Congress Wants Tech Companies to Pay Up for AI Training Data

Do AI companies need to pay for the training data that powers their generative AI systems? The question is hotly contested in Silicon Valley and in a wave of lawsuits levied against tech behemoths like Meta, Google, and OpenAI. In Washington, DC, though, there seems to be a growing consensus that the tech giants need to cough up.

Today, at a Senate hearing on AI’s impact on journalism, lawmakers from both sides of the aisle agreed that OpenAI and others should pay media outlets for using their work in AI projects. “It’s not only morally right,” said Richard Blumenthal, the Democrat who chairs the Judiciary Subcommittee on Privacy, Technology, and the Law that held the hearing. “It’s legally required.”

Josh Hawley, a Republican working with Blumenthal on AI legislation, agreed. “It shouldn’t be that just because the biggest companies in the world want to gobble up your data, they should be able to do it,” he said.

Media industry leaders at the hearing today described how AI companies were imperiling their industry by using their work without compensation. Curtis LeGeyt, CEO of the National Association of Broadcasters, Danielle Coffey, CEO of the News Media Alliance, and Roger Lynch, CEO of Condé Nast, all spoke in favor of licensing. (WIRED is owned by Condé Nast.)

Coffey claimed that AI companies “eviscerate the quality content they feed upon,” and Lynch characterized training data scraped without permission as “stolen goods.” Coffey and Lynch also both said that they believe AI companies are infringing on copyright under current law. Lynch urged lawmakers to clarify that using journalistic content without first brokering licensing agreements is not protected by fair use, a legal doctrine that permits copyright violations under certain conditions.

Common Ground

Senate hearings can be adversarial, but the mood today was largely congenial. The lawmakers and media industry insiders often applauded each others’ statements. “If Congress could clarify that the use of our content, or other publisher content, for the training and output of AI models is not fair use, then the free market will take care of the rest,” Lynch said at one point. “That seems eminently reasonable to me,” Hawley replied.

Journalism professor Jeff Jarvis was the hearing’s only discordant voice. He asserted that training on data obtained without payment is, indeed, fair use, and spoke against compulsory licensing, arguing that it would damage the information ecosystem rather than safeguard it. “I must say that I am offended to see publishers lobby for protectionist legislation, trading on the political capital earned through journalism,” he said, jabbing at his fellow speakers. (Jarvis was also subject to the hearing’s only real contentious line of questioning, from Republican Marsha Blackburn, who needled Jarvis about whether AI is biased against conservatives and recited an AI-generated poem praising President Biden as evidence.)

Outside of the committee room, there is less agreement that mandatory licensing is necessary. OpenAI and other AI companies have argued that it’s not viable to license all training data, and some independent AI experts agree.

Go to Source