Another OpenAI Executive Choked When Asked If Sora Was Trained on YouTube Data

“The answer is somehow worse than yes.”

Flummoxed

Yet another OpenAI executive has been caught lacking on camera when asked if the company’s new Sora video generator was trained using YouTube videos.

During a recent talk at Bloomberg‘s Tech Summit in San Francisco, OpenAI chief operating officer Brad Lightcap went off on a word vomit-style monologue in the wrong direction in an attempt to deflect from questions about Sora’s training data.

“Can you say, and clear up once and for all, whether Sora was trained on YouTube data?” Bloomberg‘s Shirin Ghaffary asked the COO, prompting a wordy non-response.

“Yeah, I mean, look, the conversation around data is really important,” Lightcap said. “We obviously, like, need to know, kind of, where data comes from.”

After a long-winded description of a future “content ID system for AI” that would allow creators to opt in and out of their content being used as training data, the executive seemed to come even closer than OpenAI’s chief technology officer Mira Murati to admitting that Sora was trained on data from YouTube.

“So, yeah, we’re looking at this problem,” Lightcap said. “It’s really hard.”

He went on to say that while OpenAI doesn’t “have all the answers” to this “hard” question, it may “by 2026.”

“So no answer on the YouTube,” Ghaffary quipped back. “For now.”

Confirmation Bias

Natually, Lightcap’s on-camera gaffe draws comparison to Murati’s similar cringe-inducing foible in March, when in an interview with the Wall Street Journal, the CTO also choked when asked directly if Sora was trained on YouTube data.

“We used publicly available data and licensed data,” Murati said.

“So, videos on YouTube?” the WSJ‘s Joanna Stern followed up.

“I’m actually not sure about that,” the CTO responded, and after a protracted back and forth attempted to explain herself by saying that although she thought the data was “publicly available,” she was “not confident” about it.

Following that awkward exchange, Murati confirmed to the newspaper that Shutterstock videos had been used, but the jury’s technically still out on whether YouTube videos were also part of the Sora training data — though as one finance journalist joked, the Lightcap retort all but confirms that it was.

“That is a hard yes in AI speak,” Sherwood News‘ Rani Molla tweeted, later adding that “the answer is somehow worse than yes.”

More on Sora: Turns Out That Extremely Impressive Sora Demo… Wasn’t Exactly Made With Sora

Share This Article

Go to Source