Soon after OpenAI announced its video-generating artificial intelligence model Sora in February, Nvidia leadership decided to compete.
“We need one Sora like model,” wrote Sanja Fidler, vice president of AI research at Nvidia, in a company Slack channel shared with Proof News and first reported on by 404 Media. In a matter of days, Nvidia assembled more than a hundred workers to help lay the training foundation for a similar “state of the art” video model.
Both Nvidia and OpenAI’s artificial intelligence prowess helped the companies ascend into Silicon Valley royalty over the last few years: OpenAI’s 2020 release of GPT-3 made it the fastest growing internet service ever by some estimates, and the demand for Nvidia’s AI chips has stretched its valuation well into the trillions of dollars.
And yet, despite being flush with cash and engineering talent, Nvidia faced a big hurdle in building its new video model. The project would need millions of videos on which to train.
An investigation by Proof News found that Nvidia's team began curating video datasets from around the internet, ranging in size from hundreds of clips to hundreds of millions. According to the company Slack and internal documents, staff quickly focused on YouTube, home to billions of videos, which Nvidia’s workforce gathered by downloading datasets of previously scraped videos as well as scraping their own.
Nvidia leadership also set their sites on traditional Hollywood. Communications show Nvidia staff discussed how to pull video from Discovery and Netflix and scrape clips from IMDb, an entertainment industry website that features trailers and clips. There’s no indication in the documents Proof reviewed that videos from those services were taken.
Some staff grew concerned that Nvidia, in its hunger for video, may be plowing into unlawful territory. Employees raised questions about copyright, authorization and legal clearance, Slack discussions show.
Nvidia leadership urged them to push ahead.
“In the meeting today, we get [sic] permissions to download all kind of data,” wrote Ming-Yu Liu, a vice president of research for Nvidia. “Should we download the whole Netflix too? How could we operationalize this?”
In the intensely competitive AI race, companies have invested in more resource-consuming training for their models, but finding enough high-quality material can pose a challenge, sending staff off to scrape large swaths of the internet. By one estimate, the AI industry is poised to run out of new sources of text to train language models as soon as the next couple of years, having consumed humankind’s library.
On the company’s Slack, Nvidia employees discussed “strong indications” that OpenAI had tapped blockbuster films to create Sora, because it generated images that looked like a mashup of creatures from Avatar and Lord of the Rings.
OpenAI did not respond to a request for comment.
According to Slack communications, Nvidia’s scheme was to download videos and process them for a foundation model, which could be fine-tuned for “downstream applications.” The team assembled to identify, download, and process the videos hailed from numerous branches of Nvidia, including GEAR, home to Nvidia’s GR00T group, which is working on creating a humanoid robot that can perform tasks around the house like making juice, playing the drums, and dabbing, according to a video on the company’s site. Team members also came from Nvidia’s autonomous vehicles division, which produces self-driving technology for companies like Mercedes, and Omniverse, which includes Nvidia’s foray into creating digital humans.
By May, Nvidia had built a pipeline for obtaining and readying 80 years of video runtime every day, or what Liu described in an email as “a video data factory” yielding a “human lifetime” of training content a day.
The congratulatory email went to the highest levels of the company. Nvidia’s CEO Jensen Huang replied all: “Great update.”
The video-generating model, internally dubbed Cosmos, has yet to be publicly announced. Internal communications show employees foresaw its training materials to be ready by the fall.
Part of management’s justification to staff for pursuing such large swaths of training content was that the videos were being used for research purposes, according to Slack communications and someone familiar with the project who asked not to be named for fear of professional repercussions.
“They use ‘research purposes’ to avoid responsibility to use licensed data,” the source said. “I felt it was weird and not ethical.”
Stephanie Matthew, a spokesperson for Nvidia, did not respond directly to a question about how the company squares its internal messaging on the model being used for research purposes with direct references to how it would be used in commercial products.
“We respect the rights of all content creators and are confident that our models and our research efforts are in full compliance with the letter and the spirit of copyright law,” Matthews wrote in an emailed statement. “Copyright law protects particular expressions but not facts, ideas, data, or information. Anyone is free to learn facts, ideas, data, or information from another source and use it to make their own expressions. Fair use also protects the ability to use a work for a transformative purpose, such as model training.”
This isn’t the first time Nvidia has used the work of creatives without their consent. Earlier this year, Proof News found Nvidia, among other tech heavyweights, used a dataset of subtitles culled from YouTube videos to train artificial intelligence models. YouTubers from across the United States and abroad expressed alarm about their content being taken, calling the actions of AI developers “exploitative" and “violating.”
YouTube’s terms of service prohibits scraping its creators' videos.
At one point, Nvidia workers discussed on Slack concerns that the scale of their scraping might cause YouTube to block their IP address. One worker suggested using BrightData, a company that among other things, provides proxy IP addresses for web scraping.
Another employee responded that they had found a different way around the problem by restarting a virtual machine on Amazon Web Services.
“So, that’s not a problem so far.”
Another employee indicated that they would include an update on that solution for “JHH,” an apparent reference to Huang.
In the past year, AI developers have come under fire for gobbling up mountains of digital works, including books, movies, music and news articles, leading some copyright holders to file lawsuits against tech giants for unauthorized use of their work. The tech companies argue their actions constitute fair use and much of the litigation is ongoing.
Writers, actors, animators and visual effects artists have sought to build guardrails in their union contracts in hopes of warding off future job losses to AI. But studio agreements have no bearing on the actions of technology companies using these creatives’ work to train video-generating models.
At least one Nvidia employee tasked with Cosmos training project brought up Hollywood’s “hypersensitivity” artificial intelligence, circulating articles that warn of potential job losses and efforts to address AI in union bargaining.
"What we are doing will lead to zero publications,” wrote Liu on Slack in response. “Given we are not publishing anything, there will be no negative sentiment.”
Nvidia has negotiated deals for copyrighted content for other projects. For example, the company is working with Shutterstock, which has a library of millions of videos and images, to build 3D-generative models. The company also has a partnership with Getty to use that company’s images.
When another employee asked on Slack if they’d gotten legal approval to download YouTube data, management responded right away.
"This is an executive decision,” Liu said. “We have an umbrella approval for all of the data."