Nvidia Scrapes YouTube, Eyes Netflix, Discovery to Train New AI Video Model
Автор: Proof News
Загружено: 2024-08-09
Просмотров: 844
Soon after OpenAI announced its video-generating artificial intelligence model Sora in February, Nvidia leadership decided to compete.
“We need one Sora like model,” wrote Sanja Fidler, vice president of AI research at Nvidia, in a company Slack channel shared with Proof News and first reported on by 404 Media. In a matter of days, Nvidia assembled more than a hundred workers to help lay the training foundation for a similar “state of the art” video model.
An investigation by Proof News found that Nvidia's team began curating video datasets from around the internet, ranging in size from hundreds of clips to hundreds of millions. According to the company Slack and internal documents, staff quickly focused on YouTube, home to billions of videos, which Nvidia’s workforce gathered by downloading datasets of previously scraped videos as well as scraping their own. They also discussed how to pull video from Discovery and Netflix.
Ingredients
Hypothesis: Nvidia downloaded millions of videos without permission from YouTube and potentially other sources in order to create a huge training set for a Sora-like video generation model.
Sample size: Internal Nvidia communications, consisting of Slack messages and emails, shared with Proof News. We are omitting the exact size of our sample to protect our source.
Techniques: We read through a large number of Slack messages and emails in order to establish whether the company had permission to use the videos and how it intended to use them.
Key findings: Nvidia built what Liu described in an email as “a video data factory” yielding a “human lifetime” worth of training content a day. Communications described the model as a foundation for commercial applications.
Limitations: We do not know if Nvidia obtained video from sources beyond YouTube and a handful of datasets mentioned in communications. Nvidia did not respond directly to our questions.
Why we think news needs an ingredients label
• What's in your news?
Links
Read the full investigation
https://www.proofnews.org/nvidia-scra...
Read 404 Media’s report
https://www.404media.co/nvidia-ai-scr...
Watch our previous report on tech companies using YouTube to train AI
• Was your favorite YouTube channel used to ...
https://www.proofnews.org/
/ proof_news
/ proof__news
Join us in making trustworthy, verifiable information the new baseline:
https://www.proofnews.org/donate/
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: