Proof Ingredients: How AI companies stole your YouTube videos
Автор: Proof News
Загружено: 2024-07-24
Просмотров: 334
YouTubers have long wondered whether their work has been scraped by AI companies to train their models — but Proof News investigative reporter Annie Gilbertson has proven it. She found that big companies, including Apple, Anthropic, Nvidia, and Bloomberg have all used a dataset containing the transcripts to more than 170,000 YouTube videos, including videos by megastars like Mr. Beast, Marques Brownlee, and PewDiePie.
In this interview, Proof founder Julia Angwin talks to Annie about the investigation and what went into it. The interview is the first in our new series, Proof Ingredients. In this series, Julia will talk to journalists, researchers and content creators about what their investigations are made of, walking through the hypothesis, sample size, techniques, key findings, and limitations. Hopefully these ingredients help you evaluate our work and give you a framework for judging other news, too.
Ingredients
Hypothesis: AI companies are using YouTube videos to build models that may come to compete against YouTube creators.
Sample size: A 5.7 GB (489-million-word) training dataset called YouTube Subtitles.
Techniques: We linked subtitles in the dataset to videos on YouTube in order to determine whose creative material was used to train AI models. We found evidence of AI companies’ using the data through white papers and posts online.
Key findings: The training data contained 173,536 YouTube videos, more than 12,000 of which have been deleted from the platform but were still ingested by AI models.
Limitations: AI companies do not often disclose what data they use to train their models, so we are unable to produce a comprehensive list of companies that used this dataset.
Why we think news needs an ingredients label
• What's in your news?
Links
Full story on Proof News
https://www.proofnews.org/apple-nvidi...
Search tool — see if you or your favorite YouTuber were used by AI giants
https://www.proofnews.org/youtube-ai-...
Research paper about The Pile published by Eleuther AI
https://arxiv.org/abs/2101.00027
NYT article about Open AI and Google’s use of YouTube transcripts in AI training
https://www.nytimes.com/2024/04/06/te...
WSJ interview with OpenAI CTO Mira Murati
• OpenAI's Sora Made Me Crazy AI Videos—Then...
https://www.proofnews.org/
/ proof_news
/ proof__news
Join us in making trustworthy, verifiable information the new baseline:
https://www.proofnews.org/donate/
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: