Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
dTub
Скачать

Creating a large dataset for pretraining LLMs by Guilherme Penedo

Автор: Data Makers Fest

Загружено: 2025-04-24

Просмотров: 117

Описание:

How do you build a dataset capable of training a powerful Large Language Model (LLM)? In this Data Makers Fest talk, Guilherme Penedo explores the essential steps in creating large-scale pretraining datasets for LLMs.

The session covers key insights from recent dataset projects like RefinedWeb, Dolma, and Yi, as well as the open source tools, such as Datatrove, that streamline the process of collecting and scaling massive text datasets.

Watch the full video to understand the fundamentals of dataset curation and how it impacts LLM performance.

::::::
If you love watching content like this, consider joining us in person at the next event: www.datamakersfest.com

👉 FOLLOW US
Instagram:   / datamakersfest  
LinkedIn:   / data-makers-fest  

Our channel features talks for anyone building products and services with and around data. Subscribe to our channel for videos on Data Science, Machine Learning, AI, Data Engineering, and more.

Data Makers Fest videos may be used for non-commercial purposes under a Creative Commons License, Attribution–Non-Commercial–No Derivatives (or the CC BY – NC – ND 4.0 International). To use the talk for other purposes, please contact us at [email protected].

#datamakersfest #datascience #ai #machinelearning #dataengineering

Creating a large dataset for pretraining LLMs by Guilherme Penedo

Поделиться в:

Доступные форматы для скачивания:

Скачать видео mp4

  • Информация по загрузке:

Скачать аудио mp3

Похожие видео

array(0) { }

© 2025 dtub. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: [email protected]