[SPCL_Bcast] Data Selection - Data Challenges when Training Generative Models
Автор: Scalable Parallel Computing Lab, SPCL @ ETH Zurich
Загружено: 2025-05-15
Просмотров: 132
Speaker: Theodoros Rekatsinas (Axelera AI)
Venue: SPCL_Bcast #57, recorded on 8th May 2025
Abstract: This talk explores how strategic data selection can improve the efficiency of training generative AI models. I will cover approaches for both pre-training and fine-tuning that achieve comparable performance to full training while using only a fraction of the data. During the talk I will cover key filtering techniques and data selection methods for efficient pre-training as well as the connection between data selection and optimal transport for optimized fine-tuning. I will conclude with promising future directions for adaptive data selection research.
See https://spcl.inf.ethz.ch/Bcast/ for more talks.
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: