Advanced LLM Post-Training: SFT, DPO, Reinforcement Learning w/ Maxime Labonne (Liquid AI)
Автор: Youth AI Initiative
Загружено: 2025-11-24
Просмотров: 94
In this exclusive guest lecture for the Youth AI Initiative, we hosted Maxime Labonne (Head of Post-Training at Liquid AI & Author of the LLM Engineer's Handbook) for a masterclass on the modern Large Language Model (LLM) training pipeline.
Maxime went far beyond the basics; breaking down the exact techniques used by top labs to turn base models into powerful, aligned products. He covered the full stack, from dataset creation to cutting-edge techniques like GRPO.
🚀 What You Will Learn:
📌 Supervised Fine-Tuning (SFT)
How to structure instruction data and teach models to follow specific commands.
📌 Preference Alignment (DPO)
A deep dive into Direct Preference Optimization and how it aligns models.
📌 Reinforcement Learning (GRPO) & Reasoning Models
How new “reasoning models” (like DeepSeek-R1) use Group Relative Policy Optimization (GRPO) to “think,” plan, and verify their chain of thought before answering.
📌 Efficient Training Techniques
A comparison of LoRA, QLoRA, and Full Fine-Tuning, including how to train models on limited hardware.
📌 Dataset Curation
What makes a dataset truly “good”: accuracy, diversity, complexity, filtering, and balancing.
🌟 About the Youth AI Initiative
The Youth AI Initiative is a free, 6-week intensive AI incubator for the brightest high school students. We bridge the gap between academic theory and real-world application through expert-led curriculum and guest insights from leaders at Microsoft, Hugging Face, Liquid AI, and more.
🌐 Learn More: https://youthaiinitiative.com/
📣 Connect With Us:
LinkedIn: https://www.linkedin.com/company/youth-ai-...
Instagram: https://www.instagram.com/youth_ai_initiat...
Twitter: https://x.com/YouthAIInit
🙏 Special Thanks
A huge thank you to our main sponsor Tam Finans and our community supporter Global Turks AI.
📍 Timestamps:
Introduction 00:00
What is Post-Training 00:53
Supervised Fine-Tuning 05:04
Preference Alignment (DPO) 14:00
Reinforcement Learning (GRPO) 18:10
Conclusion 24:30
Q&A 26:12
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: