Advanced LLM Post-Training: SFT, DPO, Reinforcement Learning w/ Maxime Labonne (Liquid AI)

Автор: Youth AI Initiative

Загружено: 2025-11-24

Просмотров: 94

Описание:

In this exclusive guest lecture for the Youth AI Initiative, we hosted Maxime Labonne (Head of Post-Training at Liquid AI & Author of the LLM Engineer's Handbook) for a masterclass on the modern Large Language Model (LLM) training pipeline.

Maxime went far beyond the basics; breaking down the exact techniques used by top labs to turn base models into powerful, aligned products. He covered the full stack, from dataset creation to cutting-edge techniques like GRPO.

🚀 What You Will Learn:

📌 Supervised Fine-Tuning (SFT)
How to structure instruction data and teach models to follow specific commands.

📌 Preference Alignment (DPO)
A deep dive into Direct Preference Optimization and how it aligns models.

📌 Reinforcement Learning (GRPO) & Reasoning Models
How new “reasoning models” (like DeepSeek-R1) use Group Relative Policy Optimization (GRPO) to “think,” plan, and verify their chain of thought before answering.

📌 Efficient Training Techniques
A comparison of LoRA, QLoRA, and Full Fine-Tuning, including how to train models on limited hardware.

📌 Dataset Curation
What makes a dataset truly “good”: accuracy, diversity, complexity, filtering, and balancing.

🌟 About the Youth AI Initiative

The Youth AI Initiative is a free, 6-week intensive AI incubator for the brightest high school students. We bridge the gap between academic theory and real-world application through expert-led curriculum and guest insights from leaders at Microsoft, Hugging Face, Liquid AI, and more.

🌐 Learn More: https://youthaiinitiative.com/

📣 Connect With Us:
LinkedIn: https://www.linkedin.com/company/youth-ai-...
Instagram: https://www.instagram.com/youth_ai_initiat...
Twitter: https://x.com/YouthAIInit

🙏 Special Thanks
A huge thank you to our main sponsor Tam Finans and our community supporter Global Turks AI.

📍 Timestamps:
Introduction 00:00
What is Post-Training 00:53
Supervised Fine-Tuning 05:04
Preference Alignment (DPO) 14:00
Reinforcement Learning (GRPO) 18:10
Conclusion 24:30
Q&A 26:12

Advanced LLM Post-Training: SFT, DPO, Reinforcement Learning w/ Maxime Labonne (Liquid AI)

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Generative Media: Diffusion Models, Video Generation, Inference Optimization w/ Gökay Aydoğan (fal)

Generative Media: Diffusion Models, Video Generation, Inference Optimization w/ Gökay Aydoğan (fal)

Introduction to LLM Post Training by Maxime Labonne, PhD

Introduction to LLM Post Training by Maxime Labonne, PhD

Multimodality: VLMs, Visual Techniques & More w/ Merve Noyan (HuggingFace)

Multimodality: VLMs, Visual Techniques & More w/ Merve Noyan (HuggingFace)

An introduction to Employee Ownership Trusts (EOT's) (Virtual Boardroom) - FD Recruit

An introduction to Employee Ownership Trusts (EOT's) (Virtual Boardroom) - FD Recruit

Are you CBAM ready? ALS x Dubrink join forces to provide one CBAM solution for your business needs!

Are you CBAM ready? ALS x Dubrink join forces to provide one CBAM solution for your business needs!

Музыка для работы за компьютером | Фоновая музыка для концентрации и продуктивности

Музыка для работы за компьютером | Фоновая музыка для концентрации и продуктивности

🤖 AI, Какие языки лучше генерируются? Чем на самом деле занимается Брагилевский в JetBrains

🤖 AI, Какие языки лучше генерируются? Чем на самом деле занимается Брагилевский в JetBrains

[Webinar] Agile Team Management

[Webinar] Agile Team Management

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Experimenting with Reinforcement Learning with Verifiable Rewards (RLVR)

Experimenting with Reinforcement Learning with Verifiable Rewards (RLVR)

RISE’s Risk Assessment Analysis of AI Systems and CitCom’s proposed TEF Label

RISE’s Risk Assessment Analysis of AI Systems and CitCom’s proposed TEF Label

Reinforcement Learning from Human Feedback (RLHF) Explained

Reinforcement Learning from Human Feedback (RLHF) Explained

RFT, DPO, SFT: Fine-tuning with OpenAI — Ilan Bigio, OpenAI

RFT, DPO, SFT: Fine-tuning with OpenAI — Ilan Bigio, OpenAI

Claude Code: полный гайд по AI-кодингу (хаки, техники и секреты)

Claude Code: полный гайд по AI-кодингу (хаки, техники и секреты)

DeepSeek's GRPO (Group Relative Policy Optimization) | Reinforcement Learning for LLMs

DeepSeek's GRPO (Group Relative Policy Optimization) | Reinforcement Learning for LLMs

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

🧑‍💻 Собеседования и найм: алгоритмы, высокие нагрузки, использование LLM, IDE, стресс и лайвкодинг

🧑‍💻 Собеседования и найм: алгоритмы, высокие нагрузки, использование LLM, IDE, стресс и лайвкодинг

Обучение с подкреплением для агентов — Уилл Браун, исследователь машинного обучения в Morgan Stanley

Обучение с подкреплением для агентов — Уилл Браун, исследователь машинного обучения в Morgan Stanley

RAG | ВСЁ, что тебе нужно знать (+ 11 Продвинутых стратегий)

RAG | ВСЁ, что тебе нужно знать (+ 11 Продвинутых стратегий)

Finansal Yapay Zeka: Intraday Forecasting, Ridge Regression & XGBoost w/ Ecem Yaşar (Tam Finans)

Finansal Yapay Zeka: Intraday Forecasting, Ridge Regression & XGBoost w/ Ecem Yaşar (Tam Finans)