How language model post-training is done today

Автор: Interconnects AI

Загружено: 2025-01-08

Просмотров: 11600

Описание:

I’m far more optimistic about the state of open recipes for and knowledge of post-training starting 2025 than I was starting 2024. Last year one of my first posts was how open post-training won’t match like likes of GPT-4. This is still the case, but now we at least understand the scope of things we will be working with better.

It’s a good time to record an overview of what post-training looks like today. I gave a version of this talk for the first time in 2023, which felt like a review of the InstructGPT paper not based on reproduced literature knowledge. In 2024, the scientific community made substantial progress in actually training these models and expanding the frontier of knowledge. Doing one of these talks every year feels like a good way to keep tabs on the state of play (whereas last year, I just had a bunch of links to add to the conversation on where to start).

00:00 Introduction
10:00 Prompts & Skill Selection
14:19 Instruction Finetuning
21:45 Preference Finetuning
36:17 Reinforcement Finetuning
45:28 Open Questions
52:02 Wrap Up

Slides: https://docs.google.com/presentation/...

More context: https://www.interconnects.ai/p/the-st...

Get Interconnects (https://www.interconnects.ai/)...
... on YouTube: / @interconnects
... on Twitter: https://x.com/interconnectsai
... on Linkedin: / interconnects-ai
... on Spotify: https://open.spotify.com/show/2UE6s7w...
… on Apple Podcasts: https://podcasts.apple.com/us/podcast...

How language model post-training is done today

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

array(20) { ["YXTYbr3hiFU"]=> object(stdClass)#7356 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "YXTYbr3hiFU" ["related_video_title"]=> string(48) "An Unexpected Reinforcement Learning Renaissance" ["posted_time"]=> string(27) "9 месяцев назад" ["channelName"]=> NULL } ["FUcilE5Gx_0"]=> object(stdClass)#7365 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "FUcilE5Gx_0" ["related_video_title"]=> string(50) "Состояние открытых моделей" ["posted_time"]=> string(25) "4 недели назад" ["channelName"]=> NULL } ["ZkYNjV1qETk"]=> object(stdClass)#7354 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "ZkYNjV1qETk" ["related_video_title"]=> string(131) "Если это сработает… ИИ появится раньше. (Термодинамические вычисления)" ["posted_time"]=> string(23) "8 часов назад" ["channelName"]=> NULL } ["wjZofJX0v4M"]=> object(stdClass)#7359 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "wjZofJX0v4M" ["related_video_title"]=> string(148) "LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры" ["posted_time"]=> string(19) "1 год назад" ["channelName"]=> NULL } ["yGkJj_4bjpE"]=> object(stdClass)#7348 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "yGkJj_4bjpE" ["related_video_title"]=> string(77) "How I finetuned a Small LM to THINK and solve puzzles on its own (GRPO & RL!)" ["posted_time"]=> string(25) "4 месяца назад" ["channelName"]=> NULL } ["VAzL8RHot1c"]=> object(stdClass)#7364 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "VAzL8RHot1c" ["related_video_title"]=> string(53) "The art of training a good (reasoning) language model" ["posted_time"]=> string(25) "4 месяца назад" ["channelName"]=> NULL } ["H-oCV5brtU4"]=> object(stdClass)#7352 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "H-oCV5brtU4" ["related_video_title"]=> string(42) "Intro to Fine-Tuning Large Language Models" ["posted_time"]=> string(25) "2 месяца назад" ["channelName"]=> NULL } ["1pmyTnGOevU"]=> object(stdClass)#7360 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "1pmyTnGOevU" ["related_video_title"]=> string(56) "Introduction to LLM Post Training by Maxime Labonne, PhD" ["posted_time"]=> string(23) "1 месяц назад" ["channelName"]=> NULL } ["DvZ8jZ-laj4"]=> object(stdClass)#7342 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "DvZ8jZ-laj4" ["related_video_title"]=> string(89) "Reasoning without Language - Deep Dive into 27 mil parameter Hierarchical Reasoning Model" ["posted_time"]=> string(25) "3 месяца назад" ["channelName"]=> NULL } ["vIgE1t1rKjg"]=> object(stdClass)#7363 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "vIgE1t1rKjg" ["related_video_title"]=> string(33) "They Built an AGI Lab in 8 Months" ["posted_time"]=> string(19) "3 дня назад" ["channelName"]=> NULL } ["C4HxJQ2QzWo"]=> object(stdClass)#7355 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "C4HxJQ2QzWo" ["related_video_title"]=> string(39) "Reinforcement Learning for LLMs in 2025" ["posted_time"]=> string(27) "9 месяцев назад" ["channelName"]=> NULL } ["pkpJMNjvgXw"]=> object(stdClass)#7361 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "pkpJMNjvgXw" ["related_video_title"]=> string(57) "David Silver - Towards Superhuman Intelligence - RLC 2024" ["posted_time"]=> string(19) "1 год назад" ["channelName"]=> NULL } ["J1APR8Bo9dE"]=> object(stdClass)#7349 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "J1APR8Bo9dE" ["related_video_title"]=> string(65) "Early stages of the reinforcement learning era of language models" ["posted_time"]=> string(27) "8 месяцев назад" ["channelName"]=> NULL } ["KJtZARuO3JY"]=> object(stdClass)#7347 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "KJtZARuO3JY" ["related_video_title"]=> string(70) "Visualizing transformers and attention | Talk for TNG Big Tech Day '24" ["posted_time"]=> string(28) "11 месяцев назад" ["channelName"]=> NULL } ["LVXtFnEbNU0"]=> object(stdClass)#7345 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "LVXtFnEbNU0" ["related_video_title"]=> string(100) "Everything You Wanted to Know About LLM Post-Training, with Nathan Lambert of Allen Institute for AI" ["posted_time"]=> string(28) "11 месяцев назад" ["channelName"]=> NULL } ["oogxOPjGL04"]=> object(stdClass)#7346 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "oogxOPjGL04" ["related_video_title"]=> string(167) "Россия просто исчезает. Предупреждение, к которому нужно прислушаться / Валентин Катасонов" ["posted_time"]=> string(23) "9 часов назад" ["channelName"]=> NULL } ["-8zrQggsVJU"]=> object(stdClass)#7343 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "-8zrQggsVJU" ["related_video_title"]=> string(180) "«Я понял, что это конец»: как создатель «Алисы» уволился из «Сбера», эмигрировал и строит AI-стартап" ["posted_time"]=> string(25) "2 недели назад" ["channelName"]=> NULL } ["eTieetk2dSw"]=> object(stdClass)#7344 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "eTieetk2dSw" ["related_video_title"]=> string(58) "Building with Instruction-Tuned LLMs: A Step-by-Step Guide" ["posted_time"]=> string(65) "Трансляция закончилась 2 года назад" ["channelName"]=> NULL } ["g80Q1sVtikE"]=> object(stdClass)#7332 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "g80Q1sVtikE" ["related_video_title"]=> string(108) "Доработайте свою степень магистра права за 13 минут. Вот как" ["posted_time"]=> string(23) "1 месяц назад" ["channelName"]=> NULL } ["PAz_-xPJcRM"]=> object(stdClass)#7333 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "PAz_-xPJcRM" ["related_video_title"]=> string(0) "" ["posted_time"]=> string(25) "3 месяца назад" ["channelName"]=> NULL } }

An Unexpected Reinforcement Learning Renaissance

An Unexpected Reinforcement Learning Renaissance

Состояние открытых моделей

Состояние открытых моделей

Если это сработает… ИИ появится раньше. (Термодинамические вычисления)

Если это сработает… ИИ появится раньше. (Термодинамические вычисления)

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

How I finetuned a Small LM to THINK and solve puzzles on its own (GRPO & RL!)

How I finetuned a Small LM to THINK and solve puzzles on its own (GRPO & RL!)

The art of training a good (reasoning) language model

The art of training a good (reasoning) language model

Intro to Fine-Tuning Large Language Models

Intro to Fine-Tuning Large Language Models

Introduction to LLM Post Training by Maxime Labonne, PhD

Introduction to LLM Post Training by Maxime Labonne, PhD

Reasoning without Language - Deep Dive into 27 mil parameter Hierarchical Reasoning Model

Reasoning without Language - Deep Dive into 27 mil parameter Hierarchical Reasoning Model

They Built an AGI Lab in 8 Months

They Built an AGI Lab in 8 Months

Reinforcement Learning for LLMs in 2025

Reinforcement Learning for LLMs in 2025

David Silver - Towards Superhuman Intelligence - RLC 2024

David Silver - Towards Superhuman Intelligence - RLC 2024

Early stages of the reinforcement learning era of language models

Early stages of the reinforcement learning era of language models

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Everything You Wanted to Know About LLM Post-Training, with Nathan Lambert of Allen Institute for AI

Everything You Wanted to Know About LLM Post-Training, with Nathan Lambert of Allen Institute for AI

Россия просто исчезает. Предупреждение, к которому нужно прислушаться / Валентин Катасонов

Россия просто исчезает. Предупреждение, к которому нужно прислушаться / Валентин Катасонов

«Я понял, что это конец»: как создатель «Алисы» уволился из «Сбера», эмигрировал и строит AI-стартап

«Я понял, что это конец»: как создатель «Алисы» уволился из «Сбера», эмигрировал и строит AI-стартап

Building with Instruction-Tuned LLMs: A Step-by-Step Guide

Building with Instruction-Tuned LLMs: A Step-by-Step Guide

Доработайте свою степень магистра права за 13 минут. Вот как

Доработайте свою степень магистра права за 13 минут. Вот как