Why LLMs Shouldn’t Follow Instructions (But Do)
Автор: ML Guy
Загружено: 2026-01-11
Просмотров: 40
A pretrained language model can predict text, but it doesn’t know how to help you.
In this video, we break down how raw LLMs are transformed into instruction-following assistants like ChatGPT. You’ll learn how fine-tuning, human preference data, and reinforcement learning from human feedback (RLHF) reshape a model’s behavior — without changing its architecture.
We cover:
Why next-token prediction alone is not enough
Supervised fine-tuning with instruction–response pairs
How human rankings become a reward model
What RLHF actually optimizes (and what it doesn’t)
How safety, refusals, and “helpfulness” emerge statistically
Common misconceptions about alignment and hard-coded rules
This episode connects training objectives to real-world behavior — and explains why alignment is one of the hardest unsolved problems in modern AI.
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: