The Irony of RL in LLMs (And its insane new Meta)
Автор: bycloud
Загружено: 2026-01-21
Просмотров: 10060
Start learning cyber security with TryHackMe: https://tryhackme.com/bycloud Use my code "BYCLOUD25" to get 25% off on annual subscription!
This video breaks down what's wrong with scaling RL for LLMs, especially in the direction of reaching AGI, but why RL still matters. As RL is noisy and can hurt generalization, yet it enables exploration and self-correction that pretraining can’t, we are stuck between a rock and a hard place with this direction. We’ll also look at why LoRA is becoming the practical way to do RL cheaply, swappable adapters that can match full fine-tuning on reasoning and make personalized agents easier to deploy, which might look like a promising future direction to apply RL on a massive scale.
my latest project: Intuitive AI Academy
https://intuitiveai.academy/
code "NYNM" for 50% off forever (limited to 50)
Dwarkesh Podcast w/ AK
[YouTube] • Andrej Karpathy — “We’re summoning ghosts,...
Dwarkesh Podcast w/ Ilya
[YouTube] • Ilya Sutskever – We're moving from the age...
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
[Paper] https://arxiv.org/abs/2506.01939
The Path Not Taken: RLVR Provably Learns Off the Principals
[Paper] https://arxiv.org/abs/2511.08567
LoRA Without Regret
[Blog] https://thinkingmachines.ai/blog/lora/
Tina: Tiny Reasoning Models via LoRA
[Paper] https://arxiv.org/abs/2504.15777
Tinker
[Website] https://thinkingmachines.ai/tinker/
My Newsletter
https://mail.bycloud.ai/
My Patreon
/ bycloud
Try out my new fav place to learn how to code https://scrimba.com/?via=bycloudAI
This video is supported by the kind Patrons & YouTube Members:
🙏Spam Maj, Alex, Chris LeDoux, DX Research Group, Poof N' Inu, Deagan, Robert Zawiasa, Ryszard Warzocha, Tobe2d, Louis Muk, Akkusativ, Kevin Tai, Mark Buckler, NO U, Tony Jimenez, Ângelo Fonseca, jiye, Anushka, Asad Dhamani, Binnie Yiu, Calvin Yan, Clayton Ford, Diego Silva, Etrotta, Gonzalo Fidalgo, Handenon, Hector, Jake Disco very, Michael Brenner, Nilly K, OlegWock, Daddy Wen, Shuhong Chen, Sid_Cipher, Stefan Lorenz, Sup, tantan assawade, Thipok Tham, Thomas Di Martino, Thomas Lin, Richárd Nagyfi, Paperboy, mika, Leo, Berhane-Meskel, Kadhai Pesalam, mayssam, Bill Mangrum, nyaa, Toru Mon, Lame Plane, Matej Macak
[Discord] / discord
[Twitter] / bycloudai
[Patreon] / bycloud
[Business Inquiries] bycloud@smoothmedia.co
[Profile & Banner Art] / pygm7
[Video Editor] Abhay and @Booga04
[Ko-fi] https://ko-fi.com/bycloudai
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: