Finbarr Timbers on the Future of Reinforcement Learning

Автор: Interconnects AI

Загружено: 2024-12-05

Просмотров: 1591

Описание:

Finbarr Timbers is an AI researcher who writes — one of the technical AI blog’s I’ve been recommending for a long time — and has a variety of experiences at top AI labs including DeepMind and Midjourney. The goal of this interview was to do a few things:
1. Revisit what reinforcement learning (RL) actually is, its origins, and its motivations.
2. Contextualize the major breakthroughs of deep RL in the last decade, from DQN for Atari to AlphaZero to ChatGPT. How could we have seen the resurgence coming? (see the timeline below for the major events we cover)
3. Modern uses for RL, o1, RLHF, and the future of finetuning all ML models.
4. Address some of the critiques like “RL doesn’t work yet.”

Full episode, links, and transcript: https://www.interconnects.ai/p/finbar...

Chapters:
00:00:00 Introduction
00:02:14 Reinforcement Learning Fundamentals
00:09:03 The Bitter Lesson
00:12:07 Reward Modeling and Its Challenges in RL
00:16:03 Historical Milestones in Deep RL
00:21:18 OpenAI Five and Challenges in Complex RL Environments
00:25:24 Recent-ish Developments in RL: MuZero, Decision Transformer, and RLHF
00:30:29 OpenAI's O1 and Exploration in Language Models
00:40:00 Tülu 3 and Challenges in RL Training for Language Models
00:46:48 Comparing Different AI Assistants
00:49:44 Management in AI Research
00:55:30 Building Effective AI Teams
01:01:55 The Need for Personal Branding

Timeline of RL and what was happening at the time
In the last decade of deep RL, there have been a few phases.
Era 1: Deep RL fundamentals — when modern algorithms we designed and proven.
Era 2: Major projects — AlphaZero, OpenAI 5, and all the projects that put RL on the map.
Era 3: Slowdown — when DeepMind and OpenAI no longer had the major RL projects and cultural relevance declined.
Era 4: RLHF & widening success — RL’s new life post ChatGPT.

Covering these is the following events. This is incomplete, but enough to inspire a conversation.

Early era: TD Gammon, REINFORCE, Etc
2013: Deep Q Learning (Atari)
2014: Google acquires DeepMind
2016: AlphaGo defeats Lee Sedol
2017: PPO paper, AlphaZero (no human data)
2018: OpenAI Five, GPT 2
2019: AlphaStar, robotic sim2real with RL early papers (see blog post)
2020: MuZero
2021: Decision Transformer
2022: ChatGPT, sim2real continues.
2023: Scaling laws for RL (blog post), doubt of RL
2024: o1, post-training, RL’s bloom

Get Interconnects (https://www.interconnects.ai/)...
... on YouTube: / @interconnects
... on Twitter: https://x.com/interconnectsai
... on Linkedin: / interconnects-ai
... on Spotify: https://open.spotify.com/show/2UE6s7w...
… on Apple Podcasts: https://podcasts.apple.com/us/podcast...

Finbarr Timbers on the Future of Reinforcement Learning

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Self-play for Self-driving and where Scaling Reinforcement Learning is Heading with Eugene Vinitsky

Self-play for Self-driving and where Scaling Reinforcement Learning is Heading with Eugene Vinitsky

Сооснователь OpenAI о Будущем и Настоящем в AI. Подкаст на Русском - Илья Суцкевер

Сооснователь OpenAI о Будущем и Настоящем в AI. Подкаст на Русском - Илья Суцкевер

OLMo leads on the secrets of training language models (w Dirk Groeneveld, Kyle Lo, & Luca Soldaini)

OLMo leads on the secrets of training language models (w Dirk Groeneveld, Kyle Lo, & Luca Soldaini)

Richard Sutton – Father of RL thinks LLMs are a dead end

Richard Sutton – Father of RL thinks LLMs are a dead end

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

How language model post-training is done today

How language model post-training is done today

Reinforcement Learning By the Book

Reinforcement Learning By the Book

AI Transforms Health Care | Artificial Intelligence: The Future of Medicine & Health Care Is Here

AI Transforms Health Care | Artificial Intelligence: The Future of Medicine & Health Care Is Here

MIT 6.S191 (2024): Reinforcement Learning

MIT 6.S191 (2024): Reinforcement Learning

Daniel Han on The Future of Training and Reinforcement Learning

Daniel Han on The Future of Training and Reinforcement Learning

AI, Machine Learning, Deep Learning and Generative AI Explained

AI, Machine Learning, Deep Learning and Generative AI Explained

Полный гайд по микроскопам (Обзор от 05GSM)

Полный гайд по микроскопам (Обзор от 05GSM)

МФТИ — как учат ГЕНИЕВ? Полнометражный фильм

МФТИ — как учат ГЕНИЕВ? Полнометражный фильм

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

Что такое Q-Learning (назад к основам)

Что такое Q-Learning (назад к основам)

Краткое объяснение больших языковых моделей

Краткое объяснение больших языковых моделей

9 AI-сервисов для маркетологов которые точно стоит попробовать

9 AI-сервисов для маркетологов которые точно стоит попробовать

The FASTEST introduction to Reinforcement Learning on the internet

The FASTEST introduction to Reinforcement Learning on the internet

An Unexpected Reinforcement Learning Renaissance

An Unexpected Reinforcement Learning Renaissance

История компании Dell: компьютеры, бизнес-модель, конкуренты и интернет

История компании Dell: компьютеры, бизнес-модель, конкуренты и интернет