⚡️Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect

Автор: Latent Space

Загружено: 2025-05-22

Просмотров: 4287

Описание:

Claude 4 controversies, reactions, LMArena and all that jazz.

References:

Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment : https://x.com/willccbb/status/1925389...

Verifiers: https://github.com/willccbb/verifiers

Timestamps

00:00 Introduction to the Podcast and Guests

01:00 Discussion on Claude 4 and AI Models

03:07 Extended Thinking and Tool Use in AI

06:47 Technical Highlights and Model Trustworthiness

10:31 Thinking Budgets and Their Implications

13:38 Controversy Surrounding Opus and AI Ethics

18:49 Reflections on AI Tools and Their Limitations

21:58 The Chaos of Predictive Systems

22:56 Marketing and Safety in AI Models

24:30 Evaluating AI Companies and Their Strategies

25:53 The Role of Academia in AI Evaluations

27:43 Teaching Taste in Research

28:41 Making Educated Bets in AI Research

30:12 Recent Developments in Multi-Turn Tool Use

32:50 Incentivizing Tool Use in AI Models

34:45 The Future of Reward Models in AI

39:10 Exploring Flexible Reward Systems

⚡️Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Context Engineering for Agents - Lance Martin, LangChain

Context Engineering for Agents - Lance Martin, LangChain

Reinforcement Learning with Verifiable Rewards - Teaching LLMs to Solve Problems

Reinforcement Learning with Verifiable Rewards - Teaching LLMs to Solve Problems

Ep. 22: Will Brown, Prime Intellect Research Lead

Ep. 22: Will Brown, Prime Intellect Research Lead

Huge Breakthrough: We're Moving Beyond Silicon

Huge Breakthrough: We're Moving Beyond Silicon

Mike Merrill | Terminal-bench: A Benchmark for AI Agents in Terminal Environments

Mike Merrill | Terminal-bench: A Benchmark for AI Agents in Terminal Environments

AI без хайпа: как всё работает на самом деле? Александр Машрабов и первый казахстанский единорог

AI без хайпа: как всё работает на самом деле? Александр Машрабов и первый казахстанский единорог

Andrej Karpathy: Software Is Changing (Again)

Andrej Karpathy: Software Is Changing (Again)

RL, Reasoning, Reward Hacking, AI Timeline and Post AGI | Will Brown (Research at Prime Intellect)

RL, Reasoning, Reward Hacking, AI Timeline and Post AGI | Will Brown (Research at Prime Intellect)

After LLMs: Spatial Intelligence and World Models — Fei-Fei Li & Justin Johnson, World Labs

After LLMs: Spatial Intelligence and World Models — Fei-Fei Li & Justin Johnson, World Labs

Training Agentic Reasoners — Will Brown, Prime Intellect

Training Agentic Reasoners — Will Brown, Prime Intellect

Scaling Test Time Compute to Multi-Agent Civilizations — Noam Brown, OpenAI

Scaling Test Time Compute to Multi-Agent Civilizations — Noam Brown, OpenAI

Will Brown: Abstractions for Agentic RL

Will Brown: Abstractions for Agentic RL

OpenAI o1 isn’t a chat model (and that’s the point)

OpenAI o1 isn’t a chat model (and that’s the point)

Richard Sutton – Father of RL thinks LLMs are a dead end

Richard Sutton – Father of RL thinks LLMs are a dead end

Sergey Levine - Multi-Turn Reinforcement Learning for LLM Agents

Sergey Levine - Multi-Turn Reinforcement Learning for LLM Agents

Теренс Тао о том, как Григорий Перельман решил гипотезу Пуанкаре | Лекс Фридман

Теренс Тао о том, как Григорий Перельман решил гипотезу Пуанкаре | Лекс Фридман

Обучение с подкреплением для агентов — Уилл Браун, исследователь машинного обучения в Morgan Stanley

Обучение с подкреплением для агентов — Уилл Браун, исследователь машинного обучения в Morgan Stanley

Andrew Ng: State of AI Agents | LangChain Interrupt

Andrew Ng: State of AI Agents | LangChain Interrupt

Rich Sutton, The OaK Architecture: A Vision of SuperIntelligence from Experience - RLC 2025

Rich Sutton, The OaK Architecture: A Vision of SuperIntelligence from Experience - RLC 2025

Interpretability: Understanding how AI models think

Interpretability: Understanding how AI models think