Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
dTub
Скачать

⚡️Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect

Автор: Latent Space

Загружено: 2025-05-22

Просмотров: 4287

Описание:

Claude 4 controversies, reactions, LMArena and all that jazz.

References:

Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment : https://x.com/willccbb/status/1925389...

Verifiers: https://github.com/willccbb/verifiers

Timestamps

00:00 Introduction to the Podcast and Guests

01:00 Discussion on Claude 4 and AI Models

03:07 Extended Thinking and Tool Use in AI

06:47 Technical Highlights and Model Trustworthiness

10:31 Thinking Budgets and Their Implications

13:38 Controversy Surrounding Opus and AI Ethics

18:49 Reflections on AI Tools and Their Limitations

21:58 The Chaos of Predictive Systems

22:56 Marketing and Safety in AI Models

24:30 Evaluating AI Companies and Their Strategies

25:53 The Role of Academia in AI Evaluations

27:43 Teaching Taste in Research

28:41 Making Educated Bets in AI Research

30:12 Recent Developments in Multi-Turn Tool Use

32:50 Incentivizing Tool Use in AI Models

34:45 The Future of Reward Models in AI

39:10 Exploring Flexible Reward Systems

⚡️Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect

Поделиться в:

Доступные форматы для скачивания:

Скачать видео mp4

  • Информация по загрузке:

Скачать аудио mp3

Похожие видео

Context Engineering for Agents - Lance Martin, LangChain

Context Engineering for Agents - Lance Martin, LangChain

Reinforcement Learning with Verifiable Rewards - Teaching LLMs to Solve Problems

Reinforcement Learning with Verifiable Rewards - Teaching LLMs to Solve Problems

Ep. 22: Will Brown, Prime Intellect Research Lead

Ep. 22: Will Brown, Prime Intellect Research Lead

Huge Breakthrough: We're Moving Beyond Silicon

Huge Breakthrough: We're Moving Beyond Silicon

Mike Merrill | Terminal-bench: A Benchmark for AI Agents in Terminal Environments

Mike Merrill | Terminal-bench: A Benchmark for AI Agents in Terminal Environments

AI без хайпа: как всё работает на самом деле? Александр Машрабов и первый казахстанский единорог

AI без хайпа: как всё работает на самом деле? Александр Машрабов и первый казахстанский единорог

Andrej Karpathy: Software Is Changing (Again)

Andrej Karpathy: Software Is Changing (Again)

RL, Reasoning, Reward Hacking, AI Timeline and Post AGI | Will Brown (Research at Prime Intellect)

RL, Reasoning, Reward Hacking, AI Timeline and Post AGI | Will Brown (Research at Prime Intellect)

After LLMs: Spatial Intelligence and World Models — Fei-Fei Li & Justin Johnson, World Labs

After LLMs: Spatial Intelligence and World Models — Fei-Fei Li & Justin Johnson, World Labs

Training Agentic Reasoners — Will Brown, Prime Intellect

Training Agentic Reasoners — Will Brown, Prime Intellect

Scaling Test Time Compute to Multi-Agent Civilizations — Noam Brown, OpenAI

Scaling Test Time Compute to Multi-Agent Civilizations — Noam Brown, OpenAI

Will Brown: Abstractions for Agentic RL

Will Brown: Abstractions for Agentic RL

OpenAI o1 isn’t a chat model (and that’s the point)

OpenAI o1 isn’t a chat model (and that’s the point)

Richard Sutton – Father of RL thinks LLMs are a dead end

Richard Sutton – Father of RL thinks LLMs are a dead end

Sergey Levine - Multi-Turn Reinforcement Learning for LLM Agents

Sergey Levine - Multi-Turn Reinforcement Learning for LLM Agents

Теренс Тао о том, как Григорий Перельман решил гипотезу Пуанкаре | Лекс Фридман

Теренс Тао о том, как Григорий Перельман решил гипотезу Пуанкаре | Лекс Фридман

Обучение с подкреплением для агентов — Уилл Браун, исследователь машинного обучения в Morgan Stanley

Обучение с подкреплением для агентов — Уилл Браун, исследователь машинного обучения в Morgan Stanley

Andrew Ng: State of AI Agents | LangChain Interrupt

Andrew Ng: State of AI Agents | LangChain Interrupt

Rich Sutton, The OaK Architecture: A Vision of SuperIntelligence from Experience - RLC 2025

Rich Sutton, The OaK Architecture: A Vision of SuperIntelligence from Experience - RLC 2025

Interpretability: Understanding how AI models think

Interpretability: Understanding how AI models think

© 2025 dtub. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: [email protected]