From Prompts to Policies: How RL Builds Better AI Agents [Mahesh Sathiamoorthy] - 731

Автор: The TWIML AI Podcast with Sam Charrington

Загружено: 2025-05-13

Просмотров: 1501

Описание:

Today, we're joined by Mahesh Sathiamoorthy, co-founder and CEO of Bespoke Labs, to discuss how reinforcement learning (RL) is reshaping the way we build custom agents on top of foundation models. Mahesh highlights the crucial role of data curation, evaluation, and error analysis in model performance, and explains why RL offers a more robust alternative to prompting, and how it can improve multi-step tool use capabilities. We also explore the limitations of supervised fine-tuning (SFT) for tool-augmented reasoning tasks, the reward-shaping strategies they’ve used, and Bespoke Labs’ open-source libraries like Curator. We also touch on the models MiniCheck for hallucination detection and MiniChart for chart-based QA.

🗒️ For the full list of resources for this episode, visit the show notes page: https://twimlai.com/go/731.

🔔 Subscribe to our channel for more great content just like this: https://youtube.com/twimlai?sub_confi...

🗣️ CONNECT WITH US!
===============================
Subscribe to the TWIML AI Podcast: https://twimlai.com/podcast/twimlai/
Follow us on Twitter: / twimlai
Follow us on LinkedIn: / twimlai
Join our Slack Community: https://twimlai.com/community/
Subscribe to our newsletter: https://twimlai.com/newsletter/
Want to get in touch? Send us a message: https://twimlai.com/contact/

📖 CHAPTERS
===============================
00:00 - Introduction
3:54 - Importance of data
7:50 - RL as a tool in data curation
10:21 - Curator
12:34 - Contemporary applications of reinforcement learning (RL)
22:33 - Improving models with RL fine-tuning
24:05 - Improving Multi-Turn Tool Use with RL
26:04 - Advantages of RL
31:06 - Reward shaping
33:50 - Findings in applying RL to tool use
35:42 - Examples of applying RL in tool use
40:57 - Compute of RL vs. SFT
43:25 - Future of democritizing agentic tools
46:20 - Evaluation of results
49:45 - Difference of multi-turn from single-turn tool use
52:46 - MiniChart and MiniCheck
57:32 - Bespoke Labs
58:57 - Future directions

🔗 LINKS & RESOURCES
===============================
Improving Multi-Turn Tool Use with Reinforcement Learning - https://www.bespokelabs.ai/blog/impro...
Bespoke Curator - https://github.com/bespokelabsai/cura...
Bespoke-Minicheck - https://www.bespokelabs.ai/bespoke-mi...
MiniChart Playground - https://playground.bespokelabs.ai/min...

📸 Camera: https://amzn.to/3TQ3zsg
🎙️Microphone: https://amzn.to/3t5zXeV
🚦Lights: https://amzn.to/3TQlX49
🎛️ Audio Interface: https://amzn.to/3TVFAIq
🎚️ Stream Deck: https://amzn.to/3zzm7F5

From Prompts to Policies: How RL Builds Better AI Agents [Mahesh Sathiamoorthy] - 731

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео