Reinforcement learning & fine-tuning on TPUs | The Agent Factory Podcast
Автор: Google Cloud Tech
Загружено: 2025-12-22
Просмотров: 369
With Gemini 3 crushing benchmarks by training and serving solely on TPUs, we're diving deep into the infrastructure that powers the next generation of AI agents. In this holiday special of The Agent Factory, we go beyond the hype to explore how developers can use TPUs and Reinforcement Learning (RL) to build specialized, production-ready agents at scale.
Join hosts Shir Meir Lador and Don McCasland and the special guest Kyle Meggs Product Manager on the Google TPU Training Team. We break down the "why" and "how" of fine-tuning, the critical role of RL in model alignment and safety, and how Google's TPU architecture offers unmatched efficiency for these complex workloads. Plus, don't miss the hands-on demo of MaxText 2.0 running a GRPO job on TPU infrastructure.
In this episode, you will learn:
1️⃣ Fine-tuning fundamentals: When to choose fine-tuning over prompt engineering (focusing on specialization, privacy, and cost).
2️⃣ The model lifecycle: A clear breakdown of pre-training vs. post-training (SFT & RL), featuring Andrej Karpathy’s "chemistry textbook" analogy.
3️⃣ Reinforcement learning deep dive: When should you use RL? What added value does it bring? What are the latest advancements in the field?
4️⃣ The TPU advantage: How TPU pods and Inter-Chip Interconnect (ICI) solve critical bottlenecks in large-scale fine tuning.
5️⃣ RL on TPU demo: A technical look at the MaxText 2.0 stack running Reinforcement Learning (GRPO) on Google Cloud TPUs.
Chapters:
0:00 - Introduction: Gemini 3 and the rise of TPUs
3:13 - Why fine-tune? Specialization and privacy
3:52 - What is fine-tuning? (SFT and RL explained)
5:50 - What is RL and why do we need it?
7:10 - The added value in RL
8:33 - Industry pulse: Why 2025 is the year of RL (DeepSeek-R1, Grok 4, Gemini 3)
10:46 - The challenges of RL: Infrastructure, algorithms, and orchestration
12:52 - Factory floor: How TPUs are designed for scale
15:53 - [Demo] Reinforcement Learning (GRPO) with MaxText 2.0 on TPUs
21:46 - Scaling to 1000+ chips and season wrap up
About The Agent Factory: "The Agent Factory" is a video-first technical podcast for developers, by developers, focused on building production-ready AI agents. We explore how to design, build, deploy, and manage agents that bring real value.
🔗 Resources & links mentioned:
➖ Post-training docs → https://goo.gle/4sbBLAd
➖ Google Cloud TPU (Ironwood) documentation → https://goo.gle/3MMFOCY
🔗 Google Cloud open source code:
➖ MaxText → https://goo.gle/4pcDQt4
➖ GPU recipes → https://goo.gle/495tp4x
➖ TPU recipes → https://goo.gle/4qgMF5U
➖ Andrej Karpathy - Chemistry Analogy → https://goo.gle/4pQcMAO
➖ Paper: "Small Language Models are the Future of Agentic AI" (Nvidia) → https://goo.gle/4qmLQIH
➖ Fine tuning blog → https://goo.gle/4pR211n
🔔 Follow Shir → https://goo.gle/49SAveB
🔔 Follow Don → https://goo.gle/3KKCrff
🔔 Follow Kyle → https://goo.gle/4j7Mg3k
Join the conversation on social media with the hashtag #TheAgentFactory.
Connect with the community at the Google Developer Program forums. → https://goo.gle/4oP9bmb
Watch more Agent Factory → • The Agent Factory
🔔 Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech
#TPU #ReinforcementLearning #FineTuning
Speakers: Shir Meir Lador, Kyle Meggs, Don McCasland
Products Mentioned: TPU, Gemini 3, Maxtext
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: