Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
dTub
Скачать

Reinforcement learning & fine-tuning on TPUs | The Agent Factory Podcast

Автор: Google Cloud Tech

Загружено: 2025-12-22

Просмотров: 369

Описание:

With Gemini 3 crushing benchmarks by training and serving solely on TPUs, we're diving deep into the infrastructure that powers the next generation of AI agents. In this holiday special of The Agent Factory, we go beyond the hype to explore how developers can use TPUs and Reinforcement Learning (RL) to build specialized, production-ready agents at scale.

Join hosts Shir Meir Lador and Don McCasland and the special guest Kyle Meggs Product Manager on the Google TPU Training Team. We break down the "why" and "how" of fine-tuning, the critical role of RL in model alignment and safety, and how Google's TPU architecture offers unmatched efficiency for these complex workloads. Plus, don't miss the hands-on demo of MaxText 2.0 running a GRPO job on TPU infrastructure.

In this episode, you will learn:
1️⃣ Fine-tuning fundamentals: When to choose fine-tuning over prompt engineering (focusing on specialization, privacy, and cost).
2️⃣ The model lifecycle: A clear breakdown of pre-training vs. post-training (SFT & RL), featuring Andrej Karpathy’s "chemistry textbook" analogy.
3️⃣ Reinforcement learning deep dive: When should you use RL? What added value does it bring? What are the latest advancements in the field?
4️⃣ The TPU advantage: How TPU pods and Inter-Chip Interconnect (ICI) solve critical bottlenecks in large-scale fine tuning.
5️⃣ RL on TPU demo: A technical look at the MaxText 2.0 stack running Reinforcement Learning (GRPO) on Google Cloud TPUs.

Chapters:
0:00 - Introduction: Gemini 3 and the rise of TPUs
3:13 - Why fine-tune? Specialization and privacy
3:52 - What is fine-tuning? (SFT and RL explained)
5:50 - What is RL and why do we need it?
7:10 - The added value in RL
8:33 - Industry pulse: Why 2025 is the year of RL (DeepSeek-R1, Grok 4, Gemini 3)
10:46 - The challenges of RL: Infrastructure, algorithms, and orchestration
12:52 - Factory floor: How TPUs are designed for scale
15:53 - [Demo] Reinforcement Learning (GRPO) with MaxText 2.0 on TPUs
21:46 - Scaling to 1000+ chips and season wrap up

About The Agent Factory: "The Agent Factory" is a video-first technical podcast for developers, by developers, focused on building production-ready AI agents. We explore how to design, build, deploy, and manage agents that bring real value.

🔗 Resources & links mentioned:
➖ Post-training docs → https://goo.gle/4sbBLAd
➖ Google Cloud TPU (Ironwood) documentation → https://goo.gle/3MMFOCY

🔗 Google Cloud open source code:
➖ MaxText → https://goo.gle/4pcDQt4
➖ GPU recipes → https://goo.gle/495tp4x
➖ TPU recipes → https://goo.gle/4qgMF5U
➖ Andrej Karpathy - Chemistry Analogy → https://goo.gle/4pQcMAO
➖ Paper: "Small Language Models are the Future of Agentic AI" (Nvidia) → https://goo.gle/4qmLQIH
➖ Fine tuning blog → https://goo.gle/4pR211n

🔔 Follow Shir → https://goo.gle/49SAveB
🔔 Follow Don → https://goo.gle/3KKCrff
🔔 Follow Kyle → https://goo.gle/4j7Mg3k

Join the conversation on social media with the hashtag #TheAgentFactory.

Connect with the community at the Google Developer Program forums. → https://goo.gle/4oP9bmb

Watch more Agent Factory →    • The Agent Factory  

🔔 Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech

#TPU #ReinforcementLearning #FineTuning

Speakers: Shir Meir Lador, Kyle Meggs, Don McCasland
Products Mentioned: TPU, Gemini 3, Maxtext

Reinforcement learning & fine-tuning on TPUs | The Agent Factory Podcast

Поделиться в:

Доступные форматы для скачивания:

Скачать видео mp4

  • Информация по загрузке:

Скачать аудио mp3

Похожие видео

Making a list {and checking it twice}: The Gemini CLI workflow

Making a list {and checking it twice}: The Gemini CLI workflow

America’s New Gold Rush Isn't Silicon

America’s New Gold Rush Isn't Silicon

Design Your Law Firm’s SIGNATURE CLIENT EXPERIENCE Clients Remember and REFER

Design Your Law Firm’s SIGNATURE CLIENT EXPERIENCE Clients Remember and REFER

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

Design Patterns

Design Patterns

Может ли нейросеть real-time распознавать и переводить речь на видеокарте NVIDIA P106-100 с 6GB VRAM

Может ли нейросеть real-time распознавать и переводить речь на видеокарте NVIDIA P106-100 с 6GB VRAM

Минус 450 млн за ДРОБЛЕНИЕ: кто проиграл ФНС

Минус 450 млн за ДРОБЛЕНИЕ: кто проиграл ФНС

Три китайские нейросети - DeepSeek, Qwen, Kimi - краткий обзор возможностей

Три китайские нейросети - DeepSeek, Qwen, Kimi - краткий обзор возможностей

Я забрал новый Cirrus SR22 G7+ с завода и полетел на нём в Чикаго

Я забрал новый Cirrus SR22 G7+ с завода и полетел на нём в Чикаго

Антигравитация и Nano Banana Pro с Ремиком | Подкаст Agent Factory

Антигравитация и Nano Banana Pro с Ремиком | Подкаст Agent Factory

Почему Азовское море — самое опасное в мире

Почему Азовское море — самое опасное в мире

Next-Gen In-Flight Advertising: Passenger Receptiveness, Contextual Campaigns, and What’s Next

Next-Gen In-Flight Advertising: Passenger Receptiveness, Contextual Campaigns, and What’s Next

Making spirits bright (and models smarter): Powering up with Gemini 3

Making spirits bright (and models smarter): Powering up with Gemini 3

RAG простыми словами: как научить LLM работать с файлами

RAG простыми словами: как научить LLM работать с файлами

Песочница агентов и снимки Pod: Суперзарядка агентов в GKE | Подкаст «Фабрика агентов»

Песочница агентов и снимки Pod: Суперзарядка агентов в GKE | Подкаст «Фабрика агентов»

ChatGPT Image 1.5 против Nano Banana Pro — кто победил?

ChatGPT Image 1.5 против Nano Banana Pro — кто победил?

Self Driving Cars Are at War (Here’s Why)

Self Driving Cars Are at War (Here’s Why)

Unwrap your creativity: A gen media special

Unwrap your creativity: A gen media special

Протокол TLS. Аутентификация | Компьютерные сети 2025 - 42

Протокол TLS. Аутентификация | Компьютерные сети 2025 - 42

Шаблон повторных попыток: секрет отказоустойчивого кода на Python

Шаблон повторных попыток: секрет отказоустойчивого кода на Python

© 2025 dtub. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: [email protected]