From "Saying" to "Doing": Building Infra for Physical AI | Phd Kaylin Song with Tommaso Di Bartolo
Автор: Phygtl
Загружено: 2025-12-01
Просмотров: 43
In this episode of The Physical AI, host Tommaso Di Bartolo sits down with Yueqi (Kaylyn) Song, a PhD researcher at Carnegie Mellon University’s Language Technology Institute. They dive deep into the next evolution of embodied intelligence: Generalist Agents.
We explore why scaling Large Language Models (LLMs) isn't enough to solve real-world reasoning and how we are moving away from pixel-based web scraping toward structured "Physical APIs." Caitlyn explains her groundbreaking work on the Agent Data Protocol (ADP)—the infrastructure layer that turns messy data into a standardized language for robots and digital agents.
🚀 In this episode, we cover:
The Agent Data Protocol (ADP): Treating data formats as infrastructure to unify browsing, coding, and API actions.
Imagination vs. Hallucination: How to build trust, provenance, and safety into autonomous agents.
The End of Screenshots: Why the future of agents lies in structured APIs, not visual scraping.
The Reasoning Gap: Why multi-modal models fail at simple visual puzzles and how to fix it.
Physical AI Future: How humans will collaborate with robots in an unstructured physical world.
0:00 - Introduction: Where AI meets the real world
2:03 - What is the Agent Data Protocol (ADP)?
6:30 - Preventing AI Hallucinations in Physical Agents
9:26 - Moving Beyond Browsing: APIs vs. Screenshots
13:45 - The concept of "Physical APIs"
17:27 - Why LLMs struggle with spatial reasoning (Visual Puzzles)
26:30 - The future of Human-Agent Collaboration
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: