Building the Next Generation of Conversational AI

a16z

andreessen horowitz

Автор: a16z

Загружено: 15 мар. 2025 г.

Просмотров: 10 681 просмотр

Описание:

Inside the Code: Ankit Kumar (Sesame) & Anjney Midha (a16z) on the Future of Voice AI

What goes into building a truly natural-sounding AI voice? In this episode, Sesame’s cofounder and CTO, Ankit Kumar, joins a16z’s Anjney Midha for a deep dive into the research and engineering behind their voice technology.

They discuss the technical challenges of real-time speech generation, the trade-offs in balancing personality with efficiency, and why the team is open-sourcing key components of their model. Ankit breaks down the complexities of multimodal AI, full-duplex conversation modeling, and the computational optimizations that enable low-latency interactions. They also explore the evolution of natural language as a user interface and its potential to redefine human-computer interaction.

Plus, we take audience questions on everything from scaling laws in speech synthesis to the role of in-context learning in making AI voices more expressive.

Key Takeaways:
How Sesame achieves natural voice interactions through real-time speech generation.
The impact of open-sourcing their speech model and what it means for AI research.
The role of full-duplex modeling in improving AI responsiveness.
How computational efficiency and system latency shape AI conversation quality.
The growing role of natural language as a user interface in AI-driven experiences.

For anyone interested in AI and voice technology, this episode offers an in-depth look at the latest advancements pushing the boundaries of human-computer interaction.

Follow everyone on X:
Ankit Kumar - https://x.com/_apkumar
Anjney Midha - https://x.com/anjneymidha

Check out everything a16z is doing with artificial intelligence, including articles, projects, and more podcasts here – https://a16z.com/ai/

Chapters:
0:00 - 00:51 | Intro
00:52 - 04:58 | Challenges Of Building
04:59 - 07:45 | Q + A: What Was Done To Bridge Transcription And Text Processing?
07:46 - 09:57 | How Is Sesame So Much Better Than Others?
09:58 - 12:42 | Challenges In| Making AI Accessible To All
12:43 - 14:10 | Great Researchers Prioritize User Experience
14:11 - 15:47 | What Is Good Taste In ML?
15:48 - 17:45 | Problems That Can Be Solved That Add Value To The World
17:46 - 26:25 | Open Source Audio For Speech Generation
26:26 - 34:00 | Contextual Speech vs Text to Speech, Differences
34:01 - 35:50 | Value Proposition Of Glasses With No Friction
35:51 - 38:00 | General Purpose API vs Open Source Model
38:01 - 40:47 | Creating High Quality APIs
40:48 - 45:54 | Companions And How Sesame Will Handle Context Retention In Long Conversations
45:55 - 46:59 | Talent: What It Takes To Become A Part Of The Sesame Team
47:00 - 54:37 | How Scaling Laws For Speech Differ From Text
54:38 - 58:33 | How An Organic Conversation Be Preserved Using A Voice Companion
58:34 - 1:03:52 | App Building Technology: Roadmap
1:03:53 - 1:09:09 | Architectures and Transformers
1:09:10 - 1:15:56 | The Focus On Personality, And The Differences In Products
1:15:57 - 1:25:25 | New AI Interface: Interacting With AI Companion
1:25:26 - 1:26:56 | Companion Challenges
1:26:57 - 1:29:22 | Computing Interface Of The Future
1:29:23 - 1:31:45 | Focused Product Experience Built By Small Teams
1:31:46 - 1:36:13 | Join Sesame If You Want To Make A Consumer Product People Love

Building the Next Generation of Conversational AI

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

"Death of a Salesforce”: Why AI Will Transform the Next Generation of Sales Tech

NVIDIA CEO Jensen Huang's Vision for the Future

NVIDIA CEO Jensen Huang's Vision for the Future

How AI Is Unlocking the Secrets of Nature and the Universe | Demis Hassabis | TED

How AI Is Unlocking the Secrets of Nature and the Universe | Demis Hassabis | TED

GPT 5 — The New AI Era is Here! Features EXPLAINED

GPT 5 — The New AI Era is Here! Features EXPLAINED

Deep & Melodic House 24/7: Relaxing Music • Chill Study Music

Deep & Melodic House 24/7: Relaxing Music • Chill Study Music

Sesame: the best Conversational AI is Here.

Sesame: the best Conversational AI is Here.

Всё, что нужно знать об искусственном интеллекте прямо сейчас

Всё, что нужно знать об искусственном интеллекте прямо сейчас

How We Build Effective Agents: Barry Zhang, Anthropic

How We Build Effective Agents: Barry Zhang, Anthropic

How DeepSeek Rewrote the Transformer [MLA]

How DeepSeek Rewrote the Transformer [MLA]

Tips for building AI agents

Tips for building AI agents