Scaling AI: From Custom Speech Stacks to Evaluating Agents with LLMs
Автор: Fonzi AI Engineering Community
Загружено: 2026-01-20
Просмотров: 0
In this technical deep dive, experts from Suki, Robinhood, and Databricks share first-hand insights on building and scaling modern AI systems. We cover the full lifecycle of AI development: from designing domain-specific speech stacks and foundational models to implementing rigorous evaluation frameworks for autonomous agents.
Timestamps:
0:00 - Introduction 0:25 - Part 1: Building a Speech Stack (Healthcare focus) 04:15 - Architecture & Workflow of a Speech Assistant 08:30 - Model Selection: Transformers vs. Distillation 12:46 - Part 2: Building Foundational Models (The "Why") 17:20 - Feature Exploration & Data Strategies 22:15 - Avoiding Common Pitfalls: Attention vs. Causation 26:42 - Part 3: Evaluating Agents & LLM-as-a-Judge 31:10 - Intro to MLflow for GenAI Evaluation 35:40 - Solving Bias in LLM Judges (Positional & Verbosity) 41:31 - Final Summary & Future Trends
What You’ll Learn:
Part 1: Building a Speech Stack for High-Stakes Domains (Suki)
Medical Domain Challenges: Handling complex terminology, background noise, and diverse accents.
The Architecture: A look at the full stack, from on-device wake-word detection to back-end state management.
Optimization: Why modeling is only 30% of the effort, and how to use distillation to achieve 500ms latency.
Data Strategy: Why quality beats quantity and the importance of expert verification from day one.
Part 2: Designing Foundational Models for the Enterprise (Robinhood)
The Power of Representations: Why foundational models accelerate experimentation across different company use cases.
Behavioral Encoding: Moving beyond proxy signals to capture true user behavior.
Caveats & Iteration: Understanding why "Attention is not Causation" and how to use ablations to find hidden correlations.
Efficiency: Using large models as "teachers" to distill smaller, cost-effective production models.
Part 3: Evaluating Agents using LLM-as-a-Judge (Databricks)
The Evaluation Crisis: Why BLEU and ROUGE fail in the "Year of the Agents."
LLM as a Judge: How to leverage models for scalable, human-aligned evaluation using MLflow.
Best Practices: Creating precise rubrics, scoring scales, and providing justifications (rationale) for scores.
Bias Mitigation: Strategies to overcome positional and verbosity bias in automated judging.
#AI #MachineLearning #LLM #SpeechRecognition #GenAI #Databricks #MLOps #Agents #FoundationalModels
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: