Lessons from the Trenches: Building LLM Evals That Work IRL: Aparna Dhinkaran
Автор: AI Engineer
Загружено: 6 февр. 2025 г.
Просмотров: 768 просмотров
With nearly two-thirds of enterprise developers planning production deployments of large language models this year, LLM evaluation has never been more important. LLM evaluation is also an area where confusion reigns, starting with ambiguity around what “LLM evals” even means. Often, LLM model evaluation – quantifying general fitness (i.e. on the Hugging Face leaderboard) – is conflated with task-specific LLM system evaluation. And while many foundation model providers offer their own evals, AI engineers building LLM systems designed to plug into many models or tools need a way to objectively evaluate both different foundation models and their own systems with rigorous techniques. In this session, Arize AI founder Aparna Dhinakaran will release research onstage and walk attendees through real life examples of building an LLM Eval from scratch. This session will build on multiple research pieces that have garnered millions of views across social platforms, diving into techniques to build out robust LLM evals and ultimately gain a better understanding of the limits of LLM capabilities. Want to build your own LLM task evals for a specific use case leveraging open source tools? Want to see the latest research on which foundation models your company should be using for specific use cases? You won’t want to miss this session!
Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at https://www.ai.engineer/worldsfair/20... & join us at the AI Engineer World's Fair in 2025! Get your tickets today at https://ai.engineer/2025
About Aparna
Aparna Dhinakaran is the Co-Founder and Chief Product Officer at Arize AI, a pioneer and early leader in AI observability and LLM evaluation. A frequent speaker at top conferences and thought leader in the space, Dhinakaran is a Forbes 30 Under 30 honoree. Before Arize, Dhinakaran was an ML engineer and leader at Uber, Apple, and TubeMogul (acquired by Adobe). During her time at Uber, she built several core ML Infrastructure platforms, including Michelangelo. She has a bachelor’s from Berkeley's Electrical Engineering and Computer Science program, where she published research with Berkeley's AI Research group. She is on a leave of absence from the Computer Vision Ph.D. program at Cornell University.

Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: