OpenAI DevDay 2024 | Community Spotlight | Sierra
Автор: OpenAI
Загружено: 17 дек. 2024 г.
Просмотров: 4 131 просмотр
Realistic agent benchmarks with LLMs: Measuring the performance and reliability of AI agents is challenging, especially in dynamic, real-world scenarios involving human interaction such as customer service. Sierra used OpenAI's GPT-4 and GPT-4o models to generate synthetic data and scenarios to simulate human users interacting with a customer service agent, resulting in the creation of τ-bench. This session will cover the technical challenges faced while creating the data and benchmark, findings from evaluating multiple LLM-based agents on τ-bench, and a discussion on building dynamic agent evaluations with foundation models.

Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: