OpenAI DevDay 2024 | Community Spotlight | Sierra

Автор: OpenAI

Загружено: 17 дек. 2024 г.

Просмотров: 4 169 просмотров

Описание:

Realistic agent benchmarks with LLMs: Measuring the performance and reliability of AI agents is challenging, especially in dynamic, real-world scenarios involving human interaction such as customer service. Sierra used OpenAI's GPT-4 and GPT-4o models to generate synthetic data and scenarios to simulate human users interacting with a customer service agent, resulting in the creation of τ-bench. This session will cover the technical challenges faced while creating the data and benchmark, findings from evaluating multiple LLM-based agents on τ-bench, and a discussion on building dynamic agent evaluations with foundation models.

OpenAI DevDay 2024 | Community Spotlight | Sierra

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

OpenAI DevDay 2024 | Multimodal apps with the Realtime API

OpenAI DevDay 2024 | Multimodal apps with the Realtime API

OpenAI DevDay 2024 | Community Spotlight | Stainless

OpenAI DevDay 2024 | Community Spotlight | Stainless

The Rise of Generative AI for Business

The Rise of Generative AI for Business

But how does bitcoin actually work?

But how does bitcoin actually work?

Large Language Models (LLMs) - Everything You NEED To Know

Large Language Models (LLMs) - Everything You NEED To Know

Steve Jobs' 2005 Stanford Commencement Address

Steve Jobs' 2005 Stanford Commencement Address

Watch: OpenAI CEO Sam Altman, other executives give opening statements at Senate AI hearing

Watch: OpenAI CEO Sam Altman, other executives give opening statements at Senate AI hearing

Blender Tutorial for Complete Beginners - Part 1

Blender Tutorial for Complete Beginners - Part 1

OpenAI DevDay 2024 | Community Spotlight | Sana AI

OpenAI DevDay 2024 | Community Spotlight | Sana AI

Introduction to Operator & Agents

Introduction to Operator & Agents