A Roadmap for High-Stakes Evaluation in the Age of Agentic AI – Chandler Smith | IASEAI 2025
Автор: International Association for Safe & Ethical AI
Загружено: 2025-08-12
Просмотров: 24
As AI moves into high-stakes environments, how do we ensure our benchmarks keep up?
In this IASEAI ’25 session, Better Benchmarks: A Roadmap for High-Stakes Evaluation in the Age of Agentic AI, Chandler Smith (Research Engineer at the Cooperative AI Foundation) examines the shortcomings of current AI benchmarks—such as limited replicability and poor statistical reporting—and outlines a framework for building more rigorous, transparent, and trustworthy evaluations. Smith also explores unique risks in multiagent systems, including miscoordination, conflict, and collusion, and proposes how benchmarking can evolve to address these challenges in the age of agentic AI.
About IASEAI: https://www.iaseai.org
Chandler Smith: https://www.cooperativeai.com/team
#ChandlerSmith #AIBenchmarks #AISafety #AgenticAI #IASEAI
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: