Inspect - A LLM Eval Framework Used by Anthropic, DeepMind, Grok and More.
Автор: Hamel Husain
Загружено: 2025-06-21
Просмотров: 4225
Join the AI Evals Course starting Jan 27, 2026: https://maven.com/parlance-labs/evals...
. JJ Allaire on Inspect AI Evals for LLMs
JJ Allaire, founder of RStudio (Posit), presents Inspect AI, a Python-based framework for flexible and scalable LLM evaluations created at the UK AI Security Institute. Allaire highlights its extensive use in academia and industry, its open-source nature, and its design for handling complex evaluation tasks, including solvers and scores. The discussion covers its integration capabilities, user contributions, and its compatibility with production systems, providing a comprehensive tool for evaluating and improving language models.
00:00 Introduction and Guest Speaker Introduction
00:03 JJ Allaire's Background and Contributions
01:11 Introduction to Inspect AI Framework
01:55 Features and Capabilities of Inspect AI
07:01 High-Level and Low-Level API Overview
08:45 Advanced Use Cases and Examples
17:26 Agent Bridge and Production Integration
21:54 Inspect Evals and Practical Applications
22:36 Introduction to Reproducing Evals
22:51 Foundation Model Evals
23:43 Scoring and Benchmarks
24:33 Production and Logging Tools
25:18 Web Publishing and Visualization
26:42 Sandbox Environments
28:43 Community and Contributions
29:29 Web Search and Browser Tools
31:30 Questions and Answers
35:07 Annotation Tools and Future Plans
39:21 Experiment Tracking and Analysis
42:20 Final Remarks and Wrap-Up
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: