Inspect - A LLM Eval Framework Used by Anthropic, DeepMind, Grok and More.

Автор: Hamel Husain

Загружено: 2025-06-21

Просмотров: 4225

Описание:

Join the AI Evals Course starting Jan 27, 2026: https://maven.com/parlance-labs/evals...

. JJ Allaire on Inspect AI Evals for LLMs

JJ Allaire, founder of RStudio (Posit), presents Inspect AI, a Python-based framework for flexible and scalable LLM evaluations created at the UK AI Security Institute. Allaire highlights its extensive use in academia and industry, its open-source nature, and its design for handling complex evaluation tasks, including solvers and scores. The discussion covers its integration capabilities, user contributions, and its compatibility with production systems, providing a comprehensive tool for evaluating and improving language models.

00:00 Introduction and Guest Speaker Introduction
00:03 JJ Allaire's Background and Contributions
01:11 Introduction to Inspect AI Framework
01:55 Features and Capabilities of Inspect AI
07:01 High-Level and Low-Level API Overview
08:45 Advanced Use Cases and Examples
17:26 Agent Bridge and Production Integration
21:54 Inspect Evals and Practical Applications
22:36 Introduction to Reproducing Evals
22:51 Foundation Model Evals
23:43 Scoring and Benchmarks
24:33 Production and Logging Tools
25:18 Web Publishing and Visualization
26:42 Sandbox Environments
28:43 Community and Contributions
29:29 Web Search and Browser Tools
31:30 Questions and Answers
35:07 Annotation Tools and Future Plans
39:21 Experiment Tracking and Analysis
42:20 Final Remarks and Wrap-Up

Inspect - A LLM Eval Framework Used by Anthropic, DeepMind, Grok and More.

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

How to Process Documents at Scale with LLMs

How to Process Documents at Scale with LLMs

Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar

Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar

Five hard earned lessons about Evals — Ankur Goyal, Braintrust

Five hard earned lessons about Evals — Ankur Goyal, Braintrust

Управление поведением LLM без тонкой настройки

Управление поведением LLM без тонкой настройки

Почему RAG терпит неудачу — как CLaRa устраняет свой главный недостаток

Почему RAG терпит неудачу — как CLaRa устраняет свой главный недостаток

Самая сложная модель из тех, что мы реально понимаем

Самая сложная модель из тех, что мы реально понимаем

Оценки ИИ: наглядное объяснение за 50 минут (реальный пример) | Хамель Хусейн

Оценки ИИ: наглядное объяснение за 50 минут (реальный пример) | Хамель Хусейн

Бросаю вызов гравитации — Кевин Хоу, Google DeepMind

Бросаю вызов гравитации — Кевин Хоу, Google DeepMind

Почему MCP действительно важен | Модель контекстного протокола с Тимом Берглундом

Почему MCP действительно важен | Модель контекстного протокола с Тимом Берглундом

Новый код — Шон Гроув, OpenAI

Новый код — Шон Гроув, OpenAI

Inspect, an OSS Framework for LLM Evals

Inspect, an OSS Framework for LLM Evals

Илон Маск ПОДСТАВИЛ Всех! Афера с ИИ Дата Центрами В Космосе. Скандал с Grok. Крупный Шаг OpenAI.

Илон Маск ПОДСТАВИЛ Всех! Афера с ИИ Дата Центрами В Космосе. Скандал с Grok. Крупный Шаг OpenAI.

Запуск нейросетей локально. Генерируем - ВСЁ

Запуск нейросетей локально. Генерируем - ВСЁ

Превратите ЛЮБОЙ файл в знания LLM за СЕКУНДЫ

Превратите ЛЮБОЙ файл в знания LLM за СЕКУНДЫ

AI Evals For Engineers: Course Preview (Chapters 1-3 of 8)

AI Evals For Engineers: Course Preview (Chapters 1-3 of 8)

Complete Beginner's Course on AI Evaluations in 50 Minutes (2025) | Aman Khan

Complete Beginner's Course on AI Evaluations in 50 Minutes (2025) | Aman Khan

Strategies for LLM Evals (GuideLLM, lm-eval-harness, OpenAI Evals Workshop) — Taylor Jordan Smith

Strategies for LLM Evals (GuideLLM, lm-eval-harness, OpenAI Evals Workshop) — Taylor Jordan Smith

Cursor AI: полный гайд по вайб-кодингу (настройки, фишки, rules, MCP)

Cursor AI: полный гайд по вайб-кодингу (настройки, фишки, rules, MCP)

How to Systematically Setup LLM Evals (Metrics, Unit Tests, LLM-as-a-Judge)

How to Systematically Setup LLM Evals (Metrics, Unit Tests, LLM-as-a-Judge)

7 AI Terms You Need to Know: Agents, RAG, ASI & More

7 AI Terms You Need to Know: Agents, RAG, ASI & More