AI safety Thursdays: Chain-of-Thought Monitoring and AI Control

Автор: Trajectory Labs

Загружено: 2025-10-30

Просмотров: 93

Описание:

Modern reasoning models do a lot of thinking in natural language before producing their outputs. Can we catch misbehaviours by our LLMs and interpret their motivations simply by reading these chains of thought?

In this talk, Rauno Arike and Rohan Subramani will give an overview of research areas in chain-of-thought monitorability and AI control, and discuss their recent research on the usefulness of chain-of-thought monitoring for ensuring that LLM agents only pursue objectives that their developers intended them to follow.

AI safety Thursdays: Chain-of-Thought Monitoring and AI Control

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

The Limitations of RL for LLMs in Achieving AI for Science

The Limitations of RL for LLMs in Achieving AI for Science

4 Hours Chopin for Studying, Concentration & Relaxation

4 Hours Chopin for Studying, Concentration & Relaxation

Spec-driven Testing with the AI Unified Process

Spec-driven Testing with the AI Unified Process

Introduction to Corrigibility

Introduction to Corrigibility

ГЛАВНЫЕ правила переговоров. СЕКРЕТ адвоката дьявола — Александр Добровинский.

ГЛАВНЫЕ правила переговоров. СЕКРЕТ адвоката дьявола — Александр Добровинский.

Predicting Shifts in AI Driven Security Risks

Predicting Shifts in AI Driven Security Risks

Richard Sutton – Father of RL thinks LLMs are a dead end

Richard Sutton – Father of RL thinks LLMs are a dead end

Will AI outsmart human intelligence? - with 'Godfather of AI' Geoffrey Hinton

Will AI outsmart human intelligence? - with 'Godfather of AI' Geoffrey Hinton

Stanford CS25: V5 I Large Language Model Reasoning, Denny Zhou of Google Deepmind

Stanford CS25: V5 I Large Language Model Reasoning, Denny Zhou of Google Deepmind

Python + AI: Large Language Models

Python + AI: Large Language Models

Закат программистов? Нет, эра архитекторов AI. // Сергей Марков

Закат программистов? Нет, эра архитекторов AI. // Сергей Марков

The Limits of AI: Generative AI, NLP, AGI, & What’s Next?

The Limits of AI: Generative AI, NLP, AGI, & What’s Next?

Super Intelligence Speculation - Computerphile

Super Intelligence Speculation - Computerphile

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

БЕЛКОВСКИЙ: "В Кремле проговорились". Кошмар Путина, что с "Мирным планом", что у Трампа, Китай,Маск

The Catastrophic Risks of AI — and a Safer Path | Yoshua Bengio | TED

The Catastrophic Risks of AI — and a Safer Path | Yoshua Bengio | TED

AI Safety Thursdays: How You Can Help Steer Us Toward a Positive AGI Future

AI Safety Thursdays: How You Can Help Steer Us Toward a Positive AGI Future

Алексей Венедиктов: «У меня нет выгоды. Я могу пойти на все». Разговор на фоне идеального шторма

Алексей Венедиктов: «У меня нет выгоды. Я могу пойти на все». Разговор на фоне идеального шторма

The Turing Lectures: The future of generative AI

The Turing Lectures: The future of generative AI

Что такое генеративный ИИ и как он работает? – Лекции Тьюринга с Миреллой Лапатой

Что такое генеративный ИИ и как он работает? – Лекции Тьюринга с Миреллой Лапатой