AI safety Thursdays: Chain-of-Thought Monitoring and AI Control
Автор: Trajectory Labs
Загружено: 2025-10-30
Просмотров: 93
Modern reasoning models do a lot of thinking in natural language before producing their outputs. Can we catch misbehaviours by our LLMs and interpret their motivations simply by reading these chains of thought?
In this talk, Rauno Arike and Rohan Subramani will give an overview of research areas in chain-of-thought monitorability and AI control, and discuss their recent research on the usefulness of chain-of-thought monitoring for ensuring that LLM agents only pursue objectives that their developers intended them to follow.
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: