Hoagy Cunningham — Finding distributed features in LLMs with sparse autoencoders [TAIS 2024]
Автор: AI Safety 東京
Загружено: 2024-05-14
Просмотров: 5723
One of the core roadblocks to understanding the computation inside a transformer is the fact that individual neurons do not seem to be a fruitful unit of analysis. Meanwhile, directions in activation spaces have proven to contain huge amounts of information and to facilitate control. With such an exponentially large space of potential directions, though, how can we find the important ones before we know what to look for, or hope to get a comprehensive list of the directions being used? In the last year, sparse autoencoders (SAEs) have emerged as a potential tool for solving these problems. In this talk I will explain how SAEs work, the lines of thought that led to their creation, and discuss the current state of progress.
This is a recording from TAIS 2024, a technical AI safety conference hosted at the Plaza Heisei in Tokyo April 5th–6th. TAIS 2024 was organised by AI Safety Tokyo, sponsored by Noeon Research, in collaboration with AI Alignment Network, AI Industry Foundation and Reaktor Japan.
0:00 Talk
22:50 Q&A
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: