In-Context Learning & "Model Systems" Interpretability (Stanford lecture 3) - Ekdeep Singh Lubana
Автор: Goodfire
Загружено: 2025-12-11
Просмотров: 1070
What counts as an explanation of how an LLM works?
In our last Stanford guest lecture, Ekdeep explains the different levels of analysis in interpretability, and outlines his neuro-inspired "model systems approach".
Plus, how in-context learning and many-shot jailbreaking are explained by LLM representations changing in-context (as a case study for that approach).
00:33 - What counts as an explanation?
04:47 - Levels of analysis & standard interpretability approaches
18:19 - The "model systems" approach to interp
(Case study on in-context learning)
23:36 - How LLM representations change in-context
44:10 - Modeling ICL with rational analysis
1:10:54 - Conclusion & questions
Read more about our research: https://www.goodfire.ai/research
Follow us on X: https://x.com/GoodfireAI
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: