Assessing skeptical views of interpretability research
Автор: Chris Potts
Загружено: 2025-11-10
Просмотров: 4689
Stanford AI Lab Faculty Lunch, November 7, 2025. Updated version of https://web.stanford.edu/~cgpotts/blo...
0:59 - Severance
1:45 - Explainable AI, Anthropic Interp, Stanford Interp
5:15 - Interpretability methods: Attribution, Probes, Interventions
15:27 - Skeptical positions
16:42 - "Interpretability cannot be achieved"
18:32 - "Interpretability is merely analysis"
21:14 - "Analysis is overrated"
27:33 - "Interpretability is not leading to improvements"
30:04 - "Interpretability is not helping with AI safety"
36:08 - Summary, and Aryaman's sweatshirt
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: