The Dark Matter of AI [Mechanistic Interpretability]

Автор: Welch Labs

Загружено: 23 дек. 2024 г.

Просмотров: 181 704 просмотра

Описание:

Take your personal data back with Incogni! Use code WELCHLABS at the link below and get 60% off an annual plan: http://incogni.com/welchlabs

Welch Labs Imaginary Numbers Book!
https://www.welchlabs.com/resources/i...

Welch Labs Posters:https://www.welchlabs.com/resources

Special Thanks to Patrons / welchlabs

Juan Benet, Ross Hanson, Yan Babitski, AJ Englehardt, Alvin Khaled, Eduardo Barraza, Hitoshi Yamauchi, Jaewon Jung, Mrgoodlight, Shinichi Hayashi, Sid Sarasvati, Dominic Beaumont, Shannon Prater, Ubiquity Ventures, Matias Forti, Brian Henry, Tim Palade, Petar Vecutin, Nicolas baumann, Jason Singh, Robert Riley, vornska, Barry Silverman

My Gemma walkthrough notebook: https://colab.research.google.com/dri...
Most animations made with Manim: https://github.com/3b1b/manim

References and Further Reading
Chris Olah’s original “Dark Matter of Neural Networks” post: https://transformer-circuits.pub/2024...
Great recent interview with Chris Olah: • Dario Amodei: Anthropic CEO on Claude...
Gemma Scope: https://arxiv.org/pdf/2408.05147
Experiment with SAEs yourself here! https://www.neuronpedia.org/
Relevant work from the Anthropic team:
https://transformer-circuits.pub/2022...
https://transformer-circuits.pub/2023...
https://transformer-circuits.pub/2024...
Excellent intro Mechanistic Interpretability: https://arena3-chapter1-transformer-i...
Neel Nanda’s Mechanistic Interpretability Explainer: https://dynalist.io/d/n2ZWtnoYHrU1s4v...
Transformer Lens: https://github.com/TransformerLensOrg...
SAE Lens: https://jbloomaus.github.io/SAELens/

Technical Notes
1. There are more advanced and more meaningful ways to map mid layer vectors to outputs, see: https://arxiv.org/pdf/2303.08112, https://neuralblog.github.io/logit-pr..., https://www.lesswrong.com/posts/AcKRB...
2. The 6x2304 matrix is actually 7x2304, we’re ignoring the /bos token.
3. Gemma also includes positional embeddings and lots and lots of normalization layers, which we didn’t really cover
4. I’m conflating tokens and words sometimes, in this example each word is a token, so we don’t have to worry about it too much
5. The “_” characters represent spaces in the token strings

The Dark Matter of AI [Mechanistic Interpretability]

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Kepler’s Impossible Equation

Kepler’s Impossible Equation

But what are Hamming codes? The origin of error correction

But what are Hamming codes? The origin of error correction

The moment we stopped understanding AI [AlexNet]

The moment we stopped understanding AI [AlexNet]

Future Computers Will Be Radically Different (Analog Computing)

Future Computers Will Be Radically Different (Analog Computing)

Deep & Melodic House 24/7: Relaxing Music • Chill Study Music

Deep & Melodic House 24/7: Relaxing Music • Chill Study Music

How DeepSeek Rewrote the Transformer [MLA]

How DeepSeek Rewrote the Transformer [MLA]

Mechanistic Interpretability explained | Chris Olah and Lex Fridman

Mechanistic Interpretability explained | Chris Olah and Lex Fridman

The Misconception that Almost Stopped AI

The Misconception that Almost Stopped AI

Чему нейросети на самом деле учатся? Исследуем мозг ИИ-модели.

Чему нейросети на самом деле учатся? Исследуем мозг ИИ-модели.

ChatGPT is made from 100 million of these [The Perceptron]

ChatGPT is made from 100 million of these [The Perceptron]