Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
dTub
Скачать

Transformers explained | The architecture behind LLMs

Автор: AI Coffee Break with Letitia

Загружено: 2024-01-21

Просмотров: 37434

Описание:

All you need to know about the transformer architecture: How to structure the inputs, attention (Queries, Keys, Values), positional embeddings, residual connections. Bonus: an overview of the difference between Recurrent Neural Networks (RNNs) and transformers.
9:19 Order of multiplication should be the opposite: x1(vector) * Wq(matrix) = q1(vector). Otherwise we do not get the 1x3 dimensionality at the end. Sorry for messing up the animation!

Check this out for a super cool transformer visualisation! 👏 https://poloclub.github.io/transforme...

➡️ AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring....

Outline:
00:00 Transformers explained
00:47 Text inputs
02:29 Image inputs
03:57 Next word prediction / Classification
06:08 The transformer layer: 1. MLP sublayer
06:47 2. Attention explained
07:57 Attention vs. self-attention
08:35 Queries, Keys, Values
09:19 Order of multiplication should be the opposite: x1(vector) * Wq(matrix) = q1(vector).
11:26 Multi-head attention
13:04 Attention scales quadratically
13:53 Positional embeddings
15:11 Residual connections and Normalization Layers
17:09 Masked Language Modelling
17:59 Difference to RNNs

Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Dres. Trost GbR, Siltax, Vignesh Valliappan, ‪@Mutual_Information‬ , Kshitij

Our old Transformer explained 📺 video:    • The Transformer neural network architectur...  
📺 Tokenization explained:    • What is tokenization and how does it work?...  
📺 Word embeddings:    • How modern search engines work – Vector da...  
📽️ Replacing Self-Attention:    • Replacing Self-attention  
📽️ Position embeddings:    • Position encodings in Transformers explain...  
‪@SerranoAcademy‬ Transformer series:    • The Attention Mechanism in Large Language ...  

📄 Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." Advances in neural information processing systems 30 (2017).

▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Patreon:   / aicoffeebreak  
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

🔗 Links:
AICoffeeBreakQuiz:    / aicoffeebreak  
Twitter:   / aicoffeebreak  
Reddit:   / aicoffeebreak  
YouTube:    / aicoffeebreak  

#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research​
Music 🎵 : Sunset n Beachz - Ofshane
Video editing: Nils Trost

Transformers explained | The architecture behind LLMs

Поделиться в:

Доступные форматы для скачивания:

Скачать видео mp4

  • Информация по загрузке:

Скачать аудио mp3

Похожие видео

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained

Transformer, explained in detail | Igor Kotenkov | NLP Lecture (in Russian)

Transformer, explained in detail | Igor Kotenkov | NLP Lecture (in Russian)

LLM Lecture: A Deep Dive into Transformers, Prompts, and Human Feedback

LLM Lecture: A Deep Dive into Transformers, Prompts, and Human Feedback

Transhumanism vs AI: Who Replaces Humanity? | Documentary

Transhumanism vs AI: Who Replaces Humanity? | Documentary

Energy-Based Transformers explained | How EBTs and EBMs work

Energy-Based Transformers explained | How EBTs and EBMs work

Почему «Трансформеры» заменяют CNN?

Почему «Трансформеры» заменяют CNN?

MAMBA and State Space Models explained | SSM explained

MAMBA and State Space Models explained | SSM explained

Объяснение «Трансформеров»: открытие, которое навсегда изменило искусственный интеллект

Объяснение «Трансформеров»: открытие, которое навсегда изменило искусственный интеллект

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained

Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained

Rotary Positional Embeddings Explained | Transformer

Rotary Positional Embeddings Explained | Transformer

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 1 - Transformer

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 1 - Transformer

How DeepSeek Rewrote the Transformer [MLA]

How DeepSeek Rewrote the Transformer [MLA]

I Visualised Attention in Transformers

I Visualised Attention in Transformers

Positional embeddings in transformers EXPLAINED | Demystifying positional encodings.

Positional embeddings in transformers EXPLAINED | Demystifying positional encodings.

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Transformer Explained

Transformer Explained

How AI Taught Itself to See [DINOv3]

How AI Taught Itself to See [DINOv3]

MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention

MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention

Introduction to Vision Transformer (ViT) | An image is worth 16x16 words | Computer Vision Series

Introduction to Vision Transformer (ViT) | An image is worth 16x16 words | Computer Vision Series

© 2025 dtub. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: [email protected]