Deconstructing Self-Attention: Q, K, V Explained Step-by-Step
Автор: The AI Lab Journal
Загружено: 2025-11-27
Просмотров: 17
Self-attention is the core trick behind Transformers and modern LLMs… but the math can feel mysterious.
In this video, we slowly deconstruct self-attention using a tiny sentence – “I love cats” – and follow every number from input embeddings all the way to the final contextual vectors.
No hand-waving, no big jumps. Just one clear example that shows exactly how Q, K, and V work together.
🔍 What you’ll learn
Why models need context to understand words like “bank”
How a simple sentence becomes an embedding matrix X
What Query (Q), Key (K) and Value (V) really represent
How we compute attention scores with QKᵀ and scale by √dₖ
How softmax turns scores into attention weights (percentages)
How the model mixes Value vectors to create contextualized outputs
How all of this collapses into the famous formula:
Attention(Q, K, V) = softmax(Q Kᵀ / √dₖ) V
👩💻 Who is this for?
Beginners who know a bit of linear algebra and want to see it in action
Developers and ML learners reading “Attention Is All You Need”
Anyone who has heard “self-attention” a thousand times and finally wants a concrete, visual example
📚 How this fits in The AI Lab Journal
This video is part of my learning-in-public series where I document my journey into AI math and deep learning. If you haven’t seen it yet, check out the previous episode on vectors and matrices for AI, which lays the groundwork for this one.
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: