Deconstructing Self-Attention: Q, K, V Explained Step-by-Step

Автор: The AI Lab Journal

Загружено: 2025-11-27

Просмотров: 17

Описание:

Self-attention is the core trick behind Transformers and modern LLMs… but the math can feel mysterious.
In this video, we slowly deconstruct self-attention using a tiny sentence – “I love cats” – and follow every number from input embeddings all the way to the final contextual vectors.

No hand-waving, no big jumps. Just one clear example that shows exactly how Q, K, and V work together.

🔍 What you’ll learn

Why models need context to understand words like “bank”

How a simple sentence becomes an embedding matrix X

What Query (Q), Key (K) and Value (V) really represent

How we compute attention scores with QKᵀ and scale by √dₖ

How softmax turns scores into attention weights (percentages)

How the model mixes Value vectors to create contextualized outputs

How all of this collapses into the famous formula:
Attention(Q, K, V) = softmax(Q Kᵀ / √dₖ) V

👩‍💻 Who is this for?

Beginners who know a bit of linear algebra and want to see it in action

Developers and ML learners reading “Attention Is All You Need”

Anyone who has heard “self-attention” a thousand times and finally wants a concrete, visual example

📚 How this fits in The AI Lab Journal

This video is part of my learning-in-public series where I document my journey into AI math and deep learning. If you haven’t seen it yet, check out the previous episode on vectors and matrices for AI, which lays the groundwork for this one.

Deconstructing Self-Attention: Q, K, V Explained Step-by-Step

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Vectors & Matrices for AI (Shapes, Dimensions & Multiplication)

Vectors & Matrices for AI (Shapes, Dimensions & Multiplication)

Roblox, WhatsApp, что дальше? Зачем Кремль все блокирует, к чему это приведет и как обойти

Roblox, WhatsApp, что дальше? Зачем Кремль все блокирует, к чему это приведет и как обойти

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Музыка для работы за компьютером | Фоновая музыка для концентрации и продуктивности

Музыка для работы за компьютером | Фоновая музыка для концентрации и продуктивности

Лучший Гайд по Kafka для Начинающих За 1 Час

Лучший Гайд по Kafka для Начинающих За 1 Час

The Equation That Revealed The Hidden Anti-Universe

The Equation That Revealed The Hidden Anti-Universe

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Экспресс-курс RAG для начинающих

Экспресс-курс RAG для начинающих

Мгновенное внимание: самый быстрый механизм внимания?

Мгновенное внимание: самый быстрый механизм внимания?

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

Python 0021 Lists #python, #programming, #coding #Trending #Viral #Explore #Tech #Tutorial #Shorts

Python 0021 Lists #python, #programming, #coding #Trending #Viral #Explore #Tech #Tutorial #Shorts

[1hr Talk] Intro to Large Language Models

[1hr Talk] Intro to Large Language Models

Cross Attention | Method Explanation | Math Explained

Cross Attention | Method Explanation | Math Explained

В Кремле обнаружена прослушка / Заявление Буданова

В Кремле обнаружена прослушка / Заявление Буданова

4 Hours Chopin for Studying, Concentration & Relaxation

4 Hours Chopin for Studying, Concentration & Relaxation

Основы ПЛК: релейная логика

Основы ПЛК: релейная логика

The AI Lab Podcast - Ep 001 | Attention Is All You Need

The AI Lab Podcast - Ep 001 | Attention Is All You Need

Изучите Microsoft Active Directory (ADDS) за 30 минут

Изучите Microsoft Active Directory (ADDS) за 30 минут

Алгоритмы и структуры данных за 15 минут! Вместо 4 лет универа

Алгоритмы и структуры данных за 15 минут! Вместо 4 лет универа

Что такое встраивание слов?

Что такое встраивание слов?