Decoder-Only Transformers, ChatGPTs specific Transformer, Clearly Explained!!!

Автор: StatQuest with Josh Starmer

Загружено: 2023-08-27

Просмотров: 206147

Описание:

Transformers are taking over AI right now, and quite possibly their most famous use is in ChatGPT. ChatGPT uses a specific type of Transformer called a Decoder-Only Transformer, and this StatQuest shows you how they work, one step at a time. And at the end (at 32:14), we talk about the differences between a Normal Transformer and a Decoder-Only Transformer. BAM!

NOTE: If you're interested in learning more about Backpropagation, check out these 'Quests:
The Chain Rule:    • The Chain Rule, Clearly Explained!!!
Gradient Descent:    • Gradient Descent, Step-by-Step
Backpropagation Main Ideas:    • Neural Networks Pt. 2: Backpropagation Mai...
Backpropagation Details Part 1:    • Backpropagation Details Pt. 1: Optimizing ...
Backpropagation Details Part 2:    • Backpropagation Details Pt. 2: Going bonke...

If you're interested in learning more about the SoftMax function, check out:
   • Neural Networks Part 5: ArgMax and SoftMax

If you're interested in learning more about Word Embedding, check out:    • Word Embedding and Word2Vec, Clearly Expla...

If you'd like to learn more about calculating similarities in the context of neural networks and the Dot Product, check out:
Cosine Similarity:    • Cosine Similarity, Clearly Explained!!!
Attention:    • Attention for Neural Networks, Clearly Exp...

If you'd like to learn more about Normal Transformers, see:    • Transformer Neural Networks, ChatGPT's fou...

For a complete index of all the StatQuest videos, check out:
https://statquest.org/video-index/

If you'd like to support StatQuest, please consider...

Patreon:   / statquest
...or...
YouTube Membership:    / @statquest

...buying one of my books, a study guide, a t-shirt or hoodie, or a song from the StatQuest store...
https://statquest.org/statquest-store/

...or just donating to StatQuest!
https://www.paypal.me/statquest

Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
  / joshuastarmer

0:00 Awesome song and introduction
1:34 Word Embedding
7:26 Position Encoding
10:10 Masked Self-Attention, an Autoregressive method
22:35 Residual Connections
23:00 Generating the next word in the prompt
26:23 Review of encoding and generating the prompt
27:20 Generating the output, Part 1
28:46 Masked Self-Attention while generating the output
30:40 Generating the output, Part 2
32:14 Normal Transformers vs Decoder-Only Transformers

#StatQuest

Decoder-Only Transformers, ChatGPTs specific Transformer, Clearly Explained!!!

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Encoder-Only Transformers (like BERT) for RAG, Clearly Explained!!!

Encoder-Only Transformers (like BERT) for RAG, Clearly Explained!!!

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!

Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!

Why Is M-Theory Crucial For Quantum Gravity?

Why Is M-Theory Crucial For Quantum Gravity?

Почему «Трансформеры» заменяют CNN?

Почему «Трансформеры» заменяют CNN?

The Man Who Almost Broke Math (And Himself...) - Axiom of Choice

The Man Who Almost Broke Math (And Himself...) - Axiom of Choice

NotebookLM: Таблицы из всего. 4 Способа применения

NotebookLM: Таблицы из всего. 4 Способа применения

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Какая архитектура трансформатора лучше? Модели только с энкодером, энкодером и декодером, модели ...

Какая архитектура трансформатора лучше? Модели только с энкодером, энкодером и декодером, модели ...

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

The Trillion Dollar Equation

The Trillion Dollar Equation

THIS is why large language models can understand the world

THIS is why large language models can understand the world

Backpropagation Details Pt. 1: Optimizing 3 parameters simultaneously.

Backpropagation Details Pt. 1: Optimizing 3 parameters simultaneously.

Word Embedding and Word2Vec, Clearly Explained!!!

Word Embedding and Word2Vec, Clearly Explained!!!

Sequence-to-Sequence (seq2seq) Encoder-Decoder Neural Networks, Clearly Explained!!!

Sequence-to-Sequence (seq2seq) Encoder-Decoder Neural Networks, Clearly Explained!!!

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

Recurrent Neural Networks (RNNs), Clearly Explained!!!

Recurrent Neural Networks (RNNs), Clearly Explained!!!

The Essential Main Ideas of Neural Networks

The Essential Main Ideas of Neural Networks

Объяснение «Трансформеров»: открытие, которое навсегда изменило искусственный интеллект

Объяснение «Трансформеров»: открытие, которое навсегда изменило искусственный интеллект

I Visualised Attention in Transformers

I Visualised Attention in Transformers