Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

Автор: Umar Jamil

Загружено: 2023-05-28

Просмотров: 546106

Описание:

A complete explanation of all the layers of a Transformer Model: Multi-Head Self-Attention, Positional Encoding, including all the matrix multiplications and a complete description of the training and inference process.

Paper: Attention is all you need - https://arxiv.org/abs/1706.03762

Slides PDF: https://github.com/hkproj/transformer...

Chapters
00:00 - Intro
01:10 - RNN and their problems
08:04 - Transformer Model
09:02 - Maths background and notations
12:20 - Encoder (overview)
12:31 - Input Embeddings
15:04 - Positional Encoding
20:08 - Single Head Self-Attention
28:30 - Multi-Head Attention
35:39 - Query, Key, Value
37:55 - Layer Normalization
40:13 - Decoder (overview)
42:24 - Masked Multi-Head Attention
44:59 - Training
52:09 - Inference

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

array(10) { [0]=> object(stdClass)#6438 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "s0de5Q4taFE" ["related_video_title"]=> string(71) "МАШИННОЕ ОБУЧЕНИЕ - ВСЕ ЧТО НУЖНО ЗНАТЬ" ["posted_time"]=> string(25) "2 месяца назад" ["channelName"]=> string(6) "мыш" } [1]=> object(stdClass)#6411 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "eMlx5fFNoYc" ["related_video_title"]=> string(130) "Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение" ["posted_time"]=> string(19) "1 год назад" ["channelName"]=> string(11) "3Blue1Brown" } [2]=> object(stdClass)#6436 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "ISNdQcPhsts" ["related_video_title"]=> string(92) "Coding a Transformer from scratch on PyTorch, with full explanation, training and inference." ["posted_time"]=> string(21) "2 года назад" ["channelName"]=> string(10) "Umar Jamil" } [3]=> object(stdClass)#6443 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "LCEmiRjPEtQ" ["related_video_title"]=> string(45) "Andrej Karpathy: Software Is Changing (Again)" ["posted_time"]=> string(21) "7 дней назад" ["channelName"]=> string(12) "Y Combinator" } [4]=> object(stdClass)#6422 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "7hMoz9q4zv0" ["related_video_title"]=> string(43) "1-Bit LLM: The Most Efficient LLM Possible?" ["posted_time"]=> string(21) "8 дней назад" ["channelName"]=> string(7) "bycloud" } [5]=> object(stdClass)#6440 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "iwEzwTTalbg" ["related_video_title"]=> string(80) "Variational Autoencoder - Model, ELBO, loss function and maths explained easily!" ["posted_time"]=> string(21) "2 года назад" ["channelName"]=> string(10) "Umar Jamil" } [6]=> object(stdClass)#6435 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "KJtZARuO3JY" ["related_video_title"]=> string(70) "Visualizing transformers and attention | Talk for TNG Big Tech Day '24" ["posted_time"]=> string(27) "7 месяцев назад" ["channelName"]=> string(15) "Grant Sanderson" } [7]=> object(stdClass)#6445 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "kCc8FmEb1nY" ["related_video_title"]=> string(52) "Let's build GPT: from scratch, in code, spelled out." ["posted_time"]=> string(21) "2 года назад" ["channelName"]=> string(15) "Andrej Karpathy" } [8]=> object(stdClass)#6421 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "Mn_9W1nCFLo" ["related_video_title"]=> string(97) "LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU" ["posted_time"]=> string(19) "1 год назад" ["channelName"]=> string(10) "Umar Jamil" } [9]=> object(stdClass)#6439 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "90mGPxR2GgY" ["related_video_title"]=> string(81) "BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token" ["posted_time"]=> string(19) "1 год назад" ["channelName"]=> string(10) "Umar Jamil" } }

МАШИННОЕ ОБУЧЕНИЕ - ВСЕ ЧТО НУЖНО ЗНАТЬ

МАШИННОЕ ОБУЧЕНИЕ - ВСЕ ЧТО НУЖНО ЗНАТЬ

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.

Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.

Andrej Karpathy: Software Is Changing (Again)

Andrej Karpathy: Software Is Changing (Again)

1-Bit LLM: The Most Efficient LLM Possible?

1-Bit LLM: The Most Efficient LLM Possible?

Variational Autoencoder - Model, ELBO, loss function and maths explained easily!

Variational Autoencoder - Model, ELBO, loss function and maths explained easily!

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Let's build GPT: from scratch, in code, spelled out.

Let's build GPT: from scratch, in code, spelled out.

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token

BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token