Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
dTub
Скачать

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

Автор: Umar Jamil

Загружено: 2023-05-28

Просмотров: 546106

Описание:

A complete explanation of all the layers of a Transformer Model: Multi-Head Self-Attention, Positional Encoding, including all the matrix multiplications and a complete description of the training and inference process.

Paper: Attention is all you need - https://arxiv.org/abs/1706.03762

Slides PDF: https://github.com/hkproj/transformer...

Chapters
00:00 - Intro
01:10 - RNN and their problems
08:04 - Transformer Model
09:02 - Maths background and notations
12:20 - Encoder (overview)
12:31 - Input Embeddings
15:04 - Positional Encoding
20:08 - Single Head Self-Attention
28:30 - Multi-Head Attention
35:39 - Query, Key, Value
37:55 - Layer Normalization
40:13 - Decoder (overview)
42:24 - Masked Multi-Head Attention
44:59 - Training
52:09 - Inference

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

Поделиться в:

Доступные форматы для скачивания:

Скачать видео mp4

  • Информация по загрузке:

Скачать аудио mp3

Похожие видео

array(10) { [0]=> object(stdClass)#6438 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "s0de5Q4taFE" ["related_video_title"]=> string(71) "МАШИННОЕ ОБУЧЕНИЕ - ВСЕ ЧТО НУЖНО ЗНАТЬ" ["posted_time"]=> string(25) "2 месяца назад" ["channelName"]=> string(6) "мыш" } [1]=> object(stdClass)#6411 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "eMlx5fFNoYc" ["related_video_title"]=> string(130) "Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение" ["posted_time"]=> string(19) "1 год назад" ["channelName"]=> string(11) "3Blue1Brown" } [2]=> object(stdClass)#6436 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "ISNdQcPhsts" ["related_video_title"]=> string(92) "Coding a Transformer from scratch on PyTorch, with full explanation, training and inference." ["posted_time"]=> string(21) "2 года назад" ["channelName"]=> string(10) "Umar Jamil" } [3]=> object(stdClass)#6443 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "LCEmiRjPEtQ" ["related_video_title"]=> string(45) "Andrej Karpathy: Software Is Changing (Again)" ["posted_time"]=> string(21) "7 дней назад" ["channelName"]=> string(12) "Y Combinator" } [4]=> object(stdClass)#6422 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "7hMoz9q4zv0" ["related_video_title"]=> string(43) "1-Bit LLM: The Most Efficient LLM Possible?" ["posted_time"]=> string(21) "8 дней назад" ["channelName"]=> string(7) "bycloud" } [5]=> object(stdClass)#6440 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "iwEzwTTalbg" ["related_video_title"]=> string(80) "Variational Autoencoder - Model, ELBO, loss function and maths explained easily!" ["posted_time"]=> string(21) "2 года назад" ["channelName"]=> string(10) "Umar Jamil" } [6]=> object(stdClass)#6435 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "KJtZARuO3JY" ["related_video_title"]=> string(70) "Visualizing transformers and attention | Talk for TNG Big Tech Day '24" ["posted_time"]=> string(27) "7 месяцев назад" ["channelName"]=> string(15) "Grant Sanderson" } [7]=> object(stdClass)#6445 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "kCc8FmEb1nY" ["related_video_title"]=> string(52) "Let's build GPT: from scratch, in code, spelled out." ["posted_time"]=> string(21) "2 года назад" ["channelName"]=> string(15) "Andrej Karpathy" } [8]=> object(stdClass)#6421 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "Mn_9W1nCFLo" ["related_video_title"]=> string(97) "LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU" ["posted_time"]=> string(19) "1 год назад" ["channelName"]=> string(10) "Umar Jamil" } [9]=> object(stdClass)#6439 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "90mGPxR2GgY" ["related_video_title"]=> string(81) "BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token" ["posted_time"]=> string(19) "1 год назад" ["channelName"]=> string(10) "Umar Jamil" } }
МАШИННОЕ ОБУЧЕНИЕ - ВСЕ ЧТО НУЖНО ЗНАТЬ

МАШИННОЕ ОБУЧЕНИЕ - ВСЕ ЧТО НУЖНО ЗНАТЬ

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.

Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.

Andrej Karpathy: Software Is Changing (Again)

Andrej Karpathy: Software Is Changing (Again)

1-Bit LLM: The Most Efficient LLM Possible?

1-Bit LLM: The Most Efficient LLM Possible?

Variational Autoencoder - Model, ELBO, loss function and maths explained easily!

Variational Autoencoder - Model, ELBO, loss function and maths explained easily!

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Let's build GPT: from scratch, in code, spelled out.

Let's build GPT: from scratch, in code, spelled out.

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

BERT explained: Training, Inference,  BERT vs GPT/LLamA, Fine tuning, [CLS] token

BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token

© 2025 dtub. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: [email protected]