Multi-Head Attention (MHA), Multi-Query Attention (MQA), Grouped Query Attention (GQA) Explained
Автор: DataMListic
Загружено: 2 янв. 2024 г.
Просмотров: 6 703 просмотра
In this video, we explore how the Multi-Head Attention (MHA), Multi-Query Attention (MQA) and Grouped-Query Attention (GQA) work, and what are the pros and cons in using each one of them.
References
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Self-Attention Mechanism Explained: • Transformer Self-Attention Mechanism ...
Attention Is All You Need paper: https://arxiv.org/abs/1706.03762
Fast Transformer Decoding: One Write-Head is All You Need paper: https://arxiv.org/abs/1911.02150
GQA: Training Generalized Multi-Query Transformer Models from
Multi-Head Checkpoints paper: https://arxiv.org/abs/2305.13245
Related Videos
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Why Language Models Hallucinate: • Why LLMs Hallucinate
Grounding DINO, Open-Set Object Detection: • Object Detection Part 8: Grounding DI...
Detection Transformers (DETR), Object Queries: • Object Detection Part 7: Detection Tr...
Wav2vec2 A Framework for Self-Supervised Learning of Speech Representations - Paper Explained: • Wav2vec2 A Framework for Self-Supervi...
Transformer Self-Attention Mechanism Explained: • Transformer Self-Attention Mechanism ...
How to Fine-tune Large Language Models Like ChatGPT with Low-Rank Adaptation (LoRA): • Low-Rank Adaptation (LoRA) Explained
Contents
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
00:00 - Intro
00:37 - Multi-Head Attention (MHA)
01:45 - Multi-Query Attention (MQA)
03:36 - Grouped-Query Attention (GQA)
05:04 - MHA vs MQA vs GQA
06:58 - Outro
Follow Me
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
🐦 Twitter: @datamlistic / datamlistic
📸 Instagram: @datamlistic / datamlistic
📱 TikTok: @datamlistic / datamlistic
Channel Support
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
The best way to support the channel is to share the content. ;)
If you'd like to also support the channel financially, donating the price of a coffee is always warmly welcomed! (completely optional and voluntary)
► Patreon: / datamlistic
► Bitcoin (BTC): 3C6Pkzyb5CjAUYrJxmpCaaNPVRgRVxxyTq
► Ethereum (ETH): 0x9Ac4eB94386C3e02b96599C05B7a8C71773c9281
► Cardano (ADA): addr1v95rfxlslfzkvd8sr3exkh7st4qmgj4ywf5zcaxgqgdyunsj5juw5
► Tether (USDT): 0xeC261d9b2EE4B6997a6a424067af165BAA4afE1a
#transformers #mha #mqa #gqa

Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: