A Visual Guide to Mixture of Experts (MoE) in LLMs
Автор: Maarten Grootendorst
Загружено: 18 нояб. 2024 г.
Просмотров: 24 026 просмотров
In this highly visual guide, we explore the architecture of a Mixture of Experts in Large Language Models (LLM) and Vision Language Models.
Timeline
0:00 Introduction
0:34 A Simplified Perspective
2:14 The Architecture of Experts
3:05 The Router
4:08 Dense vs. Sparse Layers
4:33 Going through a MoE Layer
5:35 Load Balancing
6:05 KeepTopK
7:27 Token Choice and Top-K Routing
7:48 Auxiliary Loss
9:23 Expert Capacity
10:40 Counting Parameters with Mixtral 7x8B
13:42 MoE in Vision Language Models
13:57 Vision Transformer
14:45 Vision-MoE
15:50 Soft-MoE
19:11 Bonus Content!
🛠️ Written version of this visual guide
https://newsletter.maartengrootendors...
Support to my newsletter for more visual guides:
✉️ Newsletter https://newsletter.maartengrootendors...
I wrote a book!
📚 Hands-On Large Language Models
https://llm-book.com/
#datascience #machinelearning #ai

Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: