A Visual Guide to Mixture of Experts (MoE) in LLMs

Автор: Maarten Grootendorst

Загружено: 18 нояб. 2024 г.

Просмотров: 24 026 просмотров

Описание:

In this highly visual guide, we explore the architecture of a Mixture of Experts in Large Language Models (LLM) and Vision Language Models.

Timeline
0:00 Introduction
0:34 A Simplified Perspective
2:14 The Architecture of Experts
3:05 The Router
4:08 Dense vs. Sparse Layers
4:33 Going through a MoE Layer
5:35 Load Balancing
6:05 KeepTopK
7:27 Token Choice and Top-K Routing
7:48 Auxiliary Loss
9:23 Expert Capacity
10:40 Counting Parameters with Mixtral 7x8B
13:42 MoE in Vision Language Models
13:57 Vision Transformer
14:45 Vision-MoE
15:50 Soft-MoE
19:11 Bonus Content!

🛠️ Written version of this visual guide
https://newsletter.maartengrootendors...

Support to my newsletter for more visual guides:
✉️ Newsletter https://newsletter.maartengrootendors...

I wrote a book!
📚 Hands-On Large Language Models
https://llm-book.com/

#datascience #machinelearning #ai

A Visual Guide to Mixture of Experts (MoE) in LLMs

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Intuition behind Mamba and State Space Models | Enhancing LLMs!

Intuition behind Mamba and State Space Models | Enhancing LLMs!

Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ)

Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ)

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

What is Mixture of Experts?

What is Mixture of Experts?

How to Improve LLMs with RAG (Overview + Python Code)

How to Improve LLMs with RAG (Overview + Python Code)

4 Hours Chopin for Studying, Concentration & Relaxation

4 Hours Chopin for Studying, Concentration & Relaxation

Topic Modeling with Llama 2

Topic Modeling with Llama 2

Illustrated Guide to Transformers Neural Network: A step by step explanation

Illustrated Guide to Transformers Neural Network: A step by step explanation

Градиентный спуск, как обучаются нейросети | Глава 2, Глубинное обучение

Градиентный спуск, как обучаются нейросети | Глава 2, Глубинное обучение

How DeepSeek Rewrote the Transformer [MLA]

How DeepSeek Rewrote the Transformer [MLA]