Llama 4 Explained: Architecture, Long Context, and Native Multimodality
Автор: Julia Turc
Загружено: 10 апр. 2025 г.
Просмотров: 4 629 просмотров
Curious how Meta’s Llama 4 works under the hood? In this deep dive, I reverse-engineer the Llama 4 architecture based on Meta’s official blog post and unpack the innovations that enable its 10M token context window and native multimodality.
✅ What makes Llama 4 natively multimodal?
✅ How does it support long context lengths? Is RAG obsolete?
✅ How good is it *really*?
🔍 Topics covered (with papers):
🔵 Early fusion (https://arxiv.org/pdf/2405.09818)
🔵 Context Parallelism / Ring Attention (https://arxiv.org/pdf/2310.01889)
🔵 Rotary Positional Embeddings / RoPE (https://arxiv.org/pdf/2104.09864)
🔵 Position Interpolation (https://arxiv.org/pdf/2306.15595)
🔵 No Positional Embeddings / NoPE (https://arxiv.org/pdf/2305.19466)
🔵 New training strategies: Mid-training, MetaP
This video is ideal for engineers and researchers curious about how LLMs scale, why Llama 4 matters, and what's next for long-context transformers.
📌 Note: This is a corrected re-upload due to A/V sync issues in the previous version.
#Llama4 #MetaAI #MultimodalLLM #LongContext
00:00 Intro
00:55 Behemoth, Maverick, Scout & Mixture-of-Experts
02:36 Multimodality in Llama 3
05:02 Native multimodality in Llama 4
08:27 10M context window
09:41 Ring Attention
12:28 Length generalization
16:56 New training techniques
20:21 Is RAG dead?
21:08 Evaluation

Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: