Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Автор: Gabriel Mongaras

Загружено: 2025-02-21

Просмотров: 5629

Описание:

Paper: https://arxiv.org/abs/2502.11089

Notes: https://drive.google.com/open?id=1HLE...

00:00 Intro
01:30 Sparse attention
05:48 Token compression attention
13:10 Token selection attention
20:50 Window attention and putting everything together
28:10 Token selection kernel
34:22 Results

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Hardware-Efficient Attention for Fast Decoding

Hardware-Efficient Attention for Fast Decoding

Rotary Positional Embeddings Explained | Transformer

Rotary Positional Embeddings Explained | Transformer

Intro to Attention and Its Forms

Intro to Attention and Its Forms

This Simple Optimizer Is Revolutionizing How We Train AI [Muon]

This Simple Optimizer Is Revolutionizing How We Train AI [Muon]

Pushing the Limits of Sparse Attention in LLMs - Marcos Treviso | ASAP 49

Pushing the Limits of Sparse Attention in LLMs - Marcos Treviso | ASAP 49

FlashAttention - Tri Dao | Stanford MLSys #67

FlashAttention - Tri Dao | Stanford MLSys #67

Обзор теории DeepSeek R1 | GRPO + RL + SFT

Обзор теории DeepSeek R1 | GRPO + RL + SFT

Нужно ли нам внимание? — Линейные рекуррентные нейронные сети и модели пространства состояний (SS...

Нужно ли нам внимание? — Линейные рекуррентные нейронные сети и модели пространства состояний (SS...

Intro to Sparse Tensors and Spatially Sparse Neural Networks

Intro to Sparse Tensors and Spatially Sparse Neural Networks

How FlashAttention Accelerates Generative AI Revolution

How FlashAttention Accelerates Generative AI Revolution

THIS is why large language models can understand the world

THIS is why large language models can understand the world

NEW DeepSeek Sparse Attention Explained - DeepSeek V3.2-Exp

NEW DeepSeek Sparse Attention Explained - DeepSeek V3.2-Exp

Round and Round We Go! What makes Rotary Positional Encodings useful?

Round and Round We Go! What makes Rotary Positional Encodings useful?

Sparse Block Attention

Sparse Block Attention

Variants of Multi-head attention: Multi-query (MQA) and Grouped-query attention (GQA)

Variants of Multi-head attention: Multi-query (MQA) and Grouped-query attention (GQA)

Lecture 36: CUTLASS and Flash Attention 3

Lecture 36: CUTLASS and Flash Attention 3

ЛУЧШАЯ БЕСПЛАТНАЯ НЕЙРОСЕТЬ Google, которой нет аналогов

ЛУЧШАЯ БЕСПЛАТНАЯ НЕЙРОСЕТЬ Google, которой нет аналогов

Sliding Window Attention (Longformer) Explained

Sliding Window Attention (Longformer) Explained

LLM inference optimization: Architecture, KV cache and Flash attention

LLM inference optimization: Architecture, KV cache and Flash attention

FlashAttention: Ускоренное обучение LLM

FlashAttention: Ускоренное обучение LLM