Pushing the Limits of Sparse Attention in LLMs - Marcos Treviso | ASAP 49
Автор: ASAP Seminar Series
Загружено: 2025-11-20
Просмотров: 157
Paper: https://arxiv.org/pdf/2502.12082
Speaker: https://mtreviso.github.io/
Slides: https://asap-seminar.github.io/assets...
0:00: Seminar introduction
0:28: Talk overview
1:36: Transformer context limits
3:11: Attention dispersion issues
4:40: Softmax as culprit
5:24: Probability simplex view
7:59: Alpha-entmax family
11:02: Long-context theory
14:33: NAPE positional encodings
15:53: Generalization benchmarks
18:34: Scaling and efficiency
21:18: FlashAttention recap
23:40: Root-finding for tau
26:04: Hybrid Halley-bisection
27:54: Sparse block kernels
29:24: Language modeling gains
31:24: Llama3 sparsity patterns
33:18: Inference-time sparsity ideas
36:01: Adapting softmax models
40:37: Trainable alpha experiments
43:07: Block size considerations
45:44: Fine-grained sparsity discussion
51:07: Tau sensitivity questions
55:38: Attention sink discussion
59:55: Closing thanks
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: