Do we need Attention? A Mamba Primer
Автор: Sasha Rush
Загружено: 2024-04-05
Просмотров: 12733
A Technical Primer on Mamba and Friends. With Yair Schiff (https://yair-schiff.github.io/)
Slides: https://github.com/srush/mamba-primer...
Main focus:
Mamba: Linear-Time Sequence Modeling with Selective State Spaces http://arxiv.org/abs/2312.00752 from Albert Gu and Tri Dao.
Simplified State Space Layers for Sequence Modeling http://arxiv.org/abs/2208.04933 from Smith JT, Warrington A, Linderman SW
00:00 - Intro
04:03 - Section 1 - Linear Time Varying recurrences
12:07 - Section 2 - Associative Scan
16:27 - Section 3 - Continuous-Time SSMs
26:55 - Section 4 - Large States and Hardware-Aware Parameterizations
34:56 - Conclusion
Yang S,Wang B,Shen Y,Panda R,Kim Y Gated Linear Attention Transformers with Hardware-Efficient Training http://arxiv.org/abs/2312.06635
Arora S,Eyuboglu S,Zhang M,Timalsina A,Alberti S,Zinsley D,Zou J,Rudra A,Ré C Simple linear attention language models balance the recall-throughput tradeoff http://arxiv.org/abs/2402.18668
De S,Smith SL,Fernando A,Botev A,Cristian-Muraru G,Gu A,Haroun R,Berrada L,Chen Y,Srinivasan S,Desjardins G,Doucet A,Budden D,Teh YW,Pascanu R,De Freitas N,Gulcehre C Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models http://arxiv.org/abs/2402.19427
Sun Y,Dong L,Huang S,Ma S,Xia Y,Xue J,Wang J,Wei F Retentive Network: A Successor to Transformer for Large Language Models http://arxiv.org/abs/2307.08621
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: