Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
dTub
Скачать

MAMBA from Scratch: Neural Nets Better and Faster than Transformers

Автор: Algorithmic Simplicity

Загружено: 2024-04-30

Просмотров: 285314

Описание:

Mamba is a new neural network architecture that came out this year, and it performs better than transformers at language modelling! This is probably the most exciting development in AI since 2017. In this video I explain how to derive Mamba from the perspective of linear RNNs. And don't worry, there's no state space model theory needed!

Mamba paper: https://openreview.net/forum?id=AL1fq...
Linear RNN paper: https://openreview.net/forum?id=M3Yd3...

#mamba
#deeplearning
#largelanguagemodels

00:00 Intro
01:33 Recurrent Neural Networks
05:24 Linear Recurrent Neural Networks
06:57 Parallelizing Linear RNNs
15:33 Vanishing and Exploding Gradients
19:08 Stable initialization
21:53 State Space Models
24:33 Mamba
25:26 The High Performance Memory Trick
27:35 The Mamba Drama

MAMBA from Scratch: Neural Nets Better and Faster than Transformers

Поделиться в:

Доступные форматы для скачивания:

Скачать видео mp4

  • Информация по загрузке:

Скачать аудио mp3

Похожие видео

Intuition behind Mamba and State Space Models | Enhancing LLMs!

Intuition behind Mamba and State Space Models | Enhancing LLMs!

AlphaFold - The Most Useful Thing AI Has Ever Done

AlphaFold - The Most Useful Thing AI Has Ever Done

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Цепи Маркова — математика предсказаний [Veritasium]

Цепи Маркова — математика предсказаний [Veritasium]

The Strange Math That Predicts (Almost) Anything

The Strange Math That Predicts (Almost) Anything

Rich Sutton, The OaK Architecture: A Vision of SuperIntelligence from Experience - RLC 2025

Rich Sutton, The OaK Architecture: A Vision of SuperIntelligence from Experience - RLC 2025

Generative Model That Won 2024 Nobel Prize

Generative Model That Won 2024 Nobel Prize

Ilya Sutskever – We're moving from the age of scaling to the age of research

Ilya Sutskever – We're moving from the age of scaling to the age of research

The Most Misunderstood Concept in Physics

The Most Misunderstood Concept in Physics

THIS is why large language models can understand the world

THIS is why large language models can understand the world

But what is quantum computing?  (Grover's Algorithm)

But what is quantum computing? (Grover's Algorithm)

The mind behind Linux | Linus Torvalds | TED

The mind behind Linux | Linus Torvalds | TED

Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Paper Explained)

Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Paper Explained)

Квантовый Диод ломающий Физику. Самое Интересное Видео!

Квантовый Диод ломающий Физику. Самое Интересное Видео!

Flow-Matching vs Diffusion Models explained side by side

Flow-Matching vs Diffusion Models explained side by side

The Real Reason Huge AI Models Actually Work [Prof. Andrew Wilson]

The Real Reason Huge AI Models Actually Work [Prof. Andrew Wilson]

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

But how do AI images and videos actually work? | Guest video by Welch Labs

But how do AI images and videos actually work? | Guest video by Welch Labs

This Simple Optimizer Is Revolutionizing How We Train AI [Muon]

This Simple Optimizer Is Revolutionizing How We Train AI [Muon]

The Misconception that Almost Stopped AI [How Models Learn Part 1]

The Misconception that Almost Stopped AI [How Models Learn Part 1]

© 2025 dtub. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: [email protected]