Lecture 80: How FlashAttention 4 Works

Автор: GPU MODE

Загружено: 2025-10-01

Просмотров: 4357

Описание:

Speaker: Charles Frye

The source code (in CuTe) for FlashAttention4 on Blackwell GPUs has recently been released for the forward pass. The following blog: https://modal.com/blog/reverse-engine... goes over their findings when reading through the source code, and changes between FA1,2,3 and now 4!

Lecture 80: How FlashAttention 4 Works

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

How FlashAttention 4 Works

How FlashAttention 4 Works

Lecture 50: A learning journey CUDA, Triton, Flash Attention

Lecture 50: A learning journey CUDA, Triton, Flash Attention

GPU Programming and Language Design with Chris Lattner

GPU Programming and Language Design with Chris Lattner

Yann LeCun | Self-Supervised Learning, JEPA, World Models, and the future of AI

Yann LeCun | Self-Supervised Learning, JEPA, World Models, and the future of AI

Everything You Need To Know About CUDA Tensor Cores (98% util)

Everything You Need To Know About CUDA Tensor Cores (98% util)

Getting Started with CuTe DSL

Getting Started with CuTe DSL

Как внимание стало настолько эффективным [GQA/MLA/DSA]

Как внимание стало настолько эффективным [GQA/MLA/DSA]

Tri Dao: Конец доминирования Nvidia, почему снизилась стоимость вывода и следующий десятикратный ...

Tri Dao: Конец доминирования Nvidia, почему снизилась стоимость вывода и следующий десятикратный ...

Getting Started with CUDA and Parallel Programming | NVIDIA GTC 2025 Session

Getting Started with CUDA and Parallel Programming | NVIDIA GTC 2025 Session

FlashAttention - Tri Dao | Stanford MLSys #67

FlashAttention - Tri Dao | Stanford MLSys #67

Lecture 36: CUTLASS and Flash Attention 3

Lecture 36: CUTLASS and Flash Attention 3

What C++ Needs to be Safe - John Lakos - C++ on Sea 2025

What C++ Needs to be Safe - John Lakos - C++ on Sea 2025

Richard Sutton – Father of RL thinks LLMs are a dead end

Richard Sutton – Father of RL thinks LLMs are a dead end

Andrej Karpathy: Software Is Changing (Again)

Andrej Karpathy: Software Is Changing (Again)

Welcome to v1.0 of the meta::[[verse]]! - Inbal Levi - C++Now 2025

Welcome to v1.0 of the meta::[[verse]]! - Inbal Levi - C++Now 2025

Lecture 23: Tensor Cores

Lecture 23: Tensor Cores

CUDA: New Features and Beyond | NVIDIA GTC 2024

CUDA: New Features and Beyond | NVIDIA GTC 2024

GTC 2022 - CUDA: New Features and Beyond - Stephen Jones, CUDA Architect, NVIDIA

GTC 2022 - CUDA: New Features and Beyond - Stephen Jones, CUDA Architect, NVIDIA

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Делаем графические процессоры по-настоящему быстрыми: глубокий анализ эффективности тренировок

Делаем графические процессоры по-настоящему быстрыми: глубокий анализ эффективности тренировок