Formalized Deep Learning Architectures for Automated Low-Level Kernel Optimization

Автор: GPU MODE

Загружено: 2025-10-18

Просмотров: 938

Описание:

Abstract: Vincent Abbott is a PhD student at the Massachusetts Institute of Technology's (MIT) Zardini Lab who has developed a formal framework for describing the relationship between the mathematical function implemented by a deep learning model, its resource usage, and low-level implementation. These methods are based on category theoretic diagrams [1]. The Zardini Lab has developed these diagrams into a tool for rapidly deriving low-level algorithms, as presented in their recent work FlashAttention on a Napkin [2]. These methods have been put into practice, deriving a FlashAttention-like algorithm for an attention variant from first principles [3].

Recently, he has been working on encoding the underlying mathematics into an automated tool for diagram generation and algorithm optimization. In this talk, Vincent Abbott will cover formal diagrams for deep learning models, show how they can be used to derive low-level algorithms such as FlashAttention and corresponding performance models, and preview work related to automated tools for diagramming and analyzing algorithms.

[1] https://openreview.net/forum?id=RyZB4...
[2] https://openreview.net/forum?id=pF2uk...
[3] https://dl.acm.org/doi/10.1007/978-3-...

Formalized Deep Learning Architectures for Automated Low-Level Kernel Optimization

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Lecture 83: Formalized Kernel Derivation

Lecture 83: Formalized Kernel Derivation

Helion: A high-level DSL for ML kernels

Helion: A high-level DSL for ML kernels

RAG Implementation PART 2: LCEL Deep Dive for Next-Gen LLM Applications

RAG Implementation PART 2: LCEL Deep Dive for Next-Gen LLM Applications

Mirage (MPK): Compiling LLMs into Mega Kernels

Mirage (MPK): Compiling LLMs into Mega Kernels

I finally understood Tensors intuitively! (My mind is blown)

I finally understood Tensors intuitively! (My mind is blown)

Мгновенное внимание: самый быстрый механизм внимания?

Мгновенное внимание: самый быстрый механизм внимания?

2025 MIT Integration Bee - Finals

2025 MIT Integration Bee - Finals

How FlashAttention 4 Works

How FlashAttention 4 Works

Что ошибочно пишут в книгах об ИИ [Двойной спуск]

Что ошибочно пишут в книгах об ИИ [Двойной спуск]

Lecture 84: Numerics and AI

Lecture 84: Numerics and AI

Low bit dtypes, sparsity and determinism

Low bit dtypes, sparsity and determinism

Этот суперкомпьютер на основе искусственного интеллекта может поместиться на вашем столе...

Этот суперкомпьютер на основе искусственного интеллекта может поместиться на вашем столе...

Nova: A Modern Nvidia GPU 🎮 Driver in Rust 🦀 for the Linux Kernel 🐧

Nova: A Modern Nvidia GPU 🎮 Driver in Rust 🦀 for the Linux Kernel 🐧

A Once-in-a-Century Proof: The Kakeya Conjecture

A Once-in-a-Century Proof: The Kakeya Conjecture

Futhark: High-performance purely functional data-parallel array programming

Futhark: High-performance purely functional data-parallel array programming

Current AI Models have 3 Unfixable Problems

Current AI Models have 3 Unfixable Problems

Why TERNARY LOGIC Makes More Sense Than Boolean Logic

Why TERNARY LOGIC Makes More Sense Than Boolean Logic

Stanford CS230 | Autumn 2025 | Lecture 1: Introduction to Deep Learning

Stanford CS230 | Autumn 2025 | Lecture 1: Introduction to Deep Learning

DeepSeek OCR — больше, чем просто OCR

DeepSeek OCR — больше, чем просто OCR

Lecture 79 Mirage (MPK): Compiling LLMs into Mega Kernels

Lecture 79 Mirage (MPK): Compiling LLMs into Mega Kernels