TiDAR: Think in Diffusion, Talk in Autoregression (Paper Analysis)

Автор: Yannic Kilcher

Загружено: 2025-12-27

Просмотров: 7657

Описание:

Paper: https://arxiv.org/abs/2511.08923

Abstract:
Diffusion language models hold the promise of fast parallel generation, while autoregressive (AR) models typically excel in quality due to their causal structure aligning naturally with language modeling. This raises a fundamental question: can we achieve a synergy with high throughput, higher GPU utilization, and AR level quality? Existing methods fail to effectively balance these two aspects, either prioritizing AR using a weaker model for sequential drafting (speculative decoding), leading to lower drafting efficiency, or using some form of left-to-right (AR-like) decoding logic for diffusion, which still suffers from quality degradation and forfeits its potential parallelizability. We introduce TiDAR, a sequence-level hybrid architecture that drafts tokens (Thinking) in Diffusion and samples final outputs (Talking) AutoRegressively - all within a single forward pass using specially designed structured attention masks. This design exploits the free GPU compute density, achieving a strong balance between drafting and verification capacity. Moreover, TiDAR is designed to be serving-friendly (low overhead) as a standalone model. We extensively evaluate TiDAR against AR models, speculative decoding, and diffusion variants across generative and likelihood tasks at 1.5B and 8B scales. Thanks to the parallel drafting and sampling as well as exact KV cache support, TiDAR outperforms speculative decoding in measured throughput and surpasses diffusion models like Dream and Llada in both efficiency and quality. Most notably, TiDAR is the first architecture to close the quality gap with AR models while delivering 4.71x to 5.91x more tokens per second.

Authors: Jingyu Liu, Xin Dong, Zhifan Ye, Rishabh Mehta, Yonggan Fu, Vartika Singh, Jan Kautz, Ce Zhang, Pavlo Molchanov

Links:
Homepage: https://ykilcher.com
Merch: https://ykilcher.com/merch
YouTube:    / yannickilcher
Twitter:   / ykilcher
Discord: https://ykilcher.com/discord
LinkedIn:   / ykilcher

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannick...
Patreon:   / yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

TiDAR: Think in Diffusion, Talk in Autoregression (Paper Analysis)

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters (Paper Explained)

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters (Paper Explained)

Прорыв в создании современных генераторов изображений на основе ИИ | Модели диффузии, часть 1

Прорыв в создании современных генераторов изображений на основе ИИ | Модели диффузии, часть 1

Искусственный интеллект внутри ИИ: внутреннее обучение с подкреплением с временной абстракцией

Искусственный интеллект внутри ИИ: внутреннее обучение с подкреплением с временной абстракцией

Почему диффузия работает лучше, чем авторегрессия?

Почему диффузия работает лучше, чем авторегрессия?

Titans: Learning to Memorize at Test Time (Paper Analysis)

Titans: Learning to Memorize at Test Time (Paper Analysis)

Why Nvidia REALLY Wants Groq (Not What You Think)

Why Nvidia REALLY Wants Groq (Not What You Think)

39C3 - Breaking architecture barriers: Running x86 games and apps on ARM

39C3 - Breaking architecture barriers: Running x86 games and apps on ARM

Electrons Don't Actually Orbit Like This

Electrons Don't Actually Orbit Like This

Управление поведением LLM без тонкой настройки

Управление поведением LLM без тонкой настройки

Цепи Маркова — математика предсказаний [Veritasium]

Цепи Маркова — математика предсказаний [Veritasium]

Text diffusion: A new paradigm for LLMs

Text diffusion: A new paradigm for LLMs

Сильные аксиомы бесконечности — Numberphile

Сильные аксиомы бесконечности — Numberphile

One Formula That Demystifies 3D Graphics

One Formula That Demystifies 3D Graphics

Dell said maybe don't try this...

Dell said maybe don't try this...

Как внимание стало настолько эффективным [GQA/MLA/DSA]

Как внимание стало настолько эффективным [GQA/MLA/DSA]

How Heisenberg Discovered It

How Heisenberg Discovered It

Diffusion Language Models: The Next Big Shift in GenAI

Diffusion Language Models: The Next Big Shift in GenAI

Flow-Matching vs Diffusion Models explained side by side

Flow-Matching vs Diffusion Models explained side by side

They HUMILIATED the Cleaner — and they PAID FOR IT | Anatoly GYM PRANK #57

They HUMILIATED the Cleaner — and they PAID FOR IT | Anatoly GYM PRANK #57

Распаковка, настройка и первые впечатления от NVIDIA DGX Spark — One plug AI.

Распаковка, настройка и первые впечатления от NVIDIA DGX Spark — One plug AI.