Diffusion Models That Speak: Discrete Latent Codes for Images w/ Michael Noukhovitch
Автор: SAIL Media
Загружено: 2025-12-18
Просмотров: 115
At NeurIPS, we welcomed @michaeln5524, a PhD student at Mila and affiliated with Ai2, at the SAIL booth to discuss his new paper on compositional discrete latent codes.
He explains how representing images as discrete tokens—rather than continuous vectors—allows diffusion models to "speak" image data just like text. This unified approach could bridge the gap between language models and vision, potentially leading to more robust multimodal reasoning.
Key Topics:
Discrete vs. Continuous: Why representing images as discrete codes (like language tokens) is powerful for generative modeling.
Unified Models: How "diffusion language models" can generate both text and images in the same latent space.
Semantic Meaning: The argument that discrete tokens offer better compositional semantics than raw pixels.
Multi-Agent RL: Michael also touches on his other research into multi-agent reinforcement learning and avoiding the "tragedy of the commons" in AI interactions.
Special thanks to @lambda-ai for sponsoring the SAIL booth for NeurIPS 2025!
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: