Liming Wang, Can Diffusion Model Disentangle? A Theoretical Perspective
Автор: MIT Embodied Intelligence
Загружено: 2025-03-19
Просмотров: 400
Title: Can Diffusion Model Disentangle? A Theoretical Perspective
Abstract:
This talk presents a novel theoretical framework for understanding how diffusion models can learn disentangled representations. Within this framework, we establish identifiability conditions for general disentangled latent variable models, analyze training dynamics, and derive sample complexity bounds for disentangled latent subspace models. To validate our theory, we conduct disentanglement experiments across diverse tasks and modalities, including subspace recovery in latent subspace Gaussian mixture models, image colorization, image denoising, and voice conversion for speech classification. Additionally, our experiments show that training strategies inspired by our theory, such as style guidance regularization, consistently enhance disentanglement performance.
Biography:
Liming Wang is a postdoctoral associate in the Spoken Language Systems Group at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). His research interests broadly encompass the practical and theoretical aspects of self-supervised speech processing and multimodal learning, with the goal of improving accessibility and inclusivity of speech and language technology.
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: