Latent Action Diffusion: Unifying Robot Control Across Diverse Hands and Grippers

Автор: Foundation Models For Robotics

Загружено: 2025-12-17

Просмотров: 22

Описание:

#Robotics #MachineLearning #DiffusionModels #CrossEmbodimentLearning #RobotManipulation #AI #LatentActionDiffusion #DiffusionPolicy

End-to-end learning is a powerful paradigm for robotic manipulation, but its effectiveness is often hampered by data scarcity and the heterogeneity of action spaces across different robot embodiments. This diversity in action spaces, known as the "embodiment gap," presents a major barrier for cross-embodiment learning and skill transfer.

We introduce **Latent Action Diffusion (LAD)**, a novel framework that addresses this challenge by learning diffusion policies within a **unified latent action space**. This approach allows us to unify diverse end-effector actions, accelerating generalization across embodiments and enabling efficient multi-robot control.

*How Latent Action Diffusion Works:*

Our methodology views aligning different end-effector action spaces as a multimodal representation learning problem. The framework operates in three main stages:

1. *Creating Aligned Action Pairs (Data Generation):* We utilize *retargeting functions* (typically used for teleoperating robots from human hands) to establish cross-modal correspondences between different action spaces, such as those of anthropomorphic robotic hands, a human hand, and a parallel jaw gripper. This generates tuples of paired end-effector poses.
2. *Contrastive Latent Space Learning:* We train modality-specific encoders to project these diverse actions into a shared latent space using a *pairwise InfoNCE loss* (a contrastive loss) to ensure semantic alignment. Subsequently, modality-specific decoders are trained to reconstruct the original poses from the latent representations, while the encoders are fine-tuned to improve reconstruction quality.
3. *Policy Learning:* We employ Diffusion Policy, factorizing the policy into a latent, *embodiment-agnostic policy* trained on latent actions, and multiple **embodiment-specific action decoders**. For inference, the appropriate decoder translates the latent action output back into the specific robot's explicit action space.

*Key Results and Contributions:*

By co-training on cross-embodiment data using this unified latent action space, we demonstrate substantial real-world performance gains, enabling robust multi-robot control across significantly different robot morphologies, including the **Faive robotic hand**, the **mimic hand**, and a **Franka parallel gripper**.

*Significant Skill Transfer:* Our approach facilitates positive skill transfer, yielding up to *25.3%* absolute success rate improvement in tasks like Block Stacking. The average improvement across embodiments and tasks was 13.4%.
*Improved Manipulation:* In the multi-stage Block Stacking task, the Franka gripper showed improved success rates in fine-grained stages ("Stack blocks" and "Put into box"), indicating it learned more precise manipulation strategies from the dexterous hand data.
*Unifying Control:* The framework successfully unifies diverse end-effector action spaces into a single, semantically aligned latent space, significantly reducing the need for extensive data collection for each new robot morphology.

Our results confirm that this method provides a path forward for effectively sharing and reusing datasets across embodiments, addressing a key challenge in scalable and efficient robotic learning.

---

*Project Page:* https://mimicrobotics.github.io/lad/

*Paper Citation:* @misc{bauer2025latentactiondiffusioncrossembodiment, title={Latent Action Diffusion for Cross-Embodiment Manipulation}, author={Erik Bauer and Elvis Nava and Robert K. Katzschmann}, year={2025}, eprint={2506.14608}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2506.14608}, }

---

*Video Tags:*

Robotics, Latent Action Diffusion, Diffusion Policy, Cross-Embodiment Manipulation, Robot Learning, End-to-End Learning, Multi-Robot Control, Dexterous Hand, Franka Gripper, Latent Space, Contrastive Learning, Skill Transfer, Robotic AI, Faive Hand, Mimic Hand, Action Space Alignment, Autonomous Manipulation, Multi-Task Robotics, Unifying Robot Control.

Latent Action Diffusion: Unifying Robot Control Across Diverse Hands and Grippers

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

But how do AI images and videos actually work? | Guest video by Welch Labs

But how do AI images and videos actually work? | Guest video by Welch Labs

Flow-Matching vs Diffusion Models explained side by side

Flow-Matching vs Diffusion Models explained side by side

Жена Цукерберга: мы вылечим все болезни в ближайшее время

Жена Цукерберга: мы вылечим все болезни в ближайшее время

Chelsea Finn: Building Robots That Can Do Anything

Chelsea Finn: Building Robots That Can Do Anything

Как создать комплексные физические системы искусственного интеллекта для роботов

Как создать комплексные физические системы искусственного интеллекта для роботов

Calming Meditation | 1 hour handpan music | Malte Marten

Calming Meditation | 1 hour handpan music | Malte Marten

Рассуждение о скрытом пространстве: взгляд на исследование

Рассуждение о скрытом пространстве: взгляд на исследование

Глава Neuralink: чип в мозге заменит вам телефон

Глава Neuralink: чип в мозге заменит вам телефон

Как Увидеть Глубину? Стереозрение. Робособака. OpenCV и роботы.

Как Увидеть Глубину? Стереозрение. Робособака. OpenCV и роботы.

Diffusion Models for AI Image Generation

Diffusion Models for AI Image Generation

Diffusion models from scratch in PyTorch

Diffusion models from scratch in PyTorch

What Is Multimodal AI? | AI Tutorials For Beginners | How Multimodal AI Works? | Edureka

What Is Multimodal AI? | AI Tutorials For Beginners | How Multimodal AI Works? | Edureka

FoundationMotion: Auto-Labelling & Reasoning About Spatial Movement in Videos | Large-Scale Dataset

FoundationMotion: Auto-Labelling & Reasoning About Spatial Movement in Videos | Large-Scale Dataset

MIT 6.S184: Flow Matching and Diffusion Models - Lecture 01 - Generative AI with SDEs

MIT 6.S184: Flow Matching and Diffusion Models - Lecture 01 - Generative AI with SDEs

20 концепций искусственного интеллекта, объясненных за 40 минут

20 концепций искусственного интеллекта, объясненных за 40 минут

ПОЗЁР. 5-осевой Робот-Манипулятор. Часть №1

ПОЗЁР. 5-осевой Робот-Манипулятор. Часть №1

Multimodal AI: LLMs that can see (and hear)

Multimodal AI: LLMs that can see (and hear)

AI Complete OneShot Course for Beginners | Learn AI & ML Fundamentals from Scratch

AI Complete OneShot Course for Beginners | Learn AI & ML Fundamentals from Scratch

Lec-40: Support Vector Machines (SVMs) | Machine Learning

Lec-40: Support Vector Machines (SVMs) | Machine Learning

Huge Breakthrough: We're Beyond Silicon

Huge Breakthrough: We're Beyond Silicon