Latent Action Diffusion: Unifying Robot Control Across Diverse Hands and Grippers
Автор: Foundation Models For Robotics
Загружено: 2025-12-17
Просмотров: 22
#Robotics #MachineLearning #DiffusionModels #CrossEmbodimentLearning #RobotManipulation #AI #LatentActionDiffusion #DiffusionPolicy
End-to-end learning is a powerful paradigm for robotic manipulation, but its effectiveness is often hampered by data scarcity and the heterogeneity of action spaces across different robot embodiments. This diversity in action spaces, known as the "embodiment gap," presents a major barrier for cross-embodiment learning and skill transfer.
We introduce **Latent Action Diffusion (LAD)**, a novel framework that addresses this challenge by learning diffusion policies within a **unified latent action space**. This approach allows us to unify diverse end-effector actions, accelerating generalization across embodiments and enabling efficient multi-robot control.
*How Latent Action Diffusion Works:*
Our methodology views aligning different end-effector action spaces as a multimodal representation learning problem. The framework operates in three main stages:
1. *Creating Aligned Action Pairs (Data Generation):* We utilize *retargeting functions* (typically used for teleoperating robots from human hands) to establish cross-modal correspondences between different action spaces, such as those of anthropomorphic robotic hands, a human hand, and a parallel jaw gripper. This generates tuples of paired end-effector poses.
2. *Contrastive Latent Space Learning:* We train modality-specific encoders to project these diverse actions into a shared latent space using a *pairwise InfoNCE loss* (a contrastive loss) to ensure semantic alignment. Subsequently, modality-specific decoders are trained to reconstruct the original poses from the latent representations, while the encoders are fine-tuned to improve reconstruction quality.
3. *Policy Learning:* We employ Diffusion Policy, factorizing the policy into a latent, *embodiment-agnostic policy* trained on latent actions, and multiple **embodiment-specific action decoders**. For inference, the appropriate decoder translates the latent action output back into the specific robot's explicit action space.
*Key Results and Contributions:*
By co-training on cross-embodiment data using this unified latent action space, we demonstrate substantial real-world performance gains, enabling robust multi-robot control across significantly different robot morphologies, including the **Faive robotic hand**, the **mimic hand**, and a **Franka parallel gripper**.
*Significant Skill Transfer:* Our approach facilitates positive skill transfer, yielding up to *25.3%* absolute success rate improvement in tasks like Block Stacking. The average improvement across embodiments and tasks was 13.4%.
*Improved Manipulation:* In the multi-stage Block Stacking task, the Franka gripper showed improved success rates in fine-grained stages ("Stack blocks" and "Put into box"), indicating it learned more precise manipulation strategies from the dexterous hand data.
*Unifying Control:* The framework successfully unifies diverse end-effector action spaces into a single, semantically aligned latent space, significantly reducing the need for extensive data collection for each new robot morphology.
Our results confirm that this method provides a path forward for effectively sharing and reusing datasets across embodiments, addressing a key challenge in scalable and efficient robotic learning.
---
*Project Page:* https://mimicrobotics.github.io/lad/
*Paper Citation:* @misc{bauer2025latentactiondiffusioncrossembodiment, title={Latent Action Diffusion for Cross-Embodiment Manipulation}, author={Erik Bauer and Elvis Nava and Robert K. Katzschmann}, year={2025}, eprint={2506.14608}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2506.14608}, }
---
*Video Tags:*
Robotics, Latent Action Diffusion, Diffusion Policy, Cross-Embodiment Manipulation, Robot Learning, End-to-End Learning, Multi-Robot Control, Dexterous Hand, Franka Gripper, Latent Space, Contrastive Learning, Skill Transfer, Robotic AI, Faive Hand, Mimic Hand, Action Space Alignment, Autonomous Manipulation, Multi-Task Robotics, Unifying Robot Control.
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: