Learning Language-Guided Visuomotor Policies for Robotic Manipulation
Автор: Ai2
Загружено: 2025-05-01
Просмотров: 167
Abstract: In this presentation, we will focus on the problem of learning language-guided visuomotor policies for robotic manipulation. We will explore different approaches to enabling robots to interpret natural language
instructions, perceive the current environment state, and act accordingly to solve a given task. We will begin this presentation by discussing the visual gap between simulation and the real world for policy transfer. Simulation training is safer and faster, but visual and physical mismatches often cause policies to fail once transferred to the real robot. To address this, we introduce a data-driven method for optimizing domain randomization parameters, enabling more effective sim-to-real transfer while minimizing the need for manual tuning and real-world trials. We then focus on language-guided policy learning, starting with Hiveformer, a 2D model that integrates
images and natural language instructions to perform manipulation tasks. To overcome the limitations of 2D inputs, such as lack of depth and occlusions, we introduce PolarNet and 3D-LOTUS, 3D point cloud-based models, to obtain more precise policies with better performance. In the final part of the talk, we will talk about the challenge of generalization in robotic manipulation. Many current approaches perform well on the same tasks they were trained for but fail to transfer to novel tasks. To address this problem, we propose a comprehensive benchmark with four levels of increasing difficulty, covering novel object placements, rigid and articulated objects, and long-horizon tasks. We then present 3D-LOTUS++, a generalist model that integrates three components: 3D-LOTUS as a trajectory prediction module, a large language model for task planning, and a vision-language model for object grounding.
Bio: Ricardo Garcia-Pinel is a last-year (graduating in Spring'2025) PhD student at Inria Paris | ENS (Willow team) working on language-guided visuomotor policies for robotic manipulation. He is supervised by Cordelia
Schmid and Shizhe Chen. Ricardo received his BS degree in Telecommunication Technologies and Services and his MS degree in
Telecommunication Engineering in 2015 and 2018, respectively, from the Technical University of Madrid (UPM), Spain. Since then, he has worked on multiple computer vision and robotics projects, such as multi-
agent reinforcement learning for quadcopters, semantic segmentation, neural motion planning, or visual sim-to-real policy transfer. Currently, he is working on language-guided visuomotor policy learning for robotic
manipulation, focusing on policy generalization. His contributions in this field include works such as Hiveformer [1], Polarnet [2], 3D-LOTUS [3], and GEMBench [3]. For more information about his projects, check his webpage: https://rjgpinel.github.io/ or CV:
https://rjgpinel.github.io/files/resu...
[1] Instruction-driven history-aware policies for robotic manipulations, CoRL 2022
[2] PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation, CoLR 2023
[3] Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy,
ICRA 2025

Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: