Learning Language-Guided Visuomotor Policies for Robotic Manipulation

Автор: Ai2

Загружено: 2025-05-01

Просмотров: 167

Описание:

Abstract: In this presentation, we will focus on the problem of learning language-guided visuomotor policies for robotic manipulation. We will explore different approaches to enabling robots to interpret natural language
instructions, perceive the current environment state, and act accordingly to solve a given task. We will begin this presentation by discussing the visual gap between simulation and the real world for policy transfer. Simulation training is safer and faster, but visual and physical mismatches often cause policies to fail once transferred to the real robot. To address this, we introduce a data-driven method for optimizing domain randomization parameters, enabling more effective sim-to-real transfer while minimizing the need for manual tuning and real-world trials. We then focus on language-guided policy learning, starting with Hiveformer, a 2D model that integrates
images and natural language instructions to perform manipulation tasks. To overcome the limitations of 2D inputs, such as lack of depth and occlusions, we introduce PolarNet and 3D-LOTUS, 3D point cloud-based models, to obtain more precise policies with better performance. In the final part of the talk, we will talk about the challenge of generalization in robotic manipulation. Many current approaches perform well on the same tasks they were trained for but fail to transfer to novel tasks. To address this problem, we propose a comprehensive benchmark with four levels of increasing difficulty, covering novel object placements, rigid and articulated objects, and long-horizon tasks. We then present 3D-LOTUS++, a generalist model that integrates three components: 3D-LOTUS as a trajectory prediction module, a large language model for task planning, and a vision-language model for object grounding.

Bio: Ricardo Garcia-Pinel is a last-year (graduating in Spring'2025) PhD student at Inria Paris | ENS (Willow team) working on language-guided visuomotor policies for robotic manipulation. He is supervised by Cordelia
Schmid and Shizhe Chen. Ricardo received his BS degree in Telecommunication Technologies and Services and his MS degree in
Telecommunication Engineering in 2015 and 2018, respectively, from the Technical University of Madrid (UPM), Spain. Since then, he has worked on multiple computer vision and robotics projects, such as multi-
agent reinforcement learning for quadcopters, semantic segmentation, neural motion planning, or visual sim-to-real policy transfer. Currently, he is working on language-guided visuomotor policy learning for robotic
manipulation, focusing on policy generalization. His contributions in this field include works such as Hiveformer [1], Polarnet [2], 3D-LOTUS [3], and GEMBench [3]. For more information about his projects, check his webpage: https://rjgpinel.github.io/ or CV:
https://rjgpinel.github.io/files/resu...
[1] Instruction-driven history-aware policies for robotic manipulations, CoRL 2022
[2] PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation, CoLR 2023
[3] Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy,
ICRA 2025

Learning Language-Guided Visuomotor Policies for Robotic Manipulation

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

array(10) { [0]=> object(stdClass)#4710 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "5mY71rGXAkM" ["related_video_title"]=> string(63) "π0: A Foundation Model for Robotics with Sergey Levine - 719" ["posted_time"]=> string(25) "4 месяца назад" ["channelName"]=> string(41) "The TWIML AI Podcast with Sam Charrington" } [1]=> object(stdClass)#4683 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "aircAruvnKk" ["related_video_title"]=> string(101) "Но что такое нейронная сеть? | Глава 1. Глубокое обучение" ["posted_time"]=> string(19) "7 лет назад" ["channelName"]=> string(11) "3Blue1Brown" } [2]=> object(stdClass)#4708 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "zizonToFXDs" ["related_video_title"]=> string(37) "Introduction to large language models" ["posted_time"]=> string(21) "2 года назад" ["channelName"]=> string(17) "Google Cloud Tech" } [3]=> object(stdClass)#4715 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "mXFH7xs_k_I" ["related_video_title"]=> string(56) "Actuate 2024 | Sergey Levine | Robotic Foundation Models" ["posted_time"]=> string(27) "7 месяцев назад" ["channelName"]=> string(8) "Foxglove" } [4]=> object(stdClass)#4694 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "wjZofJX0v4M" ["related_video_title"]=> string(148) "LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры" ["posted_time"]=> string(19) "1 год назад" ["channelName"]=> string(11) "3Blue1Brown" } [5]=> object(stdClass)#4712 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "5vI_gA1ByzA" ["related_video_title"]=> string(68) "Simulation and Generalization in VLA Models for Robotic Manipulation" ["posted_time"]=> string(25) "3 месяца назад" ["channelName"]=> string(3) "Ai2" } [6]=> object(stdClass)#4707 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "KFgwXXWT7sQ" ["related_video_title"]=> string(170) "ИИ-агенты — вот что действительно изменит разработку. Пишем ИИ-агент на Python, LangChain и GigaChat" ["posted_time"]=> string(25) "4 недели назад" ["channelName"]=> string(29) "Диджитализируй!" } [7]=> object(stdClass)#4717 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "qYNweeDHiyU" ["related_video_title"]=> string(63) "AI, Machine Learning, Deep Learning and Generative AI Explained" ["posted_time"]=> string(28) "10 месяцев назад" ["channelName"]=> string(14) "IBM Technology" } [8]=> object(stdClass)#4693 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "7ZYqvDaurl4" ["related_video_title"]=> string(41) "Discrete diffusion with planned denoising" ["posted_time"]=> string(23) "1 месяц назад" ["channelName"]=> string(3) "Ai2" } [9]=> object(stdClass)#4711 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "BT38K6NqETE" ["related_video_title"]=> string(125) "Эксперт по кибербезопасности о ваших паролях, вирусах и кибератаках" ["posted_time"]=> string(27) "6 месяцев назад" ["channelName"]=> string(22) "Раскадровка" } }

π0: A Foundation Model for Robotics with Sergey Levine - 719

π0: A Foundation Model for Robotics with Sergey Levine - 719

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Introduction to large language models

Introduction to large language models

Actuate 2024 | Sergey Levine | Robotic Foundation Models

Actuate 2024 | Sergey Levine | Robotic Foundation Models

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Simulation and Generalization in VLA Models for Robotic Manipulation

Simulation and Generalization in VLA Models for Robotic Manipulation

ИИ-агенты — вот что действительно изменит разработку. Пишем ИИ-агент на Python, LangChain и GigaChat

ИИ-агенты — вот что действительно изменит разработку. Пишем ИИ-агент на Python, LangChain и GigaChat

AI, Machine Learning, Deep Learning and Generative AI Explained

AI, Machine Learning, Deep Learning and Generative AI Explained

Discrete diffusion with planned denoising

Discrete diffusion with planned denoising

Эксперт по кибербезопасности о ваших паролях, вирусах и кибератаках

Эксперт по кибербезопасности о ваших паролях, вирусах и кибератаках