Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
dTub
Скачать

Yevgen Chebotar: RT-2- Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Автор: Montreal Robotics

Загружено: 2023-11-17

Просмотров: 1539

Описание:

We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web. To this end, we propose to co-fine-tune state-of-the-art vision-language models on both robotic trajectory data and Internet-scale vision-language tasks, such as visual question answering. In contrast to other approaches, we propose a simple, general recipe to achieve this goal: in order to fit both natural language responses and robotic actions into the same format, we express the actions as text tokens and incorporate them directly into the training set of the model in the same way as natural language tokens. We refer to such category of models as vision-language-action models (VLA) and instantiate an example of such a model, which we call RT-2. Our extensive evaluation (6k evaluation trials) shows that our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training. This includes significantly improved generalization to novel objects, the ability to interpret commands not present in the robot training data (such as placing an object onto a particular number or icon), and the ability to perform rudimentary reasoning in response to user commands (such as picking up the smallest or largest object, or the one closest to another object). We further show that incorporating chain of thought reasoning allows RT-2 to perform multi-stage semantic reasoning, for example figuring out which object to pick up for use as an improvised hammer (a rock), or which type of drink is best suited for someone who is tired (an energy drink).

Yevgen Chebotar: RT-2- Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Поделиться в:

Доступные форматы для скачивания:

Скачать видео mp4

  • Информация по загрузке:

Скачать аудио mp3

Похожие видео

Giulia Vezzani: RoboCat- A self-improving generalist for robotic manipulation

Giulia Vezzani: RoboCat- A self-improving generalist for robotic manipulation

Модели действий языка видения для автономного вождения в Wayve

Модели действий языка видения для автономного вождения в Wayve

Ink & Code Society: Gotta Have Skillz

Ink & Code Society: Gotta Have Skillz

What Are Vision Language Models? How AI Sees & Understands Images

What Are Vision Language Models? How AI Sees & Understands Images

Robotics Transformer w/ Visual-LLM explained: RT-2

Robotics Transformer w/ Visual-LLM explained: RT-2

Dhruv Shah - Learning General-Purpose Robot Navigation | Nuro Technical Talks

Dhruv Shah - Learning General-Purpose Robot Navigation | Nuro Technical Talks

Robotics in the Age of Generative AI with Vincent Vanhoucke, Google DeepMind | NVIDIA GTC 2024

Robotics in the Age of Generative AI with Vincent Vanhoucke, Google DeepMind | NVIDIA GTC 2024

Supplementary video for RT-1: Robotics Transformer for Real-World Control at Scale

Supplementary video for RT-1: Robotics Transformer for Real-World Control at Scale

RobotLearning: Gemini Robotics

RobotLearning: Gemini Robotics

LLMs Meet Robotics: What Are Vision-Language-Action Models? (VLA Series Ep.1)

LLMs Meet Robotics: What Are Vision-Language-Action Models? (VLA Series Ep.1)

U of T Robotics Institute Seminar: Sergey Levine (UC Berkeley)

U of T Robotics Institute Seminar: Sergey Levine (UC Berkeley)

Stanford Seminar - Connecting Robotics and Foundation Models, Brian Ichter of Google DeepMind

Stanford Seminar - Connecting Robotics and Foundation Models, Brian Ichter of Google DeepMind

Jiayuan Mao - Learning, Reasoning, and Planning with Neuro-Symbolic Concepts

Jiayuan Mao - Learning, Reasoning, and Planning with Neuro-Symbolic Concepts

Dhruv Shah: A General-Purpose Robotic Navigation Model

Dhruv Shah: A General-Purpose Robotic Navigation Model

Paper Club with Peter: RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Paper Club with Peter: RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Stanford CS25: V2 I Robotics and Imitation Learning

Stanford CS25: V2 I Robotics and Imitation Learning

Jiafei  Duan - Towards Robotics Foundation Models that can Reason

Jiafei Duan - Towards Robotics Foundation Models that can Reason

Vision Language Models for Robotics | ROS Developers Open Class #179

Vision Language Models for Robotics | ROS Developers Open Class #179

Превратите ЛЮБОЙ файл в знания LLM за СЕКУНДЫ

Превратите ЛЮБОЙ файл в знания LLM за СЕКУНДЫ

Jason Ma: Foundation Reward Models for General Robot Skill Acquisition

Jason Ma: Foundation Reward Models for General Robot Skill Acquisition

© 2025 dtub. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: infodtube@gmail.com