RobotLearning: Scaling Offline Reinforcement Learning

Автор: Montreal Robotics

Загружено: 2025-03-20

Просмотров: 363

Описание:

I started discussing offline reinforcement learning, highlighting its potential to learn from pre-existing datasets, a departure from online RL's data inefficiency and divergence issues. I emphasized the goal of training a policy from offline data without divergence, similar to supervised learning. We explored the concept of "stitching" trajectories, a unique advantage of RL, where optimal paths can be constructed from disparate data segments, leveraging the Markov property. However, I also pointed out that this is difficult to achieve in practice, especially with partial observations. We discussed model-based RL as a potential solution but acknowledged the challenges of error accumulation in long-horizon planning. I then introduced the Decision Transformer, a supervised learning approach using returns as input to generate trajectories, aiming to minimize error across the entire sequence. However, I noted its limitations in stitching and handling stochasticity.

Then I discuss recent papers on adapting offlineRL methods to large transformers, how to include offline data to help improve early training performance, and how to perform offline to line RL without needing to keep around the old offline RL dataset, which is typically required.

RobotLearning: Scaling Offline Reinforcement Learning

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Robot Learning: Goal-Condition Panning

Robot Learning: Goal-Condition Panning

Reinforcement Learning, RLHF, & DPO Explained

Reinforcement Learning, RLHF, & DPO Explained

6) Google s AI Playbook

6) Google s AI Playbook

A Gentle Introduction to Offline Reinforcement Learning

A Gentle Introduction to Offline Reinforcement Learning

Robot Learning 2025: Foundational Models for Robotics and Scaling DeepRL

Robot Learning 2025: Foundational Models for Robotics and Scaling DeepRL

Machine Learning and Deep Learning Videos

Machine Learning and Deep Learning Videos

Keynote - Offline reinforcement learning

Keynote - Offline reinforcement learning

Making Real-World Reinforcement Learning Practical

Making Real-World Reinforcement Learning Practical

Обучение с подкреплением и обратной связью с человеком (RLHF) — как обучать и настраивать модели ...

Обучение с подкреплением и обратной связью с человеком (RLHF) — как обучать и настраивать модели ...

Robot Learning 2025: Foundational Models for Robotics and Scaling DeepRL

Robot Learning 2025: Foundational Models for Robotics and Scaling DeepRL

Введение в методы градиента политики — глубокое обучение с подкреплением

Введение в методы градиента политики — глубокое обучение с подкреплением

Как Сделать Настольный ЭЛЕКТРОЭРОЗИОННЫЙ Станок?

Как Сделать Настольный ЭЛЕКТРОЭРОЗИОННЫЙ Станок?

DexGen: Control Robot Hand with Imitation & Reinforcement Learning for Dexterous Manipulation

DexGen: Control Robot Hand with Imitation & Reinforcement Learning for Dexterous Manipulation

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

Обучение с подкреплением с нуля

Обучение с подкреплением с нуля

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

Оптимизация проксимальной политики (PPO) — как обучать большие языковые модели

Оптимизация проксимальной политики (PPO) — как обучать большие языковые модели

SESSION 1 | Multi-Agent Reinforcement Learning: Foundations and Modern Approaches | IIIA-CSIC Course

SESSION 1 | Multi-Agent Reinforcement Learning: Foundations and Modern Approaches | IIIA-CSIC Course

Две модели, которые должен знать каждый ML‑джун

Две модели, которые должен знать каждый ML‑джун

Stanford CS224N | 2023 | Lecture 10 - Prompting, Reinforcement Learning from Human Feedback

Stanford CS224N | 2023 | Lecture 10 - Prompting, Reinforcement Learning from Human Feedback