Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
dTub
Скачать

Reinforcement Learning, Model Predictive Control, and the Newton Step for Solving Bellman's Equation

Автор: Dimitri Bertsekas

Загружено: 2025-06-03

Просмотров: 6280

Описание:

Slides at https://web.mit.edu/dimitrib/www/MPC....

We focus on a conceptual framework that connects approximate Dynamic Programming (DP), Model Predictive Control (MPC), and Reinforcement Learning (RL). This framework centers around two algorithms, which are designed largely independently of each other and operate in synergy through the powerful mechanism of Newton's method. We call them the off-line training and the on-line play algorithms. The names are borrowed from some of the major successes
of RL involving games; primary examples are the recent (2017) AlphaZero program (which plays chess), and the similarly structured and earlier (1990s) TD-Gammon program (which plays backgammon). In these game contexts, the off-line training algorithm is the method used to teach the program how to evaluate positions and to generate good moves at any given position, while the on-line play algorithm is the method used to play in real time against human or computer opponents.

Significantly, the synergy between off-line training and on-line play also underlies MPC (as well as other major classes of sequential decision problems), and indeed the MPC design architecture is very similar to the one of AlphaZero and TD-Gammon. This conceptual insight provides a vehicle for bridging the cultural gap between RL and MPC, and sheds new light on some fundamental issues in MPC. These include the enhancement of stability properties through rollout, the treatment of uncertainty through the use of certainty equivalence, the resilience of MPC in adaptive control settings that involve changing system parameters, and the insights provided by the superlinear performance bounds implied by Newton's method.

We discuss application contexts for our framework, including a computer chess architecture based on MPC.

Reinforcement Learning, Model Predictive Control, and the Newton Step for Solving Bellman's Equation

Поделиться в:

Доступные форматы для скачивания:

Скачать видео mp4

  • Информация по загрузке:

Скачать аудио mp3

Похожие видео

Model Predictive Control

Model Predictive Control

Model Based Reinforcement Learning: Policy Iteration, Value Iteration, and Dynamic Programming

Model Based Reinforcement Learning: Policy Iteration, Value Iteration, and Dynamic Programming

Why Great Cultures Win in the Age of AI (A Cornell Professor’s View)

Why Great Cultures Win in the Age of AI (A Cornell Professor’s View)

DeepMind x UCL | Introduction to Reinforcement Learning 2015

DeepMind x UCL | Introduction to Reinforcement Learning 2015

Reinforcement Learning 101

Reinforcement Learning 101

NMPC 2024 - Model Predictive Control & RL: A Unified Framework Based on Dynamic Programming

NMPC 2024 - Model Predictive Control & RL: A Unified Framework Based on Dynamic Programming

The failure of theoretical error bounds in Reinforcement Learning.

The failure of theoretical error bounds in Reinforcement Learning.

Самая сложная модель из тех, что мы реально понимаем

Самая сложная модель из тех, что мы реально понимаем

Melanie Zeilinger:

Melanie Zeilinger: "Learning-based Model Predictive Control - Towards Safe Learning in Control"

Richard Sutton – Father of RL thinks LLMs are a dead end

Richard Sutton – Father of RL thinks LLMs are a dead end

ЛЕКЦИЯ ПРО НАДЁЖНЫЕ ШИФРЫ НА КОНФЕРЕНЦИИ БАЗОВЫХ ШКОЛ РАН В ТРОИЦКЕ

ЛЕКЦИЯ ПРО НАДЁЖНЫЕ ШИФРЫ НА КОНФЕРЕНЦИИ БАЗОВЫХ ШКОЛ РАН В ТРОИЦКЕ

What is a Hilbert Space?

What is a Hilbert Space?

Цепи Маркова — математика предсказаний [Veritasium]

Цепи Маркова — математика предсказаний [Veritasium]

Abstract Dynamic Programming,  Reinforcement Learning, Newton's Method, and Gradient Optimization

Abstract Dynamic Programming, Reinforcement Learning, Newton's Method, and Gradient Optimization

Dimitri Bertsekas, Convex Optimization: A Journey of 60 Years, Lecture at MIT

Dimitri Bertsekas, Convex Optimization: A Journey of 60 Years, Lecture at MIT

Nonlinear Control: Hamilton Jacobi Bellman (HJB) and Dynamic Programming

Nonlinear Control: Hamilton Jacobi Bellman (HJB) and Dynamic Programming

Computer chess with model predictive control and reinforcement learning

Computer chess with model predictive control and reinforcement learning

Data-driven MPC: From linear to nonlinear systems with guarantees

Data-driven MPC: From linear to nonlinear systems with guarantees

Rich Sutton, The OaK Architecture: A Vision of SuperIntelligence from Experience - RLC 2025

Rich Sutton, The OaK Architecture: A Vision of SuperIntelligence from Experience - RLC 2025

MPC from Basics to Learning-based Design (1/2)

MPC from Basics to Learning-based Design (1/2)

© 2025 dtub. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: infodtube@gmail.com