Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
dTub
Скачать

Multiagent Reinforcement Learning: Rollout and Policy Iteration

Автор: Dimitri Bertsekas

Загружено: 2020-11-02

Просмотров: 5416

Описание:

To download the slides in .pdf and the associated research papers, link to the author's web site: http://web.mit.edu/dimitrib/www/RLboo...
We focus on rollout and policy iteration (PI) methods for problems where the control consists of multiple components each selected (conceptually) by a separate agent. Based on a problem reformulation that trades off control space complexity with state space complexity, we develop an approach, whereby at every stage, the agents sequentially (one-at-a-time) execute a local rollout algorithm that uses a base policy, together with some coordinating information from the other agents. The amount of total computation required at every stage grows linearly with the number of agents. By contrast, in the standard rollout algorithm, the amount of total computation grows exponentially with the number of agents. Despite the dramatic reduction in required computation, we show that our multiagent rollout algorithm has the fundamental cost improvement property of standard rollout: it guarantees an improved performance relative to the base policy. We also discuss autonomous multiagent rollout schemes that allow the agents to make decisions autonomously through the use of precomputed signaling information, which is sufficient to maintain the cost improvement property, without any on-line coordination of control selection between the agents.

We also consider exact and approximate PI algorithms involving a new type of one-agent-at-a-time policy improvement operation. For one of our PI algorithms, we prove convergence to an agent-by-agent optimal policy, thus establishing a connection with the theory of teams. For another PI algorithm, which is executed over a more complex state space, we prove convergence to an optimal policy. Approximate forms of these algorithms are also given, based on the use of policy and value neural networks. These PI algorithms, in both their exact and their approximate form are strictly off-line methods, but they can be used to provide a base policy for use in an on-line multiagent rollout scheme.

The material of this lecture is in part contained in the author's recent book
Rollout, Policy Iteration, and Distributed Reinforcement, Athena Scientific, 2020
See
http://web.mit.edu/dimitrib/www/dp_ro...

Multiagent Reinforcement Learning: Rollout and Policy Iteration

Поделиться в:

Доступные форматы для скачивания:

Скачать видео mp4

  • Информация по загрузке:

Скачать аудио mp3

Похожие видео

MIT Lecture, Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control , Oct 2022

MIT Lecture, Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control , Oct 2022

Будем Наблюдать. Алексей Венедиктов* и Сергей Бунтман / 17.01.26

Будем Наблюдать. Алексей Венедиктов* и Сергей Бунтман / 17.01.26

Policy Gradient Theorem Explained - Reinforcement Learning

Policy Gradient Theorem Explained - Reinforcement Learning

Model Based Reinforcement Learning: Policy Iteration, Value Iteration, and Dynamic Programming

Model Based Reinforcement Learning: Policy Iteration, Value Iteration, and Dynamic Programming

Введение в методы градиента политики — глубокое обучение с подкреплением

Введение в методы градиента политики — глубокое обучение с подкреплением

AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning

AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning

Отказ от территорий? / Войска оставили позиции

Отказ от территорий? / Войска оставили позиции

Reinforcement Learning, Model Predictive Control, and the Newton Step for Solving Bellman's Equation

Reinforcement Learning, Model Predictive Control, and the Newton Step for Solving Bellman's Equation

Магия транзисторов: как мы научили компьютеры думать с помощью кусочков кремния?

Магия транзисторов: как мы научили компьютеры думать с помощью кусочков кремния?

Controlling Behavioral Diversity in Multi-Agent Reinforcement Learning

Controlling Behavioral Diversity in Multi-Agent Reinforcement Learning

Гипотеза Пуанкаре — Алексей Савватеев на ПостНауке

Гипотеза Пуанкаре — Алексей Савватеев на ПостНауке

Deep RL Bootcamp  Lecture 6: Nuts and Bolts of Deep RL Experimentation

Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation

Introduction to Multi-Agent Reinforcement Learning

Introduction to Multi-Agent Reinforcement Learning

Deep RL Bootcamp  Lecture 4A: Policy Gradients

Deep RL Bootcamp Lecture 4A: Policy Gradients

Plenary lecture at IFAC Nonlinear MPC, 2024; Model Predictive Control and Reinforcement Learning

Plenary lecture at IFAC Nonlinear MPC, 2024; Model Predictive Control and Reinforcement Learning

"Learning to Communicate in Multi-Agent Systems" - Amanda Prorok

AI 101 with Brandon Leshchinskiy

AI 101 with Brandon Leshchinskiy

Google Gemini — это новая Siri. Что это значит для владельцев техники Apple? Техно новости недели

Google Gemini — это новая Siri. Что это значит для владельцев техники Apple? Техно новости недели

Abstract Dynamic Programming,  Reinforcement Learning, Newton's Method, and Gradient Optimization

Abstract Dynamic Programming, Reinforcement Learning, Newton's Method, and Gradient Optimization

Reinforcement Learning Series: Overview of Methods

Reinforcement Learning Series: Overview of Methods

© 2025 dtub. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: infodtube@gmail.com