L4 TRPO and PPO (Foundations of Deep RL Series)

Автор: Pieter Abbeel

Загружено: 2021-08-24

Просмотров: 46619

Описание:

Lecture 4 of a 6-lecture series on the Foundations of Deep RL
Topic: Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO)
Instructor: Pieter Abbeel

Slides: https://www.dropbox.com/s/bodgpysmm6l...

L4 TRPO and PPO (Foundations of Deep RL Series)

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

L5 DDPG and SAC (Foundations of Deep RL Series)

L5 DDPG and SAC (Foundations of Deep RL Series)

L3 Policy Gradients and Advantage Estimation (Foundations of Deep RL Series)

L3 Policy Gradients and Advantage Estimation (Foundations of Deep RL Series)

CS885 Lecture 14c: Trust Region Methods

CS885 Lecture 14c: Trust Region Methods

Оптимизация проксимальной политики (PPO) — как обучать большие языковые модели

Оптимизация проксимальной политики (PPO) — как обучать большие языковые модели

L2 Deep Q-Learning (Основы глубокого RL-обучения)

L2 Deep Q-Learning (Основы глубокого RL-обучения)

L1 MDPs, Exact Solution Methods, Max-ent RL (Foundations of Deep RL Series)

L1 MDPs, Exact Solution Methods, Max-ent RL (Foundations of Deep RL Series)

Машинное обучение. Обучение с подкреплением. К.В. Воронцов, Школа анализа данных, Яндекс.

Машинное обучение. Обучение с подкреплением. К.В. Воронцов, Школа анализа данных, Яндекс.

L6 Model-based RL (Foundations of Deep RL Series)

L6 Model-based RL (Foundations of Deep RL Series)

Proximal Policy Optimization (PPO) is Easy With PyTorch | Full PPO Tutorial

Proximal Policy Optimization (PPO) is Easy With PyTorch | Full PPO Tutorial

DeepMind x UCL RL Lecture Series - Policy-Gradient and Actor-Critic methods [9/13]

DeepMind x UCL RL Lecture Series - Policy-Gradient and Actor-Critic methods [9/13]

Policy Gradient Methods | Reinforcement Learning Part 6

Policy Gradient Methods | Reinforcement Learning Part 6

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

DeepSeek R1 Theory Tutorial – Architecture, GRPO, KL Divergence

DeepSeek R1 Theory Tutorial – Architecture, GRPO, KL Divergence

MIT 6.S191: Reinforcement Learning

MIT 6.S191: Reinforcement Learning

Deep RL Bootcamp Lecture 1: Motivation + Overview + Exact Solution Methods

Deep RL Bootcamp Lecture 1: Motivation + Overview + Exact Solution Methods

Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation

Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation

Алгоритмы SARSA и Q-learning — база для изучения Reinforcement Learning // «Reinforcement Learning»

Алгоритмы SARSA и Q-learning — база для изучения Reinforcement Learning // «Reinforcement Learning»

Overview of Deep Reinforcement Learning Methods

Overview of Deep Reinforcement Learning Methods

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 1 - Introduction - Emma Brunskill

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 1 - Introduction - Emma Brunskill

Policy Gradient Theorem Explained - Reinforcement Learning

Policy Gradient Theorem Explained - Reinforcement Learning