Proximal Policy Optimization Explained
Автор: Edan Meyer
Загружено: 2021-05-20
Просмотров: 75587
Every "what is proximal policy optimization?", well this is the video for you. Proximal Policy Optimization (PPO) is a reinforcement learning training method. It falls into the category of policy gradient methods, which is where a predictor is trained on a gradient derived directly from a reward function. PPO is sample efficient and very stable which makes it great from RL control problems like robotics and also many other tasks.
RL theory series: • Reinforcement Learning Made Simple
^ Watch the series above if you were confused
PPO paper: https://arxiv.org/abs/1707.06347
TRPO paper: https://arxiv.org/abs/1502.05477
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: