Uri Sherman - Convergence of Policy Mirror Descent Beyond Compatible Function Approximation (Heb)
Автор: HUJI Machine Learning Club
Загружено: 2025-04-08
Просмотров: 103
Time and Place
Thursday, April 3rd, 2025, 10:30 AM, room B220
Speaker
Uri Sherman (TAU)
Title
Convergence of Policy Mirror Descent Beyond Compatible Function Approximation
Abstract:
Policy optimization methods are one of the most widely used classes of Reinforcement Learning algorithms. Modern instantiations of policy optimization roughly follow the Policy Mirror Descent (PMD) algorithmic template, for which there are by now numerous theoretical convergence results. However, most of these either target tabular environments, or can be applied effectively only when the class of policies being optimized over satisfies strong closure conditions, which is typically not the case when working with parametric policy classes in large-scale environments.
In this talk, I will present our recent results that establish convergence of PMD (with rates that are independent of the cardinality of the state space) for general policy classes subject to a variational gradient dominance condition that is strictly weaker than the closure conditions studied by prior works. Along the way, I will discuss the key feature of our analysis technique, that casts PMD as a proximal point algorithm operating in non-Euclidean space where the proximal operator adapts to local smoothness of the objective.
Based on joint work with Tomer Koren and Yishay Mansour.
Bio:
Uri is a fifth-year PhD student at Tel-Aviv University, advised by Yishay Mansour and Tomer Koren. Prior to his PhD, Uri spent a few years in various engineering and management positions in the private sector, and before that obtained his B.Sc. from Tel Aviv University and M.Sc. from the Weizmann Institute of Science, where he worked under the supervision of Prof. Uriel Feige. Uri's research interests are in reinforcement learning and optimization.
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: