Martha White | Advances in Value Estimation in Reinforcement Learning

Автор: London Machine Learning Meetup

Загружено: 2022-04-23

Просмотров: 481

Описание:

Sponsored by Evolution AI: https://www.evolution.ai/

Paper: https://arxiv.org/abs/2104.13844

Abstract: Temporal difference learning algorithms underlie most approaches in reinforcement learning, for both prediction and control. A well-known issue is that these approaches can diverge under nonlinear function approximation, such as with neural networks, and in the off-policy setting where data is generated by a different policy than the one being learned. Naturally, there has been a flurry of work towards resolving this issue, primarily through sound gradient-based methods, but many of these approaches have been avoided due to a perception that they are ineffective or hard-to-use. In this talk, I will discuss a new generalized objective that unifies several previous approaches and facilitates creating easy-to-use algorithms that consistently outperform temporal difference learning approaches in our experiments.

Bio: Martha White is an Associate Professor of Computing Science at the University of Alberta and a PI of Amii---the Alberta Machine Intelligence Institute---which is one of the top machine learning centres in the world. She holds a Canada CIFAR AI Chair and received IEEE’s “AIs 10 to Watch: The Future of AI” award in 2020. She has authored more than 50 papers in top journals and conferences. Martha is an associate editor for TPAMI, and has served as co-program chair for ICLR and area chair for many conferences in AI and ML, including ICML, NeurIPS, AAAI and IJCAI. Her research focus is on developing algorithms for agents continually learning on streams of data, with an emphasis on representation learning and reinforcement learning.

Martha White | Advances in Value Estimation in Reinforcement Learning

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Jing Yu Koh | Grounding Language Models to Images for Multimodal Generation

Jing Yu Koh | Grounding Language Models to Images for Multimodal Generation

ACSESS 2025 – Student Blitz

ACSESS 2025 – Student Blitz

Венедиктов: Путин меня спас. «Эхо Москвы», Абрамович, война с ФБК / МОЖЕМ ОБЪЯСНИТЬ

Венедиктов: Путин меня спас. «Эхо Москвы», Абрамович, война с ФБК / МОЖЕМ ОБЪЯСНИТЬ

ДНК создал Бог? Самые свежие научные данные о строении. Как работает информация для жизни организмов

ДНК создал Бог? Самые свежие научные данные о строении. Как работает информация для жизни организмов

Отъём жилья. Не только Долина. Статус S09E15

Отъём жилья. Не только Долина. Статус S09E15

Lukas Lange | SwitchPrompt: Learning Domain-Specific Gated Soft Prompts

Lukas Lange | SwitchPrompt: Learning Domain-Specific Gated Soft Prompts

Ира Александрова — какой должна быть классика? Три уровня Alhambra | Гитарный Клуб

Ира Александрова — какой должна быть классика? Три уровня Alhambra | Гитарный Клуб

DLRLSS 2019 - Career Panel feat. Rich Sutton, Yoshua Bengio & Martha White

DLRLSS 2019 - Career Panel feat. Rich Sutton, Yoshua Bengio & Martha White

Кто и как создает приватный AI. Блокчейн, суперинтеллект и пицца за крипту

Кто и как создает приватный AI. Блокчейн, суперинтеллект и пицца за крипту

Первая игра зимней серии Что? Где? Когда? 22.11.2025

Первая игра зимней серии Что? Где? Когда? 22.11.2025

PHYS 130 Optics: The Telescope

PHYS 130 Optics: The Telescope

Ideologia Rosji jako trwałe źródło zagrożenia || Radosław Sikorski - didaskalia#163

Ideologia Rosji jako trwałe źródło zagrożenia || Radosław Sikorski - didaskalia#163

Как Перельман доказал гипотезу Пуанкаре? // 900 секунд

Как Перельман доказал гипотезу Пуанкаре? // 900 секунд

Запомните! Все болезни из за ЗАСТОЕВ в лимфе! Как разогнать лимфу? 5 убийц вашей лимфы. Е. Козлов

Запомните! Все болезни из за ЗАСТОЕВ в лимфе! Как разогнать лимфу? 5 убийц вашей лимфы. Е. Козлов

Вы просыпаетесь в 3 часа ночи? Вашему телу нужна помощь! Почему об этом не говорят?

Вы просыпаетесь в 3 часа ночи? Вашему телу нужна помощь! Почему об этом не говорят?

Шум сразу исчез после этого упражнения. Слух улучшился как никогда не слышал

Шум сразу исчез после этого упражнения. Слух улучшился как никогда не слышал

Как отставка Ермака повлияет на переговоры о мире?

Как отставка Ермака повлияет на переговоры о мире?

Norbert R. Morgenstern Open Lecture: From Certainty to Uncertainty in 64 Years

Norbert R. Morgenstern Open Lecture: From Certainty to Uncertainty in 64 Years

PHYS 485 CP, T and CPT

PHYS 485 CP, T and CPT

China Decode: What China’s MASSIVE Trade Surplus Really Means

China Decode: What China’s MASSIVE Trade Surplus Really Means