Training more effective learned optimizers, and using them to train themselves (Paper Explained)

Автор: Yannic Kilcher

Загружено: 2020-10-03

Просмотров: 19496

Описание:

#ai #research #optimization

Optimization is still the domain of hand-crafted, simple algorithms. An ML engineer not only has to pick a suitable one for their problem but also often do grid-search over various hyper-parameters. This paper proposes to learn a single, unified optimization algorithm, given not by an equation, but by an LSTM-based neural network, to act as an optimizer for any deep learning problem, and ultimately to optimize itself.

OUTLINE:
0:00 - Intro & Outline
2:20 - From Hand-Crafted to Learned Features
4:25 - Current Optimization Algorithm
9:40 - Learned Optimization
15:50 - Optimizer Architecture
22:50 - Optimizing the Optimizer using Evolution Strategies
30:30 - Task Dataset
34:00 - Main Results
36:50 - Implicit Regularization in the Learned Optimizer
41:05 - Generalization across Tasks
41:40 - Scaling Up
45:30 - The Learned Optimizer Trains Itself
47:20 - Pseudocode
49:45 - Broader Impact Statement
52:55 - Conclusion & Comments

Paper: https://arxiv.org/abs/2009.11243

Abstract:
Much as replacing hand-designed features with learned functions has revolutionized how we solve perceptual tasks, we believe learned algorithms will transform how we train models. In this work we focus on general-purpose learned optimizers capable of training a wide variety of problems with no user-specified hyperparameters. We introduce a new, neural network parameterized, hierarchical optimizer with access to additional features such as validation loss to enable automatic regularization. Most learned optimizers have been trained on only a single task, or a small number of tasks. We train our optimizers on thousands of tasks, making use of orders of magnitude more compute, resulting in optimizers that generalize better to unseen tasks. The learned optimizers not only perform well, but learn behaviors that are distinct from existing first order optimizers. For instance, they generate update steps that have implicit regularization and adapt as the problem hyperparameters (e.g. batch size) or architecture (e.g. neural network width) change. Finally, these learned optimizers show evidence of being useful for out of distribution tasks such as training themselves from scratch.

Authors: Luke Metz, Niru Maheswaranathan, C. Daniel Freeman, Ben Poole, Jascha Sohl-Dickstein

Links:
YouTube:    / yannickilcher
Twitter:   / ykilcher
Discord:   / discord
BitChute: https://www.bitchute.com/channel/yann...
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn:   / yannic-kilcher-488534136

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannick...
Patreon:   / yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Training more effective learned optimizers, and using them to train themselves (Paper Explained)

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Descending through a Crowded Valley -- Benchmarking Deep Learning Optimizers (Paper Explained)

Descending through a Crowded Valley -- Benchmarking Deep Learning Optimizers (Paper Explained)

LSTM is dead. Long Live Transformers!

LSTM is dead. Long Live Transformers!

Something Weird Happens When E=-mc²

Something Weird Happens When E=-mc²

Понимание GD&T

Как производятся микрочипы? 🖥️🛠️ Этапы производства процессоров

Как производятся микрочипы? 🖥️🛠️ Этапы производства процессоров

[Classic] Deep Residual Learning for Image Recognition (Paper Explained)

[Classic] Deep Residual Learning for Image Recognition (Paper Explained)

4 Hours Chopin for Studying, Concentration & Relaxation

4 Hours Chopin for Studying, Concentration & Relaxation

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Преломление и «замедление» света | По мотивам лекции Ричарда Фейнмана

Преломление и «замедление» света | По мотивам лекции Ричарда Фейнмана

[Classic] Generative Adversarial Networks (Paper Explained)

[Classic] Generative Adversarial Networks (Paper Explained)

ДНК создал Бог? Самые свежие научные данные о строении. Как работает информация для жизни организмов

ДНК создал Бог? Самые свежие научные данные о строении. Как работает информация для жизни организмов

Изучите Microsoft Active Directory (ADDS) за 30 минут

Изучите Microsoft Active Directory (ADDS) за 30 минут

DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

Как LLM могут хранить факты | Глава 7, Глубокое обучение

Как LLM могут хранить факты | Глава 7, Глубокое обучение

Fast reinforcement learning with generalized policy updates (Paper Explained)

Fast reinforcement learning with generalized policy updates (Paper Explained)

Поиск нейронной архитектуры без обучения (с пояснениями)

Поиск нейронной архитектуры без обучения (с пояснениями)

Момент, когда мы перестали понимать ИИ [AlexNet]

Момент, когда мы перестали понимать ИИ [AlexNet]

Perceiver: General Perception with Iterative Attention (Google DeepMind Research Paper Explained)

Perceiver: General Perception with Iterative Attention (Google DeepMind Research Paper Explained)

Feedback Transformers: Addressing Some Limitations of Transformers with Feedback Memory (Explained)

Feedback Transformers: Addressing Some Limitations of Transformers with Feedback Memory (Explained)

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Но что такое нейронная сеть? | Глава 1. Глубокое обучение