An Old Problem - Ep. 5 (Deep Learning SIMPLIFIED)

Автор: DeepLearning.TV

Загружено: 2015-12-14

Просмотров: 260824

Описание:

If deep neural networks are so powerful, why aren’t they used more often? The reason is that they are very difficult to train due to an issue known as the vanishing gradient.

Deep Learning TV on
Facebook:   / deeplearningtv
Twitter:   / deeplearningtv

To train a neural network over a large set of labelled data, you must continuously compute the difference between the network’s predicted output and the actual output. This difference is called the cost, and the process for training a net is known as backpropagation, or backprop. During backprop, weights and biases are tweaked slightly until the lowest possible cost is achieved. An important aspect of this process is the gradient, which is a measure of how much the cost changes with respect to a change in a weight or bias value.

Backprop suffers from a fundamental problem known as the vanishing gradient. During training, the gradient decreases in value back through the net. Because higher gradient values lead to faster training, the layers closest to the input layer take the longest to train. Unfortunately, these initial layers are responsible for detecting the simple patterns in the data, while the later layers help to combine the simple patterns into complex patterns. Without properly detecting simple patterns, a deep net will not have the building blocks necessary to handle the complexity. This problem is the equivalent of to trying to build a house without the proper foundation.

Have you ever had this difficulty while using backpropagation? Please comment and let me know your thoughts.

So what causes the gradient to decay back through the net? Backprop, as the name suggests, requires the gradient to be calculated first at the output layer, then backwards across the net to the first hidden layer. Each time the gradient is calculated, the net must compute the product of all the previous gradients up to that point. Since all the gradients are fractions between 0 and 1 – and the product of fractions in this range results in a smaller fraction – the gradient continues to shrink.

For example, if the first two gradients are one fourth and one third, then the next gradient would be one fourth of one third, which is one twelfth. The following gradient would be one twelfth of one fourth, which is one forty-eighth, and so on. Since the layers near the input layer receive the smallest gradients, the net would take a very long time to train. As a subsequent result, the overall accuracy would suffer.

Credits
Nickey Pickorita (YouTube art) -
https://www.upwork.com/freelancers/~0...
Isabel Descutner (Voice) -
   / isabeldescutner
Dan Partynski (Copy Editing) -
  / danielpartynski
Jagannath Rajagopal (Creator, Producer and Director) -
  / jagannathrajagopal

An Old Problem - Ep. 5 (Deep Learning SIMPLIFIED)

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Restricted Boltzmann Machines - Ep. 6 (Deep Learning SIMPLIFIED)

Restricted Boltzmann Machines - Ep. 6 (Deep Learning SIMPLIFIED)

Что происходит с нейросетью во время обучения?

Что происходит с нейросетью во время обучения?

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Объяснение исчезающего и взрывающегося градиента | Проблема, возникающая в результате обратного р...

Объяснение исчезающего и взрывающегося градиента | Проблема, возникающая в результате обратного р...

Оптимизация для глубокого обучения (Momentum, RMSprop, AdaGrad, Adam)

Оптимизация для глубокого обучения (Momentum, RMSprop, AdaGrad, Adam)

Проблема исчезающего градиента || Краткое объяснение

Проблема исчезающего градиента || Краткое объяснение

Создание нейронной сети С НУЛЯ (без Tensorflow/Pytorch, только NumPy и математика)

Создание нейронной сети С НУЛЯ (без Tensorflow/Pytorch, только NumPy и математика)

CNN: Convolutional Neural Networks Explained - Computerphile

CNN: Convolutional Neural Networks Explained - Computerphile

Deep Belief Nets - Ep. 7 (Deep Learning SIMPLIFIED)

Deep Belief Nets - Ep. 7 (Deep Learning SIMPLIFIED)

Исчезающие/взрывающиеся градиенты (C2W1L10)

Исчезающие/взрывающиеся градиенты (C2W1L10)

Маска подсети — пояснения

Маска подсети — пояснения

Градиентный спуск, как обучаются нейросети | Глава 2, Глубинное обучение

Градиентный спуск, как обучаются нейросети | Глава 2, Глубинное обучение

Что такое проблема исчезающих/взрывающихся градиентов в нейронных сетях?

Что такое проблема исчезающих/взрывающихся градиентов в нейронных сетях?

Convolutional Neural Networks - Ep. 8 (Deep Learning SIMPLIFIED)

Convolutional Neural Networks - Ep. 8 (Deep Learning SIMPLIFIED)

4 Hours Chopin for Studying, Concentration & Relaxation

4 Hours Chopin for Studying, Concentration & Relaxation

Почему в Римской Империи притесняли ТОЛЬКО ХРИСТИАН?

Почему в Римской Империи притесняли ТОЛЬКО ХРИСТИАН?

Nuts and Bolts of Applying Deep Learning (Andrew Ng)

Nuts and Bolts of Applying Deep Learning (Andrew Ng)

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Neural Networks Demystified [Part 3: Gradient Descent]

Neural Networks Demystified [Part 3: Gradient Descent]

Глава Neuralink: чип в мозге заменит вам телефон

Глава Neuralink: чип в мозге заменит вам телефон