Introduction to Reinforcement Learning | DigiKey

agent

ai

artificial intelligence

cartpole

control theory

gymnasium

inverted pendulum

machine learning

ml

pendulum

reinforcement learning

rl

sb3

stable baselines

Автор: DigiKey

Загружено: Jul 17, 2023

Просмотров: 41,222 views

Описание:

Reinforcement Learning (RL) is a field of machine learning that aims to find optimal solutions to control theory problems for various tasks. It employs an artificial intelligence (AI) “agent” that takes in observations, chooses actions, and learns from rewards. Modern RL algorithms train agents using trial-and-error approaches that involve directly interacting with the given environment.

In the video, we cover the basic theory behind RL and demonstrate how to use Farama Foundation Gymnasium (https://gymnasium.farama.org/) and Stable Baselines3 (https://stable-baselines3.readthedocs...) in Python to train an AI agent to solve the classic cartpole (https://gymnasium.farama.org/environm...) control theory problem. At the end of the video, we encourage you to try applying the knowledge to solve the slightly more advanced inverted pendulum problem (https://gymnasium.farama.org/environm....

The solution to the challenge can be found here: https://www.digikey.com/en/maker/proj...

Code for training RL agents to solve both the cartpole and pendulum problems can be found here: https://github.com/ShawnHymel/reinfor...

In RL, the environment can be anything the agent interacts with, such as board games, video games, virtual settings, or the real world. We often use a code wrapper (e.g. Gymnasium) to observe this environment, perform agent-specified actions, and assign rewards. Note that rewards are considered part of the environment and are instrumental in training.

The decision-making process for choosing actions based on observations is known as the “policy.” During training, the agent selects actions randomly or per policy. The environment then offers a new observation and reward, guiding the training algorithm to help the agent choose actions leading to higher predicted total rewards in the future.

The cartpole problem consists of a virtual pole balanced on top of a cart that can only move left and right. The goal is to design an AI agent that can keep the pole balanced by pushing the cart left or right. In the video, we use Deep Q-Learning (https://towardsdatascience.com/deep-q...) to train a Deep Q-Network (DQN) to solve the cartpole problem.

We list some recommended reading and viewing materials below if you would like to dive deeper into reinforcement learning.

Articles:
Reinforcement Learning Algorithms — an intuitive overview -   / reinforcement-learning-algorithms-an-intui...
Which Reinforcement learning-RL algorithm to use where, when and in what scenario? - https://medium.datadriveninvestor.com...
Q-Learning vs. Deep Q-Learning vs. Deep Q-Network - https://www.baeldung.com/cs/q-learnin...
Deep Q Networks (DQN) With the Cartpole Environment - https://wandb.ai/safijari/dqn-tutoria...
RL — Proximal Policy Optimization (PPO) Explained -   / rl-proximal-policy-optimization-ppo-explained
Proximal Policy Optimization (PPO) - https://huggingface.co/blog/deep-rl-ppo

Related Videos:
Exploring Reinforcement Learning: Can AI Learn to Play QWOP?
Intro to Edge AI
Related Project Links:
Intro to Reinforcement Learning Using Gymnasium and Stable Baselines3
Related Articles:
Teach an AI to play QWOP
What is Edge AI? Machine Learning + IoT

Learn more:
Maker.io - https://www.digikey.com/en/maker
DigiKey’s Blog – TheCircuit https://www.digikey.com/en/blog
Connect with Digi-Key on Facebook   / digikey.electronics
And follow us on Twitter   / digikey

00:00 - Intro
00:59 - History of reinforcement learning
02:14 - Environment and agent interaction loop
06:21 - Gymnasium and Stable Baselines3
07:55 - Hands-on: how to set up a gymnasium environment
26:57 - Markov decision process
31:02 - Bellman equation for the state-value function
34:12 - Bellman equation for the action-value function
35:47 - Bellman optimality equations
36:43 - Exploration vs. exploitation
38:39 - Recommended textbook
39:25 - Model-based vs. model-free algorithms
40:27 - On-policy vs. off-policy algorithms
41:19 - Discrete vs. continuous action space
42:36 - Discrete vs. continuous observation space
43:56 - Overview of modern reinforcement learning algorithms
46:29 - Q-learning
49:27 - Deep Q-network (DQN)
51:59 - Hands-on: how to train a DQN agent
01:12:36 - Usefulness of reinforcement learning
01:13:26 - Challenge: inverted pendulum
01:14:10 - Conclusion

Introduction to Reinforcement Learning | DigiKey

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Exploring Reinforcement Learning: Can AI Learn to Play QWOP? | Digi-Key Electronics

Exploring Reinforcement Learning: Can AI Learn to Play QWOP? | Digi-Key Electronics

Reinforcement Learning Series: Overview of Methods

Reinforcement Learning Series: Overview of Methods

What is a PID Controller? | DigiKey

What is a PID Controller? | DigiKey

An introduction to Reinforcement Learning

An introduction to Reinforcement Learning

How to Tune a PID Controller for an Inverted Pendulum | DigiKey

How to Tune a PID Controller for an Inverted Pendulum | DigiKey

Зачем Трамп Рушит Мировую Экономику На Самом Деле?

Зачем Трамп Рушит Мировую Экономику На Самом Деле?

Music for Work — Deep Focus Mix for Programming, Coding

Music for Work — Deep Focus Mix for Programming, Coding

Canadian House in 10 Days. Full construction process

Canadian House in 10 Days. Full construction process

FORMATION DEEP LEARNING COMPLETE (2021)

FORMATION DEEP LEARNING COMPLETE (2021)

RL Course by David Silver - Lecture 1: Introduction to Reinforcement Learning

RL Course by David Silver - Lecture 1: Introduction to Reinforcement Learning