Wired to Learn #4: Reinforcement Learning Bot Understands Tic Tac Toe after 10,000 Games!

Автор: Prof. Weimao Ke

Загружено: 2023-08-04

Просмотров: 164

Описание:

Create a Reinforcement Learning Bot from scratch for Tic Tac Toe, with code available at:
https://github.com/keweimao/wired

The following video explains the idea of Reinforcement Learning (RL):
• Reinforcement Learning

Key ideas of RL applied to the Tic-Tac-Toe bot:
1. Monitors every state of the game, i.e. 'X', 'O', and ' ' marks on the 3x3 grid (`3^9 = 19683` permutations).
2. Makes a move and, depending on the `exploration rate`, it will select:
EITHER an random move to *explore* different situations
OR the *best move* based on past rewards
3. No reward or penalty if there is no immediate winner.
4. In the end:
IF the bot wins, the LAST move will receive a reward of `1`
IF the bot loses, the LAST move will receive a penalty of `-1`
5. This repeat with NEW games and the bot continues to learn.

Reading the above, one may wonder whether ONLY the LAST move will be rewarded? The answer is NO.
(1) *All actions leading to* the LAST move (for a win or loss) will be rewarded or penalized but the reward/penalty will be **discounted**.
```
For example, given RLBot Actions:
Move1 to Move2 to Move3 (WIN)

Its rewards will be like:

Move3 (1 point) to Move2 (0.9 point) to Move1 (0.9*0.9 point)
```

(2) We do have to *repeat the game* to train the bot in order to update rewards/penalties to previous moves.
(3) Another parameter `learning rate` determines *how fast* the bot will update the reward/penalty.

Wired to Learn #4: Reinforcement Learning Bot Understands Tic Tac Toe after 10,000 Games!

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Live 4K video of Earth and space: 24/7 Livestream of Earth by Sen’s 4K video cameras on the ISS

Live 4K video of Earth and space: 24/7 Livestream of Earth by Sen’s 4K video cameras on the ISS

Reinforcement Learning for LLMs

Reinforcement Learning for LLMs

Reinforcement Learning 101

Reinforcement Learning 101

Microsoft begs for mercy

Microsoft begs for mercy

LLM, RAG или AI Agent — что вам нужно?

LLM, RAG или AI Agent — что вам нужно?

Твоя ПЕРВАЯ НЕЙРОСЕТЬ на Python с нуля! | За 10 минут :3

Твоя ПЕРВАЯ НЕЙРОСЕТЬ на Python с нуля! | За 10 минут :3

A friendly introduction to deep reinforcement learning, Q-networks and policy gradients

A friendly introduction to deep reinforcement learning, Q-networks and policy gradients

Implementing Tic-Tac-Toe Game using Q-Learning Algorithm

Implementing Tic-Tac-Toe Game using Q-Learning Algorithm

Предел развития НЕЙРОСЕТЕЙ

Предел развития НЕЙРОСЕТЕЙ

Smooth Jazz & Soul R&B 24/7 – Soul Flow Instrumentals

Smooth Jazz & Soul R&B 24/7 – Soul Flow Instrumentals

Reinforcement Learning : Tic-Tac-Toe

Reinforcement Learning : Tic-Tac-Toe

Негативный портал - как он объясняет карманное пространство?

Негативный портал - как он объясняет карманное пространство?

30 самых прекрасных классических произведений для души и сердца 🎵 Моцарт, Бах, Бетховен, Шопен

30 самых прекрасных классических произведений для души и сердца 🎵 Моцарт, Бах, Бетховен, Шопен

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

Код работает в 100 раз медленнее из-за ложного разделения ресурсов.

Код работает в 100 раз медленнее из-за ложного разделения ресурсов.

MIT 6.S191 (2024): Reinforcement Learning

MIT 6.S191 (2024): Reinforcement Learning

ТАКОЕ НЕ ПОКАЖУТ В ВУЗах- Как работают и для чего нужны транзисторы ? Что такое PN переход?

ТАКОЕ НЕ ПОКАЖУТ В ВУЗах- Как работают и для чего нужны транзисторы ? Что такое PN переход?

Почему ТАК сложно создать синий светодиод? (Veritasium)

Почему ТАК сложно создать синий светодиод? (Veritasium)

Магия транзисторов: как мы научили компьютеры думать с помощью кусочков кремния?

Магия транзисторов: как мы научили компьютеры думать с помощью кусочков кремния?

Reinforcement Learning: Crash Course AI #9

Reinforcement Learning: Crash Course AI #9