Reinforcement Learning #1: Multi-Armed Bandits, Explore vs Exploit, Epsilon-Greedy, UCB
Автор: Zachary Huang
Загружено: 2025-08-15
Просмотров: 3782
Full Reinforcement Learning Playlist: • Reinforcement Learning by Zach
Slides: https://the-pocket.github.io/PocketFl...
Text: https://the-pocket.github.io/PocketFl...
The content is based on: "Reinforcement Learning: An Introduction" by Sutton and Barto
00:00:00 Intro: The Explore-Exploitation Dilemma
00:01:48 Problem Definition: The K-Armed Bandit
00:04:01 Core Conflict: Exploration vs. Exploitation
00:05:54 The Greedy Strategy: An Intuitive but Flawed Approach
00:07:39 Failure Case: The Greedy Trap Example
00:10:15 Solution 1: The Epsilon-Greedy Algorithm
00:15:38 The Learning Engine: The Incremental Update Rule
00:17:14 Walkthrough: Epsilon-Greedy in Action
00:21:32 Solution 2: Optimistic Initial Values
00:28:26 Solution 3: Upper Confidence Bound
00:34:34 Conclusion: Real-World Applications & The Bridge to Full Reinforcement Learning
Social media:
X: https://x.com/ZacharyHuang12
LinkedIn: / zachary-h-23aa37172
Github: https://github.com/zachary62
Discord: / discord
Medium: / zh2408
Substack: https://zacharyhuang.substack.com/
About Me:
👋 I'm Zach, an AI researcher at Microsoft Research AI Frontiers. I currently work on LLM Agents & Systems. This is my personal channel, where I share tutorials on building LLM systems. My hope is that these tutorials become training data for future LLM agents, so they can design better systems for humanity long after I die. Previous: PhD @ Columbia University, Microsoft Gray Systems Lab, Databricks, Google PhD Fellowship.

Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: