Aleksei Petrenko

DexPBT: Dual-Arm Reorientation

DexPBT: Dual-Arm Reorientation

DexPBT: Dual-Arm Regrasping

DexPBT: Dual-Arm Regrasping

DexPBT: Object Reorientation (Alternative Behavior)

DexPBT: Object Reorientation (Alternative Behavior)

DexPBT: Object Reorientation

DexPBT: Object Reorientation

DexPBT: Regrasping

DexPBT: Regrasping

DexPBT: Grasp-and-Throw

DexPBT: Grasp-and-Throw

Decentralized PBT (animation)

Decentralized PBT (animation)

Sample Factory v2.0 Promo Video

Sample Factory v2.0 Promo Video

Accelerated Synchronous RL - Timeline Diagram

Accelerated Synchronous RL - Timeline Diagram

Double Buffered Sampling - Timeline Diagram

Double Buffered Sampling - Timeline Diagram

Synchronous Reinforcement Learning - Timeline Diagram

Synchronous Reinforcement Learning - Timeline Diagram

Asynchronous Reinforcement Learning - Timeline Diagram

Asynchronous Reinforcement Learning - Timeline Diagram

Streets of Willow CCW, 1:31 in a V6 Dodge Challenger

Streets of Willow CCW, 1:31 in a V6 Dodge Challenger

Megaverse-8 TowerBuilding. Generalization to larger structures

Megaverse-8 TowerBuilding. Generalization to larger structures

Megaverse-8 TowerBuilding. APPO agent assembling a 10-level tower

Megaverse-8 TowerBuilding. APPO agent assembling a 10-level tower

VizDoom: APPO agent, voice instruction once scenario

VizDoom: APPO agent, voice instruction once scenario

VizDoom: APPO agent, voice instruction scenario

VizDoom: APPO agent, voice instruction scenario

VizDoom: APPO agent in the music recognition scenario

VizDoom: APPO agent in the music recognition scenario

VizDoom: Amateur human player vs RGB+Audio RL agent (APPO)

VizDoom: Amateur human player vs RGB+Audio RL agent (APPO)

Sample Factory: Asynchronous Reinforcement Learning at 100000+ FPS

Sample Factory: Asynchronous Reinforcement Learning at 100000+ FPS

PPO with recurrent policy on VizDoom D3_Battle

PPO with recurrent policy on VizDoom D3_Battle

APPO Vizdoom vs 100% bots (frameskip=2)

APPO Vizdoom vs 100% bots (frameskip=2)

PPO baseline on VizDoom D3_Battle

PPO baseline on VizDoom D3_Battle

Curious A2C agent solves the hard DOOM maze

Curious A2C agent solves the hard DOOM maze

Curiosity-driven Exploration - failure mode

Curiosity-driven Exploration - failure mode

Advantage Actor-Critic getting out of Doom maze

Advantage Actor-Critic getting out of Doom maze

Advantage Actor-Critic solves 6x6 Snake (Reinforcement Learning)

Advantage Actor-Critic solves 6x6 Snake (Reinforcement Learning)

Feed-forward A2C in a partially-observable version of MicroTbs

Feed-forward A2C in a partially-observable version of MicroTbs

Advantage Actor-Critic (A2C) plays MicroTbs

Advantage Actor-Critic (A2C) plays MicroTbs

DQN in a gridworld

DQN in a gridworld