Skip to content

Latest commit

 

History

History
69 lines (52 loc) · 2.48 KB

README.md

File metadata and controls

69 lines (52 loc) · 2.48 KB

Reproduce PPO with PARL

Based on PARL, the PPO algorithm of deep reinforcement learning has been reproduced, reaching the same level of indicators as the paper in mujoco benchmarks.

Paper: PPO in Proximal Policy Optimization Algorithms

Mujoco/Atari games introduction

PARL currently supports the open-source version of Mujoco provided by DeepMind, so users do not need to download binaries of Mujoco as well as install mujoco-py and get license. For more details, please visit Mujoco.

Benchmark result

1. Mujoco games results

mujoco-result

2. Atari games results

atari-result

  • Each experiment was run three times with different seeds

How to use

Mujoco-Dependencies:

Atari-Dependencies:

Training:

# To train an agent for discrete action game (Atari: PongNoFrameskip-v4 by default)
python train.py

# To train an agent for continuous action game (Mujoco)
python train.py --env 'HalfCheetah-v4' --continuous_action --train_total_steps 1000000

Distributed Training

Accelerate training process by setting xparl_addr and env_num > 1 when environment simulation running very slow.
At first, we can start a local cluster with 8 CPUs:

xparl start --port 8010 --cpu_num 8

Note that if you have started a master before, you don't have to run the above command. For more information about the cluster, please refer to our documentation.

Then we can start the distributed training by running:

# To train an agent distributedly

# for discrete action game (Atari games)
python train.py --env "PongNoFrameskip-v4" --env_num 8 --xparl_addr 'localhost:8010'

# for continuous action game (Mujoco games)
python train.py --env 'HalfCheetah-v4' --continuous_action --train_total_steps 1000000 --env_num 5 --xparl_addr 'localhost:8010'