Reproduce PPO with PARL

Based on PARL, the PPO algorithm of deep reinforcement learning has been reproduced, reaching the same level of indicators as the paper in mujoco benchmarks.

Paper: PPO in Proximal Policy Optimization Algorithms

Mujoco/Atari games introduction

PARL currently supports the open-source version of Mujoco provided by DeepMind, so users do not need to download binaries of Mujoco as well as install mujoco-py and get license. For more details, please visit Mujoco.

Benchmark result

1. Mujoco games results

2. Atari games results

Each experiment was run three times with different seeds

How to use

Mujoco-Dependencies:

python3.7+
paddle>=2.3.1
parl>=2.1.1
gym>=0.26.0
mujoco>=2.2.2

Atari-Dependencies:

paddle>=2.3.1
parl>=2.1.1
gym==0.18.0
atari-py==0.2.6
opencv-python

Training:

# To train an agent for discrete action game (Atari: PongNoFrameskip-v4 by default)
python train.py

# To train an agent for continuous action game (Mujoco)
python train.py --env 'HalfCheetah-v4' --continuous_action --train_total_steps 1000000

Distributed Training

Accelerate training process by setting xparl_addr and env_num > 1 when environment simulation running very slow.
At first, we can start a local cluster with 8 CPUs:

xparl start --port 8010 --cpu_num 8

Note that if you have started a master before, you don't have to run the above command. For more information about the cluster, please refer to our documentation.

Then we can start the distributed training by running:

# To train an agent distributedly

# for discrete action game (Atari games)
python train.py --env "PongNoFrameskip-v4" --env_num 8 --xparl_addr 'localhost:8010'

# for continuous action game (Mujoco games)
python train.py --env 'HalfCheetah-v4' --continuous_action --train_total_steps 1000000 --env_num 5 --xparl_addr 'localhost:8010'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Reproduce PPO with PARL

Mujoco/Atari games introduction

Benchmark result

1. Mujoco games results

2. Atari games results

How to use

Mujoco-Dependencies:

Atari-Dependencies:

Training:

Distributed Training

Files

README.md

Latest commit

History

README.md

File metadata and controls

Reproduce PPO with PARL

Mujoco/Atari games introduction

Benchmark result

1. Mujoco games results

2. Atari games results

How to use

Mujoco-Dependencies:

Atari-Dependencies:

Training:

Distributed Training