For Project 2 of Udacity's Deep Reinforcement Learning Nanodegree, we were tasked with teaching an agent to maintain a target position using the "Reacher" environment configured by Udacity on Unity's ML-Agents platform.
For further information on the environment, see the accompanying project Report or Udacity's project github repo.
In this project, we explored a variety of policies to solve this continuous state space environment, including Deep Deterministic Policy Gradient DDPG, Distributed Distributional Deep Deterministic Policy Gradient D4PG, Poximal Policy Optimization PPO and Twin Delayed Deep Deterministic Policy Gradients TD3. We will use DDPG for our base implementation, but work in progress of the remaining policies are also available in this repo.
The algorithms are further explained in the accompanying Report.
To set up the python (conda) environment, in the root directory, type:
conda env update --file=environment_drlnd.yml
This requires installation of OpenAI Gym and Unity's ML-Agents.
In the root directory, run python setup.py
to set up directories and download specified environments. When running this file, make sure you have the full path to your root repo folder readily available (and end the input with a "/").
If you need to further review and access environment implementation, visit the project repo here.
The key files in this repo include:
main.py Execute this script to train in the environment(s) and agent(s) specified on this script in the environment and agent dictionaries, respectively.
util.py Contains functions to train in Unity and OpenAI environments, and to chart results.
agents folder Contains agent classes as specified policies. See the accompanying Report for additional details on agent implementations.
To train the agent, first open main.py in your favorite text editor (ie nano main.py
or vi main.py
). Make sure the path to the root repo folder is correct and that the proper environments and agents (policies) are selected. Then, in the command line run:
source activate drlnd
# to activate python (conda) environment
python main.py
# to train the environment and agent (policy)
Charts the results from model results file.
Contains the "checkpoint" model weights of each implementation.
The algorithms used in this project were inspired by a variety of sources and authors, including implementations from the following github handles: