Implementation of Multi-Agent Deep Deterministic Policy Gradient (MADDPG) Algorithm to train 2 Agents play a game of Tennis against each other using modified version of Tennis Unity ML-Agent Environment and Multi-Layer Feedforward Neural Network Model with Pytorch Library
For this project the task is train 2 Agents play a game of Tennis against each other. In this environment, both agents control rackets to bounce a ball over a net. A reward of +0.1 is provided if an agent hits the ball over the net whereas a reward (penalty) of -0.01 is provided if an agent lets a ball hit the ground or hits the ball out of bounds. Thus, the goal of each agent is to keep the ball in play.
Trained Agents properly playing Tennis controlling Rackets to consistently hit the Tennis Ball over the Net
The observation space consists of 8 variables corresponding to the position and velocity of the ball and racket. Each agent receives its own, local observation.
The Action Sapce is continuous. 2 continuous actions are available, corresponding to movement toward (or away from) the net, and jumping.
The task is episodic, and in order to solve the environment, both the agents must get an average score of +0.5 (over 100 consecutive episodes, after taking the maximum over both agents).
- After each episode, sum of the rewards that each agent received (without discounting) is taken to get a score for each agent. This yields 2 (potentially different) scores.
- Only the maximum of these 2 scores is considered for each episode.
The environment is considered solved, when the average of the maximum score per episode (over 100 episodes) is at least +0.5.
a) Download and install Anaconda 3 (latest version 5.3) from this link (https://www.anaconda.com/download/)
for the specific Operating System and Architecure (64-bit or 32-bit) being used
for Python 3.6 + version onwards
b) Create (and activate) a new environment with Python 3.6.:
Open Anaconda prompt and then execute the below given commands
Linux or Mac:
conda create --name drlnd python=3.6
source activate drlnd
Windows:
conda create --name drlnd python=3.6
activate drlnd
c) Minimal Installation of OpenAi Gym Environment
Below are the instructions to do minimal install of gym :
git clone https://github.com/openai/gym.git
cd gym
pip install -e .
A minimal install of the packaged version can be done directly from PyPI:
pip install gym
d) Clone the repository (https://github.com/udacity/deep-reinforcement-learning.git) and navigate to the python/ folder.
Then, install several dependencies by executing the below commands in Anaconda Prompt Shell :
git clone https://github.com/udacity/deep-reinforcement-learning.git
cd deep-reinforcement-learning/python
pip install . (or pip install [all] )
e) Create an Ipython Kernel for the drlnd environment :
python -m ipykernel install --user --name drlnd --display-name "drlnd"
f) Before running code in a notebook, change the kernel to match the drlnd environment by using the drop-down Kernel menu.
a) Clone the Github Repository (https://github.com/Unity-Technologies/ml-agents.git)
and install the required libraries by running the below mentioned commands in the Anaconda Prompt
git clone https://github.com/Unity-Technologies/ml-agents.git
cd ml-agents/ml-agents (navigate inside ml-agents subfolder)
pip install . or (pip install [all]) (install the modules required)
a) For this project, Unity is not necessary to be installed because readymade built environment has already been provided,
and can be downloaded from one of the links below as per the operating system being used:
Linux: https://s3-us-west-1.amazonaws.com/udacity-drlnd/P3/Tennis/Tennis_Linux.zip
Mac OSX: https://s3-us-west-1.amazonaws.com/udacity-drlnd/P3/Tennis/Tennis.app.zip
Windows (32-bit): https://s3-us-west-1.amazonaws.com/udacity-drlnd/P3/Tennis/Tennis_Windows_x86.zip
Windows (64-bit): https://s3-us-west-1.amazonaws.com/udacity-drlnd/P3/Tennis/Tennis_Windows_x86_64.zip
Place the downloaded file in the p3_collab-compet/ as well as python/ folder in the DRLND GitHub repository,
and unzip (or decompress) the file.
b) (For Windows users) Check out this link for getting help in determining if system is running a 32-bit version or 64-bit
version of the Windows operating system.
(https://support.microsoft.com/en-us/help/827218/how-to-determine-whether-a-computer-is-running-a-32-bit-version-or-64)
c) (For AWS) If the agent is to be trained on AWS (and a virtual screen is not enabled), then please use this link
(https://s3-us-west-1.amazonaws.com/udacity-drlnd/P3/Tennis/Tennis_Linux_NoVis.zip) for Tennis Environment to obtain
the "headless" version of the environment. Watching the agent during training is not possible without enabling a virtual
screen.(To watch the agent,follow the instructions to enable a virtual screen
(https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-on-Amazon-Web-Service.md)
and then download the environment for the Linux operating system above.)
- First of all clone this repository (https://github.com/PranayKr/Deep_Reinforcement_Learning_Projects.git) on local system.
- Also clone the repository (https://github.com/udacity/deep-reinforcement-learning.git) mentioned previously on local system.
- Now place all the Source code files and pretrained model weights present in this cloned GitHub Repo inside the python/ folder of the Deep-RL cloned repository folder.
- Next place the folder containing the downloaded unity environment file for Windows (64-bit) OS inside the python/ folder of the Deep-RL cloned repository folder.
- Open Anaconda prompt shell window and navigate inside the python/ folder in the Deep-RL cloned repository folder.
- Run the command "jupyter notebook" from the Anaconda prompt shell window to open the jupyter notebook web-app tool in the browser from where any of the provided training and testing source codes present in notebooks(.ipynb files) can be opened.
- Before running/executing code in a notebook, change the kernel (IPython Kernel created for drlnd environment) to match the drlnd environment by using the drop-down Kernel menu.
- The source code present in the provided training and testing notebooks(.ipynb files) can also be collated in respective new python files(.py files) and then executed directly from the Anaconda prompt shell window using the command "python <filename.py>".
NOTE:
- All the cells can executed at once by choosing the option (Restart and Run All) in the Kernel Tab.
- Please change the name of the (*.pth) file where the model weights are getting saved during training to avoid overwriting of already existing pre-trained model weights existing currently with the same filename.
Multi-Agent Deep Deterministic Policy Gradient (MADDPG) Algorithm Training / Testing Details (Files Used) :
For Training : Open the below mentioned Jupyter Notebook and execute all the cells
Tennis-Solution-Working.ipynb
Neural Net Model Architecture file Used : MADDPG_Model.py
The Unity Agent file used : MADDPG_Agent.py
For Testing : open the Jupyter Notebook file "Tennis-MADDPGAgent_Test.ipynb" and run the code to test the
results obtained using Pre-trained model weights for Actor Neural-Net Model and
Critic Neural-Net Model for each of the 2 Pre-trained Agents
Pretrained Model Weights provided : 1) checkpoint_actor_agent_0.pth
2) checkpoint_actor_agent_1.pth
3) checkpoint_critic_agent_0.pth
4) checkpoint_critic_agent_1.pth