Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor - A reproduction
Reproduction of SAC from scratch [Haarnoja et al. 2018] and Haarnoja et al. 2019
- Install MuJoCo (Instructions (See "Install MuJoCo"))
Specifically for Ubuntu:
- Download the MuJoCo version 2.1.0 binaries for Linux
- Extract the downloaded
mujoco210
directory into~/.mujoco/mujoco210
. - Set LD_LIBRARY_PATH environment variable (for instance in .bashrc, etc.):
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HOME}/.mujoco/mujoco210/bin
- Clone repository including submodule for TD3:
git clone --recurse-submodules git@github.com:Heyjuke58/sac_reproduction.git
- Create and activate conda environment from
environment.yml
conda env create -f environment.yml
conda activate sac2
- To test your setup you can run
python try_mujoco.py
Set wanted hyperparameters in src/hyperparameters.py
python main.py --sac=<sac_env> --seed=<seed>
python main.py --td3=<td3_env> --seed=<seed>
Seed is optional, default seed is given in src/hyperparameters.py
.
Possible SAC envs: {SAC_HOPPER, SAC_CHEETAH, SAC_HOPPER_FIXED_ALPHA, SAC_CHEETAH_FIXED_ALPHA}
.
Possible TD3 envs: {TD3_HOPPER, TD3_CHEETAH}
.
To check out what a learned policy actually does in the environment, you can evaluate a leared policy (from the models
folder) yourself by running:
python render_policy.py --model=<path-to-model-file>
Note that the file name needs to adhere to the used format as the algorithm and environment types are extracted from it.
To plot results from runs, you have to move one or multiple result csv files into the folder results/plot
and run the following command:
python plot.py --x=<x> -y=<y> -e=<env>
Choices:
x
:["time", "env_steps", "grad_steps"]
y
:["avg_return", "log_probs_alpha"]
e
:["Hopper", "HalfCheetah"]
b
: Bin size (to reduce noise in plot)
Note that the data from every file is splitted by the column seed
s.t. the return is averaged over multiple runs.
...can all be found in their respective files in the results
folder.
... can be found in environment-exact.yml
Runs have been splitted to 2 GPUs:
- Nvidia Geforce RTX 3060 Laptop GPU
- 23.2 h with 17 W ≅ 0.3944 kWh ≘ 12.68 cent (€) (assuming 32.16 cent/kWh)
- Nvidia Geforce GTX 1070Ti
- 19.56 h with 70 W ≅ 1.3692 kWh ≘ 44.03 cent Total: 56.71 cent (Only final evaluation and only GPU costs)