ml-mipt/week1_11_RL_outside_games at master · vinnibuh/ml-mipt · GitHub

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
basic_model_torch.py		basic_model_torch.py
main_dataset.txt		main_dataset.txt
scheme.svg		scheme.svg
setup.py		setup.py
voc.py		voc.py
week11_RL_for_seq2sec.ipynb		week11_RL_for_seq2sec.ipynb
week11_Sequence_learning.pdf		week11_Sequence_learning.pdf

README.md

RL for seq2seq practice:

Further readings:

Actually proving the policy gradient for discounted rewards - article
On variance of policy gradient and optimal baselines: article, another article

Based on Practical_RL week07