Skip to content

Latest commit

 

History

History
 
 

week1_11_RL_outside_games

RL for seq2seq practice: Open In Colab

Further readings:

  • Actually proving the policy gradient for discounted rewards - article
  • On variance of policy gradient and optimal baselines: article, another article

Based on Practical_RL week07