We read the DQN paper - Playing Atari with Deep Reinforcement Learning by Min et. al and a variation of the DQN algorithm, DQRN - Deep Recurrent Q-Learning for Partially Observable MDPs - by Hausknecht & Stone.
Here is an annotated version of the paper with some useful side notes and external references DQN and DQRN.
Some of the questions that came up while discussing these papers:
- How does the \(\gamma \) affect the policy that is learnt? Also, is there an optimal \(\gamma \) for every particular game?
- Further, how does one decide on the correct in a more strategic game like Dota versus in Cartpole. The main argument being that in cartpole, you always want to give your immediate rewards a lot of weight given the nature of the game. However, in a more strategic game, that is not necessarily the case.
Here's a cartpole demo with help from external implementations here. I have added some experiments surrounding the questions that came up during the reading group.