We read the DQN paper - Playing Atari with Deep Reinforcement Learning by Min et. al and a variation of the DQN algorithm, DQRN - Deep Recurrent Q-Learning for Partially Observable MDPs - by Hausknecht & Stone.

Here is an annotated version of the paper with some useful side notes and external references DQN and DQRN.

Some of the questions that came up while discussing these papers:

How does the \(\gamma \) affect the policy that is learnt? Also, is there an optimal \(\gamma \) for every particular game?
Further, how does one decide on the correct in a more strategic game like Dota versus in Cartpole. The main argument being that in cartpole, you always want to give your immediate rewards a lot of weight given the nature of the game. However, in a more strategic game, that is not necessarily the case.

Here's a cartpole demo with help from external implementations here. I have added some experiments surrounding the questions that came up during the reading group.