We read the DQN paper - Playing Atari with Deep Reinforcement Learning by Min et. al and a variation of the DQN algorithm, DQRN - Deep Recurrent Q-Learning for Partially Observable MDPs - by Hausknecht & Stone.

Here is an annotated version of the paper with some useful side notes and external references DQN and DQRN.

Some of the questions that came up while discussing these papers:

Here's a cartpole demo with help from external implementations here. I have added some experiments surrounding the questions that came up during the reading group.