DQN and DRQN: Reading Group Notes

Deep Q-Networks and Deep Recurrent Q-Learning

We read the DQN paper - Playing Atari with Deep Reinforcement Learning by Min et. al and a variation of the DQN algorithm, DQRN - Deep Recurrent Q-Learning for Partially Observable MDPs - by Hausknecht & Stone.

Here is an annotated version of the paper with some useful side notes and external references DQN and DQRN.

Some of the questions that came up while discussing these papers:

  • How does the \(\gamma \) affect the policy that is learnt? Also, is there an optimal \(\gamma \) for every particular game?
  • Further, how does one decide on the correct in a more strategic game like Dota versus in Cartpole. The main argument being that in cartpole, you always want to give your immediate rewards a lot of weight given the nature of the game. However, in a more strategic game, that is not necessarily the case.

Here's a cartpole demo with help from external implementations here. I have added some experiments surrounding the questions that came up during the reading group.