Reading Group: Asynchronous Methods for Deep Reinforcement Learning V Minh. et. al.

The sequence of observed data encountered by an online RL agent is non-stationary, and online RL updates are strongly correlated. By storing the agent's data in an experience replay memory, the data can be batched or randomly sampled from different-steps.

Drawbacks of experience replay

Asynchronous RL Framework

They present multi-threade asynchronous variants of one-step SARSA, one-step Q learning, n-step Q learning and advantage actor-critic. Actor-critic being an on-policy search method and Q-learning being an off-policy value-based method Gorila Framework Hogwild!