Reading Group: Asynchronous Methods for Deep Reinforcement Learning V Minh. et. al.
The sequence of observed data encountered by an online RL agent is non-stationary, and online RL updates are strongly correlated. By storing the agent's data in an experience replay memory, the data can be batched or randomly sampled from different-steps.
Drawbacks of experience replay