To perform experience replay we store the agent’s experiences et=(st,at,rt,st+1)
Then we use a random sample of these prior actions instead of the most recent action to proceed.
* This removes correlations in the observation sequence and smooths changes in the data distribution.
* Iterative update adjusts Q towards target values that are only periodically updated, further reducing correlations with the target.