Jul 28, 2023
This is an subjective statement since tons of things can impact convergence. But on-policy samples from an more optimal policy so it can have faster convergence from less optimal policy. But that we have certain assumptions that including we have safeguard to avoid unstable training.