This is an subjective statement since tons of things can impact convergence.

Jul 28, 2023

This is an subjective statement since tons of things can impact convergence. But on-policy samples from an more optimal policy so it can have faster convergence from less optimal policy. But that we have certain assumptions that including we have safeguard to avoid unstable training.

Written by Jonathan Hui

No responses yet