Jonathan Hui
Jan 19, 2023

--

the new policy is the one estimated by the DNN. it is trained with gradient descent similar to the policy gradient.

--

--

No responses yet