the new policy is the one estimated by the DNN.

In clipped PPO, what is the actual loss function we pass to the Adam optimizer?
1
Nadav B.
Jonathan Hui
·Follow
Jan 19, 2023
--
the new policy is the one estimated by the DNN. it is trained with gradient descent similar to the policy gradient.
--
--
Written by Jonathan Hui41K Followers
·39 Following
Deep Learning
No responses yet
Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams