Open in app
Jonathan Hui
17.1K Followers
About

Sign in

17.1K Followers
About
Open in app
i am confused you said the policy gradient update only at the end of every episode,but your…
3
1

JJ Zeng

Jonathan Hui

Jonathan Hui

Sep 25, 2018·1 min read

θ is only updated once for every τ which is the whole trajectory/episode.

Image for post
Image for post

BTW, Policy Gradient with Monte Carlo rollout update once per episode. There are other ways to approximate rewards.

Written by

Jonathan Hui

Deep Learning

More from Jonathan Hui

Deep Learning

More From Medium

TensorFlow & Keras

Jonathan Hui

TensorFlow Dataset & Data Preparation

Jonathan Hui

TensorFlow Save & Restore Model

Jonathan Hui

TensorFlow Automatic Differentiation (AutoDiff)

Jonathan Hui

TensorFlow Eager Execution v.s. Graph (@tf.function)

Jonathan Hui

TensorFlow RNN models

Jonathan Hui

TensorFlow Libraries and Extensions

Jonathan Hui

F is a second-order derivative.

Jonathan Hui

About

Help

Legal

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store