RL — Guided Policy Search (GPS)

Motivation of GPS

  • The physical simulation for many RL methods takes weeks for the training.
  • Robotic control uses the camera to observe the external environment. Inferring actions from high-dimensional tangled data is hard.
  • Executing partially learned but potentially bad policy puts robots into harm’s way.
  • Potential guesses may be ill-conditioned.
Modified from source
  • Policy drift may lead us to states that we never trained before.
Modified from source

Guided Policy Search GPS

Modified from source
Modified from source

Dual Gradient Descent DGD

Deterministic GPS

Source
Source

Intuition

Imitate optimal control

Source
Source

Stochastic GPS

Source
Source

Augmented Lagrangian

Multiple trajectories

Source

Note

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store