RL — Dual Gradient Descent

Image for post
Image for post
Photo by Daniel Frank

Dual Gradient Descent is a popular method for optimizing an objective under a constraint. In reinforcement learning, it helps us to make better decisions.

Image for post
Image for post

The key idea is transforming the objective into a Lagrange dual function which can be optimized iteratively. The Lagrangian 𝓛 and the Lagrange dual function g is defined as:

Image for post
Image for post

where λ is a scalar which we call the Lagrangian multiplier.

The dual function g is a lower bound for the original optimization problem. Indeed, if the function f is a convex function, the strong duality will often hold which say the maximum value of g equals the minimum values of the optimization problem. Hence, if we find λ that maximize g, we solve the optimization problem.

So we start with a random guess of λ and use any optimization method to solve this unconstrained objective.

Image for post
Image for post

Next, we will apply gradient ascent to update λ in order to maximize g. The gradient of g is:

i.e.

Image for post
Image for post

In step 1 below, we find the minimum x based on the current λ value, and then we take a gradient ascent step for g (step 2 and 3).

Image for post
Image for post

We alternate between minimizing the Lagrangian 𝓛 with respect to the primal variables (x), and incrementing the Lagrange multiplier λ by its gradient. By repeating the iteration, the solution will converge.

Visualization

Let’s visualize how this algorithm works.

Image for post
Image for post
Modified from source

Let y=g(x) and z =f(x). y and z form a space G above and we plot z against y. Our solution is the orange dot above: the minimum f within the space G and g(x)=0. The orange line below is our Lagrangian. Its slope equals λ and it touches the boundary of G.

Image for post
Image for post
Modified from source

Then we use gradient ascent to adjust λ (the slope) for the maximum value f(x) that touches G with g(x)=0.

Image for post
Image for post
Modified from source

This is how the Dual Gradient Descent works. (Credit)

Example

Let’s go through one example but we will solve it analytically this time.

Image for post
Image for post

Lagrange multiplier

So what is this Lagrange multiplier? We can visualize f with a contour plot with different values of d. g is the constraint function.

Image for post
Image for post
Wikipedia

Geometrically, the optimal point lies where the gradient at f(x,y), the blue arrow, aligned with the gradient at g(x,y), the red arrow.

Image for post
Image for post

where λ is the Lagrange multiplier.

Image for post
Image for post

Thoughts

Dual Gradient Descent uses any optimization method to minimize the Lagrange with respect to current λ value. In trajectory optimization, we use iLQR for this optimization method. Then we apply gradient ascent to adjust λ. The optimal solution will be found by repeating the iterations.

Image for post
Image for post

Written by

Deep Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store