# RL — Dual Gradient Descent

Dual Gradient Descent is a popular method for optimizing an objective under a constraint. In reinforcement learning, it helps us to make better decisions.

The key idea is transforming the objective into a Lagrange dual function which can be optimized iteratively. The Lagrangian 𝓛 and the Lagrange dual function g is defined as:

where λ is a scalar which we call the Lagrangian multiplier.

The dual function g is a lower bound for the original optimization problem. Indeed, if the function f is a convex function, the strong duality will often hold which say the maximum value of g equals the minimum values of the optimization problem. Hence, if we find λ that maximize g, we solve the optimization problem.

So we start with a random guess of λ and use any optimization method to solve this unconstrained objective.

Next, we will apply gradient ascent to update λ in order to maximize g. The gradient of g is:

i.e.

In step 1 below, we find the minimum x based on the current λ value, and then we take a gradient ascent step for g (step 2 and 3).

We alternate between minimizing the Lagrangian 𝓛 with respect to the primal variables (x), and incrementing the Lagrange multiplier λ by its gradient. By repeating the iteration, the solution will converge.

# Visualization

Let’s visualize how this algorithm works.

Let y=g(x) and z =f(x). y and z form a space G above and we plot z against y. Our solution is the orange dot above: the minimum f within the space G and g(x)=0. The orange line below is our Lagrangian. Its slope equals λ and it touches the boundary of G.

Then we use gradient ascent to adjust λ (the slope) for the maximum value f(x) that touches G with g(x)=0.

This is how the Dual Gradient Descent works. (Credit)

# Example

Let’s go through one example but we will solve it analytically this time.

# Lagrange multiplier

So what is this Lagrange multiplier? We can visualize f with a contour plot with different values of d. g is the constraint function.

Geometrically, the optimal point lies where the gradient at f(x,y), the blue arrow, aligned with the gradient at g(x,y), the red arrow.

where λ is the Lagrange multiplier.

# Thoughts

Dual Gradient Descent uses any optimization method to minimize the Lagrange with respect to current λ value. In trajectory optimization, we use iLQR for this optimization method. Then we apply gradient ascent to adjust λ. The optimal solution will be found by repeating the iterations.

Written by