RL — Conjugate Gradient

We use the Conjugate Gradient (CG) method to solve a linear equation or to optimize a quadratic equation. It is more efficient in solving those problems than the gradient descent.

Image for post
Image for post

where A is symmetrical and positive definite.

Image for post
Image for post

In a line search, we determine the steepest direction of ascent and then select the step size. For example, in the gradient ascent method, we take a step size equals to the gradient times the learning rate. For the figure on the left below, the deepest direction according to the gradient contour (the dotted eclipse) is moving right. The next move may go up and slightly to the left according to the steepest gradient at the current point. But isn’t that undo part of the progress in the first move if we turn left slightly?

Image for post
Image for post

The conjugate gradient method is a line search method but for every move, it would not undo part of the moves done previously. It optimizes a quadratic equation in fewer step than the gradient ascent. If x is N-dimensional (N parameters), we can find the optimal point in at most N steps. For every move, we want a direction conjugate to all previous moves. This guarantees that we do not undo part of the moves we did. In short, if x is 4-dimensional, it should take at most 4 moves to reach the optimal point.

Image for post
Image for post
Modified from
  1. We start the ascent in one particular direction.
  2. We settle down in the optimal point for that direction.
  3. We find a new direction dj which is A-conjugate to any previous moving directions di.

Mathematically, it means any new direction dj must obey the A-conjugate with all previous direction di:

Image for post
Image for post

where A is the matrix in the quadratic equation. Here are some other examples of A-conjugate vectors in 2-D space.

Image for post
Image for post

The A-conjugate vectors are independent of each other. Hence, N A-conjugate vectors can span a space of N dimensions.

Image for post
Image for post

The key part of the CG method is to find α and d.

CG Algorithm

But let’s have a preview of the algorithm first. We start with a random (or educated) guess for the solution on X (Xo) and calculate the next guess X1 with α and d.

Image for post
Image for post

d is the direction (the conjugate vector) to go for the next move.

Let’s see how it works in details. First, we define two terms:

  • e is the error between the current guess and the optimal point.
Image for post
Image for post
  • r measures how far we are from the correct value b (Ax=b). We can view r as the error e transformed by A to the space of b (how far Ax is from b).
Image for post
Image for post

From,

Image for post
Image for post

We can derive:

Image for post
Image for post

Our next point is computed as (which α is a scalar and d is the direction):

Image for post
Image for post

In order for the future direction not to undo progress from the previous directions, let’s try d and e to be orthogonal. i.e. the remaining error after taking this step should be orthogonal to the current direction. This makes sense if the remaining moves should not undo part of the move we just did.

Image for post
Image for post

α depends on e which we don’t know its value. So instead of orthogonal, let’s try another guess. A new searching direction should be A-orthogonal with previous one. The definition of A-orthogonal is:

Image for post
Image for post

To fulling these conditions, the next point xi has to be the optimal point in the search direction d.

Image for post
Image for post
Modified from
Image for post
Image for post

Using the A-orthogonal requirement, α is computed as:

Image for post
Image for post
Modified from

Proof

We will not go through the proof. But for those interested:

Written by

Deep Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store