Machine Learning & Linear Algebra — Special Matrices

Image for post
Image for post
Photo by Mert Talay

In literature, we often deal with statements like “since the matrix is symmetric, it has orthonormal eigenvectors”. It sounds like you should know them by heart. In linear algebra, there are special matrices with properties that are easy to analyze and to manipulate. They may have specific eigenvalues or special relationship among their eigenvectors. They obey rules that make operations easy. And there are methods to factorize a matrix into these “easier” matrices.

Image for post

The reduction in operation complexity improves scalability. However, even these matrices are special, they are no way uncommon. In machine learning and many applications, we deal with them all the time. In this article, we will introduce them so you feel less intimidated when you heard “positive definite matrix has all positive eigenvalues”.

Diagonal matrix

A diagonal matrix S has all non-diagonal elements equal zero.

Image for post
Image for post

Many factorization methods have one of the decomposed matrices to be a diagonal matrix. Since the matrix contains diagonal elements only, we sometimes write it in term of a vector.

Image for post
Image for post

The inverse of a generic matrix is not easy to calculate. But finding the inverse of a diagonal matrix is easy. We can replace the diagonal element with 1/m.

Image for post
Image for post

Also, matrix multiplication is much simpler if one of the matrices is diagonal. But when any diagonal element equals zero or the diagonal matrix is not square, its inverse does not exist. But yet, pseudoinverse (keep the inverse of 0 as 0) can be used as a substitute in some methods.

Orthogonal matrix

An orthogonal matrix Q is a square matrix that fulfills the following requirement.

Image for post
Image for post

All the columns (v, …, vᵢ , …) in Q are orthonormal, i.e. vᵢᵀ vⱼ =0 for i≠j and vᵢ are all unit vectors.

Image for post
Image for post

This sounds like a tough requirement but for some matrix, like a symmetric matrix, we can choose our eigenvectors to be orthonormal during factorization.

The following matrix is orthogonal.

Image for post
Image for post

And

Image for post
Image for post

Like a diagonal matrix, its inverse is very easy to compute — the inverse of an orthogonal matrix is its transpose. This is one key reason why orthogonal matrices are so handy.

Image for post
Image for post

Proof:

Image for post
Image for post

If we multiply x with an orthogonal matrix, the errors present in x will not be magnified. This behavior is very desirable for maintaining numerical stability.

Image for post
Image for post

The relation QQᵀ=I simplify my relationship. For example, for projection,

This property simplifies many computations like projections.

Image for post
Image for post

And the projection vector can be simplified as

Image for post
Image for post

where qᵢ is column i of Q.

Symmetric matrix

A matrix is symmetric if its transpose equals itself.

Image for post
Image for post

For example,

Image for post
Image for post

Symmetric matrices are one of the most important matrices in linear algebra and machine learning. In machine learning (ML), we often use matrices to hold f(vᵢ , vⱼ). Such functions are often symmetrical, f(x, y) = f(y, x), and the corresponding matrix is therefore symmetric. For instance, in machine earning, f can measure the feature distance between data points or calculates the covariance of features.

Image for post
Image for post

Symmetric Matrix Properties

A symmetric matrix S is an n × n square matrices.

  • Its inverse is also symmetrical.
  • All eigenvalues of S are real (not a complex number).
  • We can choose n eigenvectors of S to be orthonormal even with repeated eigenvalues.
  • A symmetric matrix can be formed by multiplying a matrix A with its transpose — AᵀA or AAᵀ (usually AᵀA AAᵀ). In machine learning, the covariance matrix with zero-centered data is in this form.
Image for post
Image for post
  • AᵀA is invertible if columns of A are linearly independent.
  • Every symmetric matrix S can be diagonalized (factorized) with Q formed by the orthonormal eigenvectors vᵢ of S and Λ is a diagonal matrix holding all the eigenvalues.
Image for post
Image for post
Image for post
Image for post

The equation above can be rewritten as

Image for post
Image for post

where v are unit vectors. Therefore the eigenvalue term λᵢ dominates the importance of each term above. In fact, if it is too small, we can drop the corresponding term λᵢvᵢvᵢᵀ completely.

This factorization property and “S has n orthogonal eigenvectors” are two important properties for a symmetric matrix.

Orthonormal eigenvectors

Eigenvectors are not unique. But often, we can “choose” a set of eigenvectors to meet some specific conditions. As mentioned before, the eigenvectors of a symmetric matrix can be chosen to be orthonormal. If S is a symmetric matrix, its eigenvalue λ and μ fulfill the following condition.

Image for post
Image for post

Proof

Image for post
Image for post

From this condition, if λ and μ have different values, the equivalency force the inner product to be zero. Therefore, x and y are orthogonal and it is easy to normalize them to have unit length — orthonormal. This proves that we can choose eigenvectors of S to be orthogonal if at least their corresponding eigenvalues are different. But even with repeated eigenvalue, this is still true for a symmetric matrix.

Proof — part 2 (optional)

For an n × n symmetric matrix, we can always find n independent orthonormal eigenvectors. The largest eigenvalue is

Image for post
Image for post

To find the maximum, we set the derivative of r(x) to 0. After some manipulation, it can be shown that

Image for post
Image for post

i.e. r(x) achieves the highest ratio if x is an eigenvector with the largest eigenvalues. By induction, we can deduce that we can find the next highest eigenvalue with an eigenvector orthogonal to the previous one. This is only a high-level description of the proof. For details, please refer to section 7.2 in Gilbert Strang’s book (see reference at the end).

Spectral theorem

Let’s summarize. Every n × n symmetric matrix S has n real eigenvalues λᵢ with n chosen orthonormal eigenvectors vᵢ.

Image for post
Image for post

These eigenvalues can form a diagonal matrix Λ as diag(λ). We can also concatenate eigenvectors vᵢ to form V. i.e.,

Image for post
Image for post

We rename V as Q. Because Q is orthogonal, it is invertible and Qᵀ = Q⁻ ¹. Therefore, the symmetric matrix S can be factorized into

Image for post
Image for post

This is the Spectral theorem. Because finding transpose is much easier than the inverse, a symmetric matrix is very desirable in linear algebra.

Positive Definite Matrix

Positive definite matrix has all positive eigenvalues. It is symmetric so it inherits all the nice properties from it. It sounds unusual but many matrices in real-life problems are positive definite. The term below computes the energy of a system with state x. If S is positive definite, it guarantees the energy stays positive unless x is zero.

Image for post
Image for post

So if energy is positive, the corresponding S should be positive definite.

There are many equivalent conditions to test positive definiteness. A symmetric matrix S is positive definite if any of the tests below is true:

  1. All eigenvalues > 0,
Image for post
Image for post

2. All upper left determinants > 0,

Image for post
Image for post

3. All pivots > 0,

Image for post
Image for post

4. Energy > 0 except at x = 0,

Image for post
Image for post

5. S can be constructed by a matrix A having independent columns.

Image for post
Image for post

Verifying all eigenvalues is positive takes a lot of works. Therefore, condition 2 or 3 are a more common test. For example, positive pivots mean positive eigenvalues (or vice versa). On the other hand, if we prove a matrix is positive definite with one of the tests above, we guarantee that it owns all the properties above.

Beside positive definite, we also have positive semidefinite, negative definite and negative semidefinite. Positive semidefinite replace all the “>” conditions above with “≥”. For example, its eigenvalues are greater or equal to 0. Negative definite and negative semidefinite is the opposite of positive definite and positive semidefinite.

Proof

In this section, we will prove some of the property above. If S is positive definite, all λ is positive. Therefore, the computed energy of the corresponding state x is positive (except x = 0).

Image for post
Image for post

If S is composed of AᵀA, S is positive semi-definite under the energy test.

Image for post
Image for post

AAᵀ and AᵀA are positive semi-definite.

Minimum

In calculus, we set the first-order derivative of f to zero to find its critical point. However, such a point can be a maximum, a minimum or a saddle point. Many machine learning model expresses its cost function in the quadratic form xAᵀx. Knowing whether this function is a convex function is important. Since if it is convex, we know that the local minimum serves as the global minimum also. If A is positive definite, the quadratic function is convex.

Image for post
Image for post
Modified from source

For any function f, we compute the Hessian matrix below. If A is positive definite, the corresponding point is a local minimum.

Image for post
Image for post

Covariance matrix

In machine learning, we are very interested in finding the correlation between properties. The figure below demonstrates a positive relationship between the weight and the height.

Image for post
Image for post

In machine learning, we model the relationship with a covariance matrix Σ.

Image for post
Image for post

Covariance matrix is positive semidefinite.

Reference

Introduction to Linear Algebra by Gilbert Strang

The Matrix Cookbook by Kaare Petersen & Michael Pedersen

Written by

Deep Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store