Prerequisites

We’ll assume you that you have read this introductory post on singular value decomposition a.k.a SVD.

The theorem

In this post, we’ll get our hands dirty and examine how and why SVD works. But let’s first recall what it is.

Let A be an n×m real matrix. Then A=UΣVT where

  • U is an n×n orthogonal1 matrix
  • V is an m×m orthogonal matrix
  • Σ is an n×m matrix where every entry except the first r diagonal entries is zero where r is the rank of A.
  • Further the non-zero diagonal entries of Σ are precisely the so-called singular values2 of A arranged in descending order.

We saw in the introductory post that the columns of V (let’s call them v1,v2,vm) are mutually perpendicular unit vectors which form a basis of Rm (a.k.a an orthonormal basis of Rm).

Further {A(vi)} still remain perpendicular to each other in Rn. The length of the vectors {A(vi)} are encoded by the diagonal entries of Σ. The unit vectors along {A(vi)} can be completed to an orthonormal basis of Rn, which are precisely given by the columns of U.

A blackbox

Before jumping to the proof, let’s reveal our secret weapon, the spectral theorem. This theorem, applied to real matrices states the following:

Let B be an m×m real and symmetric matrix. Then there exists an orthonormal eigenbasis for B.

Wait, what ? Let’s unravel that sentence again and take the result in, bit by bit.

You give me any real matrix B such that it is symmetric, i.e. Bi,j=Bj,i. We know that B is really representing a linear map Q:RmRm with respect to the standard basis. But the images of the standard basis under Q of course might get distorted. So the images might no longer remain perpendicular to each other, much less continue being the standard basis.

The spectral theorem assures us however that there is a much nicer basis available, an orthonormal basis of Rm in fact, which Q simply scales to different lengths!

More precisely, we can find unit mutually perpendicular vectors w1,w2,,wm so that Q(wi)=λiwi for some scalars λi. Of course these scalars are by definition the eigen-values of Q, or equivalently of B.

A running example and a proof

If only I had the theorems! Then I should find the proofs easily enough - Riemann”

Remember A, our very down to earth 2×2 matrix from last time

A=(6276)

Now unfortunately, we cannot use the spectral theorem on A because it is not symmetric. So instead, we first manufacture a symmetric matrix B from A by letting B=ATA. In our running example,

B=(6726)(6276)=(85303040)

Spectral theorem applied to B guarantees that there are two perpendicular unit vectors which are simply scaled by the action of B. We’ll not get into how you can find these vectors but wave the spectral theorem wand and give them to you.

v1=(2515) andBv1=100v1v2=(1525) andBv2=25v2

Here’s a visualization of the action of B on the standard basis and on V=(v1,v2). Both the bases’ lengths are scaled up a bit for clarity in the picture

action of B

So we have found an orthonormal basis v1,v2 of R2, which is very nice for B. How does A act on it ?

Av1=(105205) andAv1=10Av2=(10555) andAv2=5

We see that Av1 is perpendicular to Av2 ! Further the length of Av1,Av2 are 100=10,25=5, which are precisely the singular values of A by definition.

This is not a coincidence. For starters, we know v1 perpendicular to v2, i.e. vT1v2=0. Let’s see why Av1 is perpendicular to Av2

Av1Av2=(Av1)TAv2=vT1ATAv2=vT1Bv2=vT125v2=25vT1v2=0

Thus Av1,Av2 are perpendicular to each other. As an exercise, using the fact that length of a vector x=xTx, try computing the lengths of Avi.

More generally, the same reasoning tells us the following:

If v1,v2 are perpendicular eigenvectors of B=ATA for eigenvalues λ1,λ2, then Av1,Av2 are still perpendicular and have lengths λ1,λ2.

Thus, the orthonormal eigenbasis of B=ATA given by the spectral theorem is exactly V in the SVD of A=UΣVT !

Summary

Given an n×m matrix A, here are the steps to find the SVD of A.

  • Form the m×m symmetric matrix B=ATA
  • The spectral theorem gives us an orthogonal eigen-basis v1,v2,,vm for B of Rm with eigenvalues λ1λ2λm
  • V is just the m×m matrix (v1v2vm)
  • Σ is the n×m matrix where all non-diagonal entries are 0 and Σi,i=λi
  • AV is an n×m matrix made of mutually perpendicular columns
  • Scaling the columns of AV to unit vectors, complete it to an orthonormal basis of Rn. This orthonormal basis gives you the n×n matrix U.
  • Finally,
A=UΣVT

For our running example, we see that this corresponds to

(6276)=(15252515)(10005)(25151525)

Footnotes

  1. A square matrix X is orthogonal iff XTX=XXT is the identity matrix 

  2. The singular values of an n×m matrix A are the square roots of the eigenvalues of ATA listed with their algebraic multiplicities.