SVD Explained
In the previous post we talked about Principlal Component Analysis, a popular statistical technique for dimensionality reduction and feature decorrelation. Another common use case is matrix decomposition, where a matrix is factorized into a product of matrices (Eigendecomposition).
A limitation of Eigendecomposition is that it only works on square matrices. Singular Value Decomposition (SVD) is a generalization of Eigendecomposition, which works on any rectangle-shaped matrix. It has been widely used in Machine Learning applications such as Latent Semantic Analysis (LSA), where a document-word matrix is decomposed and re-represented while keeping the pattern in the original matrix, and image compression, where a matrix holding the pixel intensities is decomposed and re-represented as the product of three much smaller matrices, from which the original image can be reconstructed.
In this post, I’ll explain mathematically why a Singular Value Decomposition always exists for any rectangle-shaped matrix, and how to find it. It is assumed that you have some basic grasp of Linear Algebra (I recommend you to read the post on PCA).
The Math
Let be any real-valued matrix where . It can be shown that the matrix is
- symmetric
- positive semidefinite
Let be the eigenvalues of , and be the eigenvectors (column vectors of shape ) corresponding to , respectively. It follows that
and
- The eigenvalues are non-negative: .
- The eigenvectors (corresponding to different eigenvalues) are pairwise orthogonal: for and .
We’ll further assume that are unit vectors:
for .
Let’s define another column vectors (of shape ):
for
We can show that
In other words, ’s are also eigenvalues of the of matrix , and ’s are the eigenvectors corresponding to , for
and ’s are unit vectors too:
Note that is a matrix and , so it might have additional eigenvalues , with the corresponding eigenvectors , respectively. Because the eigenvectors correspond to different eigenvalues, they must be pairwise orthogonal:
for and
Finally, let’s put together the two groups of eigenvectors into a matrix and a matrix , where
and
Note that both and are inversible because their rows/columns are pairwise orthogonal (hence linearly independent).
Let’s compute the product of three matrices , and :
Remember that we defined , where .
It follows that for
Therefore for and ,
So
and the SVD of A is
Recap: the following are the steps to carry out SVD on any rectangle-shaped matrix of shape ()
- Compute and find its eigenvalues and eigenvectors for .
- Build matrix
- Build matrix , where for , and are eigenvectors of corresponding to eigenvalues that are not eigenvalues of .
-
Build matrix of shape
- Compute
Example
Toy example
First let’s create a small matrix that is easy to test out the theory:
import numpy as np
h = 4
w = 5 # w must be >= h
A = np.random.randint(0, 255, size=(h, w)).astype('float32')
Find the eigenvalues and eigenvectors of A.dot(A.T)
and A.T.dot(A)
, respectively:
eigval1, eigvec1 = np.linalg.eig(A.dot(A.T))
eigval2, eigvec2 = np.linalg.eig(A.T.dot(A))
Build the matrix U
and V
:
U = eigvec1.T
# Note: we use `thres` to determine which eigenvalues of `A.T.dot(A)`
# are NOT eigenvalues of `A.dot(A.T)`
thres = 1e-6
indices = [i for i in range(w) if
np.all(np.abs(eigval2[i] - eigval1)) >= thres]
V = np.hstack([A.T.dot(eigvec1) / np.sqrt(eigval1),
eigvec2[:, indices].reshape(w, -1)])
Build the matrix Sigma
:
Sigma = np.hstack([np.diag(np.sqrt(eigval1)), np.zeros((h, w - h))])
Finaly compute A_SVD
(Note U
and V
are orthonormal matrices so their inverses are identical to their transposed forms):
A_SVD = U.T.dot(Sigma).dot(V.T)
You can see that the reconstructed matrix is identical to the original (up to the error in numerical precision):
print(A)
array([[199., 227., 237., 107., 120.],
[254., 116., 184., 220., 171.],
[212., 150., 85., 195., 83.],
[ 24., 51., 178., 205., 135.]], dtype=float32)
print(np.round(A_SVD))
array([[199., 227., 237., 107., 120.],
[254., 116., 184., 220., 171.],
[212., 150., 85., 195., 83.],
[ 24., 51., 178., 205., 135.]])
Image compression
There is much neater way to carry out SVD than the above precedure. You can use the built-in function of NumPy in one line of code:
u, s, vh = np.linalg.svd(A)
The matrices u
and vh
are like U.T
and V.T
as the above example, and s
is a 1-D array holding the ’s (a.k.a. Singular Values, hence the name of SVD), and they are by default sorted in descending order. Usually we’d re-represent the original matrix by keeping only the largest Singular Values, and truncate u
and vh
accordingly. This is known as Truncated SVD.
Code:
import numpy as np
from PIL import Image
def truncated_svd_3_channel_compression(img, t):
"""`img` is 3-D array of shape [height, width, channels]
`t` is the number of singular values to keep
"""
u0, s0, vh0 = np.linalg.svd(img[:, :, 0])
u1, s1, vh1 = np.linalg.svd(img[:, :, 1])
u2, s2, vh2 = np.linalg.svd(img[:, :, 2])
r = u0[:, :t].dot(np.diag(s0[:t])).dot(vh0[:t, :])
g = u1[:, :t].dot(np.diag(s1[:t])).dot(vh1[:t, :])
b = u2[:, :t].dot(np.diag(s2[:t])).dot(vh2[:t, :])
return np.stack([r, g, b], axis=2).astype('uint8')
img = np.array(Image.open('img.jpg'))
n10 = truncated_svd_3_channel_compression(img, 10)
Below are the images reconstructed using different number of SV’s as well as the original image. The total number of SV’s is 450 (height of image).
Top 10 (L) and 20 (R) Singular Values
Top 30 (L) and 40 (R) Singular Values
Top 50 (L) Singular Values and orignal image (R)
We can see that as we increase the number of singular values, the reconstructed image gets less blurry and contain less “outlier” pixels, and the one reconstructed from only 50 singular values offers great approximation to the orignal image.