Singular Value Decomposition (SVD) Tutorial Using Examples in R
If you have ever looked with any depth at statistical computing for multivariate analysis, there is a good chance you have come across the singular value decomposition (SVD). It is a workhorse for techniques that decompose data, such as correspondence analysis and principal components analysis. In this post I explain, at an intuitive level, how it works. I demonstrate this using examples in R. If you have not come across the SVD before, skip this post! It is only for that rare connoisseur, who has heard of it, wants to understand it a bit better, but is averse to lots of math.
This post is going to go walk you through singular value decomposition explained in R. I'll show you step by step how to compute the singular value decomposition in R in the SVD tutorial and discuss SVD properties.
Singular value decomposition example in R
The table below shows the standardized residuals from a contingency table showing the relationship between education and readership of a newspaper. The R code used to generate the table is below. You can find out more about this data and R code in the post about the math of correspondence analysis.
education.by.readership = matrix(c(5, 18, 19, 12, 3, 7, 46, 29, 40, 7, 2, 20, 39, 49, 16), nrow = 5, dimnames = list( "Level of education" = c("Some primary", "Primary completed", "Some secondary", "Secondary completed", "Some tertiary"), "Category of readership" = c("Glance", "Fairly thorough", "Very thorough"))) O = education.by.readership / sum(education.by.readership) E = rowSums(O) %o% colSums(O) Z = (O - E) / sqrt(E)
How to compute the SVD
The table above is a matrix of numbers. I am going to call it Z. The singular value decomposition is computed using the svd function. The following code computes the singular value decomposition of the matrix Z, and assigns it to a new object called SVD, which contains one vector, d, and two matrices, u and v. The vector, d, contains the singular values. The first matrix, u, contains the left singular vectors, and v contains the right singular vectors. The left singular vectors represent the rows of the input table, and the right singular vectors represent their columns.
SVD = svd(Z)
Explore SVD in Displayr
Recovering the data
The singular value decomposition (SVD) has four useful properties. The first is that these two matrices and vector can be "multiplied" together to re-create the original input data, Z. In the data we started with (Z), we have a value of -0.064751 in the 5th row, 2nd column. We can work this out from the results of the SVD by multiplying each element of d with the elements of the 5th row of u and the 2nd row v.
That is: -0.064751 = 0.2652708*0.468524*(-0.4887795) + 0.1135421*(-0.0597979)*0.5896041 + 0*(-0.6474922)*(-0.6430097)
This can be achieved in R using the code:
sum(SVD$d * SVD$u[5, ] * SVD$v[2, ])
Better yet, if we want to recompute the whole table of numbers at once, we can use a bit of matrix algebra:
SVD$u %*% diag(SVD$d) %*% t(SVD$v)
Now, at first glance this property may not seem so useful. Indeed, it does not even seem very clever. We started with a table of 15 numbers. Now, we have one vector and two tables, containing a total of 27 numbers. We seem to be going backwards!
Reducing the data
The second useful property of the SVD relates to the values in d. They are sorted in descending order (ties are possible). Why is this important? Take a look at the last value in d. It is 2.71825390754254E-17. In reality, this is 0 (computers struggle to compute 0 exactly). When recovering the data, we can ignore the last value of d, and also the last column of each of u and v. Their values are multiplied by 0 and thus are irrelevant. Now, we only have 18 numbers to look at. This is still more than the 15 we started with.
The values of d tell us the relative importance of each of the columns in u and v in describing the original data. We can compute the variance in the original data (Z) that is explained by the columns by first squaring the values in d, and then expressing these as proportions. If you run the following Rcode, it shows that the first dimension explains 85% of variance in the data.
variance.explained = prop.table(svd(Z)$d^2)
So, if we are happy to ignore 15% of the information in the original data, we only need to look at the first column in u and the first column in v. Now we have to look at less than half the numbers that we started with.
Halving the number of numbers to consider may not seem like a sufficient benefit. However, the bigger the data set, the bigger the saving. For example, if we had a table with 20 rows and 20 columns, we may only need to look at the first couple of columns, only needing to consider 10% of the number of values that we started with. This is the basic logic of techniques like principle components analysis and correspondence analysis. In addition to reducing the number of values we need to look at, this also allows us to chart the values, which saves more time. There is rarely a good way to chart 20 columns of data, but charting 2 columns is usually straightforward.
Two more properties
The third property of the SVD is that the rows of u represents the row categories of the original table, and the rows of v represent the column categories. The fourth property is that the columns of u are orthogonal to each other, and the columns of v are orthogonal to each other. With these two properties combined, we end up with considerable simplicity in future analyses. For example, this allows us to compute uncorrelated principal components in principal components analysis and to produce plots of correspondence analysis. I will walk through this in detail in my forthcoming post on the math of correspondence analysis.
All the R code in this post has been run using Displayr. Anyone can explore SVD and the R code used in this post in Displayr for free.