Sunday, 10 November 2013

Singular Value Decomposition(SVD) in R

When we have large number of attributes in a dataset, it’s hard to identify which of the variables are the most useful to use. Singular Value Decomposition (SVD) algorithm helps us to reduce the dimensions of data.
Here we’ll learn how to implement SVD.  But before that, we must know what does Dimensional Reduction mean?

Dimensional Reduction is the process of reducing the number of random variables under consideration as feature selection and feature extraction. There are many ways to do it. Here we will read about SVD, how it helps in Dimensional Reduction.

Before looking into implementation part, let’s have a brief overview what SVD is.

Singular Value Decomposition (SVD) is a matrix factorization method used in data mining.
In data mining, this algorithm is used to reduce the number of attributes that are used in a data mining process. This reduction removes unnecessary data that are linearly dependent in the point of view of Linear Algebra. For example, imagine a database which contains a field that stores the water's temperature on several samples and another that stores its state (solid, liquid or gas). It’s easy to see that the second field is dependent from the first and, therefore, SVD could easily show us that it is not important for the analysis.


SVD is the factorization of m x n matrix A, with m > n of real or complex numbers:

A = U S VT

Where U is an orthogonal m x n matrix, S is a diagonal matrix of singular values and V is an orthogonal n x n matrix &


where I is an identity matrix.

To compute SVD :
  • we find the eigen vectors and eigen values of ATA and AAT.
  • The eigenvectors of ATA  are the columns of V and the eigenvectors of AAT are the columns of U. The singular values of A, in the diagonal of matrix S, are the square root of the common positive eigenvalues of AAT and ATA.
  •  If AAT and ATA have the same number of eigenvalues, then A is a square matrix, else eigenvalues of the matrix that have less eigen values are the eigenvalues of the matrix that has more. We can say, that singular values of A are the eigen values of the matrix, between AAT and ATA with less number of eigenvalues.
  • Singular values of matrix A is also known as rank of that matrix that specifies number of linearly independent rows or columns.
  • Rank should not be greater than min(m,n)

Now we have clear idea of SVD, so now we’ll learn how to run SVD in R.

How to run SVD in R:

Here is the example for how to run in R.
Suppose we have a 4x3 matrix.
hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, "+") }
X <- hilbert(4)[, 1:3]
(s <- svd(X))
D <- diag(s$d)
s$u %*% D %* t(s$v) #  X = U D V'
t(s$u) %*% X %*% s$v #  D = U' X V

Svd(X) returns a list with components :
d : a vector containing the singular values of x, of length min(n, p).
u : a matrix whose columns contain the left singular vectors of x.
v : a matrix whose columns contain the right singular vectors of x.

s$d is a vector of singular values as below:

[1] 1.451914187 0.143312317 0.004228883

Using SVD, we can remove noise and linear independent elements with most important singular values. This is very useful in data mining.