When we have large number of attributes in a dataset, it’s
hard to identify which of the variables are the most useful to use. Singular
Value Decomposition (SVD) algorithm helps us to reduce the dimensions of data.

Here we’ll learn how to implement SVD. But before that, we must know what does
Dimensional Reduction mean?

Dimensional
Reduction is the process of reducing the number of random variables under
consideration as feature selection and feature extraction. There are many ways
to do it. Here we will read about SVD, how it helps in Dimensional Reduction.

Before looking into implementation part, let’s have a brief
overview what SVD is.

Singular Value
Decomposition (SVD) is a matrix factorization method used in data mining.

In data mining, this algorithm is used to reduce the number
of attributes that are used in a data mining process. This reduction removes
unnecessary data that are linearly dependent in the point of view of Linear Algebra.
For example, imagine a database which contains a field that stores the water's
temperature on several samples and another that stores its state (solid, liquid
or gas). It’s easy to see that the second field is dependent from the first
and, therefore, SVD could easily show us that it is not important for the
analysis.

**Algorithm:**

SVD is the factorization of m x n matrix A, with m > n of
real or complex numbers:

A = U S V

^{T}
Where U is an orthogonal m x n matrix, S is a diagonal
matrix of singular values and V is an orthogonal n x n matrix &

U

^{T}U = V^{T}V = I
where I is an identity matrix.

**To compute SVD :**

- we find the eigen vectors and eigen values of A
^{T}A and AA^{T}. - The eigenvectors of A
^{T}A are the columns of V and the eigenvectors of AA^{T}are the columns of U. The singular values of A, in the diagonal of matrix S, are the square root of the common positive eigenvalues of AA^{T}and A^{T}A. - If AA
^{T}and A^{T}A have the same number of eigenvalues, then A is a square matrix, else eigenvalues of the matrix that have less eigen values are the eigenvalues of the matrix that has more. We can say, that singular values of A are the eigen values of the matrix, between AA^{T}and A^{T}A with less number of eigenvalues. - Singular values of matrix A is also known as rank of that matrix that specifies number of linearly independent rows or columns.
- Rank should not be greater than min(m,n)

Now we have clear idea of SVD, so now we’ll learn how to run
SVD in R.

**How to run SVD in R:**

Here is the example for how to run in R.

Suppose we have a 4x3 matrix.
hilbert <-
function(n) { i <- 1:n; 1 / outer(i - 1, i, "+") }

X <- hilbert(4)[, 1:3]

(s <- svd(X))

D <- diag(s$d)

s$u %*% D %* t(s$v)
# X = U D V'

t(s$u) %*% X %*% s$v
# D = U' X V

**d : a vector containing the singular values of x, of length min(n, p).**

**u : a matrix whose columns contain the left singular vectors of x.**

**v : a matrix whose columns contain the right singular vectors of x.**

**s$d**is a vector of singular values as below:

[1] 1.451914187
0.143312317 0.004228883

Using SVD,
we can remove noise and linear independent elements with most important
singular values. This is very useful in data mining.

## No comments:

## Post a Comment