The singular value decomposition (SVD) is a fundamental matrix decomposition in linear algebra. It is widely applied in many modern techniques, for example, high- dimensional data visualization, dimension reduction, data mining, latent semantic analysis, and so forth. Although the SVD plays an essential role in these fields, its apparent weakness is the order three computational cost. This order three computational cost makes many modern applications infeasible, especially when the scale of the data is huge and growing. Therefore, it is imperative to develop a fast SVD method in modern era. If the rank of matrix is much smaller than the matrix size, there are already some fast SVD approaches. In this paper, we focus on this case but with the additional condition that the data is considerably huge to be stored as a matrix form. We will demonstrate that this fast SVD result is sufficiently accurate, and most importantly it can be derived immediately. Using this fast method, many infeasible modern techniques based on the SVD will become viable.

The singular value decomposition (SVD) and the principle component analysis (PCA) are fundamental in linear algebra and statistics. There are many modern applications based on these two tools, such as linear discriminate analysis [

Currently there are some well-known methods for computing the SVD. For example, the GR-SVD is a two-step method which performs Householder transformations to reduce the matrix to bidiagonal form then performs the

The second purpose of this paper is to update the SVD when the matrix size is extended by new data updating. If the rank of matrix is much smaller than the matrix size, Matthew proposed a fast SVD updating method for the low-rank matrix in 2006 [

Multidimensional scaling (MDS) is a method of representing the high-dimensional data into the low-dimensional configuration [

Assume that

Assume that

Although

The MDS method is useful in the application of dimension reduction. If the matrix

Let

Let

Unlike the previous definition of

In 2008, we adapted the classical MDS so as to reduce the original

The main idea of fast MDS is using statistical resampling to split data into overlapping subsets. We perform the classical MDS on each subset and get the compact Euclidean configuration. Then we use the overlapping information to combine each configuration of subsets to recover the configuration of the whole data. Hence, we named this fast MDS method by the split-and-combine MDS (SCMDS).

Let

Assume that

After necessary modification to the sign of columns of

Assume that there are

When the number of samples

After reviewing the MDS and the SCMDS, We will now demonstrate how to adapt the SCMDS method to become the fast PCA and how the fast PCA can become the fast SVD with further modification.

Because the MDS is similar to the PCA when the data configuration is Euclidean, we can adapt the SCMDS method to obtain the fast PCA with the similar constraint when the rank

Equation (

Let

In practice, we do not want to produce the product matrix

However, the result of the SCMDS is of the form

Since

Notice that there are some linear SVD methods for thin matrix (

The concepts of the SVD and the PCA are very similar. Since the PCA starts from decomposing the covariance matrix of a data set, it can be considered as adjusting the center of mass of a row vector to zero. On the other hand, the SVD operates directly on the product matrix without shifting. If the mean of the matrix rows is zero, the eigenvectors derived by the SVD are equal to the eigenvectors derived by the PCA. We are looking for a method which will give a fast approach to produce the SVD result without recomputing the eigenvectors of the whole data set, when the PCA result is given. The following is the mathematical analysis for this process.

Let

If the singular value decomposition of

Checking the matrix size of

On the other hand, if

Let

From the above analysis, we can have a fast PCA approach by computing the SCMDS first and then adapt the MDS result to obtain the PCA. We named this approach SCPCA. Similarly, the fast SVD approach, which computes the SCMDS first, then adapts the MDS result to obtain the PCA, and finally adapts the PCA result to the SVD, is called the SCSVD. These two new approaches work when the rank of

In this section, we look for the solution when the data is updated constantly and we need to compute the SVD continuously. Instead of scanning all the data again, we try to use the previous result of the SCSVD together with the new updated data to compute the next SVD. Before introducing our updating method, we review the general updating methods first.

Let

If

Note that the matrix

In general, the extended matrix

Now, we discuss how to update the SVD in the SCSVD approach. If the updated column vectors

If

There is one important difference between the general approach and the SCSVD approach. To obtain the SVD from the MDS result, we need the column mean vector of the matrix. The column mean vector of the original matrix must be computed, and then the column mean vector is updated by the simple weighted average when the new data comes in.

Notice that no matter whether

In this section, we show that our fast PCA and fast SVD methods work well for big-sized matrices with small ranks. The simulated matrix is created by the product of two slender matrices and one small diagonal matrix, that is,

If we increase the matrix to

Comparison of the elapsed time between economical SVD (the solid line) and SCSVD (the dashed line).

Note that when the estimated rank used in the SCSVD is greater than the real rank of data matrix, there is almost no error (except rounding error) between the economic SVD and the SCSVD. The error between the economical SVD and the SCSVD is computed by comparing the orthogonality. Assume that

We would like to explore what happens if the estimated rank is smaller than the true rank. According to the experiment of the SCMDS [

The relationship between errors and estimated dimension of matrix size from 500 to 4000 with step size 500. The true rank of these matrices is 50. From the bottom to top is matrix of sizes 500, 1000, 1500,…, respectively.

We can see that when the estimated rank decreases, the error arises rapidly. Lines in Figure

The purpose of the second simulation experiment is to observe the approximation performance of applying the SCPCA to a big full rank matrix. We generate a random matrix with a fixed number of columns and rows, say 1000. The square matrix is created by the form,

Figure

The effect of estimated rank to the error. The matrix size is from 500-by-500 to 4000-by-4000 (from the bottom to the top, resp.) and its essential rank is 50 (

In the last experimental result, we let the matrix be growing and observe the performance between the general and the SCSVD update approaches. We start from a matrix that is formed by

The computational time of the SCMDS is more than the general approach; however, the difference is within one second. The error of the SCMDS approach to update

Compare the update time for SCMDS approach (red line) and the general approach (blue line).

We proposed the fast PCA and fast SVD methods derived from the technique of the SCMDS method. The new PCA and the SVD have the same accuracy as the traditional PCA and the SVD method when the rank of a matrix is much smaller than the matrix size. The results of applying the SCPCA and the SCSVD to a full rank matrix are also quite reliable when the essential rank of matrix is much smaller than the matrix size. Thus, we can use the SCSVD in a huge data application to obtain a good approximated initial value. The updating algorithm of the SCMDS approach is discussed and compared with the general update approach. The performances of the SCMDS approach both in the computational time and error are worse than the general approach. Hence, the SCSVD method is only recommended for computing the SVD for a large matrix but is not recommended for the update approach.

This work was supported by the National Science Council.