Split-and-combine Singular Value Decomposition for Large-scale Matrix

The singular value decomposition (SVD) is a fundamental matrix decomposition in linear algebra. It is widely applied in many modern techniques, for example, high-dimensional data visualization, dimension reduction, data mining, latent semantic analysis, and so forth. Although the SVD plays an essential role in these fields, its apparent weakness is the order three computational cost. This order three computational cost makes many modern applications infeasible, especially when the scale of the data is huge and growing. Therefore, it is imperative to develop a fast SVD method in modern era. If the rank of matrix is much smaller than the matrix size, there are already some fast SVD approaches. In this paper, we focus on this case but with the additional condition that the data is considerably huge to be stored as a matrix form. We will demonstrate that this fast SVD result is sufficiently accurate, and most importantly it can be derived immediately. Using this fast method, many infeasible modern techniques based on the SVD will become viable.


Introduction
The singular value decomposition (SVD) and the principle component analysis (PCA) are fundamental in linear algebra and statistics.There are many modern applications based on these two tools, such as linear discriminate analysis [1], multidimensional scaling analysis [2], and feature extraction, high-dimensional data visualization.In recent years, digital information has been proliferating and many analytic methods based on the PCA and the SVD are facing the challenge of their significant computational cost.Thus, it is crucial to develop a fast approach to compute the PCA and the SVD.
Currently there are some well-known methods for computing the SVD.For example, the GR-SVD is a two-step method which performs Householder transformations to reduce the matrix to bidiagonal form then performs the QR iteration to obtain the singular values [3,4].Since the offdiagonal regions are used to store the transform information, this approach is very efficient in saving the computational memory.If we only want to compute a few of the largest singular values and associated singular vectors of a large matrix, the Lanczos bidiagonalization is an important procedure for solving this problem [5][6][7][8].However, the previously mentioned methods all require matrix multiplications for the SVD.One interesting problem is how do we compute the SVD for a matrix when the matrix size is huge and loading the whole matrix into the memory is not possible?The main purpose of this paper is to deal with this problem when the numerical rank of the huge matrix is small.
The second purpose of this paper is to update the SVD when the matrix size is extended by new data updating.If the rank of matrix is much smaller than the matrix size, Matthew proposed a fast SVD updating method for the low-rank matrix in 2006 [9].A rank- thin SVD of an  ×  matrix can be computed in () time for  ≤ √min(, ).Although the matrix size remains unchanged in the Matthew's method, we can still adopt the analysis of this paper for updating the SVD when the matrix size is changed.
Multidimensional scaling (MDS) is a method of representing the high-dimensional data into the low-dimensional configuration [10][11][12].One of the key approaches of the MDS is simply the SVD, that is, if we can find a fast approach of the MDS then it is possible to find a fast approach of the SVD.When the data configuration is Euclidean, the MDS is similar to the PCA, in that both can remove inherent noise with its compact representation of data.The order three computational complexity makes it infeasible to apply to huge data, for example, when the sample size is more than one million.In 2008, Tzeng et al. developed a fast multidimensional scaling method which turned the classical order three MDS method to be linear [13].In this paper, we would like to implement SCMDS to the fast SVD approach, say SCSVD.The following subsections are reviews of the classical MDS and the SCSVD.

Review of the Classical MDS.
Assume that  is an -by- matrix with rank , where the th row of  represents the coordinates of the th data point in an -dimensional space.Let  =   be the product matrix of .Torgerson [10] proposed the first classical MDS method to reconstruct a matrix  of Cartesian coordinates of points from a given symmetric matrix .The key of the classical MDS is to apply the double centering operator and the singular value decomposition.
Assume that 1 is an -by-1 matrix whose elements are all 1's.We define a symmetric matrix  by where   = (1/) 1 1  is the row mean matrix of ,   = (1/) 1 1   is the column mean matrix of , and   = (1/ can be simplified to  = .Since matrix  is symmetric, the SVD decomposes  into  = Σ  .Then we have for some unitary matrix .In practice, we set  =  to obtain the MDS result, namely, √ . Although √  ̸ = , √  = Σ 1/2 = ( − (1/) 1 1  ) for some unitary matrix , it is the same as shifting all the rows of  such that the row's mean is zero and then multiplying the result by .Hence, the row of √  preserves the pairwise relationship among the data points. is derived from  by double centering, whose cost is lower than obtaining √  by the SVD.When the data size is huge, the cost of the classical MDS is dominated by the cost of the SVD, whose ( 3 ) computational complexity makes it infeasible.Therefore, we need a fast MDS method to deal with the huge data problem.
The MDS method is useful in the application of dimension reduction.If the matrix  is extended to the higher dimensional space and then shifted and/or rotated by an affine mapping, the double centering result of the corresponding product matrix stays the same.More generally, let  1 ∈  , 1 be derived from  ∈  , by an affine mapping such that  1 =  + 1   , where  is orthogonal and  ∈  , 1 and  is a nonzero vector in   1 and  ≪ min(,  1 ).That is  1 is a big matrix but with a rank less than .Assume that  1 =  1   1 , the double centering of  1 obtains the same  as double centering of .
Theorem 1.Let  1 be the dimensional extension from  by an affine mapping  1 =  + 1   .Then the double centering of  1   1 is the same as the double centering of   .
Proof.Let  1 =  1   1 = ( + 1   )( + 1   )  and let  =   , we have where  =  is a vector in R  .By the definition of   ,   and   , we have where   is the element of .Then we have, Unlike the previous definition of , if  is a distance matrix with each element  , = √(  −   )  (  −   ), the double centering of  2 is equivalent to −2, provided that ∑  =1   = 0. Hence, the MDS method performs double centering on  2 , multiplies by −1/2, and then performs the SVD, which gives the configurations of the data set.

Review of the SCMDS.
In 2008, we adapted the classical MDS so as to reduce the original ( 3 ) complexity to () [13], in which we have proved that when the data dimension is significantly smaller than the number of data entries, there is a fast linear approach for the classical MDS.
The main idea of fast MDS is using statistical resampling to split data into overlapping subsets.We perform the classical MDS on each subset and get the compact Euclidean configuration.Then we use the overlapping information to combine each configuration of subsets to recover the configuration of the whole data.Hence, we named this fast MDS method by the split-and-combine MDS (SCMDS).
Let  be the -by- product matrix for some  and let I be a random permutation of 1, 2, . . ., .Assume that I  is a nonempty subset of I for  = 1, . . ., , where I  ⋂ I +1 ̸ = 0 for  = 1, . . .,  − 1 and ⋃ I  = I.I  is considered as the index of overlapping subgroups.  is the submatrix of  that is extracted from the rows and columns of  by the index I  .Then   is the product matrix of the submatrix of  by index I  .Since the size of   is much smaller than , we can perform the classical MDS easily on each   to obtain the MDS result, say   .After obtaining the representation coordinates of   for each subgroup, we have to combine these representations into one.Without loss of generality, we only demonstrate how to combine the coordinates of the first two subgroups.
Assume that  1 and  2 are matrices whose columns are the two coordinate sets of the overlapping points with respect to  1 and  2 , respectively.Then there exists an affine mapping that maps  1 to  2 .Let  1 and  2 be the means of columns of  1 and  2 , respectively.In order to obtain the affine mapping, we apply QR factorization to both Then we have It is clear that the mean of the center of column vectors of  1 −  1 1  has been shifted to zero.Because  1 and  2 come from the same data set, the difference between  1 − 1 1  and 1  is a rotation.Therefore, the triangular matrices  1 and  2 should be identical when there is no noise in  1 and  2 .Due to randomness of the sign of columns of   in QR factorization, the sign of columns of   might need to be adjusted according to the corresponding diagonal elements of   , so that the signs of tridiagonal elements of  1 and  2 are the same.
After necessary modification to the sign of columns of   , we conclude that Furthermore, we have Here, the unitary operator is  =  1   2 and the shifting is Since the key step of finding this affine mapping is QR decomposition, the computational cost is ((  ) 3 ), where   is the number of columns of  1 and  2 .Therefore, the cost ((  ) 3 ) complexity is limited by the number of samples in each overlapping region.The proof of the computational cost of SCMDS is given later.
Assume that there are  points in a data set.We divide these  samples into  overlapping subgroups, and let   be the number of points in each subgroup and let   be the number of points in each intersection region.Then we have the relationship or For each subgroup, we apply the classical MDS to compute the configuration of each group data, which costs ((  ) 3 ).
In each overlapping region, we apply QR factorization to compute the affine transformation, which costs ((  ) 3 ).
Assume that the true data dimension is , and the lower bound of   is  + 1.For convenience, we take   =  for some constant  > 2. Then the total computational cost is about When  ≪ , the computational cost ( 2 ) is much smaller than (√), which is the computation time of the fast MDS method proposed by Morrison et al., 2003 [14].The key idea of our fast MDS method is to split data into subgroups, then combine the configurations to recover the whole one.Since all the order three complexities are restricted in the small number of data entries, we can therefore speed up MDS.
Proposition 2. When the number of samples  is much greater than the dimension of data , the computational complexity of SCMDS is ( 2 ), which is linear.

Methodology
After reviewing the MDS and the SCMDS, We will now demonstrate how to adapt the SCMDS method to become the fast PCA and how the fast PCA can become the fast SVD with further modification.

2.1.
From the SCMDS to the SCPCA.Because the MDS is similar to the PCA when the data configuration is Euclidean, we can adapt the SCMDS method to obtain the fast PCA with the similar constraint when the rank  is much smaller than the size  and .Equation (1) shows how to use double centering to convert the product matrix  to  and the √  =  − 1 1  .We note that the MDS result √  is equal to the matrix obtained from shifting the coordinates of  with the mean of data points to the original point followed by a transformation by some unitary matrix .
Let  ∈  , ; the purpose of the PCA is to find the eigenvectors of the covariance matrix of  and to obtain the corresponding scores.Since the covariance matrix of  defined by (1/( − 1))( − (1/) 1 1  )( − (1/) 1 1  )  is symmetric and finding the eigenvectors of the covariance matrix of  is similar to obtaining  of √  in (1), we set  − A(1/) 1 1  =  − (1/) 1 1   by omitting the constant 1/ √  − 1.We note that such omission of the constant will not change the direction of the eigenvectors.When we want to carry out the PCA on , the related matrix  can be obtained by  =  − (1/) 1 1  + 1 , where  ∈   is defined by In other words, carrying out the PCA on A is equivalent to carrying out the MDS on  =   .That is and the result of PCA will be  and (1/ √  − 1)Σ 1/2 .From the result of the MDS ( √  = Σ 1/2 ), we can easily obtain  and Σ by normalizing the column vectors of √ .
In practice, we do not want to produce the product matrix  =   when  is huge.Rather, after we obtain matrix  from , we will randomly produce the index permutation I and use the index of overlapping subgroup I  to compute   .And then we follow the step of the SCMDS to obtain the result, √ .
However, the result of the SCMDS is of the form √  + 1   , where  is an unitary matrix in   and  ∈ R  is usually a nonzero vector.Removal of the factor  can be easily done by computing the average of the row vectors of the SCMDS result.After removing the component  from each row vector, we obtain √ .
Since √  ∈  , and  ≪ , we have which is a  ×  small matrix.It is easy to solve this small eigenvalue problem to obtain , Σ, and .The computational cost from √  + 1   to obtain  and Σ 1/2 is ().Hence, the SCMDS approach to computing the SVD is still linear.
Notice that there are some linear SVD methods for thin matrix ( ≪  or  ≫ ).Even in the case of big matrix with small rank, the traditional SVD method, for example the GR-SVD, can be implemented to be linear (because of the dependency, many components of the matrix become zero in the GR-SVD algorithm).However, if the rank of big matrix is almost full and the numerical rank is small or the matrix size is considerably huge that loading the whole matrix is impossible, the SCMDS has the advantage in computing speed.And we call this approach to obtaining  and Σ 1/2 by the SCMDS to be SCPCA.

2.2.
From the SCPCA to the SCSVD.The concepts of the SVD and the PCA are very similar.Since the PCA starts from decomposing the covariance matrix of a data set, it can be considered as adjusting the center of mass of a row vector to zero.On the other hand, the SVD operates directly on the product matrix without shifting.If the mean of the matrix rows is zero, the eigenvectors derived by the SVD are equal to the eigenvectors derived by the PCA.We are looking for a method which will give a fast approach to produce the SVD result without recomputing the eigenvectors of the whole data set, when the PCA result is given.The following is the mathematical analysis for this process.
Let  be a column matrix of data set: X = −⋅ 1  , where  is the mean of columns of .Hence, the row mean of X is zero.Assume that we have the PCA result of , that is, XX  = Σ 2   .Then we have X = Σ  for some orthogonal matrix .Assume that the rank of X is  and  is much smaller than the matrix size.We observe that rank() =  or rank() =  + 1, depending on whether  is spanned by columns of .If  is spanned by , then where  is the coefficient vector of  when represented by , that is,  =  ⋅ .
If the singular value decomposition of Because the matrix  2 is unitary,  1 =  2 is automatically an orthogonal matrix as well.Then we have the SVD of .
Checking the matrix size of Σ  +  ⋅ 1  , we can see that to compute the SVD of Σ  +  ⋅ 1  is not a big task.This is because Σ  +  ⋅ 1  is a -by- matrix, and under our assumption,  is much smaller than , so we can apply the economic SVD to obtain the decomposition of Σ  +  ⋅ 1  .
On the other hand, if  is not spanned by , the analysis becomes where  +1 is a unit vector defined by Using the same concept of diagonalization in the case when  is spanned by columns of , we find the SVD of 2 is another orthogonal matrix, and hence the SVD of  is completed.Proposition 3. Let  ∈  , (R) be a big size matrix and let its rank  be much smaller than min{, }.Then the SCSVD or the SCPCA has linear computational complexity.
From the above analysis, we can have a fast PCA approach by computing the SCMDS first and then adapt the MDS result to obtain the PCA.We named this approach SCPCA.Similarly, the fast SVD approach, which computes the SCMDS first, then adapts the MDS result to obtain the PCA, and finally adapts the PCA result to the SVD, is called the SCSVD.These two new approaches work when the rank of  is much smaller than the number of samples and the number of variables.To obtain the exact solution, the parameter   must be greater than the rank of .In the SCPCA or SCMDS method, if   ≤ , we only get the approximated solutions of the PCA and the SVD.Under the necessary criterion, we can reduce the computational complexity from min{(), ( 2 ), ( 2 } to min{(), ()}.If the numerical rank is not small, for example,  ≈ min(√, √), the computational complexity becomes almost the same as the original PCA and SVD.Our approach has no advantage in the latter case.

SVD for Continuously Growing Data
In this section, we look for the solution when the data is updated constantly and we need to compute the SVD continuously.Instead of scanning all the data again, we try to use the previous result of the SCSVD together with the new updated data to compute the next SVD.Before introducing our updating method, we review the general updating methods first.
Let  be an -by- matrix, where  is the number of variables and  is the number of samples.And we assume that both  and  are huge.When new data comes in, we collect these new data to form a column matrix which is denoted by .Assume that we have the singular value decomposition of , that is, where  ∈  , (R),  ∈  , (R) are orthogonal and Σ ∈   (R) is a diagonal.Since the data gets updated, the data matrix becomes To compute the singular value decomposition of  1 , we need to compute the eigenvalue and eigenvector of  1   1 .If  is spanned by the column space of , we can represent the column matrix  by  = , where  is the coefficient matrix of  with columns of  as the basis.Since  is orthogonal, the coefficient matrix  can be computed easily by  =   .Then we have Note that the matrix Σ 2 +   is positive symmetric and it is a small size matrix.Using the spectrum theorem, we can decompose this matrix into Σ 2 1   .Because the matrix  is unitary,  1 is  rotated by .In this case, the computational cost is dominated by the rotation from  to  1 , and it is ( 2 ).
In general, the extended matrix  is not spanned by the column space of  and the decomposition of  1   1 should be modified by where The cost to obtain  and  1 is (), the computational cost of QR decomposition of  1 is ( 2 ), the cost to obtain Σ 2 1 is (( + ) 3 ), and the cost of rotation  to  1 is (( + ) 2 ).When  ≫ ,  the cost of update process is linear ().However, if the rank is not small ( ≈  or  ≈ ), the computational cost is dominated by (( + ) 3 ).Now, we discuss how to update the SVD in the SCSVD approach.If the updated column vectors  are spanned by the column vectors of some subgroup of the original data, say   , the dimension will not increase by the extended matrix .We just reset the distance matrix   in this subgroup by where  , is the number of vectors in the original -th subgroup.To compute the MDS result of this updated subgroup, it costs (( , + ) 3 ).Then we will combine this  , +  MDS representation points with the remaining points.The cost of obtaining the combination affine mapping is still ((  + ) 3 ) which is increased by the new update.Combining this new subgroup with the whole data set needs ( 2 ( , −   )).Then transfering the MDS result to the SVD needs ( 2 ), which is linear too.If  is not spanned by any subgroup of the previous SCMDS process, then we have to update  to the largest group of the subgroups.Because the dimension of this updated group will increase, we have to make some modifications in the combined approach.If the column matrix of the MDS configuration of the new updated group is  1 ∈   1 , 1 and the column matrix of the configuration of the other elements is  2 ∈  , 2 , where  1 > , then we will extend the matrix  2 to an  1 -by- 2 matrix by filling zeros in the bottom of the original  2 .After adjusting  1 and  2 to the same dimension of rows, we perform the same combination step as before.
Here, the width of matrix  2 is almost the number of whole data, and this combination step takes ( 2  1 ).There is one important difference between the general approach and the SCSVD approach.To obtain the SVD from the MDS result, we need the column mean vector of the matrix.The column mean vector of the original matrix must be computed, and then the column mean vector is updated by the simple weighted average when the new data comes in.
Notice that no matter whether  is spanned by  or not, the update method in the SCSVD has no advantage over the general update method although the computational costs are of the same order.This is because the true computational cost of the SCSVD updating method is always more than the general updating method, even though they are of the same order.Thus, when we have to update the new data to the matrix on which the SVD will be performed, the general method is recommended.However, when the matrix size is so huge that to expand Σ 2 +   becomes impossible, it is better to recompute the SVD result.In this case and with the condition that the true rank is much smaller than the matrix size, the SCSVD approach is recommended.The relationship between errors and estimated dimension of matrix size from 500 to 4000 with step size 500.The true rank of these matrices is 50.From the bottom to top is matrix of sizes 500, 1000, 1500,. .., respectively.
In the last experimental result, we let the matrix be growing and observe the performance between the general and the SCSVD update approaches.We start from a matrix that is formed by  4000,50 × Λ 50 ×  50,4000 + 0.01 ⋅  4000,4000 , where Λ is a diagonal matrix such that the diagonal terms are positive and decayed rapidly like natural data and each element in  and  is a random number from the normal distribution N(0, 1).Then we update a 100 × 4000 random matrix into the matrix 10 times of the size.The element of the updated matrix is also from the normal distribution N(0, 1).When the matrix gets updated, we compute the SVD decomposition to obtain the first 20 columns of .The updated result of the general approach is  1 ×Σ 1 × 1 , and the updated result of the SCMDS approach is  2 × Σ 2 ×  2 .We compute the error as (25) for  1 and  2 .The reason that we  concern ourselves only with the first 20 bases of the column space is that it is rare to use such a high dimension in the dimension reduction applications.
The computational time of the SCMDS is more than the general approach; however, the difference is within one second.The error of the SCMDS approach to update 100 rows compared with recomputing the SVD decomposition is 4.0151×10 −5 with standard deviation 3.5775×10 −5 .The error of general approach is 1.4371 × 10 −12 with standard deviation 1.5215 × 10 −12 .The update time of the SCMDS and general approach is shown in Figure 4.The red solid line is the time for the SCMDS and the blue line is that for general approach.We can see that the SCMDS approach is about 8 times of general approach.Hence the SCSVD update approach is not recommended for either saving time or controlling error.

Conclusion
We proposed the fast PCA and fast SVD methods derived from the technique of the SCMDS method.The new PCA and the SVD have the same accuracy as the traditional PCA and the SVD method when the rank of a matrix is much smaller than the matrix size.The results of applying the SCPCA and the SCSVD to a full rank matrix are also quite reliable when the essential rank of matrix is much smaller than the matrix size.Thus, we can use the SCSVD in a huge data application to obtain a good approximated initial value.The updating algorithm of the SCMDS approach is discussed and compared with the general update approach.The performances of the SCMDS approach both in the computational time and error are worse than the general approach.Hence, the SCSVD method is only recommended for computing the SVD for a large matrix but is not recommended for the update approach.

Figure 1 :Figure 2 :
Figure 1: Comparison of the elapsed time between economical SVD (the solid line) and SCSVD (the dashed line).

Figure 3 :
Figure3: The effect of estimated rank to the error.The matrix size is from 500-by-500 to 4000-by-4000 (from the bottom to the top, resp.) and its essential rank is 50 ( = 0.01).When the estimated rank is greater than 50, there is almost no composition error.

Figure 4 :
Figure 4: Compare the update time for SCMDS approach (red line) and the general approach (blue line).