Recursive Sample Scaling Low-Rank Representation

The low-rank representation (LRR) method has recently gained enormous popularity due to its robust approach in solving the subspace segmentation problem, particularly those concerning corrupted data. In this paper, the recursive sample scaling low-rank representation (RSS-LRR) method is proposed. The advantage of RSS-LRR over traditional LRR is that a cosine scaling factor is further introduced, which imposes a penalty on each sample to minimize noise and outlier inﬂuence better. Speciﬁcally, the cosine scaling factor is a similarity measure learned to extract each sample’s relationship with the low-rank representation’s principal components in the feature space. In order words, the smaller the angle between an individual data sample and the low-rank representation’s principal components, the more likely it is that the data sample is clean. Thus, the proposed method can then eﬀectively obtain a good low-rank representation inﬂuenced mainly by clean data. Several experiments are performed with varying levels of corruption on ORL, CMU PIE, COIL20, COIL100, and LFW in order to evaluate RSS-LRR’s eﬀectiveness over state-of-the-art low-rank methods. The experimental results show that RSS-LRR consistently performs better than the compared methods in image clustering and classiﬁcation tasks.


Introduction
e limitations of classical feature learning techniques such as PCA [1] easily made the robust principal component analysis (RPCA) method an efficient choice for dealing with noise and outliers. Specifically, RPCA is focused on learning a low-rank subspace directly from the original high-dimensional data to preserve its geometric structure in a lowdimensional subspace. And this strategy has shown tremendous improvements in several applications [2][3][4][5]. However, as RPCA only seeks a single low-rank subspace, it may still be limited with noise damage since high-dimensional data are known to reside in multiple low-dimensional subspaces [6]. us, extending RPCA's idea, Liu et al. [4,7] proposed a method named "low-rank representation" (LRR). LRR's main advantage over RPCA lies in its aim to learn data's multiple low-dimensional subspaces and their membership. is approach makes LRR very robust to the negative effect of noise and outliers [8].
erefore, considering LRR's robustness mentioned above, several attempts were made in the literature such as references [9][10][11][12][13] to improve its performance. For example, to deal with the data from nonlinear subspaces, Tang et al. [13] proposed robust kernel LRR (RKLRR). Liu et al. [11] then adopted a fixed-rank strategy to accelerate LRR's computation process. However, their performances could be reduced with insufficient or heavily corrupted samples. at is why Xiao et al. [14] had previously proposed the latent LRR (LatLRR) for joint subspace segmentation and feature selection. e idea behind LatLRR is to include hidden data in constructing the dictionary to improve robustness further. On the contrary, how to handle gross data damage remains unsolved. It is observed that the more the data corrupted by noise, the larger the degradation in the classification and clustering performance.
To address this issue, a recursive sample scaling low-rank representation (RSS-LRR) method is proposed in this paper. Since some data samples will be more damaged in gross data corruption, we estimate each data sample's importance using a cosine scaling factor. is scaling factor measures the angle connecting each data sample and the low-rank representation's principal components in feature space. In this way, we then iteratively subdue noisy data samples to overcome their effect.
us, the proposed RSS-LRR can effectively obtain a good low-rank representation than existing methods. Our main contributions are summarized as follows: (1) We propose a novel method named "RSS-LRR," which measures each data sample's importance using a cosine scaling factor. is scaling factor is used in our model to extract each data sample's relationship with the low-rank matrix's principal components. (2) e proposed RSS-LRR method can effectively handle noisy data samples by iteratively restricting noisy data samples using the sample scaling factor to suppress noise so as to obtain a good low-rank representation.

Related Work
is section presents a brief review of the baseline methods RPCA and LRR. First, in this paper, matrices are written in uppercase, e.g., X. us, ‖X‖ F and ‖X‖ * denote the Frobenius norm and nuclear norm, respectively. ‖X‖ 2 and ‖X‖ 2,1 denote the vector norm and L 2,1 -norm, which are defined by ‖X‖ 1 � i,j |x ij | and ‖X‖ 2,

Robust Principal Component Analysis.
To recover a subspace structure from corrupted data, RPCA was proposed in [2]. Its strategy is to decompose a given data matrix into two components matrices by solving the following optimization problem: where the data matrix X � [x 1 , x 2 , . . . , x n ] ∈ R m×n is a matrix of n samples in the m-dimensional space, D is the low-rank matrix, E is a sparse error matrix, and λ is the regularized parameter to balance the effects of the two terms. us, RPCA's main objective from the above formula is to obtain low-rank and sparse elements by combining the nuclear norm and L 1 -norm. is approach is proven to be possible under some assumptions [16]. However, as RPCA assumes a single low-rank subspace, its performance can degrade easily.

Low-Rank Representation.
Liu et al. [5] proposed LRR to tackle RPCA's limitations. Specifically, LRR is focused on pursuing a data representation matrix with the lowest rank.
It can achieve that by using data's self-expressiveness property such that the given data itself are utilized as a selfdictionary. is way, each data sample is then represented as a linear combination of similar samples belonging to the same class. e optimal low-rank matrix obtained by LRR is defined as follows: where the data matrix X � [x 1 , x 2 , . . . , x n ] ∈ R m×n denotes the self-dictionary E is used to capture the error components where ‖.‖ p denotes a certain norm, which can be determined based on the type of noise corruption. For example, while ‖.‖ F is a suitable candidate for data damaged by Gaussian noise, ‖.‖ 1 is good for random noise. Besides, ‖.‖ 2,1 is an efficient choice when only a part of data are contaminated.
Although LRR's approach is shown to be very effective, particularly in noisy settings, its performance may degrade with insufficient samples. For this reason, Liu et al. [15] also proposed latent LRR that exploits both the observed and the hidden data to construct the self-dictionary. is strategy is most useful for image restoration [17]. Consequently, the work of [18] proposed a method for line pattern noise removal to address contaminated instances. It is realized in a transform domain by using a line pattern's directional property. Besides, other efforts were made in references [11,12,[19][20][21][22][23][24][25][26][27][28][29][30][31] to improve LRR's discriminative capability.
Notably, Bing-Kun Bao et al. [23] used a fixed-rank approach so as to reduce LRR's singular value decomposition cost. Zhang et al. [24] proposed two instantaneous methods: the first can reasonably handle noise interference by decomposing given data into two parts, namely, the low-rank sparse principal feature part and a noise-fitting error part. In [25], Tang et al. introduced a diversity regularization and a rank constraint to suppress the redundancy in different data views. In [26], Zhang et al. presented a method that adaptively preserves local information of salient features, thus guaranteeing a block-diagonal coefficient structure. Meanwhile, a compressive robust subspace clustering method was proposed in [27] for dimensionality reduction. However, because the subspace techniques, including LRR, do not provide linear dimensionality reduction (LDR) functionality, the feature selective projection (FSP) [28] was proposed. FSP combines feature extraction, feature selection, and LRR into a unified model to promote robust LDR. Likewise, a method was introduced in [29], which exploits a robust dictionary learning strategy to discover hybrid salient low-rank and sparse representations in a factorized compressed space. Furthermore, in an attempt to keep both similarity and local structures, the hierarchical weighted low-rank representation (HWLRR) [30] was proposed. Similarly, the more recent study [31] was focused on capturing cross-view information through an approach that preserves both diversity and consensus information of each data view.

RSS-LRR.
Since real-world data are not always perfect and inevitably corrupted by noise in practice, most existing low-rank methods cannot guarantee robust performance. erefore, noise interference must be carefully handled to resolve the present drawback. As a result, a rational solution is pursued in this paper, which uses a cosine scaling factor to estimate the importance of each data sample. Essentially, we suppose that the clean data samples will give high significant values, while the noisy ones would differ from the principal component of the data. us, the cosine scaling factor D is introduced into the LRR formulation in equation (2) using the constraint X D � X DZ + E. e motivation behind our approach is straightforward: according to reference [32], Z can be decomposed as UΣV T , where U and V are left and right low-rank singular vectors. In other words, U or V becomes the pursued projection vector such that XUϵR m is the data projection in feature space. erefore, XU 1 is chosen as our maximum projection direction where U 1 , the maximum eigenvalue of Σ, is U's column vectors. So, for an outlier data sample x j , the angle between it and the principal component vector XU 1 would differ more than that of clean data x i , as described in Figure 1(a). Hence, d i expressed in the following is used to estimate the importance of each data sample: where ε is a constant that stops d i from 0. us, using the significant factor d, a given data matrix can be scaled to minimize the effect of noisy data samples, allowing the lowrank structure to be realized with clean data as shown in Figure 1(b). As such, the proposed sample scaling low-rank model is obtained as follows: where Z denotes the low-rank matrix and E is used to capture the noise elements, similar to that in equation (2).
From the above formulation, one can easily check that the scaling factor π/2 suits our goal, as we can then detect the noisy points from closer angles to π/2, with lower values being assigned to such points. Illustratively, let us assume θ i � π/2, and d i is almost 0, so x i is subdued with x i � x i d i . us, X � X D is used to obtain new training data. Besides, using SVD, meaning that both new singular vectors U x and new projection vectors of sample space are obtained by suppressing the noisy data samples. As a result, our proposed method can then learn an optimal lowrank structure using new data X where the points closer to the principal component vector are enhanced.
We give the summary of our model's main characteristics as follows: (i) Unlike the existing low-rank methods, which use the input data X itself as the dictionary, a new dictionary is presented with X � X D by imposing a recursive scaling factor D on X to suppress the effect of noisy samples. (ii) Specifically, our recursive modeling is very useful for learning a good low-rank representation, most especially when the data are heavily contaminated. As shown in equation (4), our focus is to find Z by minimizing equation (4) with the constraint of X � XZ, thus allowing Z to preserve a better low-rank structure using only data samples with huge cosine similarity (referred to as clean data samples as they would have the smaller angle with the principal component). In other words, Z is obtained by equation (4) using clean data X.

Optimization.
In this section, we propose an optimization algorithm to solve equation (4). First, following standard practice, we introduce a variable J � Z to relax equation (4) further. us, equation (4) can be recast as e augmented Lagrangian function equation (5) is given as where M 1 and M 2 are Lagrange multipliers, μ > 0 is a penalty parameter, and ‖ · ‖ F denotes the Frobenius norm of a matrix. Many concepts for convex optimization have been developed [33][34][35], which rely on nuclear-norm regularization. And the optimization problem can be solved via the method in [36].

Computation of J.
According to references [37,38], nuclear-norm minimization methods have a stable performance. For computing J, we rewrite equation (6) as 3.2.2. Computation of Z. By fixing J and E and substituting X � X D, Z can be updated using the following formula:

Computation of E.
With X, Z, and J fixed, E can be solved as follows: Following reference [39], equation (9) can be solved by the following lemma.
Lemma 1 (see [40]). Let Q � [q 1 , q 2 , . . . , q n ] be a given matrix. If W * is the optimal solution, then us, the ith column of W * is Based on Lemma 1, supposing we have a matrix Q, W * can be reached directly using the above formula, making the E computation process very efficient. e complete solution is given in Algorithm 1.

Complexity Analysis.
is section gives an analysis of the computational cost of Algorithm 1. In Step 6 of Algorithm 1, the values of J k , Z k , E k are the same as J k− 1 , Z k− 1 , E k− 1 in the inner loop. erefore, as the number of iterations k increases, the time complexity in pursuing the low-rank variable Z k decreases faster. From the above discussion, the computing time in t k is far shorter than that in t 1 .
e cost of SVD is O(n 3 ), where n denotes the number of data vectors. Besides, Z costs O(n 2 m + n 3 ) for computation, where m is the data dimension. E in Step 10 then costs O(dn). Furthermore, supposing that the D subblock requires k times until convergence, the J, Z, E subproblems will then be calculated for t k iterations. erefore, based on the number of iterations, the combined cost is O(t(n 3 + n 2 m + dn)), such that t is a representation of the number of iterations. us, when n ≥ m, the cost's upper bound would be O(tn 3 ). Accordingly, the overall computational cost of the proposed method is O(ktn 3 ).

Background Modeling from Surveillance Video
is experiment is performed using surveillance video with various illumination settings. It is composed of a chain of 200 grayscale frames of 32 × 32 dimensions. us, each algorithm's effectiveness is evaluated using precision, recall, and F-score metrics, and their parameters are tuned according to the corresponding literature. Precisely, background modeling [44] is measured by manually quoting out the activities. In this experiment, 50% of frames are randomly selected as the training set, while the remaining are treated as the testing set. Figure 2, we show each algorithm's background recovery and activity segmentation performance. Additionally, it can be observed from Table 1 that RSS-LRR outperforms other methods in activity segmentation as it can better generalize to the testing frames.

Experimental Results. In
is result further substantiates our sample scaling factor approach's effectiveness such that a more reliable low-rank object is obtained than that from the compared methods. For each dataset, the images are resized to 32 * 32 dimensions in our experiments. As illustrated in Figure 3, these datasets are then corrupted with 5%, 10%, 15%, and 20% random pixel noise to demonstrate each algorithm's robustness to noise. us, a spectral clustering algorithm is applied to the similarity matrix of each algorithm to obtain the clustering results with ten multiple tries to ensure fairness [45]. Tables 2-4 display the clustering results of each algorithm concerning the accuracy evaluation metric on ORL, CMU PIE, and COIL20, respectively. erefore, it is obvious to see that the accuracy of our proposed RSS-LRR method consistently beats those of the compared methods in all three datasets. For example, on the ORL dataset, the accuracy of our proposed method is about 1% higher than that of its closest competitor on clean data. en, gradually increasing the noise level, one may notice that all the algorithms had reduced performance. However, our proposed method shows more robustness than the other methods, especially with 20% noise damage. Let us take, for instance, the clustering result of LRR, which moved from 0.7505 to 0.3497, while that of the proposed method had a lower drop from 0.7609 to 0.5650.

Experimental Results.
Similarly, in Tables 3 and 4, we can also see that the proposed method maintained its performance over other methods on the CMU PIE and COIL20 datasets. Particularly, RSS-LRR results with a 20% corruption level show that its accuracy is about 4% better than that of GODEC+, which is 0.5069 on the COIL20 dataset. Accordingly, we present the clustering variation graph of each method in Figure 4 that further reveals the robustness of the proposed method to noise. us, it is safe to conclude that the proposed method's performance is steadier than that of the other algorithms, especially at a high level of corruption. We attribute that to our scaling factor approach in iteratively overcoming the noise effect.

Experimental Settings.
In order to evaluate the robustness of RSS-LRR on image recognition under different levels of contiguous occlusions in images [19], we randomly add 6 × 6 and 8 × 8 46 block occlusions to each dataset, as illustrated in Figure 5. erefore, 50% of samples in each dataset are selected as the training set and the rest as the testing set. Besides, we compare our proposed method's performance with that of similar ones: LRR, LRRLC, latent LRR (LLRR), GLRR, and GODEC+, by adopting relevant experimental settings in Section 4.2. us, each algorithm's classification accuracy is evaluated using the K nearest neighbor (KNN) classifier. Table 5, the average classification accuracies are obtained on the ORL dataset in two levels of contiguous occlusions. From Table 5, we can see that the accuracy of RSS-LRR is only slightly higher than that (1) Input: training dataset X, regulation parameter λ (2) Initialize: t � 0, k � 0, J k�0 � 0, Z k�0 � 0, E k�0 � 0, ϵ 1 � 10 − 6 , ϵ 2 � 10 − 6 , ε � 0.0001, D � Diag(ones(n, 1)).

Experimental Results. In
(3) While not converged do (4) Update X by X � X D.
Update E while fixing others by equation (9).
Update the multipliers.
Update μ by μ � min(ρμ, max(μ)) (13) Update t k � t k + 1 (14) Check convergence conditions  Journal of Mathematics of GODEC+. On the contrary, RSS-LRR's accuracy is significantly better than that of LRRLC, LLRR, and LRR by about 5%, 4%, and 12%, respectively, under 6 × 6 occlusion. For example, the classification accuracy of RSS-LRR is 0.6015 and 0.4891 in LRR. is indicates that our proposed RSS-LRR can effectively obtain a good low-rank representation than existing methods. From Table 6, we can see that the accuracy of RSS-LRR is better than that of LRR by about 6%. Under 6 * 6 occlusion, the accuracy of RSS-LRR is better than that of GODEC+ and GLRR by about 2%. us, our proposed RSS-LRR demonstrates approximately outstanding effectiveness on classification accuracies among all algorithms. In Table 7, the accuracy of RSS-LRR is higher than that of LRRLC and LRR by about 3% and 6% under 6 * 6 occlusion. For example, the classification accuracy of GLRR is 0.6985 and 0.7130 in RSS-LRR, while the result of LRR is 0.6522 under 6 * 6 occlusion. Figures 6(a)-6(f ) illustrate the variations of classification accuracies with increasing feature dimensionalities in different occlusions on ORL, CMU PIE, and COIL20 datasets.
From Figures 6(a) and 6(b), we can see that RSS-LRR achieves the highest accuracies among all algorithms. In Figures 6(c) and 6(d), we can see that the performance of RSS-LRR becomes better than that of others on the CMU PIE dataset when the feature dimensionality is over 70. From Figures 6(e) and 6(f ), on the COIL20 dataset, we can see that the accuracies of RSS-LRR gradually show its superiority when the feature dimensionalities are more than 70.
From the above discussion, it can be seen that our proposed RSS-LRR shows better performance in classification accuracy compared with the other algorithms.

Experimental Settings.
In this experiment, RSS-LRR effectiveness is further evaluated on two larger datasets, namely, COIL100 and LFW. Also, the relevant settings from previous sections are adopted to perform the large-scale experiment. We give a brief description of each dataset as follows:         COIL100: it has 7200 images of 100 objects, which amount to 72 images for each object with each image taken at pose intervals of 5 degrees. LFW: the Labeled Faces in the Wild (LFW) dataset originally contains more than 13000 face images, mainly from Internet sources. However, 2484 face images were extracted from 38 classes in our experiments due to fewer samples in some categories. Each image was resized to 64×64 pixels, yielding 4096 features per image. Tables 8 and 9, it can be noticed that RSS-LRR performance is consistently better than that of the compared methods. For instance, while RSS-LRR's performance of 0.6052 on the COIL100 dataset corrupted with 6 × 6 block occlusion (Table 8) is slightly better than that of the secondbest GODEC+ by over 1%, it is far better by over 2% under 8 × 8 block occlusion. Similar results are obtained on the LFW dataset (Table 9), where the proposed method's performance is also better than that of the GODEC+ method, which follows closely, except that more margin of over 4% is obtained under 6 × 6 block occlusion.

Experimental Results. From the classification results in
From the clustering results displayed in Tables 10 and 11, the following can be observed: (i) Although all methods obtained comparative performances on both datasets, their accuracies degrade significantly by increasing noise. is, however, is not unexpected because more corruption levels would mean that more discriminative data features are destroyed, making it difficult to accurately group similar data samples in the same cluster.
(ii) Furthermore, while the relatively newer method GODEC+ shows more robustness to noise than the older methods, its clustering accuracy on clean data is slightly lower than that of the other methods on the LFW dataset, perhaps due to class imbalance in this dataset. (iii) Overall, RSS-LRR's robustness to noise grows stronger with the increase in noise level. For example, its performance on COIL100 under 0% noise is merely 1% better than that of its closest competitor GODEC+, but it is over 2% better under 20% noise. e same can be said on the LFW dataset, where RSS-LRR's clustering accuracy is only about 2% better than that of LRR on clean data, whereas it is more than 4% better than that of GODEC+, which is the closest result under the 20% noise level.
Additionally, in order to demonstrate more novelty, RSS-LRR's clustering and classification performances on large-scale datasets are further compared with those of two more recent state-of-the-art (SOTA) methods, namely, nonnegative sparse discriminative low-rank representation (NSDLRR) [47] and low-rank and collaborative representations for hyperspectral anomaly detection (LRCRD) [48]. From the classification results displayed in Figure 7, it can be observed that all three methods obtain correlative results on both datasets. However, RSS-LRR shows more robustness, especially on the LFW dataset. Similarly, the clustering results are shown in Figure 8, and the results are also close with those of the proposed method displaying the best overall performance.        together. Luckily, as shown in Figure 9, the proposed algorithm has a strong convergence property, as it converges within 150 iterations on COIL100 and ORL datasets.

Conclusion
In this paper, we propose a recursive sample scaling lowrank representation method named "RSS-LRR." Different from the existing methods, each data sample's importance is estimated by introducing a cosine scaling factor. is scaling factor is used to extract each sample's relationship with the low-rank representation's principal components in the feature space. us, our proposed model can effectively reduce the noise effect by iteratively reducing the importance of noisy samples in learning the robust low-rank matrix. Several experimental results on well-known benchmark datasets demonstrate that RSS-LRR performs better in clustering and classification tasks than the state-of-the-art methods. It includes various experiments conducted on gross corrupted data. erefore, we will extend RSS-LRR's idea to multiview data in future work.

Data Availability
e datasets used in this study are open benchmark datasets that are allowed for use in research. e following is a description and links to each one of them. Each can be accessed using the corresponding link. ORL contains face images of ten individuals, with each of them contributing forty distinct images under various conditions such as facial details and different facial expressions: http://cam-orl.co.uk/ facedatabase.html. CMU PIE is a face image repository with images of sixty-eight individuals with different settings. It includes thirteen different poses, four different expressions, and forty-two different illuminations: https://www.cs.cmu. edu/afs/cs/project/PIE/MultiPie/Multi-Pie/Home.html. COIL20 is an object image dataset consisting of 20 separate objects. Each object contributes 72 grayscale images, amounting to a total of 1440 images: https://www.cs. columbia.edu/CAVE/software/softlib/coil-20.php. COIL100 has 7200 images of 100 objects, which amount to 72 images for each object with each image taken at pose intervals of 5 degrees: https://www.kaggle.com/jessicali9530/ coil100. e Labeled Faces in the Wild (LFW) dataset originally contains more than 13000 face images, mainly from Internet sources. However, 2484 face images were extracted from 38 classes in our experiments due to fewer samples in some categories. Each image was resized to 64 × 64 pixels, yielding 4096 features per image: http://viswww.cs.umass.edu/lfw/.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this study.