One-Step Robust Low-Rank Subspace Segmentation for Tumor Sample Clustering

Clustering of tumor samples can help identify cancer types and discover new cancer subtypes, which is essential for effective cancer treatment. Although many traditional clustering methods have been proposed for tumor sample clustering, advanced algorithms with better performance are still needed. Low-rank subspace clustering is a popular algorithm in recent years. In this paper, we propose a novel one-step robust low-rank subspace segmentation method (ORLRS) for clustering the tumor sample. For a gene expression data set, we seek its lowest rank representation matrix and the noise matrix. By imposing the discrete constraint on the low-rank matrix, without performing spectral clustering, ORLRS learns the cluster indicators of subspaces directly, i.e., performing the clustering task in one step. To improve the robustness of the method, capped norm is adopted to remove the extreme data outliers in the noise matrix. Furthermore, we conduct an efficient solution to solve the problem of ORLRS. Experiments on several tumor gene expression data demonstrate the effectiveness of ORLRS.


Introduction
Tumor is a group of cells that have undergone unregulated growth and often form a mass or lump. It is critical to reveal the pathogenesis of cancer by analyzing tumor gene expression data. e advances of various sequencing technologies have made it possible to measure the expression levels of thousands of genes simultaneously [1]. Increasingly, one challenge is how to interpret these gene expression data to gain insights into mechanisms of tumors [2]. Many advanced machine learning algorithms [3][4][5][6][7][8][9] have thus been proposed to analyze various data. Among them, clustering can be used for discovering tumor samples with similar molecular expression patterns [10,11].
Many traditional clustering methods, such as hierarchical clustering (HC) [12,13], self-organizing maps (SOM) [14], nonnegative matrix factorization (NMF) [15,16], and principal component analysis (PCA) [17][18][19][20] have been used for gene expression data clustering. e gene expression data often contains structures that can be represented and processed by some parametric models. e linear subspaces are possible to characterize a given set of data since they are easy to calculate and often effective in real applications. e subspace methods, such as NMF, are essentially based on the assumption that the data is approximately drawn from a low-dimensional subspace. In recent years, these methods have been gaining much attention. For example, Yu et al. proposed a correntropy-based hypergraph regularized NMF (CHNMF) method for clustering and feature selection [21]. Specifically, the correntropy is used in the loss term of CHNMF instead of the Euclidean norm to improve the robustness of the algorithm. And, CHNMF also uses the hypergraph regularization to explore the high-order geometric information in more sample points. Jiao et al. proposed a hypergraph regularized constrained nonnegative matrix factorization (HCNMF) method for selecting differentially expressed genes and tumor sample classification [22]. HCNMF incorporates a hypergraph regularization constraint to consider the higher order data sample relationships. A nonnegative matrix factorization framework based on multisubspace cell similarity learning for unsupervised scRNA-seq data analysis (MscNMF) was proposed by Wang et al. [23]. MscNMF can learn the gene features and cell features of different subspaces, and the correlation and heterogeneity between cells will be more prominent in multisubspaces, resulting in the final cell similarity learning will be more satisfactory.
However, real data rarely can be well represented by a single subspace. A more reasonable model is to assume that the data are lying near multiple subspaces (i.e., the data are considered as samples approximately drawn from a mixture of multiple low-dimensional subspaces). Subspace clustering (or segmentation) has been proposed to improve clustering accuracy. It is assumed that the data points are drawn from the combination of multiple low-dimensional subspaces. e goal of subspace clustering is to obtain such multiple low-dimensional subspaces with each subspace corresponding to a cluster. Subspace clustering has obtained promising results in previous studies, and subspace clustering methods have been found widespread applications in many areas, such as pattern recognition [24], image processing [25], and bioinformatics [26].
When the data are clean, i.e., the samples can be strictly drawn from multiple subspaces, several existing methods, such as sparse subspace clustering (SSC) [27], low-rank representation (LRR) [5], and low-rank model with discrete group structure constraint (LRS) [28], are able to solve the subspace clustering problem. SSC clusters the data drawn from multiple low-dimensional subspaces based on sparse representation (SR) [29]. Since low-rank structure can well perform matrix recover, the multiple subspaces can be exactly recovered by LRR. Recently, many excellent works based on low-rank representation are published. For example, Tang et al. proposed a multiview subspace clustering model by learning a joint affinity graph for multiview subspace clustering based on low-rank representation with diversity regularization and rank constraint [30]. is method can effectively suppress redundancy and enhance the diversity of different feature views. In addition, the cluster number is used to promote affinity graph learning by using a rank constraint. In [31], an unsupervised linear feature selective projection (FSP) method was proposed for feature extraction with low-rank embedding and dual Laplacian regularization. FSP can take advantage of the inherent relationship between data and can effectively suppress the influence of noise. LRR have two steps in the clustering task: building the affinity matrix and performing spectral clustering. How to define an excellent affinity matrix is crucial. Furthermore, the clustering problem will be transformed into a segmentation problem of graph by using spectral clustering. e choice of segmentation criteria will directly affect the clustering results. To address the above concerns, LRS directly grasps the indicators of different subspaces via the discrete constraint. As a result, multiple low-rank subspaces can be obtained clearly. Furthermore, Nie et al. introduced a piecewise function to relax the rank constraint which makes LRS better at handling the noisy dataset than the preliminary version [32].
As pointed out in [33], one major challenge of subspace clustering is to deal with the outliers that exist in data. erefore, robust subspace clustering has become an active research topic. To address the robustness issue, the main idea is to explore the L 2,1 -norm based objective functions since the nonsquared residuals of L 2,1 -norm can reduce the effects of data outliers. In [34,35], the L 2,1 -norm is adopted in robust PCA (RPCA) for detecting outliers. In [33], Liu et al. proposed a robust LRR model via L 2,1 -norm for subspace clustering. Although the L 2,1 -norm is robust to outliers, it still suffers from the extreme data outliers. e L 2,1 -norm just reduces, not completely removes, the effects of the outliers. Capped norm is a more robust strategy than L 2,1norm due to the fact that it can remove the effects of the outliers. It has been recently studied in many applications [36,37].
In this paper, a one-step robust low-rank subspace segmentation (ORLRS) method via the discrete constraint and capped norm is proposed for clustering tumor sample. For a data set X ∈ R m×n with m genes and n samples, a lowrank representation matrix A ∈ R m×n and a noise matrix E ∈ R m×n , i.e., X � A + E, are being sought. e low-rank representation of the i-th subspace A i can be denoted as rank(A i ). Here, we impose the discrete constraint on a diagonal matrix I i ∈ R n×n to obtain the low-rank representation rank(XI i ), where I i ⊆ 0, 1 { } and c i�1 I i � I (c is the number of total subspaces and I is an identify matrix). e indicators of the i-th cluster are included in I i . In contrast to traditional low-rank based models, we can directly learn the cluster indicators. To avoid trivial solutions and approximate the low-rank constraint, the rank of all subspace simultaneously can be minimized as c denotes the Schatten p-norm which has a better relaxation than the nuclear norm [38]. For the noise matrix E, capped norm is used to improve the robustness. We define θ as a thresholding parameter for choosing the extreme data outliers, and then the capped norm of E can be formulated as ‖E‖ Capped � min E n i�1 min ‖E i ‖ 2 , θ . is function treats E i equally if ‖E i ‖ 2 is smaller than θ. Hence, it is more robust to outliers than L 2,1 -norm. Meanwhile, we derive an efficient optimization algorithm to solve ORLRS with a rigorous theoretical analysis. e main contributions of our paper are given as follows: ① Compared with traditional low-rank representationbased methods, ORLRS can obtain the clustering result directly by learning a subspace indicator matrix from the low-rank representation matrix without spectral clustering. is avoids the graph construction process in spectral clustering and makes the clustering process simpler. ② We introduced the capped norm into our model and formed a novel objective function for the gene expression data clustering task. Capped norm is used to constrain the noise matrix to improve the robustness of ORLRS. ③ Optimizing the objective function of ORLRS is a nontrivial problem, thus we derive a new optimization algorithm to solve the problem. Furthermore, we have also given a rigorous convergence analysis of ORLRS. e remainder of the paper is structured as follows. In Section 2, the proposed ORLRS is presented, and the theoretical analysis of the proposed method is provided. 2 Computational Intelligence and Neuroscience Experimental results are presented in Section 3. In Section 4, the conclusions are given.

Methods
We start with a brief introduction of several classical clustering methods. en, the proposed ORLRS is presented, and the optimal solution and convergence analysis of ORLRS is provided.
where ‖A‖ * � i σ i (A), i.e., the nuclear norm of A [33], can detect outliers with column-wise sparsity, D is a dictionary, and λ > 0 is a balance parameter.
A brief explanation of LRR subspace clustering process is provided as follows. Firstly, the low-rank problem is solved by equation (1). en, the optimal solution A * to equation (1) is used to calculate the affinity matrix by (|A * | + |(A * ) T |)/2, where |A * | is the absolute value function. Finally, the data are clustered by using spectral clustering [39].

One-
Step Robust Low-Rank Subspace Clustering. In this paper, we propose the one-step robust low-rank subspace clustering (ORLRS) method via discrete constraint and capped norm. Different from LRR, ORLRS was proposed for clustering the data by learning the indicators.
Suppose the data matrix X has c subspaces X 1 , X 2 , . . . , X c , the low-rank representation of each subspace needs to be optimized. In the clustering task, we want each subspace to belong to its own cluster. To obtain a low-rank representation of each subspace, the following formula should be computed: rank(X i ), which has trivial solution.
erefore, we need to solve the problem in another way. We define a cluster indicator matrix as C ∈ R c×n : C(i, j) � 1 if the j-th sample belongs to the i-th subspace, and C(i, j) � 0 otherwise. And, the c diagonal matrices are defined as I 1 , I 2 , . . . , I c ∈ I, where the diagonal elements of I i (1 ≤ i ≤ c) are formed by the i-th row of C and I is the identity matrix.
en, XI i can be represented as the i-th subspace of X. at is, rank(X i ) can be rewritten as rank(XI i ). We can get the clustering label in one step by directly optimizing I i [28].
Finally, the problem of the one-step low-rank subspace clustering method can be defined as where ‖XI i ‖ p S p is the Schatten p-norm of XI i . e clustering indicators of each subspace can be obtained from the optimized diagonal matrix I i | c i�1 directly. However, equation (2) is sensitive to data outliers in practical problems since it does not consider the noise in data. To address the robustness problem, we represent the gene expression data X ∈ R m×n with m genes and n samples as the addition of low-rank representation matrix A ∈ R m×n and the noise matrix E ∈ R m×n , i.e., X � A + E, which is the same strategy as in RPCA. Our one-step low-rank subspace clustering problem can be written as where λ > 0 is a balance parameter and ‖•‖ L indicates certain regularization strategy. Note that Schatten p-norm is used to approximate the low-rank problem in equation (3) since it is a better relaxation for the rank constraint problem than nuclear norm [38]. e Schatten p-norm In [38], the convergence of Schatten p-norm with 0 < p ≤ 2 is proved. Here, we set 0 < 2k ≤ 2 to guarantee the convergence of first term in equation (3). So, the range of k is 0 < k ≤ 1.
To seek a better robustness strategy for the outliers, we adopt capped norm to regularize the noise matrix E, i.e., ‖E‖ Capped . en, equation (3) becomes where θ > 0 is a thresholding parameter for choosing the data outliers. If the data point ‖E i ‖ 2 > θ, we consider E i as extreme outlier, and it is capped as θ. In this way, the influence of extreme outliers is fixed. For other data point i.e., the L 2,1 -norm. at is, if θ is set as ∞, ‖E‖ Capped is equivalent to ‖E‖ 2,1 . us, the capped norm is a more robust strategy than L 2,1 -norm.
As a result, ORLRS provides a more robust low-rank subspace clustering model by using capped norm. And, the clustering indicators of each subspace can be obtained from the optimized diagonal matrix I i | c i�1 directly. We will propose an efficient optimization algorithm to solve equation (4) in Section 2.3.

Optimization Algorithm.
e objective function equation (4) of the ORLRS is nonconvex, thus jointly optimizing A, E, and I i | c i�1 is extremely difficult. e augmented Lagrange multiplier (ALM) algorithm is used to optimize equation (4). e Lagrangian function of equation (4) can be written as Computational Intelligence and Neuroscience  3 min where Y is a Lagrange multiplier, μ > 0 is a penalty parameter, ‖•‖ F is the Frobenius norm, and g(Λ, I i | c i�1 ) encodes the constraints of I i | c i�1 . We rewrite equation (5) as follows: We divide equation (6) where B � X − E + (Y/μ).
and Lemma 1, the first term in equation (7) can be denoted as we convert the first term in equation (7) to en, equation (7) can be represented as Taking derivative w.r.t A and setting to zero, the above formula becomes where H � k(A T A) k− 1 . So, we can achieve the optimal A:

Fixing A and
Here, we can denote equation (6) as where F � X − A + (Y/μ). It can be easily verified that the derivative of equation (13) is equivalent to the derivative of where Equation (14) can be formulated as where O is a diagonal matrix with O ii � o i . e problem of equation (16) can be optimized by using the iterative reweighted optimization strategy. When fixing O, taking derivative w.r.t E and setting it to zero, the above formulation can be written as So, we can obtain the optimal E: When fixing E, the updating rule for O is as follows:

Fixing A and E to Optimize
We can rewrite equation (6) as Taking derivative w.r.t I i | c i�1 and setting to zero, the above formulation can be written as 4 Computational Intelligence and Neuroscience Since L i depends on I i , an iteration-based algorithm is used to obtain the solution of equation (21). Firstly, we calculate L i by using the current solution of I i . If L i is given, the solution of I i to the following objective function will satisfy equation (21): e current solution of I i can be updated according to the optimal solution to equation (22).
Denote that Z i � A T L i A, equation (22) can be written as Due to I i | c i�1 are n × n diagonal matrices, the above formulation becomes where r ci is the c-th diagonal element of matrix I i and z ci is the c-th diagonal element of matrix Z i . We can optimize equation (24) by e algorithm to solve the problem of ORLRS is summarized in Algorithm 1.

Convergence Analysis.
In this section, the convergence analysis of the proposed algorithm will be proved. Theorem 1. At each iteration, the updating rule in Algorithm 1 for matrix A while fixing others will monotonically decrease the objective value in equation (4) when 0 < k ≤ 1.
Proof. It can be verified that equation (12) is the solution to the following problem: s.t.X � A + E. (26) en, at the t iteration at is, Equation (28) can be converted to according to Lemma 2 in [38].
□ Lemma 2. For any positive definite matrices P, P t ∈ R m×m , the following inequality holds when 0 < p ≤ 2.
Note that, here, we set 0 < k ≤ 1, so equation (30) is equivalent to en, we have Combining equations (29) and (32), we have at is to say, us, the updating rule for matrix A in Algorithm 1 will not increase the objective value of the problem in equation (10) at each iteration t when 0 < k ≤ 1.

Theorem 2.
At each iteration, the updating rule in Algorithm 1 for matrix E while fixing others will monotonically decrease the objective value in equation (4).
Proof. We fist prepare the following lemma in [37].
otherwise, , we have the following inequality: It can be verified that equation (18) is the solution to the following problem: Suppose the updated E in Algorithm 1 is E while fixing others. Since E is the optimal solution to equation (4), we have Computational Intelligence and Neuroscience According to the definition of O ii in equation (19) and Lemma 3, we have Summing over equations (37) and (38) at both sides, we can obtain erefore, at each iteration, the updating rule in Algorithm 1 for matrix E while fixing others will monotonically decrease the objective value in equation (4).

Theorem 3.
At each iteration, the updating rule in Algorithm 1 for I i | c i�1 while fixing others will monotonically decrease the objective value in equation (4) when k � 1.
Proof. It can be easily verified that equation (25) is the solution to the following problem: Assume the updated I i in Algorithm 1 is I i . Since I i is the optimal solution to the equation (22), we can have According to the definition of L i in Algorithm 1, equation (41) can be written as According to the Cauchy-Schwarz inequality, it can be proved that, when p � 1, we have us, combining inequations (42) and (43), we can obtain Equation (44) indicates that the updating rule in Algorithm 1 for I i | c i�1 while fixing others will monotonically decrease the objective value in equation (4) during the iteration until the algorithm converges when k � 1. In practice, the algorithm is also converged when 0 < k < 1. If the objective function of equation (40) is changed to , the convergence is also observed [28].
As a result, the objective of equation (4) is nonincreasing under the updates of A, E, and I i | c i�1 according to eorems 1-3, respectively. erefore, the iteratively updating Algorithm 1 converges to a local optimal.
□ Input: data matrix: X ∈ R m×n , number of subspace c, the low-rank constraint parameter k, balance parameter λ, threshold parameter θ. Initialize: A � X, E � 0, Y � 0, μ � 10 − 6 , ρ � 1.1, max μ � 10 10 , ε � 10 − 8 , O as the identity matrix, I i | c i�1 such that the discrete constraints in equation (16) are satisfied. Output: the optimal I i | c i�1 for the i-th cluster. while not converge do (1) Fix the others and update A: (2) Fix the others and update E: (3) Fix the others and update where the c-th diagonal element r ci of matrix I i is updated by equation (33).  Computational Intelligence and Neuroscience

Complexity Analysis.
In Algorithm 1, the most complicated calculations are L i � k‖AI i ‖ k S p (AI 2 i A T ) (k− 2)/2 and Z i � A T L i A in Step 3. We suppose m > n in the low-rank representation matrix A ∈ R m×n . Firstly, L i needs to be computed. Denoting the SVD of AI i is UΣV T . Computing
Following [28,[45][46][47], clustering accuracy (ACC) is a widely used evaluation method for tumor clustering. Given a data point x i , suppose N i as the target label and T i as the truth label. ACC can be denoted as [45] ACC where φ(x, y) � 0 if x ≠ y and φ(x, y) � 1 if x � y, map(N i ) maps N i to the equivalent label from the raw data and n is the number of tumor samples. We also evaluate the clustering performance by normalized mutual information (NMI) [48]. NMI is defined as where M(S, C) is the mutual information function between the true class label C and the clustering label S and H(·) is the entropy function. e larger the NMI value is, the better the clustering result is.

Gene Expression Data Sets.
A brief introduction of six gene expression data sets is presented, and the detailed information of these datasets is summarized in Table 1.
Leukemia data contain 25 cases of AML and 47 cases of ALL. It is packaged into a 7129 × 72 matrix [42]. DLBCL data consist of 5469 genes and 77 samples. ese samples include 58 patients of diffuse large B-cell lymphoma (DLBCL) and 19 patients of follicular lymphomas (FL) [43].
e colon cancer data [44] consists of a matrix that

Comparison Algorithms.
We compare LRS [28], Ext-LRR [32], RPCA [3], PLRR [47], robust LRR [33], LatLRR [49], robust NMF [50], and K-means [51] with the proposed method for tumor clustering. In these methods, LRS is the basic version of our method to implement the one-step clustering, and Ext-LRR is a simpler and more effective extension work compared with LRS; RPCA is a classic robust learning algorithm; PLRR (projection LRR) is one of the latest subspace clustering methods for tumor sample clustering; Robust LRR and LatLRR are the best state-of-art lowrank subspace segmentation algorithms; robust NMF is a classic NMF-based method and is widely used for tumor clustering. K-means is the most commonly used clustering method and is embedded into many methods including PLRR, robust LRR, and LatLRR to achieve better performance. Since our proposed method is a novel one-step robust low-rank subspace clustering model, we choose these methods as our comparison algorithms.

Parameter Setting.
Since gene expression data have the characteristics of high-dimensional and small samples, we use PCA to perform dimensionality reduction. And, we use the K-means method to initialize I i | c i�1 in the proposed ORLRS. Here, three parameters, i.e., threshold parameter θ, balance parameter λ, and low-rank constraint parameter k, need to be determined. In the experiment, we investigated one parameter by fixing the other two parameters. Since the initialization of I i | c i�1 will bring some uncertainty, the proposed ORLRS method run 100 times, and the average of the accuracies of 100 times is reported. e choices of parameters in the following are heuristic and might not be the best for tumor clustering.

Determination of
reshold Parameter θ. In the ORLRS model, data outliers are not heuristically determined based on the magnitude. ey are selected during the Computational Intelligence and Neuroscience  Genes  Samples  Classes  Leukemia  AML, ALL  7129  72  2  DLBCL  DLBCL, FL  5469  77  2  Colon cancer  Tumor, normal  2000  62  2  Brain_Tumor1 Medulloblastoma, malignant glioma, normal cerebellum, AT/RTs, PNETs  5920  90  5  Brain_Tumor2  CG, CAO, NCG, NCAO  10367  50  4  9_Tumors NSCLC, colon, breast, ovary, renal, leukemia, melanoma, prostate, CNS 5726 60 9   optimization process. e data outliers may be distinct at different iterations (with the same thresholding parameter), while we iteratively optimize the objective function of ORLRS method. When the algorithm converges, likely correct extreme data outliers can be found. So, we just need to determine one value of θ for each data set. Figure 1 presents the results of ORLRS with different θ. Since the gene expression levels in different data are very different, the values of extreme data outliers are also very different. So, the value of θ has a large range in six data sets. From Figure 1, we can observe that ORLRS can obtain the best performance in the case of θ � 10 3 , 10 5 , 40, 90, 400, 10 6 in Leukemia data, DLBCL data, Colon cancer data, Brain_Tumor1 data, Brain_Tumor2 data, and 9_Tumors data, respectively. e results indicate that the value of θ should be determined appropriately. If the value of θ is too large, we will miss some extreme outliers. If the value of θ is too small, some important information may be removed, thereby affecting the clustering performance. Figure 2 presents the results of ORLRS with different λ. ORLRS can obtain the best results in the case of λ � 0.6, 1, 1, 1, 0.9, 1.1 { } in Leukemia data, DLBCL data, Colon cancer data, Brain_Tumor1 data, Brain_Tumor2 data, and 9_Tumors data, respectively. According to the experimental results in each data set, before reaching the best results, the clustering accuracies showed an overall upward trend when λ increases; after achieving the best results, the clustering accuracies showed an overall downward trend when λ increases. So, we suggest a rough range 0.1 ≤ λ ≤ 2 on the choice of λ.

Determination of Schatten P-Norm Parameter k.
Since the algorithm is converged for the Schatten p-norm parameter 0 < k ≤ 1, we determine the value of k in this range. Figure 3 presents the results of ORLRS with different k. ORLRS can achieve the best performance in the case of k � 1, 1, 1, 1, 0.7, 0.4 { } in Leukemia data, DLBCL data, Colon cancer data, Brain_Tumor1 data, Brain_Tumor2 data, and 9_Tumors data, respectively. So, a general guidance is given 0 < k ≤ 1 on the choice of k.

Experimental Results.
In this section, experimental results of our proposed method and six comparison algorithms, i.e., LRS, Ext-LRR, RPCA, PLRR, robust LRR, LatLRR, robust NMF, and K-means, are reported. ORLRS, LRS, and Ext-LRR use K-means to initialize the indicator matrix I i | c i�1 . PLRR, robust LRR, and LatLRR use the normalized cuts method to segment data, which cluster data   points by using the K-means method. For robust NMF method, we initial the coefficient matrix and basis matrix randomly. To avoid randomness, we run all methods 100 times, and the mean and standard error results of the clustering accuracies of 100 times are shown in Table 2. e best result of each data is indicated in bold. Based on the results reported in Table 2, we have the following observations and discussions. ORLRS extends LRS by adding a noise matrix into the objective function to enhance the robustness, which contributes to the observation that ORLRS outperforms LRS. From the results shown in Table 2, it can be observed that ORLRS achieves generally 8%-19% higher performances than LRS in terms of the clustering accuracy on four data sets, i.e., Leukemia data, DLBCL data, Colon cancer data, and Brain_Tumor1 data. On Brain_Tumor2 and 9_Tumors data sets, ORLRS has a slightly better performance than LRS. ORLRS has better results than Ext-LRR on all datasets. Compared to the three classical low-rank based methods, PLRR, robust LRR, and LatLRR, the clustering accuracy of ORLRS is 1%-9% higher on all six data. e main reason is that we use capped norm to remove the extreme outliers in the noise matrix and Schatten p-norm to better approximate the low-rank representation. Compared with traditional clustering methods, RPCA, robust NMF, and K-means, ORLRS achieves outstanding results on all of the six data sets.
e NMI results on five gene expression data sets are shown in Table 3. e best result of each data is indicated in bold. Due to the NMI results of all the methods on colon data is less than 0.1, we only reported the results of remain five data sets. From Table 3, we can observe that ORLRS has better results on all the five data sets than PLRR, robust LRR, LatLRR, robust NMF, RPCA, and Ext-LRR. Except on 9_Tumors data set, our method outperforms LRS and K-means on the other four data sets.

Convergence Curves and Running Time.
We plotted the convergence curves of our ORLRS on different datasets. e convergence curves can be found in Figure 4. It shows that our method can converge around the 10-th iteration on all six data sets. In Table 4, we also reported the running time of ORLRS on six gene expression data sets without dimensionality reduction by PCA. We implement our experiment with MATLAB R2020b on an ordinary computer, which is configured with Intel i9-10900 KF (up to 3.70 GHz) cores, 8 GB RAM, and Windows 10 operating system.

Conclusions
In this paper, a novel one-step robust low-rank subspace clustering method (ORLRS) is proposed for tumor clustering, where the gene expression data set is represented by a low-rank matrix and a noise matrix. By using the Schatten p-norm and discrete constraint, low-rank representation of each subspace can be well obtained. Different from traditional low-rank-based methods, such as LRR and LatLRR, ORLRS learns indicators directly and perform clustering process in one step by using the discrete constraint. Capped norm is used to improve the robustness of ORLRS since it can effectively remove the extreme data outliers in the noise matrix. Furthermore, we propose an efficient algorithm to solve the proposed subspace clustering model, and the convergence of the proposed algorithm is proved. We thus can discover the clusters of tumor data depending on the optimal cluster indicators. We tested the proposed ORLRS method on six tumor data.
e results are proved that ORLRS is an excellent method for clustering tumor sample.
ere remain several interesting directions for future work. First, it might be better to learn a dictionary for ORLRS since some low-rank subspace segmentation methods achieve significant improvements by learning a dictionary. Second, ORLRS may be extended to solve other problems, such as matrix recovery and classification. ird, ORLRS may be employed in other applications, such as gene clustering and coclustering.

Data Availability
e data used to support the findings of this study are available from the first author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.