Guarantees of Fast Band Restricted Thresholding Algorithm for Low-Rank Matrix Recovery Problem

Affine matrix rank minimization problem is a famous problem with a wide range of application backgrounds. +is problem is a combinatorial problem and deemed to be NP-hard. In this paper, we propose a family of fast band restricted thresholding (FBRT) algorithms for low rank matrix recovery from a small number of linear measurements. Characterized via restricted isometry constant, we elaborate the theoretical guarantees in both noise-free and noisy cases. Two thresholding operators are discussed and numerical demonstrations show that FBRTalgorithms have better performances than some state-of-the-art methods. Particularly, the running time of FBRT algorithms is much faster than the commonly singular value thresholding algorithms.


Introduction
e affine matrix rank minimization (AMRM) problem, which is to recover a low-rank matrix from only a small number of linear measurements, can be described as the following optimization problem: min X∈C n 1 ×n 2 rank(X) where A ∈ L(C n 1 ×n 2 , C d ) is a given linear map, b ∈ C d is a given vector, and η ≥ 0 is error tolerance. is problem has got much attention in recent years and many applications arising in various areas can be captured by solving model (1), for example, matrix completion [1,2], background modeling [3], subspace clustering [4], phase retrieval [5,6], image inpainting [7], and other applications [8,9]. e lines and the symbols in all figures are too small to see. It is better to enlarge them. Unfortunately, ARMR problem is known to be NP-hard. erefore, without further assumptions on A and X, solving this problem would be computationally intractable. To overcome this shortcoming, many researchers have focused on replacing rank(X) with other penalties G(X), such as Schatten p-norm (0 < p < 1) [10], Minimax Concave Plus (MCP) [11], Smoothly Clipped Absolute Deviation (SCAD) [12], Logarithm [13], Geman [14], and Laplace [15], and considered the following optimization problem: min X∈C n 1 ×n 2 G(X) s.t. ‖A(X) − b‖ 2 ≤ η, (2) or the corresponding unconstrained optimization problem: min X∈C n 1 ×n 2 where λ > 0 is a regularization parameter. Model (3) can be transformed into a fixed point problem described as where A * is the dual operator of A, H is the thresholding operator, and s is a step size parameter. Naturally, we have corresponding iterative thresholding algorithm. Given a fixed penalty G(·), there have been many theoretical guarantees. However, there are still some challenges as follows: (I) Most of convergence results were developed for model (3) with fixed λ, and it is difficult to choose an appropriate parameter λ (II) It is hard to design the thresholding operator H(·) for penalties G(X) except some special penalties, such as Schatten p-norm (p � 0, 1, 1/2, 2/3) [2,[16][17][18] and SCAD [12] (III) Every iteration of the iterative thresholding algorithms needs to calculate the SVD, and if the size of the original matrix is large, the iterative thresholding algorithms will be time consuming Two papers that relate to our work need to be overviewed here. Chartrand [19] firstly studied the way to compute the penalty G(·) via a given thresholding operator. However, his work did not show the stability guarantees of thresholding algorithms. Wei et al. [20] proposed an iterative hard algorithm with subspace projections, which had relatively lower computation complexity. However, his work did not examine the performance in the noisy case. Taking into account these work, we try to design a new algorithm and overcome these shortcomings to a certain extent.
In this paper, we design a family of new fast band restricted thresholding (FBRT) algorithms to solve model (1) and elaborate the theoretical guarantees in both noise-free and noisy cases. In fact, many related works [17,18,21,22] focus on the stability of the optimal solution of the model. For the effect of noise on the algorithm, an empirical explanation is often given through numerical experiments. At the end of this paper, we give some simple examples to roughly verify our theoretical results. e remainder of this paper is organized in the following. In Section 2, we introduce some basic preliminary results and notations. In Section 3, we describe the fast band restricted thresholding (FBRT) algorithm. In Section 4, we prove the convergence of the FBRT algorithm. In Section 5, we display the numerical simulations, and then conclude this paper in Section 6. All the proofs are presented in Appendix.

Preliminaries
In this section, we introduce some notations and useful results to facilitate the following research.

Notations.
We denote A as a linear operator from C n 1 ×n 2 to C d . For any is a Frobenius inner product. us, we naturally obtain the dual operator A * as follows: We also denote X as the original matrix with rank(X) � r and e � b − A(X) as the noise with error tolerance η � ‖e‖ 2 . For any subspaces Ω 1 , Ω 2 ⊂ C n 1 ×n 2 , we obtain the sum of subspaces Ω 1 + Ω 2 , and we also rewrite Ω 1 ∩(Ω 2 ) ⊥ as Ω 1 \Ω 2 .

Some Useful Results.
In this section, we introduce some useful results, which show the relationship between the original low-rank matrix and the solution of model (1). For this purpose, we review the definition of RIC, which plays an important role in the following research.
Definition 1 (see [23]). Given a linear operator A: C n 1 ×n 2 ⟶ C d and 1 ≤ r ≤ min m, n { }, the restricted isometry constant δ r is the smallest nonnegative number, such that, for any matrix X ∈ C n 1 ×n 2 , with rank(X) ≤ r, we have the following inequality: where ‖·‖ F is the Frobenius norm. For a given linear map A, it is difficult to calculate the restricted isometry constant δ r , but for Gaussian random linear map, there has been a result that the number of measurements d ≥ c 0 r(n 1 + n 2 )log(n 1 n 2 ) (c 0 only depends on δ) is sufficient to yield an RIC of δ with high probability [23].
In the noisy case, the following theorem claims that the solution of model (1) cannot be far of from the original low-rank matrix, if A satisfies a certain RIC. e related theorem about compressive sensing is mentioned in [24], see equation (5.24). Theorem 1. Let Z, Z η be two solution of (1) within error tolerance η (i.e., ‖A(Z) − b‖ 2 , ‖A(Z η ) − b‖ 2 ≤ η). If rank(Z) � rank(Z η ) � r and δ 2r < 1, we can obtain a stability claim of the form If we take η � 0 in eorem 1, we obtain the following uniqueness result, in a noise-free case.
then Z is the unique lowest rank solution.
Taking into account the abovementioned results, it implies that the original low-rank matrix X is the solution of model (1), if A satisfies a certain RIC.

Fast Band Restricted Thresholding Algorithm
In this section, we will design a new fast band restricted thresholding (FBRT) algorithm. For this purpose, we need to review some thresholding algorithms for (1) in previous work and compare FBRT algorithm with other algorithms.
Similar to compressive sensing problem, both iterative hard thresholding and iterative soft thresholding, also known as SVP [16] and SVT [2], are simple and efficient algorithms for low-rank matrix recovery. In pursuit of better results, some alternative algorithms, such as HFPA [22], thresholding function for Schatten 2/3-norm [18], SCAD [12], and firm thresholding [25], have been proposed. We present these commonly used thresholding functions in Figure 1, and it is obvious that all of them satisfy the following definition.
According to Definition 2, the following Table 1 shows the corresponding band parameter c of these BRT functions.
Given a BRT function h τ , we can obtain a thresholding operator H τ (·) which is defined as where i�1 σ i u i v * i is the SVD of X, and we can also have the corresponding thresholding algorithm described as where τ is the threshold parameter. In fact, every iteration of the abovementioned algorithm needs to compute the SVD of W (k) . us, if N � (n 1 + n 2 ) is large, this algorithm will be computationally expensive. To improve the efficiency of algorithm, we propose to orthogonally project W (k) onto a subspace S k , which was first applied in [20]. S k , which is the tangent space of the rank r matrix manifold at X (k) , is described as where U k ∈ C n 1 ×r and V k ∈ C r×n 2 are the left and right singular vectors of X (k) . Furthermore, for any matrix Z ∈ C n 1 ×n 2 , P S k (Z) is defined as Meanwhile, we derive the fast band restricted thresholding (FBRT) algorithm described as in Algorithm 1.

Remark 1.
e threshold parameter τ plays an important role which affects the performance of FBRT algorithm. Here, we let τ � σ k r+1 , and we will show the theoretical performance of FBRT algorithm in the following discussion.

Remark 2.
e stopping criterion is ‖A(X (k) ) − b‖ 2 ≤ ε. Taking into account eorem 1, X (k) cannot be far of from X, if A satisfies a certain RIC. On the contrary, it is an important situation of classified discussion that X (n) is close to X, and the detail will be shown in the following section.
It is worth noting that is always a full rank matrix and computing the SVD will use O(N 3 ) (N � (m + n)) floating point operations flops. In the mean time, W (k) in every iteration of FBRT algorithm is a (2r)-rank matrix, and according to the QR factorization, we can obtain that where W (k) � QR and R * � Q 1 R 1 are the QR factorizations of W (k) and R * , with Q ∈ C n 1 ×2r , R ∈ C 2r×n 2 , R 1 ∈ C 2r×2r , and Q 1 ∈ C n 2 ×2r . us, the SVD of W (k) can be obtained from the SVD of R 1 , and computing the SVD of W (k) , will use O(r 3

Analysis of the FBRT Algorithm
In this section, we will study theoretically the performance of the FBRT algorithm, and the following eorem 2 and 3 show the theoretical guarantees in both noise-free and noisy cases.
If the thresholding operator in the FBRT algorithm is the BRT operator with a band parameter c and the following constant is less than 1, we have Particularly, ρ < 1 can be satisfied if e proof of eorem 2 is given in Appendix, and we discuss condition of eorem 2, which shows the performance of the FBRT algorithm in the AMRM problem.
Remark 3. According to eorem 2, the performance of the FBRT algorithm depends on σ max (X)/σ min (X), i.e., the condition number of the original r-rank matrix X. It plays an important role for the projection operator P S n .
In the noisy case, the error tolerance η is a significant parameter. According to eorem 1, we know that there exists a gap between the solution of model (1) and the original low-rank matrix. us, we always assume that the model error is small.
where rank(X) � r and η � ‖e‖ 2 is error tolerance. Let the sequence X (n) be defined by the FBRT algorithm with ‖A(X) − b‖ 2 ≤ η. If the thresholding operator in the FBRT algorithm is the BRT operator with a band parameter c and A satisfies then the sequence X (n) must satisfy one of the following result: (i) ere exists a positive integer n, such that where c ′ � 4(μ + 1) where ρ � (μ/(μ + 1)) < 1.
e proof of this theorem is also presented in Appendix. First of all, if δ 3r satisfies the condition of eorem 3, δ 3r ≤ μ/(μ + 1) < 1. Besides, when the error tolerance η is not too large, the result of eorem 3 illustrates that there exists a positive integer n, such that X (n) will be close to the original matrix X. Since A is a bounded linear operator, it implies that A(X (n) ) is close to A(X). In the mean time, the fact that A(X (n) ) is close to A(X) also implies that X (n) is close to the original matrix X, according to eorem 1. On the contrary, the parameter μ in this theorem is a key parameter, and performance of algorithm will get better as ‖X‖ F − μη is close to 0. Meanwhile, the performance of algorithm also depends on the condition number of the original r-rank matrix X, which is similar to eorem 2.

Numerical Demonstration
In this section, we present some empirical observations of FBRT algorithms with two thresholding operator: and compare them with some state-of-the-art methods (singular value thresholding (SVT) algorithm [2], singular value pursuit (SVP) algorithm [16], half norm fixed point (Half ) algorithm [22], and Riemannian gradient descent (RGrad) algorithm [20]). Here, we define two quantities to quantify performance of the algorithm:SR ≔ d/n 1 n 2 , i.e., the number of measurements divided by the number of entries of the matrix, which denotes the sampling ratio, andOS ≔ d/(r(n 1 + n 2 − r)) is the oversampling ratio, i.e., the ratio between the number of sampled entries and the "true dimensionality" of an n 1 × n 2 matrix of rank r. In fact, if OS < 1, we cannot recover the original low-rank matrix because there are always an infinite number of matrices of rank r with the given entries [26]. We use the relative error (RE) in the Frobenius norm to evaluate the closeness of X opt to M, where X opt is the "optimal" solution of the algorithm and M is the original low-rank matrix.

Empirical Phase Transition.
In this section, we test how many measurements are needed to recover a low rank matrix. For the sake of simplicity, we set n � m and generate rank r matrix X � LR, where L ∈ R n×r and R ∈ R r×n and the components of L and R is sampled from the standard normal distribution. e stopping criterion is as follows: Simulations of FBRT algorithms are repeated for 10 times, and numerical results are reported in Table 2. We consider an algorithm to successfully recover the low rank matrix X if the "optimal" solution of this algorithm with RE ≤ 0.01. Furthermore, we denote r min as the largest rank such that the corresponding max RE ≤ 0.01 and r max as the smallest rank such that the corresponding min RE ≥ 0.01. OS max is computed via r min , and OS min is computed via r max . Figure 2 shows the empirical of the tested algorithms on the (SR and OS), where we calculate oversampling ratio OS via r � ((r min + r max )/2). Table 2 and Figure 2 show that these two FBRT algorithms can affect recover rank r matrices, where OS is close to 1, and there is not much difference between two empirical phase transition curves.

Comparison with the State-of-the-Art Algorithms in Noise-Free Case.
In this section, we consider the noise-free case and test the performances of FBRT algorithms on matrix completion problems. We also generate a low rank matrix in the way of Section 5.1. e stopping criterion is as follows: Simulations are repeated for 10 times, and numerical results are reported in Tables 3 and 4 and Figure 3.
In Table 3, we show the comparison experiments with SVT, SVP, and Half. We conduct the tests for n ∈ 400, 1000 { }, r � 50, and OS � 2 and record the average, maximum, and minimum values of relative errors and running time, respectively. Based on Table 3, we can obverse that the relative errors of FBRT algorithms are smaller, and the running time is also faster. In the meantime, the running time of FBRT algorithms increases slowly as n increases. We explore the trends, displayed in Figure 3, between the relative error and the running time of the first time algorithm operation results. In Table 4, we show the comparison experiments with RGrad. We conduct the tests in two cases: n � 100, r � 19, SR � 0.4 and n � 1000, r � 50, OS � 2. Based on Table 4, we see that the running time of different Table 2: Numerical results of two FBRTalgorithms: FBRT-Atan and FBRT-firm (a � 3) with fixed m � n � 300. For any SR, the maximum of 10 relative errors is less than 0.01 when r ≤ r min and the minimum of 10 relative errors is greater than or equal to 0.01 when r > r max .   algorithms is about the same, but the relative errors of FBRT algorithms are smaller than the relative error of RGrad. erefore, we can find that FBRT algorithms perform better than others in noise-free case.

Comparison with the State-of-the-Art Algorithms in Noisy
Case. In this section, we consider the noisy case and compare the performances of FBRT algorithms with RGard algorithm on the image inpainting problems. Here,    we test these algorithms on the grayscale image: 419 × 400 intracranial venous image (IVI), and we obtain approximated low-rank image truncated from IVI with rank r � 35. We generate the noised image with normal distribution by imnoise image, "gaussian", 0, σ 2 , where σ 2 is the variance of normal distribution, and generate the sample image from noised image with SR � 0.4. We consider the variance σ 2 ∈ 0.01, 0.001 { } and iterations are 150 and 550, respectively. e original image, its approximated low-rank image, its noised image, and its recovering images of different algorithms and displayed in Figure 4, respectively. Figure 4(a) is original IVI with full rank. Figure 4(b) is a approximated low-rank image truncated from Figure 4(a) with rank r � 35. Figure 4(c) is a noised image of Figure 4(b) with σ 2 � 0.01. Figures 4(d)-4(f ) are the recovering image via different algorithms, respectively. Numerical results for image inpainting are reported in Table 5. Comparing these comparison experiments, we find that FBRT algorithms perform much better that RGard algorithm in image inpainting noisy case.

Conclusions
In this paper, we proposed the FBRT algorithm and developed the theoretical guarantees in both noise-free and noisy case. e numerical demonstration showed that this algorithm is effective. However, it is important to estimate error tolerance η, which has a great influence on FBRT algorithm. In the low SNR cases, more priori assumptions of the original low rank matrix are necessary, and we will study it in our future work. On the contrary, the condition number of the original matrix is also a significant parameter. As we know, phase retrieval [5,6] and blind deconvolution [27,28] problems can be transformed into a rank one matrix recovery problem. Since the condition number of a rank one matrix is always equal to one, it may be worth to study phase retrieval and blind deconvolution problems via the FBRT algorithm.

Proof of Theorems 2 and 3
Now, let us turn to the proof of eorems 2 and 3. Before eorems 2 and 3, we need to denote some notations and introduce some lemmas.
We denote I, I n , and I n + as subspaces of C n 1 ×n 2 , which are described, respectively, as I ≔ UV * : U ∈ C n 1 ×r , I n ≔ U V n * : U ∈ C n 1 ×r , where V ∈ C r×n 2 is the right singular vectors of X, V n ∈ C r×n 2 is the right singular vectors of X (n) , and V n+ ∈ C (r+1)×n 2 is the right singular vectors corresponding to the first r + 1 singular values of W (n) . According to FBRT algorithm, it holds that I n ⊂ I n + . For any matrix M ∈ C n 1 ×n 2 and matrix subspaces I, J ⊂ C n 1 ×n 2 , M I represents P I (M), where P I (·) is an orthogonal projection onto I, and we also denote I + J as the sum of two subspaces and rewrite I∩(J) ⊥ as I\J.
Lemma A.1. Let σ n r+1 be the (r + 1)th singular value of the matrix W (n) . en, Proof. Because of the definitions of σ n r+1 and I n + , it holds that where the second equality is based on the definition of I n + . In the mean time, we project X onto a subspace I n + and obtain X I n + described as us, rank(X I n + ) ≤ rank(X) � r, and we obtain a subspace ker(X I n + ) with dim(ker(X I n + )) ≥ n 2 − r. Furthermore, it holds that Let e ∈ C d and S be a linear subspace of C n 1 ×n 2 .
Proof. In fact, it holds that and divide through by ‖(A * (e)) S ‖ F to complete the proof.
To the second inequality, we have and divide through by ‖((I d − A * A)(M 2 )) I ‖ F to complete the proof.
□ Lemma A.4 (see [20]). Let X (l) � U l Σ l V * l be a rank r matrix and S l be the tangent space of the rank r matrix manifold at X (l) . Let X be another rank r matrix. en, Lemma A.5 (see [20]). Let X (l) � U l Σ l V * l be a rank r matrix with the tangent space S l . Let X be another rank r matrix. en, the Frobenius norm of P S l A * A(I − P S l )(X) can be bounded as Lemma A.4 and A.5 can be found in [20] and the detail proofs are omitted here. Based on lemmas above, we start to prove eorems 2 and 3.
Proof of eorem 2. To prove inequality (15), we just need to prove First of all, we have the following inequality: (A.14) since the space (C n 1 ×n 2 , ‖ · ‖ F ) is an Euclidean space and (X (n) ) I\I n � 0. In the following, we will bound J 1 , J 2 , and J 3 one by one.
According to the property of the orthogonal projection, we obtain J 1 ≤ 2‖(W (n) − X) I n + +I ‖ 2 F . Besides, bound of J 3 begins with the following equations:

Mathematical Problems in Engineering
Since the definition of I n , (W (n) ) I n is the best rank r approximation of W (n) . us, it implies that where X I n \I � 0. In addition, the thresholding algorithm is a kind of algorithm for singular value of the matrix. It is easy to obtain that (A.17) Because the thresholding function is a BRT function with a band parameter c, it holds that J 2 ≤ 2c 2 r σ n r+1 2 ≤ 2c 2 r W (n) where the second inequality follows from Lemma A.1. erefore, we have In the meantime, we have W (n+1) � X (n) + P S n A * (b − A(X (n) )). us, it implies that W (n+1) − X � � � � � � � � � � F � X (n) − X + P S n A * b − A X (n) � � � � � � � � � � F � X (n) − X + P S n A * A X − X (n) � � � � � � � � � � F ≤ X (n) − X − P S n A * AP S n X (n) − X � � � � � � � � � � F + P S n A * A I − P S n X (n) − X � � � � � � � � � � F ≤ P S n − P S n A * AP S n X (n) − X � � � � � � � � � � F where the second equality follows from the fact (I − P S n )(X (n) ) � 0. Bound of J 1 ′ : it is a fact that RIC can be represented as