On Randomized Sampling Kaczmarz Method with Application in Compressed Sensing

We propose a randomized sampling Kaczmarz algorithm for the solution of very large systems of linear equations by introducing a maximal sampling probability control criterion, which is aimed at grasping the largest entry of the absolute sampling residual vector at each iteration. +is new method differs from the greedy randomized Kaczmarz algorithm, which needs not to compute the residual vector of the whole linear system to determine the working rows. Numerical experiments show that the proposed algorithm has the most significant effect when the selected row number, i.e, the size of samples, is equal to the logarithm of all rows. Finally, we extend the randomized sampling Kaczmarz to signal reconstruction problems in compressed sensing. Signal experiments show that the new extended algorithm is more effective than the randomized sparse Kaczmarz method for online compressed sensing.


Introduction
In this paper, we mainly consider the iterative solution of large-scale consistent linear systems of the form where A ∈ C m×n , b ∈ C m , and x ∈ C n denotes the n-dimensional unknown vector. e Kaczmarz algorithm [1] is a typical row-action method [2][3][4][5] for solving such a linear system (1) and is known as algebraic reconstruction technique in the field of computerized tomography [6,7] and image reconstruction [5,[8][9][10]; see also [11][12][13] for additional references. More specifically, the (t + 1)th iterative vector xt + 1 is generated by the following Kaczmarz updates: , t � 0, 1, 2, . . . , (2) where A (i t ) denotes the i t th row of the matrix A, (·) * denotes the conjugate transpose of the corresponding vector or matrix, b (i t ) denotes the i t th entry of the vector b, and i t = (t mod m) + 1. us, the Kaczmarz algorithm is very suitable for the solution of very large linear equations since the main computation in the procedure of this algorithm is an inner product; however, this algorithm sometimes converges very slowly, see [14,15] and the references therein. In order to improve the convergence of this algorithm, in 2009, Strohmer and Vershynin [16] proposed the randomized Kaczmarz (RK) algorithm with an expected exponential rate of convergence by using the rows of the coefficient matrix A in a random manner rather than in a given order, i.e., the RK chooses row i t ∈ 1, 2, . . . , m { } with probability Recently, Bai and Wu proposed a different but more effective probability criterion and, based on it, they constructed the greedy randomized Kaczmarz (GRK) algorithm for solving the linear system (1) in [17].
is probability criterion makes the corresponding GRK algorithm to converge significantly faster than the RK method in both theory and experiments. We refer to [15,18,19] for more

The RK and GRK Algorithms
In this section, we briefly introduce the RK and GRK algorithms for solving systems of linear equations. First, let us give some necessary explanations. In this paper, we use E t to denote the expected value conditional on the first t iterations, that is, where i l (l � 1, 2, . . ., t) is the i l th row chosen at the lth iterate. en, it is easy to get that E[E t [.]] � E[.] (see, e.g., the work of Bai and Wu [17]). When the linear system (1) is consistent and its coefficient matrix A ∈ C m×n (m ≥ n) is of the full-column rank, Strohmer and Vershynin [16] proposed the randomized Kaczmarz algorithm, as given in Algorithm 1.
e RK method is convergent to the unique least-norm solution of the consistent linear system (1) when the coefficient matrix A is of the full-column rank with m ≥ n [16], and later in [23], Ma, Needell, and Ramdas gave the same convergence rate of the RK algorithm, but for the case that m < n. Compared with the Kaczmarz method, we can find, according to probability criterion (3), that the order in which the RK method selects the action row is based on the probability corresponding to the size of each row norm. It can improve the convergence of the Kaczmarz method discussed in [1].
In [17], Bai and Wu proposed a new probability criterion (i.e., Steps 2-5 of Algorithm 2) for the RK algorithm, and based on it, they constructed a greedy randomized Kaczmarz algorithm, as given in Algorithm 2, which converges faster than the RK algorithm.
According to Algorithm 2, we can find that, at the tth iterate, the corresponding residual vector is then the GRK algorithm is to make the ith row be selected with a larger probability than the jth row. Actually, only some entries of r t whose relative residual is greater than a threshold are nonzero. Convergence property of the GRK method has been studied by Bai and Wu in [17]. In addition, from [17], we can find that if the GRK method is convergent, then it must converge to the least-norm solution x * � A † b of the linear system (1), where A † indicates the Moore-Penrose pseudoinverse of the matrix A.

e Randomized Sampling Kaczmarz Method.
In this part, a new algorithm will be given. rough the contents of the previous sections, we note that at each iteration of GRK, it will cost a lot of money to calculate the residual vector r t . If the dimension of the matrix is too large relative to the memory of the computer, the performance of the GRK method will be greatly reduced. In response to this situation, we propose a new method to solve the linear system (1). e purpose of our method is to make full use of the residual information as the GRK method does, while avoiding calculating the residual r t at each iteration. Let A � (a ij ) ∈ C m×n in the linear system (1) with 1 ≤ i ≤ m. Our method is carried out as follows: first, according to uniform distribution, randomly select k rows i 1 , i 2 , . . ., i k from the m rows of the coefficient matrix A and put them in a set V t � i 1 , i 2 , . . . , i k ; second, calculate the relative residuals of all corresponding rows in the set V t ; then, select the row with the largest absolute relative residual value (i.e., max((|b (i l ) − A (i l ) x t |)/‖A‖ F ), i l ∈ i 1 , i 2 , . . . , i k ) to implement the Kaczmarz update in (2). e RSK algorithm proposed in this paper is expressed in Algorithm 3.
According to Algorithm 3, we can see that the selection of the appropriate number of rows k plays an important role in the RSK algorithm. More specifically, if the number of rows k is too small, it is difficult to ensure the efficiency of selecting iterative action rows in each iteration, and, if the number of rows k is too large, then the RSK method will have higher requirement for the memory of the computer, especially when the matrix dimension is large enough. In the following numerical examples, we will discuss the selection of parameter k in the RSK method.
In fact, according to Algorithms 1-3, the RSK method will cost (2k + 2) n + 1 + k flopping operations (flops) at each iteration, while the GRK method and the RK method will cost 7m + 2 (n + 1) and 4n + 1 flops, respectively [17]. We list the flops of these three algorithms at each iteration in Table 1. Obviously, when the row number m is extremely large, the RSK method costs less flops than the GRK method. When k � 1, the RSK method degenerates into the uniform sampling RK method.

Convergence Analysis of the RSK Algorithm.
In this section, we pay attention to the convergence property of our algorithm. e following theorem guarantees the convergence of the RSK method.
where λ min (A * A) is the smallest positive eigenvalue of A * A.
Require: A, b, l, and x 0 . (1) for t � 0, 1, . . ., l do (2) Randomly generate by uniform distribution k different elements from 1 to m, indicated by V t � i 1 , i 2 , . . . , i k (3) Compute the lth entry r (l) t of the vector r t according to . . , k (4) For l � 1, 2, . . ., k, select the row according to maxr (l) t for i l ∈ V t and assume that the index i l � j  Proof. Update rule of the RSK algorithm yields In other words, x t+1 − x * is orthogonal to A (i t ) . Consequently, by the Pythagorean theorem, we obtain Based on this equality, we have Note that the population mean can be expressed as the expectation of the sample mean, thus, So, we upper bound By the assumption that x 0 ∈ R(A * ), we have from the RSK algorithm that x t also does for each t. Since Hence, it holds that In addition, by taking full expectation on both sides of (13) and using the law of iterated expectation It follows from induction on the iteration index t, and we finally obtain that Actually, compared with the GRK algorithm in [17], the RSK algorithm cannot improve the convergence for the number of iterations for the linear system (1). However, when the row number m of the coefficient matrix A is much larger than k, in each iteration of the GRK algorithm, we must calculate the residuals of all m rows, while in the RSK algorithm, we just calculate the residual of k rows. is explains how and why the RSK algorithm uses less calculation time to achieve convergence accuracy when compared with the RK and GRK algorithms.

Sensitivity Analysis.
e essence of the RaSK algorithm is to solve the least square problem. Consider least square problem (16), given matrix A ∈ R m×n , observation vector b ∈ R m , and x ∈ R n is the vector to be solved and satisfies Let A be accurate and vector b be the measured data, and then we consider the sensitivity of the solution x of problem (16) with respect to each component of data b j . Let the sensitivity of x with respect to b j be According to articles [24,25], we have the following conclusions.

Theorem 2.
In the least square problem (16), the sensitivity of x with respect to b j is where u ik and v lk are the elements of orthogonal matrix U and V, respectively, and it satisfies singular value decomposition formula A � VΛU T with Λ � diag (λ 1 , λ 2 , . . .) being the singular value of matrix A.
Proof. Without losing generality, let rank (A) � n(m ≥ n), and according to the singular value decomposition formula of the matrix, we have where the orthogonal matrix V ∈ R m×m , U ∈ R n×n and Λ ∈ R m×n . Let Λ � diag(λ 1 , λ 2 , . . . , λ n ) and λ 1 ≥ λ 2 ≥. . .≥ λ n , then we have For any x and b, let then we can get Let β � (β 1 /β 2 ), where β 1 ∈ R n and β 2 ∈ R m−n . We have So x should satisfy us, we can get According to β � V T b, we have us, we can get So we have In fact, S ij reflects the rate of change of x with respect to data b j , which depends on the singular value of matrix A and its decomposition orthogonal matrix U, V. In particular, when m � n, the least square problem is reduced to solve the system of equations.

Algorithm Comparison.
In this section, we use RK, GRK, and RSK algorithms to solve equations (1) with different matrices A. e numerical behaviors of these algorithms are tested and evaluated in terms of the computing time in seconds (denoted as "CPU") and the number of iteration steps (denoted as "IT"), and the CPU and IT mean the medians of the CPU time and iteration steps with respect to 50 times repeated runs of the corresponding method. In addition, we also report the speed-up of RSK (k) (k represents the number of rows selected from the coefficient matrix A in the RSK algorithm) against GRK, which is defined as All algorithms started from the initial vector x 0 � 0 and terminated once the relative solution error (RSE), defined as at the current iterate x t , satisfies RSE < 10 −6 , or the number of iteration steps exceeds 200,000. e latter is given a label "−" in the numerical tables. For part of the tested matrices, we give their Euclidean condition number, denoted by cond(A), and their density is defined as density � number of nonzeros of an m × n matrix m × n .
All experiments are carried out using MATLAB (R2015b) on a personal laptop with 2.5 GHz (Intel(R) Core (TM) CPU i5-7300HQ), 8.00 GB memory, and Windows operating system (Windows 10).
Example 1. In this example, test matrices A are randomly generated by the MATLAB function randn, which produces independent entries subject to the standard normal distribution N (0,1). In our implementations, the random vectors x ∈ R n is randomly generated by using the MATLAB Mathematical Problems in Engineering function randn, and b ∈ R m is taken to be Ax; in addition, the solution vectors x * ∈ R n is taken to be pinv(A)b.
In Tables 2 and 3, we report iteration counts and CPU times for RK, GRK, RSK (2), and RSK (7) when the linear system (1) is consistent. As the results in Table 2 show that the RSK (7) vastly outperform both the RK and GRK in terms of CPU times with significant speed-ups when the corresponding linear system is fat (i.e., m < n) and the minimum speed-up is 1.22 and the maximum reaches 2.08. From Table 3, we see that the CPU times of RSK (10) are considerably less than those of RK and GRK when the corresponding linear system is thin (i.e., m > n), with the speed-up being at least 2.32 and at most attaining 3.49. In addition, we can find that the "fatter" the matrix is, the RSK algorithm shows less advantages, and the "thinner" the matrix is, the RSK algorithm shows more advantages. It is in line with our intuition because if the row number m is extremely large, the RSK algorithm can reduce more computational complexity, for the RSK algorithm is independent of m while the GRK algorithm is not.
Example 2. In this example, we select full-rank sparse matrices from [26], which originate in different applications such as linear programming, combinatorial optimization, DNA electrophoresis model, and Pajek or world city network. ey possess certain structures and properties, such as square (m � n) (e.g., cage5), thin (m > n) (e.g., WorldCities), or fat (m < n) (e.g., bibd_16_8, and refine). ere are also rank-deficient sparse matrices from [26], which come from applications like combinatorial problem and Pajek or world   Mathematical Problems in Engineering soccer network, such as square (e.g., football) and thin (e.g., relat6 and flower_5_1).
In Table 4, we list the numbers of iteration steps and the computing times for RK, GRK, RSK (2), and RSK (5) algorithms. According to the test results in Table 4, we can see that the RSK algorithm performs better than the GRK algorithm in terms of CPU, even if the matrix is square or not, full-rank or not, and sparse or not and the condition number is infinite or not. More specifically, in terms of the RSK (2) algorithm, the speed-up is at least 1.59 and the biggest is 2.76; and in terms of the RSK (5) algorithm, the speed-up is at least 1.84 and the biggest even attains 4.45.

e Choice of Parameter k.
ere is no doubt that the value of parameter k makes a huge difference in the performance of the RSK algorithm. In this section, we will have a tentative discussion on k and the matrix row number m. We simulate the relationship between the CPU of the RSK algorithm (with RSE < 10 −6 ) and the size of k under different A. e simulation results are shown in Figure 1.
rough a large number of our numerical experiments, we find that k � [log 2 (m)] may be a good choice.

e RaSSK Algorithm. Consider the linear system
x is a sparse n dimension vector, and b is an m-dimension vector. By solving the following regularized basis pursuit problem we can find that the least Euclidean norm solution satisfies the sparsity requirement. In 2014, Lorenz et al. [22] proposed the RaSK algorithm for solving the regularized basis pursuit problem as given in Algorithm 4).
, where λ(>0) is set according to different applications and is to control the sparsity of the solution. Lorenz et al. proved that, for a consistent linear system Ax � b, the RaSK algorithm converges to the unique solution of the regularized basis pursuit problem in [22].
Similar to the construction method of the RaSK algorithm, we can give the randomized sampling sparse Kaczmarz (RaSSK) algorithm which is listed in Algorithm 5. e proof of the convergence of RaSSK is similar to RaSK. It can be seen as a special case of the Bregman projections for split feasible problems (BPSFP) algorithm in [22]; if we change its feasibility question to "Find x ∈ ∩ m k�1 A k " (i.e., the hyperplane formed by the rows of the matrix A) and define f(x) � λ‖x‖ 1 +(1/2)|x‖ 2 2 , then, the convergence can be easily obtained by eorem 2.8 in [22].

Signal Experiments.
In this section, we will show the efficiency of the RaSSK method by several numerical examples and compare it with the RaSK algorithm. We implement the numerical experiments, by MATLAB (R2015b) on a personal laptop with 2.5 GHz (Intel(R) Core(TM) CPU i5-7300HQ), 8.00 GB memory, and Windows operating system (Windows 10).
Example 3. In this example, the test signal is randomly generated with length 256 and limit its sparsity to 10, that is, only 10 nonzero coefficients. e signal is reconstructed by RaSK and RaSSK, respectively. e recovery signal map, relative error map, and relative residual map are given in Figure 2. As shown in this figure, the parameter λ in the RaSSK method and the RaSK method is 50. We can see that both algorithms can reconstruct the original signal. It is worth mentioning that the RaSSK method is more efficient compared to the RaSK method. (1) for t � 0, 1, . . ., l do (2) Generate i t randomly by Pr(row ALGORITHM 4: e randomized sparse Kaczmarz (RaSK) algorithm [22].
(1) for t � 0, 1, . . ., l do (2) Randomly generate by uniform distribution k different elements from 1 to m, indicated by V t � i 1 , i 2 , . . . , i k (3) Compute the lth entry r (l) t of the vector rt according to . . , k (4) For l � 1, 2, . . ., k, select the row according to maxr (l) t for il ∈ Vt and assume that the index il � j    MATLAB function randn generates an m × n random matrix which is independently distributed in the standard normal distribution N (0,1) as the observation matrix. For the sparse representation of the image, we use MATLAB function dwt, which is a wavelet transform process. e error is defined by and the stopping criterion of the algorithm is error < 0.1 or reaches the maximum number of iteration steps 50,000. e value of the parameter λ and the results are shown in Table 5.
In this table, "CPU" denotes the computing time and "IT" denotes the number of iteration steps. From Table 5, it is easy to obtain that the RaSSK algorithm significantly performs better than the RaSK algorithm. It greatly reduces the number of iteration steps and the computing time. Meanwhile, from Figures 3 and 4, we can find that for human beings, there is almost no perception loss whenever some information is discarded.

Conclusion
Variants of the RK algorithm are effective iteration methods for large-scale linear systems. In this paper, based on the randomized Kaczmarz algorithm and the greedy randomized Kaczmarz algorithm, we propose a new algorithm which makes use of the residual information, while it need not calculate all the residuals.
is algorithm converges faster than RK [16] and GRK [17] in experiments. Furthermore, after a large amount of numerical experiments, we recommend the parameter k to take [log2(m)]. As an application, we apply it to the signal reconstruction in compressed sensing. e experiments show that our algorithm has a good performance.

Data Availability
e table and figure data used to support the findings of this study are included within the article. e software code data used to support the findings of this study are available from the corresponding author upon request. We respect and implement the relevant provisions of Mathematical Problems in Engineering, and our article implements

Conflicts of Interest
e authors declare that they have no conflicts of interest.