ADMM-EM Method for L 1-Norm Regularized Weighted Least Squares PET Reconstruction

The L 1-norm regularization is usually used in positron emission tomography (PET) reconstruction to suppress noise artifacts while preserving edges. The alternating direction method of multipliers (ADMM) is proven to be effective for solving this problem. It sequentially updates the additional variables, image pixels, and Lagrangian multipliers. Difficulties lie in obtaining a nonnegative update of the image. And classic ADMM requires updating the image by greedy iteration to minimize the cost function, which is computationally expensive. In this paper, we consider a specific application of ADMM to the L 1-norm regularized weighted least squares PET reconstruction problem. Main contribution is derivation of a new approach to iteratively and monotonically update the image while self-constraining in the nonnegativity region and the absence of a predetermined step size. We give a rigorous convergence proof on the quadratic subproblem of the ADMM algorithm considered in the paper. A simplified version is also developed by replacing the minima of the image-related cost function by one iteration that only decreases it. The experimental results show that the proposed algorithm with greedy iterations provides a faster convergence than other commonly used methods. Furthermore, the simplified version gives a comparable reconstructed result with far lower computational costs.


Introduction
Positron emission tomography (PET) is an important imaging tool in modern medicine and provides noninvasive quantification of the biochemical and biological processes inside living subjects. Several reconstruction methods have been developed and applied in clinical practice. These methods can be roughly divided into two categories: analytical methods and iterative methods. Iterative methods have attracted more attention because they generally model imaging physics better and are more capable of suppressing noise artifacts than analytical methods. A basic target of PET reconstruction is to solve a system of the following form: where represents the biochemical activity distribution inside a subject, denotes the measured projections, is scatter and random events, and is a predetermined system matrix, where ∈ ×1 , ∈ ×1 , ∈ ×1 , and ∈ × . All of these are fully nonnegative.
A classical approach is to select such that the weighted least squares (WLS) error [1] between and + is minimized: Σ is a diagonal matrix with the diagonal element Σ = 1/ , where is the variance of the th measurement. In theory, the values in are larger than those in ; however, in practical applications, the latter one could exceed the former one when estimating .
Although an iterative algorithm can more effectively suppress noise propagation than conventional filtered backprojection, it may fail in special cases, such as increased amount of noise, sparse projections, or low dose (which results in high noise or poor SNR). In fact, the problem of reconstructing PET image data is an ill-posed inverse problem. Over the past twenty years, efforts have been made to solve these problems by employing regularization techniques [2][3][4][5][6][7][8]. A general method is to introduce a priori knowledge to constrain the solution space, which can be expressed as a regularization (or penalization) on the reconstructed image to reflect information on the properties of acceptable images. Tikhonov regularization [9,10] is a popular method that generally leads to a unique solution. There are many methods for solving such a quadratic programming problem [1,11]. However, they exhibit a weak ability to preserve the edges while smoothing the interior of the image.
Another viable regularization method is the 1 -norm regularization [12][13][14][15][16], in which one seeks to find the solution of the following optimization problem: where > 0 serves as a penalty parameter and is a linear operator (e.g., gradient operator and orthonormal transformation). The penalty parameter controls the tradeoff between data fidelity and resolution (image smoothness). Several linear operators have been proposed, such as the first-or second-order derivative or wavelet basis. Note that could include negative elements. The underlying philosophy in working with 1 -norm regularization is that most images have a sparse representation [17].
In recent years, the subgradient-based method [18] has been developed for solving convex and nonconvex optimization problems; this method takes a subgradient related surrogate function at each step to obtain the update. Another method is the alternating direction method of multipliers (ADMM) [19,20]. ADMM decomposes the original problem into three subproblems, and then it sequentially solves these subproblems at each iteration. For medical imaging, ADMM will distributively minimize the augmented Lagrangianrelated function to solve for the additional and primal variables (pixels), and then it updates the dual variables, which are associated with a coupling constraint.
One can reformulate the optimization problem in (3) by imposing the extra constraint = , which leads to the following optimization problem: Note that (3) and (4) have the same solution. The scaled augmented Lagrangian function [19] is introduced to overcome these difficulties, and it is defined as follows: where is the dual variable or Lagrange multiplier and > 0 is the penalty parameter. When = 0, the augmented Lagrangian can be reduced to the unaugmented (common) version.
By applying distribution optimization for and and dual ascent to , a unified framework can be introduced to solve the 1 -norm regularized WLS reconstruction problems. For the -update, one exploits the separability ( , , ) in , that is, ( , , ) = ∑ =1 ( , , ), to solve for each independently. A solution can be found in [17], which is the well-known shrinkage method. To update , a simple gradient-ascent method can be used. When not considering the details of the -update, the ADMM's framework can be formulated [19] as follows.
Algorithm 1 (ADMM general framework). One has (2) +1 = arg min ≥0 ( , +1 , ); Here, the -update is a difficult problem that minimizes the following function with the nonnegativity constraint: When 2 + is invertible, we may obtain a unique global solution, which has been used in [21,22].
where the truncation below zero is necessary for constraining the solution to the nonnegative space. This formulation is feasible in theory if the regularization process guarantees the nonsingularity of the matrix. A viable method is the gradient-based method. The steepest descent method is perhaps the simplest technique to implement, which takes the negative gradient as the descent direction: where the superscript denotes the th iteration and ( ) is the step size. Of course, negative pixel values still need to be truncated. The convergence of the gradient-based methods depends on the choice of step size, which is problematic for practical implementations. Let ( ) = be a constant, where 0 < < 2/‖∇ 2 ( , +1 , )‖ (‖ ⋅ ‖ denotes the maximum eigenvalue); then (8) becomes the projected Landweber method [23]: The conjugate gradient method is a popular approach, which is often implemented as an iterative algorithm applicable to sparse systems for large-scale problems.
Computational and Mathematical Methods in Medicine 3 However, for large-scale problems such as PET reconstruction, the inverse matrix method becomes very costly from a computational perspective. Furthermore, the "pure" steepest descent method and the "pure" conjugate gradient method do not meet the nonnegativity constraint, and consequently, negative pixel values need to be truncated. The truncation, however, leads to a divergent modification. The projected Landweber method, however, is able to meet the nonnegativity, but its convergence cannot be proven theoretically. In addition, the projected algorithms also destroy the monotonic decreasing properties of the decomposed cost function.
In fact, ADMM is a framework of distribution optimization, which is not limited to PET reconstruction with the nonnegativity constraints. With different constraints and distributed variables, ADMM can be applicable in many other fields, such as multiple-block convex programming [24], ADMM for tomography with nonlocal regularizers [25], linear classification [26], and optimal power flow problems [27].
In this paper, we consider a specific application of ADMM to the 1 -norm regularized WLS reconstruction problems. Here, we do not change the framework of ADMM; rather, we develop a new algorithm for PET image reconstruction. The proposed approach is applicable to several medical image reconstruction problems, such as TV regularized [13] and wavelet regularized [14] image reconstruction. We focus on a key subproblem: the -update. A multiplicative update rule is derived to iteratively and monotonically (in the sense of decreasing cost function) update the pixel values. Similar to the EM algorithm [28,29], the pixel-update algorithm also intrinsically satisfies the automatic satisfaction of the nonnegativity constraint without the need for an adjustable step size. We provide a rigorous convergence proof for the proposed -update, which shows that the algorithm will iteratively pursue a single global optimum. The -update that optimizes the subproblem inevitably leads to high computational costs, and we can replace it by a single iteration algorithm to decrease the decomposed cost function of the subproblem, which is an often used strategy in distributed optimization. The experimental results demonstrate that the proposed algorithm (with greedy reconstruction of the pixels) provides better performance compared to those of other commonly used methods with respect to image qualification and convergence speed. The results also show that the simplified version provides a comparable reconstructed result but at a considerably lower computational cost compared to the existing methods.

Methodology
For notational simplicity, we define = − +1 + as a constant vector, and then we define the cost function in (6) as follows: We will solve this optimization problem using a modified EM-type algorithm. As mentioned in many articles [30][31][32][33], a surrogate function, as defined below, is useful in algorithm derivation and convergence proof.
Clearly, Ψ( ) is decreasing under the update +1 = min ( | ) because of There are two important properties for the surrogate: additivity and transitivity. For the former, the sum of two surrogates is a surrogate of the sum of two original functions. For the latter, the surrogate of the surrogate of a function is a surrogate of this function. Following these properties, we will construct the surrogates for ( ) and ( ).

Surrogate for ( ).
We construct a surrogate ( | ) by the convexity. Let * = ( + ) , that satisfy * , ≥ 0 and * + ∑ =1 = 1. They can be the convex combination coefficients such that It can be verified that ( | ) = ( ). When considering Jensen's inequality and the convex combination coefficients , then ( | ) ≥ ( ) is proven by the following inequality: A similar derivation can be found in [34]. Note that * only relates to the constant term with regard to and can therefore be ignored when minimizing the function ( | ).

Surrogate for ( ).
Since there may be some negative values in the matrix , it is difficult to directly construct a surrogate as above. Some previous works, such as [35,36], are unable to solve the problem because they fail to guarantee nonnegativity during the iterations, for which we provided a counterexample in [11]. We will utilize an intermediate surrogate to solve the problem. Let = −̂and = −̂, where ,̂, , and̂are matrices or vectors with nonnegative entries. Subsequently, we can construct a surrogate for ( ) at .
where ( | ) It can be verified that mid ( | ) = ( ). By the convexity of ( ), we view 1/2 as the combination coefficients, leading to mid ( | ) Following the same process as in Section 2.1, we can construct surrogates ( | ) and̂( | ) for ( | ) and̂( | ), respectively. Let then Note that * and̂ * are relative to the constant terms and can safely be ignored. Now, we can obtain a surrogate for ( ) at .

Multiplicative Update Rule.
We minimize ( | ) = ( | ) + ( /2) ( | ) to obtain a new iteration. Taking the partial derivatives for ( | ), ( | ) and̂( | ) leads to Solving the one-dimensional equations ( | )/ = 0 leads to a multiplicative update rule, which is the main result in this paper. , The update rule results from the current image multiplied by a factor and is flexible and easy to implement. The derivation process can also be explained in terms of EM optimization: when considering the surrogate as the minimal conditional expectation, its minimization is equivalent to the maximization of the conditional expectation. The algorithm shows two important properties: the iterations are positive if the initial estimate is positive, and the cost function monotonically decreases. For this derivation process, the key step is to replace the minimization of the cost function by minimizing at each iteration the surrogate whose variables are separable. Moreover, the minimization of the surrogate ensures that the cost function decreases. Algorithm 3 (specific ADMM). Given > 0 and > 0, then (2) iteratively update by (22) until some stop rule is satisfied; Note that, in Algorithm 3, ADMM requires greedy iterations to obtain the optimal solution with respect to , which is an expensive operation. In general, an update that decreases the primal cost function is sufficient for use in practical applications; thus, we will provide the following simplified algorithm.

Convergence
There are many convergence proofs for both constrained and unconstrained ADMM [19,20]. Therefore, we only need to limit ourselves to discussing the convergence of theupdate. We will theoretically prove that the iteration sequence will converge to a global solution if we use it to pursue an accurate solution without considering the computational cost. In the appendix, we will prove that update (22) can iteratively and monotonically minimize the cost function (10) while observing the nonnegativity constraint along the lines of [35][36][37][38][39].

Simulated Head Phantom Data.
A simulated head phantom with 128 × 128 pixels (pixel width of 4 mm), as shown in Figure 1, is used in the following experiments. This phantom is modified to meet the needs of PET simulation because the original one is a CT phantom. There are many advantages to using simulated phantoms, including prior knowledge of the pixel values and the ability to control noise. The total detection counts are approximately 5 × 10 5 . An anisotropic TV regularization is used to test the algorithmic performance as follows: where ,right and ,below , respectively, represent pixels to the right and below .
We compare the performance of the proposed method with De Pierro's ISRA (image space reconstruction method) [34] and the PWLS-EM algorithm with a quadratic smoothing regularization [11]. These methods are selected to demonstrate the difference between regularized and nonregularized reconstruction algorithms. Moreover, the difference between 1 -norm regularization and squared 2 -norm regularization is examined. We also use several methods to pursueupdate, including the projected Landweber and the conjugate gradient methods. The projected Landweber method uses [40], where ‖⋅‖ 1 and ‖⋅‖ ∞ denote the 1-and ∞-norms of a matrix, respectively. The code of the conjugate gradient method comes from [41], which is slightly modified to meet our criteria.
In the following, the results of the ADMM-type algorithms are named ADMM---, where represents theupdate method, which can be EM (proposed), PL (projected Landweber), and CG (conjugate gradient); denotes the number of outer loops; and is the number of inner loops. For example, ADMM-EM-400-120 refers to using the proposed method to update with 120 inner iterations and update all of , , and with 400 outer iterations. In the unambiguous case, we will omit , , , or all of them for notational simplicity. The experiments are performed on a HP Compaq PC with a 3.00 GHz Core i5 CPU and 4 GB of memory. The algorithms are implemented in MATLAB 7.0. All of the algorithms are initiated using the same uniform image for a fair comparison.
The system matrix is obtained using the "angle of view" method [28]. The diagonal matrix Σ is computed using Fessler's "data-plugin" technique [1].
Using the system matrix, we project the phantom on the sinogram with 128 radial bins (bin size of 4 mm) and 128 angular views evenly spaced over . The noisy projections are obtained by Fessler's pseudorandom formulation.
where * is the noise-free projection, = 30% simulates the contribution of random events, and is the th detector efficiency. We select = 1 and thus ignore the influence of the detector efficiency. Furthermore, we ignore in (2) during the simulation process.
Mean absolute error (MAE) can be used to measure the proximity of the reconstructed image to the true image. The MAE value is calculated by taking the average of the absolute difference between the reconstructed pixel values and the real ones over the entire image. The best algorithm will provide the lowest MAE value.
The following criterion is applied to stop the iteration process: where is a difference tolerance. Contrast and variability are typically used to evaluate image quality. We compute these parameters using the method of NEMA [42]. Denote mean Ω ( ) and std Ω ( ) as the mean and standard deviation of the image on the region Ω; then where Ω 1 represents the ROI (region of interest) and Ω 2 denotes the background.
The running time of the algorithms is easily influenced by many factors, for example, coding level and running environment. Therefore, we also compare the computational complexity for the sake of fairness. Here, we use a simple and feasible method by counting the number of multiplications of the system matrix and any vector because the TV matrix ( ) is very sparse, and thus the computation load can be ignored. In fact, for all of ADMM-PL, ADMM-CG, and ADMM-EM, each inner loop ( -update) requires only two multiplications of the system matrix and a vector; consequently, the computation complexity can be further simplified to the total number of inner loops, which equals the multiplications of the number of outer loops and inner loops for each -update. In the following, we will take it to indicate the computational complexity.
In Figure 2, we find a suitable penalty parameter for PWLS-EM by comparing the change of the MAE at the 400th iteration with respect to . We also pursue the optimal and for ADMM-EM by comparing the change of MAE at the 400th outer iteration and the 120th inner iteration. Since the -update by EM can be proven in theory to converge to a single global optimum, ADMM-EM may be viewed as a golden standard. For PWLS-EM, the optimal MAE values are obtained at = 10 −4 . For ADMM-EM, the optimal penalty parameter is obtained at ( , ) = (10 −2 , 10 −4 ). In addition, the minimal MAE values for PWLS-EM and ADMM-EM are 41.91 and 13.05, respectively. This phenomenon shows a clear advantage of 1 -norm regularization compared to Tikhonovtype regularization.  Below, we focus on several special algorithms, including ISRA, PWLS-EM ( = 10 −4 ), and ADMM ( = 10 −2 and = 10 −4 ). Figure 3 shows the reconstructions corresponding to a greedy outer and inner iteration. As shown in this figure, ISRA's image suffers from serious noise artifacts, and PWLS-EM fails to preserve the edges. ADMM-PL-400-1 and ADMM-CG-400-1 provide an obvious blurred reconstruction. The remaining images exhibit smooth interiors and sharp edges, which are desirable results. These algorithms provide almost identical reconstructed results. In fact, since the subproblem to update the pixel values has a strictly convex cost function, every convergent algorithm will converge to the same result with a greedy iteration number. This is the reason for why ADMM-PL-400-120, ADMM-CG-400-120, and ADMM-EM-400-120 provide similar results. This experiment also indicates that EM is more suitable for the simplified ADMM algorithm than PL and CG. We believe that the reason is because the EM--update can ensure the monotonic decrease of the augmented Lagrangian function; however, the projected Landweber and CG have no such characteristic. Figure 4 presents a comparison of the MAE curves and cost function with increasing iteration numbers for ADMM-PL, ADMM-CG, and ADMM-EM, in which the curves of ISRA and PWLS-EM disappear because they clearly fail to preserve edges and suppress noise artifacts. As shown, for both curves, ADMM-PL-1 presents larger MAE and cost function values and ADMM-CG-1 provides the next-worse curve, demonstrating the consistent conclusion that ADMM-PL and ADMM-CG are ill-suited for the proposed simplified strategy. ADMM-EM-120 exhibits the fastest convergence rate; however, it is only slightly superior to ADMM-CG-120. ADMM-PL-120 and ADMM-EM-1 present similar curves. In conclusion, ADMM-EM-120 and ADMM-CG-120 perform better than ADMM-PL-120 and ADMM-EM-1 when only judging these curves, but eventually, all algorithms will approach the same target with increasing iteration numbers.
Note that ADMM-EM has been proven in theory to converge to a unique global solution of the corresponding subproblem with a greedy -update. This means that when ignoring rounding error, our algorithm can arrive at any desired accuracy given a large enough iteration count. Thus, in these experiments, ADMM-PL and ADMM-CG will also converge to that solution if they are indeed convergent. However, to date, there is no evidence to show the convergence when negative values are truncated during the iteration process. Table 1 presents the comparison of algorithmic performance, including MAE, cost function, contrast, variability, running time (second), and computational complexity. We only compare the greedy versions of ADMM-PL, ADMM-CG, and ADMM-EM with the simplified version of ADMM-EM because the simplified versions of ADMM-PL and ADMM-CG failed to obtain an acceptable image. Our findings show that either the greedy or simplified version of ADMM-EM always provides the best results. In addition, the greedy versions of ADMM with -update by PL, CG, and EM impose a similar time cost and have the same computational complexity, but the simplified version of ADMM-EM requires the least number of iterations. Figure 5 differs from Figure 4 because the former fixes the number of inner iterations and the latter fixes the number of outer iterations. This figure also presents a comparison of MAE curves and cost function. As shown, ADMM-EM always presents the best result, ADMM-CG provides the next best result, and ADMM-PL shows the worst result; however, all of them will be consistent after sufficient inner iterations. We can also observe that ADMM-EM's result does not actually depend on the number of inner iterations, which demonstrates a clear advantage of the simplified version over the other two.
The above experiments provide a comparison while fixing the number of outer or inner iterations. In practical applications, we generally do not require this many iterations to obtain an acceptable image. In Table 2, (26) will be used as a stopping condition for both cases; then, we obtain the acceptable images with = 10 −8 . As shown, when fixing the number of inner iterations, the greedy ADMM-EM requires a lower number of outer iterations, whereas the simplified ADMM-EM requires more outer iterations. When fixing the outer iteration count, ADMM-EM still requires the lowest number of inner iterations, while ADMM-PL requires the most. Table 3 also presents a comparison of algorithmic performance; however, the number of inner and outer iterations use those suggested by Table 2 to meet the stop rule = 10 −8 . ADMM-EM-68-11 is also included in the comparison. All presets present comparable evaluation parameters, but the simplified ADMM-EM requires very little running time and computational complexity. We observe that either the greedy or simplified version of ADMM-EM still provides the best results. In addition, the greedy version of ADMM-PL requires the largest amount of time and computational cost. The greedy versions of ADMM-EM and ADMM-CG have a similar computational cost, but the simplified version of ADMM-EM requires less computational cost. In fact, ADMM-EM-256-1 consumes 42.98% of the running time of ADMM-CG-71-11 and 9.37% of that of ADMM-PL-292-20, which corresponds to 32.78% and 4.38% of computation complexity, respectively.

PET Clinical Data.
Real clinical brain projections, which were obtained using Positron's mPower scanner with 10 Ci FDG preinjected into a patient, were also used for evaluation purposes. The acquisition lasted for 10 minutes, and 700 M counts in 61 slices were obtained. The normalization was performed by measurements obtained from a calibration scan with a 1 Ci Ge 68 rotating rod source. The attenuation coefficients of the attenuation media were computed from a transmission scan of a 5 Ci Ge 68 rod source. The raw data were 128 radial bins and 128 angles (bin size 4 mm), which are shown in Figure 6. The ADMM-type algorithms with ( , ) = (10 −2 , 10 −4 ) obtain the best reconstruction as above, which we also use on the real clinical data. We also executed the proposed iteration numbers in Table 3 for the algorithms. Figure 7 shows the reconstructions of the patient's projection data. For real clinical data, it is difficult to quantitatively evaluate the algorithms. The reconstructed grid is 128 × 128, with 4 mm pixels. This figure illustrates that the two proposed algorithms produce sharper images with greater contrast than those achieved with other algorithms. Moreover, PL-292-20 and CG-71-11 do not clearly resolve the ellipse. Among them, PL-292-20 results in a seriously degraded image, where the boundaries of the different regions are obscure. The large difference is due to the lower number of iterations. These experimental results further demonstrate that ADMM-EM leads to a superior result than the others by providing a fast convergence.
From the derivation of the proposed method, we make the following observation: the EM-type -update ensures that the cost function decreases at each iteration. However, it is difficult for the other algorithms to meet this condition, particularly for lower numbers of iterations. Thus, ADMM-EM with one inner loop can achieve a good result. Certainly, while running many inner loops, all of the algorithms will converge to the same solution; thus, they present some similar images.

Conclusion
We present a special application of ADMM to PET image reconstruction. Specifically, a new update rule is developed to iteratively update the pixel values, which exhibits desirable   properties, including monotonic decrease of the cost function, self-constraining to the feasible region, and no need to impose a step size. Such properties allow us to implement a simplified version of ADMM, requiring considerably less time and computational overhead. We provide a rigorous theoretical global convergence proof for the update step.
The simulation results demonstrate that the proposed greedy algorithm provides a stabler and faster convergence with similar computational cost as ADMM-PL and ADMM-CG. The results also indicate that the proposed simplified algorithm obtains a similar image quality while imposing lower computational costs. For our simplified algorithm, a theoretical convergence proof cannot be provided; rather, we use the experimental method to demonstrate the convergence. Proving the convergence for the simplified algorithm will be the focus of future work. In the following, we use a number of reasonable assumptions.
The first assumption forces the iterations to be positive, but the limit may be zero. The second condition is reasonable because ( Σ ) = 0 suggests that = 0 for any . Thus, the equation = is irrelevant to , and then in is removable. For the third requirement, one of the main reasons to add the regularization is to enforce the convexity.
First, we provide four useful lemmas.  The derivation process is omitted due to its simplicity. By the three theorems below, the global convergence will be proven.
Proof. When * > 0, by (21), it can be verified that Proof. According to Lemmas A.2, A.3, and A.4, the set of accumulation points of { } is connected and compact. If we can prove that the number of accumulation points is finite, then the desired result follows because a finite set can be connected only if it consists of a single point [38].
To prove the existence of a finite number of accumulation points, we consider any accumulation point * . Given an integer set Ω = {1, 2, . . . , }, where is the total number of components of , then Ω * = { : * = 0} is a subset of Ω. Let Φ Ω * be the restrictions of Φ to the set { : = 0, ∈ Ω * }, which is a strictly convex function of the reduced variables. It follows that Φ Ω * has a unique minimum (Theorem A.6: Φ( * )/ = 0 if * > 0). It means that an accumulation point must correspond to a subset of Ω. The number of subsets of Ω is finite; thus, the number of accumulation points is also finite.
In Theorem A.6, we prove that every accumulation point meets the first KKT condition, by which the full sequence convergence is provided in Theorem A.7. Naturally, the limit of { } satisfies the first KKT condition. In the following, we will show that the second KKT condition is also satisfied. Thus, we can obtain that +1 > , which is a contradiction to { } → 0.