A Model of Regularization Parameter Determination in Low-Dose X-Ray CT Reconstruction Based on Dictionary Learning

In recent years, X-ray computed tomography (CT) is becoming widely used to reveal patient's anatomical information. However, the side effect of radiation, relating to genetic or cancerous diseases, has caused great public concern. The problem is how to minimize radiation dose significantly while maintaining image quality. As a practical application of compressed sensing theory, one category of methods takes total variation (TV) minimization as the sparse constraint, which makes it possible and effective to get a reconstruction image of high quality in the undersampling situation. On the other hand, a preliminary attempt of low-dose CT reconstruction based on dictionary learning seems to be another effective choice. But some critical parameters, such as the regularization parameter, cannot be determined by detecting datasets. In this paper, we propose a reweighted objective function that contributes to a numerical calculation model of the regularization parameter. A number of experiments demonstrate that this strategy performs well with better reconstruction images and saving of a large amount of time.


Introduction
Nowadays, X-ray computed tomography (CT) is still an important part of biomedical imaging technologies for the reason that the reconstructed image is of high spatial resolution and quality. Nevertheless, it confirms that an overdose of radiation possibly increases the risk of genetic or cancerous diseases, making it urgent to develop creative and effective reconstruction techniques to fit low-dose CT scanning protocol. Obviously, the X-ray flux cannot be reduced much since the signal-to-noise ratio (SNR) of measured data declines with the reduction of dose. Another approach is to decrease the number of projection angles, which will lead to incomplete few-view data. In this case, analytic-based algorithms like FDK [1], which are derived from a continuous imaging model and in need of dense sampled projections, are sensitive to insufficient projection data and arrive at a terrible result. However, algebraic algorithms like the simultaneous algebraic reconstruction technique (SART) [2] solved the problem better by transforming it to a series of linear equations.
Recently, Candes et al. [3,4] have made compressed sensing theory popular in information theory field. This theory indicates that a variety of signals can be represented sparsely in a certain transform domain. Therefore, original signal can be recovered accurately by far fewer samples while there is no need to follow the Shannon/Nyquist sampling theorem. A principle called restricted isometry property (RIP) guarantees the perfect recovery of any sparse signal [5]. This novel theory has been applied to many regions, like information technology [6], signal and image processing [7], inverse filtering [8], and so on. It is said that the data acquisition 2 Computational and Mathematical Methods in Medicine process with compression is good for enhancing image quality because this method can increase imaging speed and suppress the artifacts caused by patients' movement [9]. For these benefits, many compressed sensing based algorithms are created to deal with few-view CT reconstruction problem. One major group is based on the total variation (TV), which takes the TV of the image as the sparse constraint. The image is determined by minimizing the TV term with the constraints of the linear projection equations. Sidky and Pan presented an improved TV-based algorithm named adaptive steepest descent projection onto convex sets (ASD-POCS) in circular cone-beam framework [10]. Another similar method called gradient projection Barzilari Borwein (GPBB) has a faster convergence speed [11]. Besides the TV minimization algorithms, dictionary learning is also helpful to sparse representation. During the reconstruction process, the image is divided into many overlapped patches, represented sparsely by overcomplete elements of a particular dictionary. Xu et al. combined statistical iterative reconstruction (SIR) with dictionary learning and got a better reconstruction result than TV-based methods in the low-dose CT condition [12]. According to Xu's paper, this method is robust to noise and obtains a better reconstructed image with more details than the TV-based methods do. Naturally, there are some parameters relevant to the final result. Some of them, like the sparse level, the scale of the dictionary, and so on, have less change due to different scanning data and then can be empirically selected. However, there is a special parameter changing according to the phantom, the scanning protocol, the noise level, and other factors. This parameter plays an important role in the reconstruction program to balance the data fidelity term and the regularization term while determining its value is time consuming with many attempts. Hence, there is no doubt that providing a model to select a proper value of this parameter according to the scanning data is essential for the algorithm based on dictionary learning, which leads to better result and time saving. This paper is organized as follows. In Section 2, the problem of low-dose CT reconstruction is stated and the algorithm based on dictionary learning is reviewed. In Section 3, the model of regularization parameter determination is proposed by function fitting method. In Section 4, a series of experiments are performed and corresponding discussions are given. Finally, there is the conclusion at the end of this paper.

Background and Notation
Interpretation. According to previous work by Xu et al. [12], SIR is united with dictionary learning to derive the algorithm. SIR assumes that the measured data can be regarded as the Poisson distribution where b = ( 1 , 2 , . . . , ) ∈ R ×1 is the entrance X-ray intensity, y = ( 1 , 2 , . . . , ) ∈ R ×1 is the exit X-ray intensity, l = ( 1 , 2 , . . . , ) ∈ R ×1 is the integral of the linear attenuation coefficient with , A = { } ∈ R × 2 is the system matrix, the reconstructed image = ( 1 , 2 , . . . , 2 ) is a linear attenuation coefficient distribution, which transforms the initial image of × pixels to a vector ∈ R 2 ×1 , and represents the read-out noise.
The objective function of SIR is as where ( ) = ∑ =1 ( /2)([A ] −̂) 2 is the data fidelity term, l = (̂1,̂2, . . . ,̂) ∈ R ×1 is the measured data of l calculated bŷ= ln( /( − )), = ( − ) 2 / is the statistical weight, and ( ) is the regularization term. The regularization term usually contains prior information of the image, like sparse constraint. When the sparse representation is acquired by dictionary learning theory, we can replace ( ) = ∑ ‖E − D ‖ 2 2 + ∑ ] ‖ ‖ 0 in the objective function. Therefore, the reconstruction problem is equivalent to the following minimization: where is an operator to extract patches with 0 × 0 pixels from the image, D = (d 1 , d 2 , . . . , d ) ∈ R 2 0 × is the training dictionary whose column d ∈ R is called an atom of the same size of a patch, ∈ R ×1 has few nonzero entries as a sparse representation of patches by the dictionary basis D, and the variables and ] are regularization parameters. In this optimization problem, , , and D are all unknown; hence, a practical plan of minimizing the object function is an alternating minimization scheme. The plan divides the primary problem into two recursive steps: update of the dictionary model and update of the image. The final result is acquired by operating the two steps alternately until reaching a stopping criterion.

Update of the Dictionary Model.
During this procedure, the image is supposed to be fixed, meaning that the data fidelity term is a constant. The optimization problem is simplified to the one as where is an intermediate image of the last updating step. In the adaptive dictionary based statistical iterative reconstruction (ADSIR), the dictionary is defined dynamically based on the unknown image while the dictionary in the global dictionary based statistical iterative reconstruction (GDSIR) is predefined beforehand [12]. Previous researches have proved that the K-SVD algorithm performs well at training the dictionary [13]. Once the dictionary is determined, the OMP algorithm is used to update the sparse coding [14] with a predetermined sparse level, instead of solving the 0 -norm problem as (4) directly.

Update of the Image.
While updating the image, the dictionary and sparse coding remain invariable. In other words, the problem transforms to the form as where is the regularization parameter balancing the data fidelity term ∑ =1 ( /2)([A ] −̂) 2 and the regularization term ∑ ‖E − D ‖ 2 2 . The regularization term is already a separable quadratic function. By replacing the data fidelity term with a separable paraboloid surrogate [15], the optimization can be iteratively solved by

Effect of the Regularization Parameter.
As mentioned above, the regularization parameter is of great importance during the update of image (5). We consider the optimizing problem of the form If there is It shows that a smaller makes the data fidelity term smaller and the regularization term bigger, which means that the sparse constraint has less effect on the optimizing process and more noise will appear in the final image. On the other hand, a bigger weakens the effect of the data fidelity term, generating a loss of some fine details in the image. For example, should be increased to suppress the noise increment in the projection domain since the data fidelity term is proportional to the noise standard deviation. In order to get an optimal result, previous work selects a great many values of and picks out the best one by comparing the final images. This testing strategy is of great time consuming, making the algorithm based on dictionary learning not friendly to the reconstruction task.

Morozov's Principle and the Balancing Principle.
The research on the choices of regularization parameters in linear inverse problems appears early in 1998 [16]. The original optimizing function of the inverse problem is like Take = 2, = 2 as an example. If the noise level of is known, one efficient tool for selecting the proper regularization parameter is the well-known Morozov discrepancy principle [16]. To adaptively determine the regularization parameter, a model function is brought in [17]: To find a solution of , (9) is rewritten as When → ∞, it obviously shows that the solution of the minimization problem is = 0, and then = (1/2)‖ ‖ 2 2 is obtained easily. The other two variables and can be determined by the equations ( ) = ( ), ( ) = ( ). To solve the parameter iteratively, the balancing principle [18] is introduced as where > 1 controls the relative weight of the two terms. Equation (12) can be written as which is a fixed point iteration. +1 is calculated by the formula Although the balancing principle behaves well in the inverse problem model, there is no direct way introducing the method to the dictionary based algorithm. The regularization 4 Computational and Mathematical Methods in Medicine term has a minimum value greater than zero, which leads to the derivation as follows: Therefore, the strategy that determining the regularization parameter adaptively accords to the last iterative result is not reasonable. We should look for a selecting strategy which can determine the proper value of the regularization parameter by making an analysis of the projection data.

Weight Modification of the Objective Function.
In order to find out an applicable selecting model of the regularization parameter, we reconsider the minimizing problem (5) and the updating formula (6). According to the former work, the data fidelity term is replaced with a separable surrogate [15] ( ; ) And one convenient choice is By making use of the surrogate, (5) becomes a separable form where ( ; ) and ( ) both can be represented as the sum of a series of quadratic functions, whose variables are ( = 1, 2, . . . , 2 ). After some variable exchanges, (19) can be rewritten as From the above, the image updating formula is just the same as (6); the quadratic term coefficient only depends on the system matrix A and the statistical weight . We make a weight modification on the regularization term By eliminating the constant term, the image reconstruction process is equivalent to solving the following optimization problem: = arg min ( ; ) +̃( ) which can be solved iteratively by As shown in (23), determines the relative impact on the updating image by the data fidelity term and the regularization term, respectively.

Evaluation Model of Regularization Parameter.
Before developing the evaluation model, some discussions about the reconstruction result are displayed firstly. Once a value of is selected randomly, by solving (22) iteratively, it infers that the relative error of the data fidelity term is as The relative error depends on the phantom image, the noise level, the regularization parameter, and so on. When it comes Computational and Mathematical Methods in Medicine 5

Image Reconstruction
Initialize 0 , D 0 , 0 , and = 0, is determined in former step. While the stopping criterion is not satisfied, do (1) Implement the OSC algorithm for acceleration; (2) Extract patches from the intermediate image ; (3) Update the dictionary D +1 by K-SVD algorithm; (4) Update the sparse coding +1 by OMP algorithm; (5) Update the image +1 by (23), = + 1; Output the final reconstruction. to the situation that the regularization parameter is infinite as → ∞, (23) and (24) will be like the following form: It is naturally derived that the relative error → ∞ increases with the increment of the noise level in projection domain. In addition, it has been mentioned above (in Section 3.1) that should be increased with the noise increment. Therefore, the proper has a monotonous relation with the parameter → ∞ . Since → ∞ can be easily determined by operating the reconstruction algorithm based on dictionary learning once with → ∞, the proper value of can be calculated if a reasonable function as * = ( → ∞ ) can be found.
With the help of a series of tests, the relationship between * and → ∞ is fitted by a piecewise quadratic function as follows: * = 1.
Finally, taking ADSIR as an example, the workflow of the developed algorithm is exhibited in Algorithm 1. In addition, the ordered subsets convex (OSC) algorithm [19] is utilized as an acceleration of the convergence.

Experimental Results and Discussion
To make the evaluation of the regularization parameter possible, the developed algorithm improves ADSIR with a modified weight of the regularization term while the weight is adaptive to the data fidelity term. So the proposed algorithm is named adaptive weight regularized ADSIR (AWR-ADSIR). In this section, a series of reconstruction experiments are exhibited to validate that the regularization selecting principle in AWR-ADSIR is practical. The simulation numerical phantoms are Shepp-Logan phantom, human head slice image, and human abdomen slice image. The Shepp-Logan phantom is a numeric phantom with pixels intensities ranging from 0 to 1. The sample images of human head slice and human abdomen slice are the FBP reconstruction results based on full-sampling scanning data, which are obtained from our collaborator. All of these phantom images are of 256 × 256 pixels presented in Figure 1. The scanning data are simulated as an undersampling situation with different noise levels. Firstly, different regularization parameters are selected to demonstrate that the one chosen by the algorithm leads to the best reconstruction result. Secondly, by comparing the quality of images reconstructed by diverse algorithms, which are SART, GPBB, ADSIR, and AWR-ADSIR, it can be confirmed that AWR-ADSIR is of remarkable performance among these algorithms. Finally, the selecting principle is used in the GDSIR model, proving that it also works. All the algorithms above are coded in MATLAB and run on a dualcore PC with 3.10 GHz Intel Core i5-2400 and 4 GB RAM.

Comparison of Different Regularization Parameters.
In the following experiments, all the parameters except the regularization parameter keep invariant for the same phantom with the same projection noise level. Three values of are selected, of which one is calculated by the proposed algorithm, another one is multiplied by 0.1, and the third one is multiplied by 10. The distance from the X-ray source to the center point of the phantom is twice the length of the image edge. The iteration of the algorithm is stopped when the relative error err = | − −1 |/ is less than a stopping value ( is calculated by (24)).
To compare the difference between different selections of the regularization parameter, the human abdomen slice image is tested as an example. The projection data are simulated by 180 views of 2 ∘ step length over a 360 ∘ range, and 512 detector elements are distributed in fan-beam geometry covering the phantom. The noise levels added to the projection data are 0.0% and 0.1% Gaussian random noise. The results are displayed in Figures 2 and 3. For the reason that biomedical images are often observed by a proper window to find more details, the images are displayed with a window [−160, 400] HU. The difference between the reconstructed image and the phantom image is displayed by a window [−90, 90] HU.
There are two criterions to evaluate the reconstructed image. One is the normalized mean absolute deviation (NMAD), defined as The other one is the signal-to-noise ratio (SNR), defined as The values of the two criterions are presented in Table 1. Comparing the results with the same noise level of the situation and the 10 situation, the NMAD of the situation is smaller and the SNR of the situation is larger mostly, which proves that the image reconstructed by choosing the regularization parameter as is more close to the sample image. When it comes to the 0.1 situation, although the values of the two criterions are a little better, there are some artifacts appearing in the reconstructed image, leading to a decline of the image quality. In the middle column of Figures 2 and 3, some horizontal line artifacts appear in the images (regions D, E, and F in Figure 2 and the ellipse regions in Figure 3). It seems that there are more horizontal artifacts in the 0.0% noise level image. The probable reason is the noise added to the projection data since the noise covers the inconspicuous artifact in the 0.1% noise level image. So what is the reason for the fact that the NMAD and SNR of the 0.1 situation are better? One reasonable explanation might be the smoothing effect of the dictionary learning algorithm. When the iterative image is updated by (6) or (23), the sparse constraint added by dictionary learning method smoothes not only the noise but also the margin details. This effect becomes more significant when the regularization parameter is becoming larger. Therefore, the difference between the reconstructed image and the sample image in the margin area becomes more obvious, which is displayed in the left and right columns of Figure 3. By discovering this effect making the criterion worse, future work should be devoted to improving the reconstruction algorithm based on dictionary learning in order to smooth the noise and preserve the margin details meanwhile.  Tables 2 and 3, the quality of all the reconstructed images is decreased with the noise level increasing. Among these four algorithms, SART generates the worst results. GPBB behaves well when the noise level is very low but when   ADSIR is empirically selected according to [12]. The fact that the image quality and comparing criterions of ADSIR and AWR-ADSIR are mostly the same proves that the parameter selecting model in AWR-ADSIR is practical and efficient. In addition, the marginal details appearing in Figures 5 and 7 of algorithms ADSIR and AWR-ADSIR indicate the smoothing effect on the margins again. Since the high-contrast edge is smoothed by the algorithm, the structural boundaries appear in the difference image in Figure 7. Obviously, when the empirical regularization parameter is unknown, the ADSIR algorithm costs a large amount of time to determine the proper value by repeatedly operating the iterative process (usually more than ten times) while the AWR-ADSIR algorithm only operates the iterative process twice as Algorithm 1 shows. The model indeed reduces the time consumption to determine the value of the regularization parameter.

Adaptive Weight Regularized GDSIR.
The last experiment is applying the parameter selecting strategy into the GDSIR model. The dictionary displayed in Figure 8 is learned from the overlapping patches of the original image in Figure 1. The reconstruction process of AWR-GDSIR is almost the same as AWR-ADSIR except that the dictionary has been constructed in advance and dose not change during the reconstruction process. The result shown in Figure 8 indicates that the proposed model is also suitable for the general dictionary condition.

Conclusion
In most optimization problems, the determination of the regularization parameter is still a problem. In this paper, aiming to determine the regularization parameter of the algorithm based on dictionary learning, one model function, whose independent variable → ∞ can be calculated by the known projection data, is proposed depending on some modification on the objective function. When compared to some other algorithms, the images reconstructed by AWR-ADSIR and ADSIR are of similar quality, better than the one reconstructed by SART, and the proposed algorithm is much more robust to noise than GPBB. This indicates that the modification of the objective function does not degrade the performance of ADSIR. What is more, the parameter selection model is demonstrated to be rational by the fact that the image quality of the situation is better than the ones of 0.1 and 10 situations. However, when some other parameters (like the scale of the dictionary, the scale of the patch, and so on) change, the model function might result in some difference. However, it still works to look for the function relationship between these two parameters since the monotonous relation between → ∞ and remains. By validating the proposed selecting principle, the smoothing effect on image margins is discovered. Our future work will focus on improving the dictionary learning method, with the expectation to maintain the smoothing