Convolutional Neural Network Combined with Half-Quadratic Splitting Method for Image Restoration

the


Introduction
Image restoration is one interesting issue in low-level computer vision [1][2][3]. Generally, image restoration is to restore the potential clean images from the degraded observation images. Loading of different degradation matrices for the clean images forms the image restoration issue to be solved. Since image restoration is one ill-posed problem with numerous solutions, a prior (or regularization) method is required to restrain the solution space [4,5]. The maximum a posteriori (MAP) approach fully considers the image prior knowledge, and it is based on Bayesian perspective to convert the original issue to the issue of optimal solutionx: where pðy | xÞ represents likelihood probability and pðxÞ represents the prior probability of the clear image and is irrelevant to the degraded image. The Equation (1) can be further corrected tô x = arg min where ð1/2Þky − Hxk 2 2 is the fidelity term and H is the degraded matrix which describes the image degradation process. ΦðxÞ is the regularization term, which represents the image prior information used to restrain the final solution and convert the ill-posed problem to a good state problem. λ is the trade-off parameter between the fidelity term and the regularization term.
Usually, image restoration is the solution Equation (2) and is generally divided into two categories: model-based optimization and discriminative learning. The model-based optimization approach is to solve directly Equation (2), but these algorithms require a large quantity of iterative computation and greatly reduce the computation efficiency. The discriminative learning approach keeps optimizing the loss function in a number of training sets including degradation images to get the prior parameter Θ. The final target is the network output, and the target distance is the minimum [6][7][8][9][10], i.e., min Θ ℓðx, xÞ whose constraint condition iŝ x = arg min As shown above, we can conclude that the most obvious difference between the model-based optimization approach and the discriminative learning approach is that the model-based optimization approach can deal with flexibly various kinds of image restoration tasks by loading different kinds of degradation matrices H, while the discriminative learning approach requires different degradation image training sets to finish the restoration tasks. For example, the NCSR (one model-based optimization method [11]) algorithm can freely perform the tasks of image denoising, deblurring, and super-resolution, whereas the discriminative learning approach must be designed separately from the three tasks above, like MLP [12], DCNN [13], and SRCNN [14] separately executing these tasks. As for the particular task like deblurring, the model-based optimization method (e.g., IDDBM3D [15] and NCSR [11]) can well handle the degraded images loading different fuzzy kernels while the discriminative learning approach (e.g., MLP [16]) must train different learning models for different fuzzy kernels.
The discriminative learning approach sacrifices the flexibility, but the trained network model can have higher computation efficiency. With the gradual increase of GPU performance, the computation time can be further reduced. These two image recovery methods have their own merits and demerits, and it would be promising to combine their merits. Luckily, latest research suggests that with variable splitting techniques, like the alternating direction method of multipliers (ADMM [17]) and half quadratic splitting (HQS [18]), can handle separately the fidelity term and regularization term [7], which makes it possible to combine the network model trained with discriminative learning approach and the model-based optimization method to better handle the image restoration task. This paper is aimed at training two sets of rapid and effective CNN denoising models (to handle separately the gray image and colorful image), where the latest CNN technologies like ReLU activation function [19], batch normalization [20], residual learning [21] and dilated convolution [22] are also adopted to get better image recovery performance. The network model is combined with the HQS method so as to provide powerful image prior information before the model-based optimization method is adopted. Besides, two sets of CNN denoising models are taken as one model to be inserted to the model-based optimization method to solve more widespread image restoration problem (e.g., image deblurring and single image super-resolution). Final experimental results show that our algorithm performs well in both image restoration flexibility and computational efficiency, and the restoration can reach or approximate the latest advanced algorithm level.

Related Work
In the past, there have been many attempts to apply the denoiser trained in the model-based optimization method to other image restoration tasks. Some researchers have proposed an iterative decoupling image deblurring method based on Nash equilibrium derivation (IDDBM3D) [15]. Some people also proposed a method for single-frame image super-resolution (SISR) with a priori of CBM3D denoiser [23]. Through iterative update back projection and CBM3D denoising steps, the PSNR of this method is better than SRCNN [14]; It is further proposed that the BM3D denoising algorithm can be regarded as a priori and then integrated into the image deblurring scheme by using the improved Lagrange method [24]. Later, a plug and play prior framework based on the ADMM method was proposed [25], which adopted an iterative scheme similar to that in Reference [15]. It can be noted here that before the above method was proposed, the similar concept of plug and play was proposed in Reference [5], in which the half-quadratic splitting method (HQS) can be used in a variety of subproblems of image restoration. Some researchers have also proposed a denoising autoencoder based on a multichannel model and applied it to the restoration task of single-channel gray-scale infrared image [26]. In addition, there are also methods to apply the half-quadratic splitting method to image super-resolution task and achieve good results [27]. Almost every method mentioned above indicates that fidelity and regularization can be decoupled, so that the existing denoising model can solve a wider range of image restoration tasks.
It can also be found that as long as the preconditions for the fidelity term and regularization term are well separated, then denoising priori can be inserted into the model-based iterative optimization algorithm through certain mathematical methods. For this reason, these iterative methods are usually decomposed into a denoising problem and other subproblems. In the next chapter, we give the variable separation algorithm of this paper in detail, that is, the halfquadratic splitting method (HQS). Although HQS can be regarded as a general way to deal with different image restoration problems, the denoiser can also be applied to different image restoration tasks in advance.

HQS Method
The variable splitting technology can combine the advantages of two kinds of image restoration algorithms. This technology separates the fidelity and regularization terms, and the separated regularization term only corresponds to the subproblem of image denoising [28][29][30][31][32][33].
With the HQS method in variable splitting technology, after one auxiliary variable z is introduced, the Equation (2) can be rewritten tô

Journal of Sensors
The constraint condition is z = x; however, the original HQS approach is to solve the following problem: where μ is the penalty parameter in the regularization term, which keeps decreasing in the iteration of solution. As for the solution of Equation (5), z can be taken as constant, then similarly, when x is taken as constant: As shown in Equations (6) and (7), the HQS method has successfully separated the fidelity term ð1/2Þky − Hxk 2 2 and regularization term ΦðxÞ, which divides the original big problem into two small individual problems. As for Equation (6), the method of solution derivative equal to 0 can be used. x k+1 is also the solution of the following It is easy to get: As for Equation (7), it can be changed to the form of Equation (10) as follows: According to the Bayesian probability, the Equation (10) can be explained as z k+1 is the result of denoising of Gaussian denoiser with the noise level of ffiffiffiffiffiffiffi λ/μ p for the image x k+1 with the noise. According to this theory, a series of Gaussian denoisers after CNN training can utilize the conclusions of this chapter to make the image restoration in other directions. In order to represent this point more vividly, the Equation (10) can be changed to the form as follows: The Equations (9) and (11) suggest that the fidelity term and regularization term have been separated successfully and the regularization term corresponds only to the subproblem of image denoising. The result is such that we can integrate the two trained sets of denoisers to the model-based optimization method to solve different kinds of image restoration problems.

Key Technology in CNN.
Due to strong ability of CNN to excavate the image characteristics, we have reason to believe that better results will be achieved when CNN is used to remove the image noise. However, there are many problems when directly classic CNN structure (e.g., LeNet, AlexNet, and ZF-net) is used to make image denoising: firstly, how to select the activation function; secondly, if the pooling layer is added to the network structure, the image after the network can be compressed into very small and lot of information will be lost, which leads to more complexity of the image restoration, so how to increase the receptive field while not changing the image size; thirdly, there are many parameters in the whole network; it takes a large amount of time to train so many parameters normally, so to speed up the whole training process. In the following text, we will give details about the network design.
4.1.1. Selection of Activation Function. The introduction of activation function is to add the nonlinear factor. Although the sigmoid function has been successfully applied to many network structures, it also has its own drawbacks as found in recent years, which results in gradual disappearing of gradient flowing to this layer of network and greatly reduces the training speed. The other kind is called ReLU activation function (i.e., rectifier linear units), and it just solves this problem, whose representation is as follows: After the input is less than or equal to 0, the output is 0, which is equivalent to the building of one sparse matrix. This feature can remove the redundancy in the data and retain the characteristics of the data. In the continuous network computation process, it changes as to how it tries to represent the data characteristics with one matrix most of 0. Due to the sparsity, this method runs fast and effectively.

Dilated Convolution.
The existence of the pooling layer leads to image shrinking and loss of lot of information. The convolution method of dilated convolution is introduced. The basic idea of the dilated convolution is that while keeping the image size unchanged, the receptive field does not become smaller, and huge computation amount is not increased. Specifically, the convolution kernels originally densely arranged become somewhat fluffy, the number of points to be calculated in the convolution kernels is not changed, and the spare positions are all filled with 0. The receptive field can keep increasing while the part of the convolution kernels that really needs to be calculated remains unchanged, and it is always 3 × 3.

Batch Normalization.
After the ReLU activation function above is adopted, the problem of saturation gradient disappearing is solved, but many factors slowing the training speed still exist in real training. In the network training, a continuous change of different layers of parameters can cause change in each layer of input distribution of the network due to the backpropagation. On the other hand, the training has 3 Journal of Sensors to adapt to such change again, which reduces the training efficiency. The batch normalization (BN) method is used in this paper to solve the problem. After the mean value and variance of the whole data are obtained, the data are normalized so that the training of each layer of the network is no longer suitable for input change, which greatly improves the training efficiency.

Residual
Learning. There are two kinds of learning methods in the neural network. One is to directly learn the mapping from the image y including noise to the potential clear image x, and the other is to firstly learn the noise in the image and then solve indirectly the potential clear image. The second learning method is called residual learning. If one mapping is close to the identity mapping, the use of residual network can make the optimization process easier. Clearly, the process of image denoising is closer to one identity mapping, particularly when the noise level is lower. So the residual learning is added to our network model, which can accelerate and stabilize the training process together with BN mentioned above and improve the denoising ability of the network model.

4.2.
Proposal of Network Model. The practice proves that the CNN structure is more useful for image feature extraction and is powerful in performance. In particular, parallel computation can be based on GPU during the network training, which greatly improves the training efficiency. Based on such trend, we use CNN to restore the image. And ReLU activa-tion function, dilated convolution, batch normalization, and residual learning above are applied to the network model to get better image restoration ability. The network model in this paper is shown in Figure 1.
As shown in Table 1, between the input layer and the output layer are collectively called hidden layers. The traditional 3 × 3 model is used for single convolution kernel of each layer, and the convolution step is 1. Meanwhile, in order to solve the boundary effect, the zero-filling method is adopted. In order to improve the training efficiency, the practice in this paper is to cut the image to a size of 35 × 35, so both the input layer and the output layer have images whose size is 35 × 35 individually.

Image Denoising.
Since residual learning is added in the network, the residual function of the training model is where fðx i , y i Þg N i=1 represents N pairs of clear and noise images; f is the output of network model, the difference between the predicated value and the actual value can be represented by f ðy i ; ΘÞ − ðy i − x i Þ; and Θ is the parameter to be trained in the model. N is the number of minibatch input images.

Journal of Sensors
After determining the loss function, it is the training network. The training data sets adopted in this paper are 400 images in Berkeley segmentation dataset [34], 400 images in ImageNet database [35], and 4744 images in Waterloo Exploration Database [36]. We cut all the images to pieces of 35 × 35 and select randomly 256 × 4000 pieces for training. The solver adopts Adam (adaptive moment estimation), uses the default hyperparameters, and the minibatch size is selected to be 256. In order to handle different levels of noise, we train a series of denoising network models (also called denoisers) targeting different noise levels. The noise level σ is from 0 to 50, the step is 2, and there are 25 denoisers in total. Our experiments have been implemented in MATLAB R2017b with MatConvNet package [37], running on PC with Intel Core i7-7700HQ CPU, 2.80 GHz, NVIDIA GeForce GTX 1060 GPU. The operating system is Windows 10. It takes about six days to train two groups of denoisers (used separately for gray and color images).
In order to better explain the algorithm performance in this paper, two model-based optimization methods (i.e., BM3D [38] and WNNM [39]) and five discriminative learning methods (i.e., TNRD [34], MLP [12], EPLL [5], DnCNN [40], and FFDNet [41]) are selected for comparison. On the experimental results of color image denoising, we also choose the CBM3D algorithm, DnCNN, and FFDNet models to compare with the method proposed in the text. The experimental data are shown in Tables 2 and 3; our method achieves the level of the best denoising model FFDNet in gray and color image denoising, and with the increase of noise level, the advantages of our method become more obvious. Moreover, the number of our network layer is far less than the FFDNet model, which has the advantage of easier network training and higher efficiency. In terms of computational efficiency, BM3D and WNNM algorithms use CPU operations, while TNRD, MLP, and our algorithms use GPU parallel computing. Since our CNN structure was originally designed for image denoising, the algorithm has extremely high operational efficiency in the denoising subproblem. Whether it is gray or color image denoising, usually the operation time is kept within 0.1 s. Generally speaking, both the gray and color image denoising of the algorithm proposed in this paper are excellent. In particular, we have showcased part of the experimental results in Figures 2 and  3 so as to feel vividly the performance of each algorithm. From the local enlarged image, we can find that our method does not make the denoised image become too smooth, and the restoration of detail texture is closest to the original image.

Image
Deblurring. Generally, the production of blurred image can be modeled as one blurred kernel applied to the original clear image and added with white Gaussian noise with the level of σ. In order to evaluate the method proposed in this paper, three kinds of blurred kernels are adopted in the experiment: one is the normal Gaussian blurred kernel with the standard deviation of 1.6 and size of 25 × 25; the other two are the first two kinds in eight kinds of actual blurred kernels proposed in the literature [42] (the first kernel and the second kernel size are 19 × 19 and 17 × 17, respectively). In addition, the normal three levels of white Gaussian noises are added, which are σ = 2, σ = 2:55, and σ = 7:65, respectively.
According to the Gaussian denoiser trained in the last section, we design the following image deblurring experiment process as shown in Algorithm 1.
During the iterative process above, the noise level of the denoiser is one gradual decreasing process. The number of iteration times set by us is 30. One geometric progression of 30 numbers with the scope of 50 to 0 (in descending order) is formed, and then, 25 denoisers are mapped to 30 according to the principle of proximity (the denoisers have repetition), so the new model is not loaded in each time of iteration. Besides, since there is inverse to solve matrix in Equation (9), we use FFT to speed up the solution.
The method proposed is compared with four other kinds of algorithms. IDDBM3D, NCSR, and EPLL [5] are modelbased optimization methods, and MLP is a discriminative learning method. The test data sets are Set3G and Set3C (including gray and color images separately). As shown in Table 4, the method proposed in any blurring condition performs well, particularly in deblurring of color images. Among them, IDDBM3D, NCSR, and MLP algorithms tend to make the image edge smoother and prone to color artifacts. In contrast, our method can better restore the image clarity and naturalness. As shown in Figures 4 and 5, the deblurring effect of each algorithm can be seen directly. At the same time, we also  5 Journal of Sensors recorded the running time of different algorithms. It can be seen that the algorithm proposed is the most competitive in terms of computational efficiency.

Single Image Super-Resolution.
In reconstruction of single image super-resolution (SISR), it can be modeled as fuzzy treatment of a high-resolution image and then lowresolution image is obtained after downsampling. Based on the obtained model of the low-resolution image, the superresolution reconstruction of a single image can be divided into two subproblems [23,43], i.e., iterative update back projection solution and denoising. Combined with the HQS  Journal of Sensors method mentioned above, as shown in Equation (14), we can use the iterative back projection method to solve the Equation (9): where ↓ sf represents the downsampling with the scale factor of sf , ↑ sf represents the bicubic interpolation with the scale factor of sf , and α represents the iterative step. Similarly, the denoiser model can be used for solution.
We have designed the following single image superresolution reconstruction process as shown in Algorithm 2.
The n in the above process is the number of internal iterations in order to speed up the convergence of iteration. The n set in this paper is 5, m is 30, the step length α is 1.75, and the noise level of the denoiser attenuates from the exponent 12 × sf to sf .
We also select six kinds of algorithms for comparison: one is the model-based optimization method NCSR, another is the method based on denoising prior SRBM3D [40], and the other four are discriminative learning methods (i.e., SRCNN, VDSR [44], LapSRN [45], and SPMSR [46]). Three different ↓ sf are adopted in the experiment, i.e., bicubic downsampling when sf is 2 and 3 [14,47], and Gaussian blurred kernel with the standard deviation of 1.6, size of 7 × 7 and sf of 3 [11]. The experiment data are shown in Table 5. The data sets are Set5 (including five color images) and Set14 (including 14 gray and color images) [48]. In terms of data, the method in this paper does not reach the best performance in every aspect, but its superiority can be fully represented from another perspective, specifically.
When ↓ sf is bicubic, VDSR and LapSRN can be called the state-of-the-art algorithm. The algorithm performance in this paper can be close to these two methods. When ↓ sf is in Gaussian condition, as suggested in Figure 6, the performance of three kinds of discriminative learning methods are greatly limited since no such model is trained in advance, so the whole model has to be trained again as to how to keep good reconstruction results. According to Equation (14) and the experimental data, the method proposed in this paper is able to adapt to different kinds of ↓ sf models when the whole model does not need to be trained again and the performance can be close to the most advanced algorithm level. The results show that the proposed image restoration method based on depth CNN denoising prior can perform super-resolution on degraded images only by adjusting fuzzy kernel and scale factor without training, while SRCNN, VDSR, and LapSRN need additional training to deal with these situations. This can fully demonstrate that the proposed method is more flexible than other discriminative learning methods. In addition, the algorithm proposed in this paper is still the fastest.

Conclusion
In this paper, a series of Gaussian denoisers are obtained through CNN learning, and the denoisers are integrated as modules into the model-based optimization method by combining variable splitting techniques (i.e., the fidelity term and regularization term are separated in the original problem), which greatly improves the flexibility of discriminative learning method in solving different image restoration problems. 1. Select one particular blurred kernel to apply to the original clear image x, and add the Gaussian noise of particular level σ to produce the blurred image y 2. Determine the parameter λ according to the noise level σ 3. for i =1:n (n is the set iteration times) 4.
Determine the relevant level of denoiser according to the iterative value so as to determine the value of parameter μ 5.
Solve the Eqn. (9) with FFT and get the result z 6.
Load relevant level of denoiser and take the z obtained from last step as the model input to get the model output residual, get the difference value z = z − residual and update the value z 7.end 8.Calculate the evaluation data PSNR. Algorithm 1. Image deblurring In order to better explain the effect of the method proposed in this paper, several kinds of most advanced algorithms are selected for comparison. The experiment results show that the denoisers obtained with CNN learning have good image prior knowledge, which can solve well other image restoration problems when it is applied to model-based optimization method, i.e., image deblurring and single image super-resolution. Moreover, compared with the most advanced algorithms, the proposed algorithm is also competitive enough.
Although the proposed method integrates the advantages of model-based optimization method and discriminant learning method, there are still many areas worth studying. For example, this paper is for image nonblind deblurring; i.e., the blurred kernel is known, and the blind deblurring can be further studied; during denoiser training, the number