X-Ray Breast Images Denoising Method Based on the Convolutional Autoencoder

Considering the potential risk of X-ray to patients, denoising of low-dose X-ray medical images is imperative. Inspired by deep learning, a convolutional autoencoder method for X-ray breast image denoising is proposed in this paper. First, image symmetry and fip are used to increase the number of images in the public dataset; second, the number of samples is increased further by image cropping segmentation, adding simulated noise, and producing the dataset. Finally, a convolutional autoencoder neural network model is constructed, and clean and noisy images are fed into it to complete the training. Te results show that this method efectively removes noise while retaining image details in X-ray breast images, yielding higher peak signal-to-noise ratio and structural similarity index values than classical and novel denoising methods.


Introduction
Medical images are frequently utilized in modern clinical diagnosis and therapy to aid in disease diagnosis and treatment evaluation, among other things. Tey provide a crucial foundation for clinical diagnosis and treatment. Unlike natural photos, medical images generate a lot of signal-related noise during the creation process; therefore, the contrast is lower and the noise is more visible [1]. As a type of medical imaging, X-ray is of great value in early breast cancer detection. Patients should receive mammography with the lowest possible radiation dose [2]. Te most common way to reduce the radiation dose is to reduce the X-ray fux by decreasing the operating current and shortening the exposure time of the X-ray tube. However, the weaker the X-ray fux, the more-noisy the reconstructed image will be. Noise can blur the image, obscure important information in the image, and make disease analysis and diagnosis more difcult. Terefore, it is important to study how to remove the noise from X-ray images.
In the past few decades, many scholars have been proposing new image denoising algorithms as image noise has been intensively studied. Traditional image denoising models can be classifed into four categories based on spatial domain, transform domain, sparse representation, and natural statistics. Among them, the representative methods are the median fltering method based on the spatial domain [3], which ignores the characteristics of each pixel, and the image will be more seriously blurred after denoising; the BLS-GSM [4] based on the transform domain lose some useful information while denoising; the NLSC [5] based on the sparse representation has a long computation time and low denoising efciency. Te natural statistics-based BM3D [6] can only flter a specifc noise. Although these algorithms are efective in removing noise, they often inevitably result in loss of texture information in the medical image and excessive smoothening of edges, which adversely afects diagnosis. It is still a challenging problem to remove the noise while retaining the detailed information in the medical image.
With the improvement of hardware computing power, the powerful learning and ftting capabilities of neural networks have shown great potential in image processing [7]. For example, convolutional neural networks (CNNs) have been applied to image classifcation, target detection, image segmentation, image denoising, etc. Currently, many scholars have used CNN to denoise medical images such as low-dose CT images [8,9], OCT images [10], and MRI images [11,12], ultrasonography images [13], and so on. Kim et al. improved the BM3D method by proposing a method to assign diferent weights to each block according to the degree of denoising [14]. Although the detailed information of the image can be well recovered, the Gibbs efect will be produced after denoising, and the artifacts will be produced which cannot be eliminated. Guo et al. proposed a median fltering method based on adaptive two-level threshold to solve the problems of low contrast and blurred boundary of traditional weighted median flters [15]. Tis method has a very good denoising efect on CT images of COVID-19, but it is not suitable for denoising mixed noise, which is easy to cause image blur and discontinuity. Jia et al. proposed a pyramid dilated CNN [16], which uses dilated convolution to expand the network's receptive feld and obtain more image details. Tis method has good denoising efect on both gray image and color image. However, the dilation rate needs be adjusted according to the size of the object in the input image to avoid image discontinuity. Huang et al. proposed a denoising GAN based on the U-Net discriminator [17], which can not only give feedback to each pixel in the image but also use U-Net to focus on the global structure at the semantic level. Tis method has a very good denoising efect on low-dose CT images, but the model is complicated and difcult to train. However, there are few studies on the denoising of X-ray breast images. Tis inspired us to use a CNN-based denoising model to remove noise from X-ray breast images and improve the quality of X-ray breast images. Tis paper proposes an X-ray breast image denoising method based on a convolutional autoencoder. First is by expanding the public breast dataset of Mammographic Image Analysis Society MiniMammographic Database (MIAS) and then centrally cropping the key parts to intercept the data containing key medical information while further expanding the number of samples; second is by adding Gaussian noise and salt and pepper noise to the sample data to generate noisy images and then combining the clean and noisy images into a dataset; and the fnal one is building an autoencoder denoising model based on CNN and completing the training. Te results of the experiments show that the method can efectively remove various levels of blending noise in medical images while retaining image detail texture. Te following are the main contributions of this paper: (1) We achieved better results than the current denoising methods, with higher peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM); (2) we proposed a mathematical model to simulate blending noise which was implemented by code to add diferent levels of blending noise in medical images; (3) in the MIAS dataset, the performance of this method is tested from the perspective of multiple index analysis, which fully verifes the efciency and superiority of this method; and (4) we completed end-to-end modeling for medical image denoising.

Simulated Medical Image Noise.
In imaging, complex noise sources include Gaussian, impulse, pretzel, and scatter noise [18]. Salt and pepper noise, Gaussian noise, and other types of noise are common in medical images. Because noisefree images are not readily available in clinics, and the network model of proposed method requires high-resolution noise-free images to train the network. As a result, the open dataset is chosen, and the open dataset is a clear clean image free of noise obtained by adding simulated noise to the image to obtain paired training data to train the noise removal network model. In this paper, noise is added to an image by adding noise components to the values of the image's corresponding pixels. Te noise model can be defned as follows: where (h, w, c) represents the pixel point in the image, h represents the image height, w represents the image width, c represents the number of channels, and when c � 1 represents the single-channel gray image, O (h,w,c) represents original images, Y (h,w,c) represents salt and pepper nose data, and G (h,w,c) represents Gaussian noise data.

Generation of Gaussian Noise.
Gaussian noise is a kind of noise that obeys Gaussian distribution, where μ represents the expectation of Gaussian noise data and σ represents the variance of Gaussian noise data. In this paper, the method of adding Gaussian noise to medical images is as follows: where k represents noise intensity, Z (h,w,c) represents denoising images, considering the pixel of images which may over 255 before adding noise, and the pixel of the noisy image is limited to avoid data overfow in the computer, and the restriction is as follows:

Salt and Pepper Noise Generation. Salt and pepper
noise generation is also a common noise in the image, as the name suggests, and pepper represents black, and salt represents white. Salt and pepper noise represents white or black points. Te generation way is as follows: where SNR represents signal-to-noise ratio, which is expressed in the range of [0, 1], W obeys the uniform distribution of parameter [0, 1], Z obeys the 0.1 distribution when P(1) � 0.5 and W ∈ [0, (1 − SNR)] , and the value of pixels in the original image in each channel has a 50% chance of being 255 or 0. When the pixel value is 255, it shows black salt and pepper noise points, and when it is 0, it shows white salt and pepper noise points. In Figure 1, (a) represents the original image, (b) represents the image after adding Gaussian noise, (c) represents the image after adding salt and pepper noise, and (d) represents the mixed image after adding Gaussian noise and salt and pepper noise.

Convolutional Autoencoder
Model. Te concept of autoencoder was frst proposed by Rumelhart et al. [19], which was originally applied to dimension reduction of complex data. Te autoencoder includes encoding and decoding and trains the network through reverse propagation, so that the output is equal to the input [20]. Te encoder can reduce the dimension of data compression, reduce the amount of data, and retain the most critical feature information. Te function of the decoder is opposite to the encoder. Te decoder restores the compressed data and restores the input data through decoding. Te full connection layer of the traditional autoencoder will stretch the data into one dimension, thus losing the spatial information of the two-dimensional image data. Te CNN has strong performance in extracting spatial feature information of images, which can compensate for the loss of spatial information when an autoencoder extracts features. Based on the traditional autoencoder, convolutional autoencoder combines the advantages of CNN and realizes the deep neural network through the superposition of convolution layer, activation layer, and pooling layer to complete the extraction of image detail features [21]. Tis paper uses the convolution layer, pooling layer, activation layer, and deconvolution layer in the CNN to construct an encoder and decoder. Encoders and decoders can be defned as where W i and b i represent the weight matrix and bias of each convolution layer, respectively; i represents the i convolution layers; f and h represent the encoder and decoder, respectively; X represents the input medical image data; M Mathematical Problems in Engineering represents the feature information of input image data after feature extraction by the encoder; and N represents the medical image data generated after the feature information is decoded.

Loss Function and Optimization
Algorithm. Te mean squared diference loss function is used in this paper to compute the diference between the denoised noisy image and the clear image. Te mean squared diference of image reconstruction in this paper is as follows: where M represents the number of training samples; θ represents the parameters in the network model; w represents weight; and b represents a bias. f p (y i , θ) represents the output image of the denoising model, and f i denotes the noise-free image corresponding to the output image of the model. When the mean square diference is smaller, the reconstruction efect is better. To make the model converge faster and better [22], the learning process of the model is optimized using the Adam algorithm, which is based on the gradient descent method but difers from the traditional stochastic gradient descent algorithm that can make the model converge faster and better. To update all the weights, stochastic gradient descent uses a single learning rate that does not change during the training process. In contrast, the Adam algorithm calculates the frst-order moment estimates and second-order moment estimates of the gradient to design independent adaptive learning rates for diferent parameters, which has signifcant advantages for model optimization of large-scale datasets. Te Adam algorithm's optimization procedure is as follows: where t is the current time step; F is the optimization function of the model; g t is the gradient of the optimization objective function F at the time step t; θ is the model parameter vector; and θ t i is the value of the frst element θ i in the parameter vector θ when t time step. α is the step length; m is the frstorder moment estimation of gradient; β 1 is the exponential decay rate of m; v is the second-order moment estimation of gradient; β 2 is the exponential decay rate of v; and β 1 , β 2 ∈ [0, 1); ε � 10 − 8 equation (6) represents each update of the parameter vector θ. However, as the neural network is trained, the distribution of the parameters of the next layer will change from the parameters of the previous layer. Tis causes a constant change in the distribution of parameters, which slows down training. Tis issue can be addressed by incorporating a normalization layer into the network model and performing normalization for each small batch of samples [23]. Suppose the input of a layer in the model is x � (x 1 , x 2 , . . . , x n ) and the set of samples is B � (x 1 , x 2 , . . . , x m ). Te batch normalization method is as follows: where x (n) is the nth dimension of the input x; μ B is the sample set B's expectation; σ 2 B is the variance of the sample set B; x (k) is the input's regularization result; y (n) is the batch regularization result of x (n) ; and c (n) and β (n) is the parameters to be learned. Te network model species' parameters are then continuously updated using backpropagation and optimization algorithms to produce the best denoising model. Figure 2 depicts the overall fow of the denoising method. To begin with, a public dataset is chosen for data preprocessing. Te MIAS dataset with 322 high-defnition 1024 * 1024 mammographic images are used in this paper. Te MIAS dataset is then cropped, rotated, and fipped to increase the volume of training data to 3000 images with 336 * 336 resolution. Second, simulated noise is added to the cropped and expanded clear image data, and the noise images and clear images are combined into a dataset that can be passed into the model, with the samples in the dataset divided into an 8 : 1 : 1 training, test, and validation set. Te convolutional autoencoder neural network model is then built, and the training set is fed into the model to complete the model's training. Te validation set is then fed into the trained model, which checks the model's denoising efect and outputs the model parameters. Finally, the model is loaded and the test set is fed into it to test the model's denoising efect. Figure 2 depicts the process of the denoising method used in this paper: frst, the public dataset is converted into a dataset, and then the dataset is divided into a training set, a validation set, and a test set. Te model training is completed with the help of the training set. Te validation set is used to determine whether the trained model is overftting. Te test set is used to validate the model's denoising efect. Finally, the model is then saved.

Model Structure.
Te model's goal is to create an endto-end mapping of noisy image noise to a clear image. Te process of converting a noisy image to a clear image is known as image denoising. Tis paper's convolutional autoencoder denoising network is divided into two parts: encoder and decoder. Te encoder compresses and downscales the input noisy image through feature extraction via convolution before transforming it into an abstract mathematical description. By deconvolution, the decoder converts this abstract description into an image. As the cost function is reduced, the decoder generates images that are increasingly similar to the target image. Te structural model of convolutional autoencoder proposed in this paper is shown in Figure 3, where the input is the noisy image and the output is the image with completed noise reduction processing. Te model contains a total of 12 convolutional layers, the frst 6 layers belong to the encoder structure, and the last 6 layers belong to the decoder structure. Te frst four layers of the encoder contain 32 convolutional kernels, the ffth layer contains 64 convolutional kernels, and the sixth layer contains 128 convolutional kernels. Te frst two layers of the decoder contain 64 convolutional kernels, layers 3 and 4 contain 32 convolutional kernels, layer 5 contains 16 convolutional kernels, and the last layer contains 1 convolutional kernel. Te size of all convolutional kernels in the model is 3 * 3. A 3 * 3 convolution kernel has the same perceptual feld after three convolutions as a 7 * 7 convolution kernel after 1 convolution; that is, a small-sized convolution kernel can obtain the same perceptual feld as a large-sized convolution kernel through multiple convolution operations, and the multiple convolution operations of a small-sized convolution kernel can increase the network depth and improve the network's nonlinear ftting ability [24]. Furthermore, the small convolution kernel reduces the number of parameters and improves the model's convergence speed. As a result, small-sized convolutional kernels of 3 * 3 are used in the models developed in this paper. A ReLU layer is added as the activation function to ensure the gradient descent speed and model convergence. After the encoder's third layer, the model adds a maximum pooling layer. Te pooling layer can reduce redundant information while also expanding the receptive feld. Figure 3 depicts the end-to-end medical image denoising model developed in this paper, with the noisy image as the input, and the regenerated image after noise reduction as the output.

Experimental Set-Up
(  [25]. SSIM structural similarity, on the other hand, comprehensively takes into account nonstructural distortions such as brightness and contrast as well as structural distortions such as junction noise intensity and blurring degree [26]. Tis means that in the context of visual perception, SSIM can better refect the image quality. Terefore, in this experiment, we comprehensively considered PSNR and SSIM to evaluate the denoising efect of the model. Te calculation formula is as follows: PSNR � 10 * log 10 MAX 2 MSE , where MAX is the maximum value of pixels in the image, and all the images involved in this paper are 8-bit color depth, so MAX � 255; MSE is the mean square value; x and y are input images; μ is the mean value of the input image; σ is the variance of the input image; and α, β, c are power exponents used to adjust the importance of input images. Since the input image is equally important, here we take the power    index as 1; l represents brightness, c represents contrast, and s represents structure.

Results and Discussion.
To show the proposed method's efectiveness in removing noise from X-ray breast pictures and its benefts in maintaining texture. In this study, we conduct an experimental comparison of the proposed method's denoising performance with that of novel denoising methods. Tese methods include those mentioned in literatures [14], [15], [16], and [17]. In our experiments, we denoise the dataset constructed based on MIAS, and the experimental results are shown in Tables 1 and 2. All methods present visually well denoised results to some degree. When σ � 40, the PSNR of literature [16] and literature [15] are 0.98 dB and 0.12 dB greater than literature [14], respectively. Te PSNR of literature [17] is 0.54 dB higher than literature [16], while the PSNR of the proposed approach in this study is 0.04 dB higher than literature [17]. Te average SSIM values of literature [16] are improved by 0.01 compared to literature [14] and literature [15], and the SSIM of literature [17] is improved by 0.01 compared to literature [16]. Te SSIM of the proposed method in this paper is improved by 0.01 compared to literature [17]. In addition, with the increase in noise intensity, σ � 50, σ � 60, and σ � 70, and it can be seen from the experimental data in Table 1 that the proposed method in this paper performs well on PSNR and SSIM. When SNR � 0.99, the PSNR of literature [16] and literature [14] are 0.42 dB and 0.33 dB greater than literature [15], respectively. Literature [17] is 0.05 dB greater than literature [16], while the PSNR of the proposed approach in this study is 0.19 dB higher than literature [17]. Te average SSIM values of literature [16], literature [17] and literature [14] are improved by 0.02, 0.02, and 0.01 compared to literature [15], respectively, and the SSIM of the proposed method has the same value with literature [17] and literature [16]. In addition, with the decrease in signal-to-noise ratio, SNR � 0.97, SNR � 0.95, and SNR � 0.93, and it can be seen from the experimental data in Table 2 that the proposed method performs well on PSNR and SSIM. We chose two images from the test set to display in Figures 4 and 5 to further confrm the visualization efect of the technique for X-ray image denoising described in this research. In Figure 4, the noise intensity factor is 0.1, the variance of the noisy image is � 0.6, and the SNR is 0.99. In Figure 5, the noise intensity factor is 0.1, the variance of the noisy image is 0.6, and the SNR is 0.93. It is observed that the literature [14] produces a signifcant blurring efect while removing noise and fails to recover the detailed texture information of the image. A weakness can be seen in the harmonious interplay between noise removal and preservation of edge features and texture information. Literature [15] is a highly competitive technique that has shown higher performance in both noise removal and texture information preservation. Te Gibbs efect, which cannot be removed, will be produced by BM3D since it executes the fltering operation in the transform domain, which is comparable to windowing the image signal. After denoising, the Gibbs efect will produce pseudo-textures that resemble scratches. Te clinical diagnosis might be signifcantly impacted by pseudo-texture. Literature [16] is a relatively novel method at present, and the pyramid convolution method expands the perceptual feld without increasing the number of parameters, and thus more image details can be obtained. Figure d shows the denoising results of the method in the literature [16] for mixed noise, which not only removes the noise in the image but also retains the image detail information. Te proposed method obtains a better denoising efect than literature [16], and the image detail texture is clearer. Literature [17] is a U-net based denoising method that has a simple structure and uses a completely diferent feature fusion approach to stitch the features in channels to form thicker features. Figure e shows the denoising results of the method in the literature [17] for mixed noise, and the noise in the image is removed more cleanly. Te proposed denoising method in this paper obtains comparable denoising results to those of the literature [17].
To better show the detailed information about image denoising, we zoom in on the areas marked by red arrows in Figures 4 and 5, as shown in Figure 6. Figures 6(a1)-6(f1) correspond to the part pointed by the red arrow in Figure 4, respectively.
Figures 6(a2)-6(f2) correspond to the part pointed by the red arrow in Figure 6, respectively. From Figures 6(b1), 6(b2), 6(c1), and 6(c2), it can be found that the literature [14,15] cannot produce a good denoising efect on the mixed noise in X-ray breast images, and the denoised images are blurred, and there are artifacts. Figures 6(d1), 6(d2), 6(e1), and 6(e2) show the enlarged details after denoising the noisy images in literature [16,17]. Besides, these two methods can remove the noise in X-ray breast images and retain the details better. Figures 6(f1) and 6(f2) show the detailed information of the X-ray breast image denoised by the proposed method. It can be seen that the denoising efect of the proposed method on X-ray breast images is slightly better than that of the literature [16,17]. By showing the enlarged details, it can be seen that the proposed method has a better denoising efect on X-ray breast images, compared with the current novel denoising methods.

Conclusions
In this paper, the image denoising method is described. In view of the shortcomings of the current traditional denoising methods, a convolution autoencoder neural network denoising model for medical images is proposed by using the powerful extraction ability of the convolution neural network for image information and the learning ability of the autoencoder network structure. A solution to mixed noise in medical images is presented. Te proposed method has the following advantages: (1) Denoising efect is obvious, in the removal of mixed noise in medical images, while retaining the details of the image texture information; (2) model denoising is fast and can complete a large number of medical image denoising tasks in a short time, which has a clinical application value; (3) end-to-end modeling is realized, and the denoised image can be obtained by inputting the noise image without manually adjusting the parameters. In addition, the experiment in this paper is limited by the memory of graphic processing unit. In the future, we can consider increasing the memory of the graphic processing unit and further optimize the model from the total amount of sample data and mini-batch.

Data Availability
Te data used to support the fndings of this study are available from the corresponding author upon request.

Conflicts of Interest
Te authors declare that there are no conficts of interest with any fnancial organizations regarding the material reported in this manuscript.