Neural Architectures for Correlated Noise Removal in Image Processing

The paper proposes a new method that combines the decorrelation and shrinkage techniques to neural network-based approaches for noise removal purposes.The images are represented as sequences of equal sized blocks, each block being distorted by a stationary statistical correlated noise. Some significant amount of the induced noise in the blocks is removed in a preprocessing step, using a decorrelationmethod combined with a standard shrinkage-based technique.The preprocessing step provides for each initial image a sequence of blocks that are further compressed at a certain rate, each component of the resulting sequence being supplied as inputs to a feed-forward neural architecture F X → F H → F Y . The local memories of the neurons of the layers F H and F Y are generated through a supervised learning process based on the compressed versions of blocks of the same index value supplied as inputs and the compressed versions of them resulting as the mean of their preprocessed versions. Finally, using the standard decompression technique, the sequence of the decompressed blocks is the cleaned representation of the initial image. The performance of the proposed method is evaluated by a long series of tests, the results being very encouraging as compared to similar developments for noise removal purposes.


Introduction
There have been proposed a long series of digital image manipulation techniques, general and special tailored ones for different particular purposes.Digital image processing involves procedures including the acquisition and codification of images in digital files and the transmission of the resulting digital files of some communication channels, usually affected by noise [1,2].Consequently, a significant part of digital image procedures are devoted to noise removal and image reconstruction, most of them being developed in the framework represented by the assumptions that the superimposed noise is uncorrelated and normally distributed [3,4].Our approach is somehow different, keeping the assumption about normality but relaxing the constraint that the superimposed noise affects neighbor image pixels in a correlated way.
There are two basic mathematical characterizations of images, deterministic and statistical.In deterministic image representation, the image pixels are defined in terms of a certain function, possibly unknown, while, in statistical image representation, the images are specified in probabilistic terms as means, covariances, and higher degree moments [5][6][7].In the past years, a series of techniques have been developed in order to involve neural architectures in image compression and denoising processes [8][9][10][11][12][13].
A neural network is a massively parallel-distributed processor made up of simple processing units, which has a natural propensity for storing experiential knowledge and making it available for use [14].The neural networks methodology is of biological inspiration, a neural network resembling the biological brain in two respects; on one hand the knowledge is acquired by the network for its environment through a learning process, and on the other hand the interneuron connection strengths are used to store the acquired knowledge.
The "shrinkage" is a method for reducing the uncorrelated Gaussian noise affecting additively a signal image by soft 2 Mathematical Problems in Engineering thresholding applied to the sparse components [15][16][17].Its use in neural network-based approach is intuitively explained by the fact that when only a few of the neurons are simultaneously active, it makes sense to assume that the activities of neurons with small absolute values correspond to noise; therefore they should be set to zero, and only the neurons whose absolute values of their activities are relatively large contain relevant information about the signal.
Recently, a series of correlated noise removal techniques have been reported.Some approaches focus on estimating spatial correlation characteristics of noise for a given image either when noise type and statistics like variance are known [18] or in case the noise variance and spatial spectrum have to be estimated [19] and then use a DCTbased method for noise removal.Wavelet-based approaches mainly include noise prewhitening technique followed by the wavelet-based thresholding [20], additive stationary correlated noise removal by modeling the noise-free coefficients using a multivariate Gaussian Scale Mixture [21], and image denoising using HMM in the wavelet domain based on the concept of signal of interest [22,23].Since the sparsity of signals can be exploited for noise removal purpose when different representations are used (Fourier, wavelet, principal components, independent components, etc.), a series of results concerning this property could be of interest in image denoising [24][25][26] and artifact (noise) removal in magnetic resonance imaging [27].
The outline of the paper is as follows.The general model of image transmission through a noisy corrupted channel is described in Section 2. Each image is transmitted several times as a sequence of equal sized blocks, each block being disturbed by a correlated Gaussian noise whose statistical properties are not known.All variants of each block are submitted to a sequence of transforms that decorrelate, shrink, and average the pixel values.
A special tailored family of feed-forward single-hiddenlayer neural networks is described in Section 3, their memories being generated using a supervised learning algorithm of gradient descent type.
A suitable methodology aiming to implement a noise removal method on neural network for image processing purposes is then described in the fourth section of the paper.The proposed methodology was applied to process images from different standard databases, the conclusions experimentally derived from the tests performed on two standard databases, the former containing images of human faces and the latter containing images of landscapes being reported in the next section.
The final section of the paper contains a series of conclusive remarks.

Image Preprocessing Based on Decorrelation and Shrinkage Techniques
We assume that the images are transmitted through a noisy channel, each image  being transmitted as a sequence of  -dimensional blocks, The working assumptions included in our model seem to be quite realistic according to the currently used information transmission frameworks.According to the second working assumption, for each index value , the sequence of blocks ( 1   ,  2  , . . .,    ) could represent fragments of possibly different images taken at the counterpart positions, as, for instance, in case of face images the areas of eyes or mouths and so on.Therefore the assumption that each  0  is a random vector corresponds to a model for each particular block, the parameters  0  and Σ 0  expressing the variability existing in the sequence of images  1 ,  2 , . . .,   at the level of th block.
On one hand, the maximum likelihood estimates (MLE) of the parameters  0  and Σ0  are given by respectively.On the other hand, the values of the parameters  0  and Σ 0  are also unknown and moreover it is quite inconvenient to estimate them before the transmission of the sequence of images is over.
The covariance matrix corresponding to the noise component can be estimated before the transmission is performed by different methods, as, for instance, the white wall method; therefore, without loss of generality, the matrix Σ can be assumed to be known; therefore, Σ0  = Ŝ0  − Σ can be taken as an estimate of Σ 0  .Also, in case each sequence ( B1  , B2  , . . ., B  ) is processed separately, we can assume that the data are centered; that is, μ0  = 0, 1 ≤  ≤ .
Consequently, the available information in developing a denoising procedure is represented by the sequences ( B1  , B2  , . . ., B  ), the estimates Σ0  , 1 ≤  ≤ , and Σ.In our work we consider the following shrinkage type denoising method.
For each 1 ≤  ≤ , we denote by   a matrix that diagonalizes simultaneously the matrices Σ0  and Σ.
Let C  , 1 ≤  ≤ , be the random vectors: Note that the linear transform of matrix (  )  allows obtaining the representation C  , where the most amount of noise is contained in the second term.Moreover, since the linear transform of matrix (  )  decorrelates the noise components.Let D  , 1 ≤  ≤ , be the sequence of variants of C  using the code shrinkage method [16], where each entry , Obviously, from (3) we get ((  )  ) −1 = Σ0    ; that is, Note that, although the eigenvalues of ( Σ0  ) −1 Σ are theoretically guaranteed to be positive numbers, in real world applications frequently arise situations when this matrix is ill conditioned.In order to overpass this difficulty, in our tests we implemented the code shrinkage method using where  is a conventionally selected positive threshold value.Also, instead of (8) we use where ((  )  ) + is the generalized inverse (Penrose pseudoinverse) of (  )  [30].
In our approach we assumed the source of noise (namely, the communication channel used to transmit the image) can be observed.This hypothesis is frequently used in image restauration techniques [26].In preprocessing and training stages, undisturbed original versions of the images transmitted are not available; instead, a series of perturbed versions are available and also through white wall technique noise component characteristics may be estimated.Working hypothesis includes the fact that images come from a common probability repartition (maybe a mixture); that is, they share the same statistical characteristics.This hypothesis is frequently used when sets of images are captured and processed [16].The purpose of this method is, on one hand, to eliminate correlated noise, and, on the other hand, to eliminate the noise from new images transmitted through a communication channel, when they come from the same probability distribution as the images in the initially observed set.

Neural Networks Based Approach to Image Denoising
The aim of this section is to present an image denoising method in the framework described in the previous section implemented on a family of standard feed-forward neural architectures NN  : Let us assume that Ĩ = ( B1 , B2 , . . ., B ) is the noisy received version of the image  = ( 1 ,  2 , . . .,   ) transmitted through the channel.The training process of the architectures NN  , 1 ≤  ≤ , is organized such that the resulting memories encode the associations of the type (input block, sample mean), the purpose being the noise removal according to the method presented in the previous section.
In order to reduce in some extent the computational complexity, a preprocessing step aiming dimensionality reduction is required.In our work we use  2 -PCA method to compress the blocks.Since the particular positions of the blocks correspond to different models, their compressed versions could be of different sizes.Indeed, according to (2), the estimates of the autocorrelation matrices therefore, the numbers of the most significant directions are different for different values of index ; that is, the sizes of the compressed variants of blocks are, in general, different.Consequently, the sizes of (  )  and (  )  depend on , these sizes resulting in the preprocessing step by applying the  2 -PCA method [31,32].
The hidden neurons influence the error on the nodes to which their output is connected.The use of too many hidden neurons could cause the so-called overfitting effect which means the overestimate of the complexity corresponding to the target problem.Maybe the most unpleasant consequence is that this way the generalization capability is decreased; therefore, the capacity of prediction is degraded too.On the other hand, at least in image processing, the use of fewer hidden neurons implies that less information extracted from the inputs is processed and consequently less accuracy should be expected.Consequently, the determining of the right size of the hidden layer results as a trade-off between accuracy and generalization capacity.
There have been proposed several expressions to compute the number of neurons in the hidden layers [33,34].Denoting by | ⋅ | the number of elements of the argument, the sizes of the hidden layers (  )  can be computed many ways, some of the most frequent expressions being [34]  ] .

(12b)
The aim of the training is that, for each value of the index  to obtain on the output on the layer (  )  , a compressed cleaned version of the input applied to the layer (  )  , the output being computed according to the method presented in the previous section.According to the approach described in the previous section, all blocks of the same index say  are processed by the same compression method yielding to compressed variants, the size of compressed variants being the same for all these blocks.The compressed variants corresponding to the blocks of index  are next fed as inputs to th neural architecture.Consequently, the denoising process of an image consisting of  blocks is implemented on a family of  neural architectures operating in parallel (NN  , 1 ≤  ≤ ), where NN  : (  )  → (  )  → (  )  ; the sequence of denoised variants resulted as outputs of the layers (  )  being next decompressed.The cleaned variant of each input image is taken as the sequence of the decompressed cleaned variants of its blocks.
The preprocessing step producing the compressed variants fed as input blocks is described as follows.For each index value , the sequence of compressed versions of the blocks ( B1 where the columns of the matrix   are the most significant unit eigenvectors of the autocorrelation matrix are computed in a similar way as in the compression step applied to input blocks using possibly a different threshold value  2 ∈ (0, 1).Note that, in tests, the threshold values  1 ,  2 are experimentally tuned to the particular sequence of images.
To summarize, the preprocessing scheme consists of applying  2 -PCA method to both noisy sequence of blocks ( B1  , B2  , . . ., B  ) and their cleaned versions ( B1  , . . ., B  ) causing the sequence of inputs to be applied to the input layer (  )  and to their compressed cleaned versions The aim of the training is to produce on each output layer the sequence ( Ĉ The training of each neural architecture NN  is of supervised type using a gradient descent approach, the local memories of (  )  and (  )  being determined using the Levenberg-Marquardt variant of the backpropagation learning algorithm (LM-BP algorithm) [35].
We organized the training process for the m neural networks by transmitting through the channel each available image several times, say  times; the reason of doing that is that this way better estimates of the covariance matrices Σ 0  , 1 ≤  ≤ , of the proposed stochastic models are expected to be obtained.
Consequently, the whole available data is the collection The linear compression filter   is a matrix whose columns are the most significant unit eigenvectors of the matrix ), 1 ≤  ≤ , resides in the fact that taking the means and their compressed versions some amount of noise is expected to be removed, for each value of the index ; that is, the compressed versions of the means are expected to be better cleaned variants of the compressed blocks.
Summarizing, the memory of each neural architecture NN  is computed by the Levenberg-Marquardt algorithm applied to the input/output sequence {( C Once the training phase is over, the family of NN  's is used to remove the noise from a noisy version of an image Ĩ = ( B1 , B2 , . . ., B ) received through the channel according to the following scheme.Let  = ( 1 ,  2 , . . .,   ) be the initial image transmitted through the channel and Ĩ = ( B1 , B2 , . . ., B ) the received noisy version.
Step 1. Compress each block B of Ĩ using the filter   and get its compressed version C  ; that is, ( C 1 , C 2 , . . ., C  ) is a dynamically block-compressed version of Ĩ.
Step 2. Apply ( C 1 , C 2 , . . ., C  ) as inputs to the architectures NN  's, C  applied as input to the layer (  )  , 1 ≤  ≤ , and get the outputs   's.

Description of the Methodology Applied in the Implementations of the Proposed Method on Neural Architectures
The aim of this section is to describe the methodology followed in implementing the neural network-based noise removal method for image processing purposes.The proposed methodology was applied to process images from different standard databases, the conclusions experimentally derived from the tests performed on two standard databases, the former containing images of human faces and the latter containing images of landscapes being reported in the next section.
We performed the experiments according to the following methodology.
(1) The quality of the a certain test image  = ((, )) versus a reference image  = ((, )) of the same size (  ,   ) is evaluated in terms of the Signal-to-Noise Ratio (SNR), Peak Signal-to-Noise Ratio (PSNR), Root Mean Squared Signal-to-Noise Ratio (SNR RMS) indicators [36], and the Structural Similarity Metric (SSIM) [37], where ( Let  and  be spatial patches extracted from the images  and , respectively.The two patches correspond to the same Mathematical Problems in Engineering spatial window of the images  and .The original standard SSIM value computed for the patches  and  is defined by where   denotes the mean value of ,   is the standard deviation of , and   represents the cross-correlation of the mean shifted patches  −   and  −   .The constants  1 and  2 are small positive numbers included to avoid instability when either  2  +  2  or  2  +  2  is very close to zero, respectively.The overall SSIM index for the images  and  is computed as the mean value of the SSIM measures computed for all pairs of patches  and  of  and , respectively. (2) The size of the blocks and the model of noise in transmitting data through the channel are selected for each database.The size of the blocks is established by taking into account the size of the available images in order to assure reasonable complexity to the noise removal process.In our tests the size of input blocks is about 150 and the sizes of images are 135 × 100 in case of the database containing images of human faces and 154 × 154 in case of the database containing images of landscapes.We assumed that the components of the noise  induced by the channel are possibly correlated; in our tests, the noise model is of Gaussian type,  ∼ (0, Σ), where Σ is a symmetric positive defined matrix.
(3) The compression thresholds  1 ,  2 in ( 14) and ( 15) are established in order to assure some desired accuracy.In our tests we used  1 =  1 * 10 −4 ,  2 =  2 * 10 −4 , where  1 ,  2 are positive constants.The reason for selecting different magnitude orders of these thresholds stems from the fact that  1 is used in compressing noise affected images, while  2 is used for compressing noise cleaned images [32].The sizes of the input and output layers (  )  , (  )  of the neural network NN  result in terms of the established values of  1 and  2 accordingly.
(5) In order to implement the noise removal method on a family of neural networks NN  : (  )  → (  )  → (  )  , 1 ≤  ≤ , the sizes of the input (  )  and the output layers (  )  are determined by  2 -PCA compression/decompression method and the established values of  1 ,  2 .The sizes of the layers (  )  are determined as approximations of the recommended values cited in the published literature (12a) and (12b).In order to assure a reasonable tractability of the data, in our tests we were forced to use a less number of neurons than it is recommended, on the hidden layers (  )  .
For fixed values of  1 ,  2 , the use of the recommended number of neurons as in (12a) and (12b) usually yields to either the impossibility of implementing the learning process or to too lengthy training processes.Therefore, in such case we are forced to reconsider the values of  1 ,  2 by increasing them, therefore decreasing the numbers of neurons on the input and the output layers and consequently the number of neurons on the hidden layers too.Obviously, by reconsidering this way the values of  1 ,  2 , inherently imply that some larger amount of information about data is lost.The effects of losing information are manifold, one of them being that the cleaned versions resulted from decompressing the outputs of NN  's yield to poorer approximation Î of the initial image .
This way we arrive at the conclusion that, in practice, we have to solve a trade-off problem between the magnitude of the compression rates and the number of neurons on the hidden layers (  )  's.In order to solve this trade-off, in our tests we used smaller numbers of neurons than recommended on the hidden layers and developed a comparative analysis on the quality of the resulting cleaned images.(6) The activation functions of the neurons belonging to the hidden and output layers can be selected from very large family.In our tests, we considered the logistic type to model the activation functions of the neurons belonging to the hidden layers and the unit functions to model the outputs of the neurons belonging to the output layers.Also, the learning process involved the task of splitting the available data into training, validation, and test data.In our tests the sizes of the subcollections were 80%, 10%, and 10%, respectively.(7) The evaluation of the overall quality of the noise removal process implemented on the set of neural networks, as previously described, is performed in terms of the indicators (20) and (21), on one hand by comparing the initial data  = ( 1 ,  2 , . . .,   ) to the noisy transmitted images Ĩ = ( B1 , B2 , . . ., B ) through the channel and on the other hand by comparing  = ( 1 ,  2 , . . .,   ) to their cleaned versions Î = ( 1 ⋅  1 , . . .,   ⋅   ).(8) The comparative analysis between the performances corresponding to the decorrelation and shrinkage method and its implementation on neural networks is developed in terms of the indicators (20) and (21).

Experimentally Derived Conclusions on the Performance of the Proposed Method
In this section we present the results in evaluating both the quality of the proposed decorrelation and shrinkage method and the power of the neural network-based approach in simulating it for noise removal purposes.The tests were performed in a similar way on two standard databases, the former, referred to as Senthil, containing images of 5 human faces and 16 images for each person [38] and the latter containing 42 images of landscapes [39].In case of the Senthil database, the preprocessing step used 75 images; for each human face 15 of its available versions are being used.The tests performed in order to evaluate the quality of the trained family of neural networks used the rest of 5 images, one for each person.In case of the database containing images of landscapes, we identified three types of quite similar images, Table 1 The maximum number of epochs The minimum value of the performance (Jacobian computation)

The maximum validation failures
The minimum performance gradient The initial/maximum  factor (in the LM adaptive learning rule) 1000 0 5 10 −5 10 −3 /10 10 and we used 13 images of each type in the training process, the tests being performed on the rest of three ones.The sizes of hidden layers were set to smaller values than recommended by (12a) and (12b).For instance, when  1 ≈ 10 −4 and  2 ≈ 10 In our test, the memory of each neural architecture is computed by the LM-BP algorithm, often the fastest variant of the backpropagation algorithm and one of the most commonly used in supervised learning.The available data was split into training set, validation set, and test set, the sizes of the subcollections being 80%, 10%, and 10%, respectively.The main parameters of the LM-BP training process are specified in Table 1.
In order to experimentally establish the quality of the proposed method, a comparative analysis against three of the most used and suitable algorithms for correlated noise removal, namely, BM3D (block-matching and 3D filtering [25]), NLMF (Nonlocal Means Noise Filtering [40]), and ProbShrink (correlated noise removal algorithm using nondecimated wavelet transform and generalized Laplacian [22]), was conducted.The reported results include both quantitative and qualitative comparisons.
In the following, we summarize some of our results.
(a) The quality evaluation of the preprocessing step in terms of the indicators (20) and ( 21) is as follows: (1) In Figure 1(a), a sample of five face images belonging to the Senthil database is presented, their cleaned versions resulted from applying the decorrelation and shrinkage method being shown in Figure 1(e), where each image was transmitted 30 times through the channel.In Figures 1(b), 1(c), and 1(d) the restored versions resulting from applying the NLMF algorithm, ProbShrink algorithm, and BM3D method, respectively, are depicted.Table 2 contains the values of the indicators ( 20) and ( 21) corresponding to these five pairs of noisy-cleaned versions of these images.Note that, on average, the best results were obtained when our method was used.(2) A sample of three images of landscapes, one from each class, is presented in Figure 2(a) together with their cleaned versions resulting from applying the decorrelation and shrinkage method shown in Figure 2(e), where each image was transmitted 30 times through the channel.
In Figure 2(b) the restored variants using NLMF algorithm are exhibited, while in Figure 2(c) the restored variants using ProbShrink method are shown.The cleaned version using the BM3D algorithm is presented in Figure 2(d).The values of the indicators ( 20) and ( 21) are given in Table 3.Note that, on average, the best results were obtained when our method was used.
(b) As it was previously described, the images resulting from the preprocessing step are used in the supervised training of the family of neural networks.Once the training process is over, the family of neural networks are used to remove noise from new unseenyet images.Obviously, it is impossible to guarantee that the new test images share the same statistical properties with the images used during the training process, the unique criterion being that they are visually enough similar.In order to take into account this constraint, we split each of these two databases containing similar images into training and testing subsets, the sizes being 75/5 for Senthil dataset and 39/3 for the second database.
(1) The test images from the Senthil database and their versions resulting from applying the preprocessing step are shown in Figures 3(a 20) and ( 21), the results are summarized in Table 4.Note that in this case our method, ProbShrink algorithm, and BM3D method produce similar results, according to both SNR measure and SSIM metric.21) is summarized in Table 5.
Note that, in this case, the BM3D algorithm proved to smooth the results too much.Also, the images obtained when NLMF algorithm was used are of poor visual quality.The ProbShrink algorithm performed better than BM3D and NLMF, but, on average, the best results were obtained when our method was used.

Conclusive Remarks and Suggestions for Further Work
The proposed method combines the decorrelation and shrinkage techniques to neural network-based approaches for noise removal purposes.The images are assumed to be transmitted as sequences of blocks of equal sizes, each block being distorted by a stationary statistical correlated noise and some amount of the noise being partially removed using the method that combines noise decorrelation and standard shrinkage technique.The preprocessing step provides, for each initial image, a sequence of blocks that are further PCAcompressed at a certain rate, each component of the resulting sequence being supplied as inputs to a feed-forward neural architecture   →   →   .Therefore, each indexed block is processed by a neural network corresponding to that index value.The local memories of the neurons of the layers   and   are generated through a supervised learning process based on the compressed versions of blocks of the same index value supplied as inputs and the compressed versions of them resulting as the mean of their preprocessed versions.Finally, using the standard PCA-decompression technique, the sequence of the decompressed blocks is the cleaned representation of the initial image.The performance of the proposed method is evaluated by a long series of tests, the results being very encouraging as compared to similar developments for noise removal purposes.The evaluation of the amount of the noise removed is done in terms of some of the most frequently used similarity indicators, SNR, SNR-RMS, Peak SNR, and SSIM.
The results produced by applying the proposed method were compared to those produced by applying three of the most widely used algorithms for eliminating correlated noise.NLMF algorithm consistently produces weaker results than the proposed method.Using ProbShrink or BM3D, the results are similar to or weaker than those yielded by the proposed method, in both quality and quantity.
The long series of tests proved good results of the abovedescribed methodology entailing the hope that further and possibly more sophisticated extensions can be expected to improve it.Among several possible extensions, some work is still in progress concerning the use of different output functions for the hidden and output neurons and the use of more hidden layers in the neural architectures.Also, some other compression techniques combined with new techniques for feature extraction as well as the use of other learning schemes to generate the local memories of the neurons are expected to allow the removal of a larger amount of noise.community; Professor Luminita State passed away in January 2016.The authors will always remember her amazing spirit, as well as her brilliant mind.May God rest her soul in peace!

) and 3
(b), respectively.Their cleaned versions computed by the resulting family of trained neural networks are shown in Figure3(f), while their restored versions when NLMF algorithm, ProbShrink method, and BM3D method are used are presented in Figures3(c), 3(d), and 3(e), respectively.In terms of the indicators (

Figure 1 ( 2 )
Figure 1 Figure 3 Figure 4 working assumption of our model is that the noise modeled by the -dimensional random vectors   , 1 ≤  ≤ , affects the blocks in a similar way, where  1 ,  2 , . ..,   are independent identically distributed;   ∼ (0, Σ), 1 ≤  ≤ .In case  images,  1 ,  2 , . . .,   , are transmitted sequentially through the channel we denote by 1 ,  2 , . ..,   ,  = ( 1 ,  2 , . ..,   ), and we denote by Ĩ = ( B1 , B2 , . .., B ) the received image.A therefore, for each index value , the inputs applied to the th neural network are the sequence ( C ,  = 1, . . ., ,  = 1, . . ., . computed in a similar way as (15) using a threshold value  2 ∈ (0, 1) and let    =       .The learning process for each neural architecture NN  , 1 ≤  ≤ , is developed to encode the associations −4 the resulting sizes of the layers |(  )  | and |(  )  | are about 115 and 30, respectively, the recommended sizes of the layers |(  )  | being about 65.The results of a long series of tests pointed out that one can use hidden layers of smaller sizes than recommended without decreasing dramatically the accuracy.For instance, in this work, we used only half of recommended sizes; that is,