The paper proposes a new method that combines the decorrelation and shrinkage techniques to neural network-based approaches for noise removal purposes. The images are represented as sequences of equal sized blocks, each block being distorted by a stationary statistical correlated noise. Some significant amount of the induced noise in the blocks is removed in a preprocessing step, using a decorrelation method combined with a standard shrinkage-based technique. The preprocessing step provides for each initial image a sequence of blocks that are further compressed at a certain rate, each component of the resulting sequence being supplied as inputs to a feed-forward neural architecture FX→FH→FY. The local memories of the neurons of the layers FH and FY are generated through a supervised learning process based on the compressed versions of blocks of the same index value supplied as inputs and the compressed versions of them resulting as the mean of their preprocessed versions. Finally, using the standard decompression technique, the sequence of the decompressed blocks is the cleaned representation of the initial image. The performance of the proposed method is evaluated by a long series of tests, the results being very encouraging as compared to similar developments for noise removal purposes.
1. Introduction
There have been proposed a long series of digital image manipulation techniques, general and special tailored ones for different particular purposes. Digital image processing involves procedures including the acquisition and codification of images in digital files and the transmission of the resulting digital files of some communication channels, usually affected by noise [1, 2]. Consequently, a significant part of digital image procedures are devoted to noise removal and image reconstruction, most of them being developed in the framework represented by the assumptions that the superimposed noise is uncorrelated and normally distributed [3, 4]. Our approach is somehow different, keeping the assumption about normality but relaxing the constraint that the superimposed noise affects neighbor image pixels in a correlated way.
There are two basic mathematical characterizations of images, deterministic and statistical. In deterministic image representation, the image pixels are defined in terms of a certain function, possibly unknown, while, in statistical image representation, the images are specified in probabilistic terms as means, covariances, and higher degree moments [5–7]. In the past years, a series of techniques have been developed in order to involve neural architectures in image compression and denoising processes [8–13].
A neural network is a massively parallel-distributed processor made up of simple processing units, which has a natural propensity for storing experiential knowledge and making it available for use [14]. The neural networks methodology is of biological inspiration, a neural network resembling the biological brain in two respects; on one hand the knowledge is acquired by the network for its environment through a learning process, and on the other hand the interneuron connection strengths are used to store the acquired knowledge.
The “shrinkage” is a method for reducing the uncorrelated Gaussian noise affecting additively a signal image by soft thresholding applied to the sparse components [15–17]. Its use in neural network-based approach is intuitively explained by the fact that when only a few of the neurons are simultaneously active, it makes sense to assume that the activities of neurons with small absolute values correspond to noise; therefore they should be set to zero, and only the neurons whose absolute values of their activities are relatively large contain relevant information about the signal.
Recently, a series of correlated noise removal techniques have been reported. Some approaches focus on estimating spatial correlation characteristics of noise for a given image either when noise type and statistics like variance are known [18] or in case the noise variance and spatial spectrum have to be estimated [19] and then use a DCT-based method for noise removal. Wavelet-based approaches mainly include noise prewhitening technique followed by the wavelet-based thresholding [20], additive stationary correlated noise removal by modeling the noise-free coefficients using a multivariate Gaussian Scale Mixture [21], and image denoising using HMM in the wavelet domain based on the concept of signal of interest [22, 23]. Since the sparsity of signals can be exploited for noise removal purpose when different representations are used (Fourier, wavelet, principal components, independent components, etc.), a series of results concerning this property could be of interest in image denoising [24–26] and artifact (noise) removal in magnetic resonance imaging [27].
The outline of the paper is as follows. The general model of image transmission through a noisy corrupted channel is described in Section 2. Each image is transmitted several times as a sequence of equal sized blocks, each block being disturbed by a correlated Gaussian noise whose statistical properties are not known. All variants of each block are submitted to a sequence of transforms that decorrelate, shrink, and average the pixel values.
A special tailored family of feed-forward single-hidden-layer neural networks is described in Section 3, their memories being generated using a supervised learning algorithm of gradient descent type.
A suitable methodology aiming to implement a noise removal method on neural network for image processing purposes is then described in the fourth section of the paper. The proposed methodology was applied to process images from different standard databases, the conclusions experimentally derived from the tests performed on two standard databases, the former containing images of human faces and the latter containing images of landscapes being reported in the next section.
The final section of the paper contains a series of conclusive remarks.
2. Image Preprocessing Based on Decorrelation and Shrinkage Techniques
We assume that the images are transmitted through a noisy channel, each image I being transmitted as a sequence of md-dimensional blocks, B1,B2,…,Bm, I=B1,B2,…,Bm, and we denote by I~=B~1,B~2,…,B~m the received image. A working assumption of our model is that the noise modeled by the d-dimensional random vectors ηi, 1≤i≤m, affects the blocks in a similar way, where η1,η2,…,ηm are independent identically distributed; ηi~N0,Σ, 1≤i≤m.
In case N images, I1,I2,…,IN, are transmitted sequentially through the channel we denote by (I~i=B~1i,B~2i,…,B~mi, 1≤i≤N) the sequence of received disturbed variants. In our model, we adopt the additional working assumption that, for each 1≤j≤m, B~j1,B~j2,…,B~jN is a realization of a d-dimensional random vector B~j0=Bj0+ηj, where Bj0 is a random vector of mean μj0 and covariance matrix Σj0, and that Bj0 and ηj are independent; therefore the covariance matrix of B~j0 is Σ~j0=Σj0+Σ. The working assumptions included in our model seem to be quite realistic according to the currently used information transmission frameworks. According to the second working assumption, for each index value j, the sequence of blocks Bj1,Bj2,…,BjN could represent fragments of possibly different images taken at the counterpart positions, as, for instance, in case of face images the areas of eyes or mouths and so on. Therefore the assumption that each Bj0 is a random vector corresponds to a model for each particular block, the parameters μj0 and Σj0 expressing the variability existing in the sequence of images I1,I2,…,IN at the level of jth block.
On one hand, the maximum likelihood estimates (MLE) of the parameters μj0 and Σ~j0 are given by (1)μ^j0=1N∑i=1NB~ji,(2)S^j0=1N-1∑i=1NB~ji-μ^j0B~ji-μ^j0T,respectively. On the other hand, the values of the parameters μj0 and Σj0 are also unknown and moreover it is quite inconvenient to estimate them before the transmission of the sequence of images is over.
The covariance matrix corresponding to the noise component can be estimated before the transmission is performed by different methods, as, for instance, the white wall method; therefore, without loss of generality, the matrix Σ can be assumed to be known; therefore, Σ^j0=S^j0-Σ can be taken as an estimate of Σj0.
Also, in case each sequence B~j1,B~j2,…,B~jN is processed separately, we can assume that the data are centered; that is, μ^j0=0, 1≤j≤m.
Consequently, the available information in developing a denoising procedure is represented by the sequences B~j1,B~j2,…,B~jN, the estimates Σ^j0, 1≤j≤m, and Σ.
In our work we consider the following shrinkage type denoising method.
For each 1≤j≤m, we denote by Aj a matrix that diagonalizes simultaneously the matrices Σ^j0 and Σ. According to the celebrated W theorem [28, 29], the columns of Aj are eigenvectors of Σ^j0-1Σ and the following equations hold: (3)AjTΣ^j0Aj=Id,(4)AjTΣAj=Λj=diagλ1j,λ2j,…,λdj,where λ1j,λ2j,…,λdj are the eigenvalues of the matrix Σ^j0-1Σ. Note that although Σ^j0-1Σ is not a symmetric matrix, its eigenvalues are proved to be real positive numbers [29].
Let C~ji, 1≤i≤N, be the random vectors: (5)C~ji=AjTB~ji=AjTBji+AjTηj.Note that the linear transform of matrix AjT allows obtaining the representation C~ji, where the most amount of noise is contained in the second term. Moreover, since (6)CovAjTηj,AjTηjT=AjTΣAj=Λj,the linear transform of matrix AjT decorrelates the noise components.
Let D~ji, 1≤i≤N, be the sequence of variants of C~ji using the code shrinkage method [16], where each entry p, 1≤p≤d, of D~ji is(7)D~jip=sgnC~jipmax0,C~jip-2λpj.Then D~ji is a variant of C~ji where the noise distributed N0,Λj is partially removed. Since B~ji=AjT-1C~ji a variant of B~ji where the noise was partially removed can be taken as(8)B^ji=AjT-1D~ji.Obviously, from (3) we get AjT-1=Σ^j0Aj; that is,(9)B^ji=Σ^j0AjD~ji.Note that, although the eigenvalues of Σ^j0-1Σ are theoretically guaranteed to be positive numbers, in real world applications frequently arise situations when this matrix is ill conditioned. In order to overpass this difficulty, in our tests we implemented the code shrinkage method using (10)D~jip=sgnC~jipmax0,C~jip-2λpj,λpj>ε,C~jip,otherwise,where ε is a conventionally selected positive threshold value. Also, instead of (8) we use(11)B^ji=AjT+D~ji,where AjT+ is the generalized inverse (Penrose pseudoinverse) of AjT [30].
In our approach we assumed the source of noise (namely, the communication channel used to transmit the image) can be observed. This hypothesis is frequently used in image restauration techniques [26]. In preprocessing and training stages, undisturbed original versions of the images transmitted are not available; instead, a series of perturbed versions are available and also through white wall technique noise component characteristics may be estimated. Working hypothesis includes the fact that images come from a common probability repartition (maybe a mixture); that is, they share the same statistical characteristics. This hypothesis is frequently used when sets of images are captured and processed [16]. The purpose of this method is, on one hand, to eliminate correlated noise, and, on the other hand, to eliminate the noise from new images transmitted through a communication channel, when they come from the same probability distribution as the images in the initially observed set.
3. Neural Networks Based Approach to Image Denoising
The aim of this section is to present an image denoising method in the framework described in the previous section implemented on a family of standard feed-forward neural architectures NNj:FXj→FHj→FYj, 1≤j≤m, working in parallel.
Let us assume that I~=B~1,B~2,…,B~m is the noisy received version of the image I=B1,B2,…,Bm transmitted through the channel. The training process of the architectures NNj, 1≤j≤m, is organized such that the resulting memories encode the associations of the type (input block, sample mean), the purpose being the noise removal according to the method presented in the previous section.
In order to reduce in some extent the computational complexity, a preprocessing step aiming dimensionality reduction is required. In our work we use L2-PCA method to compress the blocks. Since the particular positions of the blocks correspond to different models, their compressed versions could be of different sizes. Indeed, according to (2), the estimates of the autocorrelation matrices S^j0+μ^j0μ^j0T, 1≤j≤m, are different for different values of the index j; therefore, the numbers of the most significant directions are different for different values of index j; that is, the sizes of the compressed variants of blocks are, in general, different. Consequently, the sizes of FXj and FYj depend on j, these sizes resulting in the preprocessing step by applying the L2-PCA method [31, 32].
The hidden neurons influence the error on the nodes to which their output is connected. The use of too many hidden neurons could cause the so-called overfitting effect which means the overestimate of the complexity corresponding to the target problem. Maybe the most unpleasant consequence is that this way the generalization capability is decreased; therefore, the capacity of prediction is degraded too. On the other hand, at least in image processing, the use of fewer hidden neurons implies that less information extracted from the inputs is processed and consequently less accuracy should be expected. Consequently, the determining of the right size of the hidden layer results as a trade-off between accuracy and generalization capacity.
There have been proposed several expressions to compute the number of neurons in the hidden layers [33, 34]. Denoting by · the number of elements of the argument, the sizes of the hidden layers FHj can be computed many ways, some of the most frequent expressions being [34] (12a)FHj=2FYj+2FXj,(12b)FHj=FYj+2FXj+2FXjFYj+2.The aim of the training is that, for each value of the index j to obtain on the output on the layer FYj, a compressed cleaned version of the input applied to the layer FXj, the output being computed according to the method presented in the previous section.
According to the approach described in the previous section, all blocks of the same index say j are processed by the same compression method yielding to compressed variants, the size of compressed variants being the same for all these blocks. The compressed variants corresponding to the blocks of index j are next fed as inputs to jth neural architecture. Consequently, the denoising process of an image consisting of m blocks is implemented on a family of m neural architectures operating in parallel (NNj, 1≤j≤m), where NNj:FXj→FHj→FYj; the sequence of denoised variants resulted as outputs of the layers FYj being next decompressed. The cleaned variant of each input image is taken as the sequence of the decompressed cleaned variants of its blocks.
The preprocessing step producing the compressed variants fed as input blocks is described as follows. For each index value j, the sequence of compressed versions of the blocks B~j1,B~j2,…,B~jN denoted by CB~j1,CB~j2,…,CB~jN is(13)CB~ji=WjTB~ji,i=1,…,N,where the columns of the matrix Wj are the most significant unit eigenvectors of S^j0+μ^j0μ^j0T. The most significant unit eigenvectors of S^j0+μ^j0μ^j0T are computed as follows. Let θ1j≥θ2j≥⋯≥θdj be the eigenvalues of S^j0+μ^j0μ^j0T and ε1∈0,1 a conventionally selected threshold value. If t is the smallest value such that (14) holds, then the columns of Wj are unit eigenvectors of S^j0+μ^j0μ^j0T corresponding to the largest t eigenvalues:(14)1∑k=1dθkj∑k=t+1dθkj<ε1;therefore, FXj=t.
Assuming that the sequence of blocks B^j1,…,B^jN are cleaned versions of B~j1,B~j2,…,B~jN computed according to (11), we denote by CB^j1,…,CB^jN their compressed variants: (15)CB^ji=VjTB^ji,i=1,…,N,where the columns of the matrix Vj are the most significant unit eigenvectors of the autocorrelation matrix 1/N∑i=1NB^jiB^jiT. The most significant eigenvectors of 1/N∑i=1NB^jiB^jiT are computed in a similar way as in the compression step applied to input blocks using possibly a different threshold value ε2∈0,1. Note that, in tests, the threshold values ε1, ε2 are experimentally tuned to the particular sequence of images.
To summarize, the preprocessing scheme consists of applying L2-PCA method to both noisy sequence of blocks B~j1,B~j2,…,B~jN and their cleaned versions B^j1,…,B^jN causing the sequence of inputs to be applied to the input layer FXj and to their compressed cleaned versions CB^j1,…,CB^jN:(16)B~j1,B~j2,…,B~jN⟶WjTCB~j1,CB~j2,…,CB~jN⟶FXj.The aim of the training is to produce on each output layer the sequence CB^j1,…,CB^jN, the decompressed versions of its blocks being VjCB^j1,…,VjCB^jN: (17)FYj⟶CB^j1,…,CB^jN⟶VjVjCB^j1,…,VjCB^jN;therefore, the blocks of VjCB^j1,…,VjCB^jN are denoised versions of B~j1,B~j2,…,B~jN, respectively.
The training of each neural architecture NNj is of supervised type using a gradient descent approach, the local memories of FHj and FYj being determined using the Levenberg-Marquardt variant of the backpropagation learning algorithm (LM-BP algorithm) [35].
We organized the training process for the m neural networks by transmitting through the channel each available image several times, say p times; the reason of doing that is that this way better estimates of the covariance matrices Σj0, 1≤j≤m, of the proposed stochastic models are expected to be obtained.
Consequently, the whole available data is the collection (I~i,l=B~1i,l,B~2i,l,…,B~mi,l, 1≤i≤N, 1≤l≤p); therefore, for each index value j, the inputs applied to the jth neural network are the sequence CB~j1,1,CB~j1,2,…,CB~j1,p,…CB~jN,1,…,CB~jN,p of compressed versions of the blocks B~j1,1,B~j1,2,…,B~j1,p,…B~jN,1,…,B~jN,p:(18)CB~ji,l=WjTB~ji,l,i=1,…,N,l=1,…,p.The linear compression filter Wj is a matrix whose columns are the most significant unit eigenvectors of the matrix 1/Np∑i=1N∑l=1pB~ji,lB~ji,lT.
Let B^j1,1,…,B^j1,p,…,B^jN,1,…,B^jN,p be the sequence of the cleaned variants of B~j1,1,B~j1,2,…,B~j1,p,…B~jN,1,…,B~jN,p computed using (11) and, for each 1≤i≤N, let Mji be the sample mean of cleaned blocks B^ji,1,…,B^ji,p:(19)Mji=1p∑l=1pB^ji,l.We denote by Vj a linear compression filter whose columns are the most significant unit eigenvectors of the matrix 1/N∑i=1NMjiMjiT computed in a similar way as (15) using a threshold value ε2∈0,1 and let CMji=VjTMji.
The learning process for each neural architecture NNj, 1≤j≤m, is developed to encode the associations CB~jk,1,…,CB~jk,p→CMjk, 1≤k≤N. The reason of using the means Mjk, 1≤k≤N, and their corresponding compressed versions instead of the associations CB~jk,1,…,CB~jk,p→CB^jk,1,…,CB^jk,p, 1≤k≤N, resides in the fact that taking the means and their compressed versions some amount of noise is expected to be removed, for each value of the index j; that is, the compressed versions of the means are expected to be better cleaned variants of the compressed blocks.
Summarizing, the memory of each neural architecture NNj is computed by the Levenberg-Marquardt algorithm applied to the input/output sequence CB~j1,1,CMj1,…,CB~j1,p,CMj1,…,CB~jN,1,CMjN,… , CB~jN,p,CMjN, 1≤j≤m.
Once the training phase is over, the family of NNj’s is used to remove the noise from a noisy version of an image I~=B~1,B~2,…,B~m received through the channel according to the following scheme. Let I=B1,B2,…,Bm be the initial image transmitted through the channel and I~=B~1,B~2,…,B~m the received noisy version.
Step 1.
Compress each block B~j of I~ using the filter Wj and get its compressed version CB~j; that is, CB~1,CB~2,…,CB~m is a dynamically block-compressed version of I~.
Step 2.
Apply CB~1,CB~2,…,CB~m as inputs to the architectures NNj’s, CB~j applied as input to the layer FXj, 1≤j≤m, and get the outputs RBj’s.
Step 3.
Decompress each block RBj using the decompression filter Vj, 1≤j≤m.
Step 4.
Get I^=V1·RB1,…,Vm·RBm the cleaned version of I~.
4. Description of the Methodology Applied in the Implementations of the Proposed Method on Neural Architectures
The aim of this section is to describe the methodology followed in implementing the neural network-based noise removal method for image processing purposes. The proposed methodology was applied to process images from different standard databases, the conclusions experimentally derived from the tests performed on two standard databases, the former containing images of human faces and the latter containing images of landscapes being reported in the next section.
We performed the experiments according to the following methodology.
(1) The quality of the a certain test image T=tx,y versus a reference image R=rx,y of the same size nx,ny is evaluated in terms of the Signal-to-Noise Ratio (SNR), Peak Signal-to-Noise Ratio (PSNR), Root Mean Squared Signal-to-Noise Ratio (SNR_RMS) indicators [36], and the Structural Similarity Metric (SSIM) [37], where (20)SNRR,T=10∗log10∑x=1nx∑y=1nyrx,y2∑x=1nx∑y=1nyrx,y-tx,y2,PSNRR,T=10∗log10maxrx,y21/nx∗ny∑x=1nx∑y=1nyrx,y-tx,y2,SNR_RMSR,T=∑x=1nx∑y=1nyrx,y2∑x=1nx∑y=1nyrx,y-tx,y2.Let x and y be spatial patches extracted from the images R and T, respectively. The two patches correspond to the same spatial window of the images R and T. The original standard SSIM value computed for the patches x and y is defined by(21)SSIMx,y=2μxμy+C1μx2+μy2+C1·2σxy+C2σx2+σy2+C2,where μx denotes the mean value of x, σx is the standard deviation of x, and σxy represents the cross-correlation of the mean shifted patches x-μx and y-μy. The constants C1 and C2 are small positive numbers included to avoid instability when either μx2+μy2 or σx2+σy2 is very close to zero, respectively. The overall SSIM index for the images R and T is computed as the mean value of the SSIM measures computed for all pairs of patches x and y of R and T, respectively.
(2) The size of the blocks and the model of noise in transmitting data through the channel are selected for each database. The size of the blocks is established by taking into account the size of the available images in order to assure reasonable complexity to the noise removal process. In our tests the size of input blocks is about 150 and the sizes of images are 135×100 in case of the database containing images of human faces and 154×154 in case of the database containing images of landscapes. We assumed that the components of the noise η induced by the channel are possibly correlated; in our tests, the noise model is of Gaussian type, η~N0,Σ, where Σ is a symmetric positive defined matrix.
(3) The compression thresholds ε1, ε2 in (14) and (15) are established in order to assure some desired accuracy. In our tests we used ε1=c1∗10-4, ε2=c2∗10-4, where c1, c2 are positive constants. The reason for selecting different magnitude orders of these thresholds stems from the fact that ε1 is used in compressing noise affected images, while ε2 is used for compressing noise cleaned images [32]. The sizes of the input and output layers FXj, FYj of the neural network NNj result in terms of the established values of ε1 and ε2 accordingly.
(4) The quality evaluation of the preprocessing step consisting in noise cleaning data is performed in terms of the indicators (20) and (21), by comparing the initial data I=B1,B2,…,Bm against the noisy transmitted images I~=B~1,B~2,…,B~m through the channel and I=B1,B2,…,Bm against their corresponding cleaned versions I^=V1·RB1,…,Vm·RBm, respectively.
(5) In order to implement the noise removal method on a family of neural networks NNj:FXj→FHj→FYj, 1≤j≤m, the sizes of the input FXj and the output layers FYj are determined by L2-PCA compression/decompression method and the established values of ε1, ε2. The sizes of the layers FHj are determined as approximations of the recommended values cited in the published literature (12a) and (12b). In order to assure a reasonable tractability of the data, in our tests we were forced to use a less number of neurons than it is recommended, on the hidden layers FHj.
For fixed values of ε1, ε2, the use of the recommended number of neurons as in (12a) and (12b) usually yields to either the impossibility of implementing the learning process or to too lengthy training processes. Therefore, in such case we are forced to reconsider the values of ε1, ε2 by increasing them, therefore decreasing the numbers of neurons on the input and the output layers and consequently the number of neurons on the hidden layers too. Obviously, by reconsidering this way the values of ε1, ε2, inherently imply that some larger amount of information about data is lost. The effects of losing information are manifold, one of them being that the cleaned versions resulted from decompressing the outputs of NNj’s yield to poorer approximation I^ of the initial image I.
This way we arrive at the conclusion that, in practice, we have to solve a trade-off problem between the magnitude of the compression rates and the number of neurons on the hidden layers FHj’s. In order to solve this trade-off, in our tests we used smaller numbers of neurons than recommended on the hidden layers and developed a comparative analysis on the quality of the resulting cleaned images.
(6) The activation functions of the neurons belonging to the hidden and output layers can be selected from very large family. In our tests, we considered the logistic type to model the activation functions of the neurons belonging to the hidden layers and the unit functions to model the outputs of the neurons belonging to the output layers. Also, the learning process involved the task of splitting the available data into training, validation, and test data. In our tests the sizes of the subcollections were 80%, 10%, and 10%, respectively.
(7) The evaluation of the overall quality of the noise removal process implemented on the set of neural networks, as previously described, is performed in terms of the indicators (20) and (21), on one hand by comparing the initial data I=B1,B2,…,Bm to the noisy transmitted images I~=B~1,B~2,…,B~m through the channel and on the other hand by comparing I=B1,B2,…,Bm to their cleaned versions I^=V1·RB1,…,Vm·RBm.
(8) The comparative analysis between the performances corresponding to the decorrelation and shrinkage method and its implementation on neural networks is developed in terms of the indicators (20) and (21).
5. Experimentally Derived Conclusions on the Performance of the Proposed Method
In this section we present the results in evaluating both the quality of the proposed decorrelation and shrinkage method and the power of the neural network-based approach in simulating it for noise removal purposes. The tests were performed in a similar way on two standard databases, the former, referred to as Senthil, containing images of 5 human faces and 16 images for each person [38] and the latter containing 42 images of landscapes [39]. In case of the Senthil database, the preprocessing step used 75 images; for each human face 15 of its available versions are being used. The tests performed in order to evaluate the quality of the trained family of neural networks used the rest of 5 images, one for each person. In case of the database containing images of landscapes, we identified three types of quite similar images, and we used 13 images of each type in the training process, the tests being performed on the rest of three ones.
The sizes of hidden layers were set to smaller values than recommended by (12a) and (12b). For instance, when ε1≈10-4 and ε2≈10-4the resulting sizes of the layers FXj and FYj are about 115 and 30, respectively, the recommended sizes of the layers FHj being about 65.
The results of a long series of tests pointed out that one can use hidden layers of smaller sizes than recommended without decreasing dramatically the accuracy. For instance, in this work, we used only half of recommended sizes; that is,(22)FHj=FYj+2FXj+2FXj/FYj+22.In our test, the memory of each neural architecture is computed by the LM-BP algorithm, often the fastest variant of the backpropagation algorithm and one of the most commonly used in supervised learning. The available data was split into training set, validation set, and test set, the sizes of the subcollections being 80%, 10%, and 10%, respectively. The main parameters of the LM-BP training process are specified in Table 1.
The maximum number of epochs
The minimum value of the performance (Jacobian computation)
The maximum validation failures
The minimum performance gradient
The initial/maximum μ factor (in the LM adaptive learning rule)
1000
0
5
10-5
10-3/1010
In order to experimentally establish the quality of the proposed method, a comparative analysis against three of the most used and suitable algorithms for correlated noise removal, namely, BM3D (block-matching and 3D filtering [25]), NLMF (Nonlocal Means Noise Filtering [40]), and ProbShrink (correlated noise removal algorithm using nondecimated wavelet transform and generalized Laplacian [22]), was conducted. The reported results include both quantitative and qualitative comparisons.
In the following, we summarize some of our results.
The quality evaluation of the preprocessing step in terms of the indicators (20) and (21) is as follows:
In Figure 1(a), a sample of five face images belonging to the Senthil database is presented, their cleaned versions resulted from applying the decorrelation and shrinkage method being shown in Figure 1(e), where each image was transmitted 30 times through the channel. In Figures 1(b), 1(c), and 1(d) the restored versions resulting from applying the NLMF algorithm, ProbShrink algorithm, and BM3D method, respectively, are depicted. Table 2 contains the values of the indicators (20) and (21) corresponding to these five pairs of noisy-cleaned versions of these images.
Note that, on average, the best results were obtained when our method was used.
A sample of three images of landscapes, one from each class, is presented in Figure 2(a) together with their cleaned versions resulting from applying the decorrelation and shrinkage method shown in Figure 2(e), where each image was transmitted 30 times through the channel. In Figure 2(b) the restored variants using NLMF algorithm are exhibited, while in Figure 2(c) the restored variants using ProbShrink method are shown. The cleaned version using the BM3D algorithm is presented in Figure 2(d). The values of the indicators (20) and (21) are given in Table 3.
Note that, on average, the best results were obtained when our method was used.
As it was previously described, the images resulting from the preprocessing step are used in the supervised training of the family of neural networks. Once the training process is over, the family of neural networks are used to remove noise from new unseen-yet images. Obviously, it is impossible to guarantee that the new test images share the same statistical properties with the images used during the training process, the unique criterion being that they are visually enough similar. In order to take into account this constraint, we split each of these two databases containing similar images into training and testing subsets, the sizes being 75/5 for Senthil dataset and 39/3 for the second database.
The test images from the Senthil database and their versions resulting from applying the preprocessing step are shown in Figures 3(a) and 3(b), respectively. Their cleaned versions computed by the resulting family of trained neural networks are shown in Figure 3(f), while their restored versions when NLMF algorithm, ProbShrink method, and BM3D method are used are presented in Figures 3(c), 3(d), and 3(e), respectively. In terms of the indicators (20) and (21), the results are summarized in Table 4.
Note that in this case our method, ProbShrink algorithm, and BM3D method produce similar results, according to both SNR measure and SSIM metric.
Similar tests were performed on the database containing images of landscapes. The tests were performed on three new images shown in Figure 4(a), the results of the preprocessing step being given in Figure 4(b). The cleaned versions computed by the resulting family of trained neural networks are shown in Figure 4(f) and the clean versions given by the BM3D algorithm are presented in Figure 4(e). The results obtained when the NLMF algorithm and ProbShrink method are used are displayed in Figures 4(c) and 4(d), respectively. The numerical evaluation in terms of the indicators (20) and (21) is summarized in Table 5.
Note that, in this case, the BM3D algorithm proved to smooth the results too much. Also, the images obtained when NLMF algorithm was used are of poor visual quality. The ProbShrink algorithm performed better than BM3D and NLMF, but, on average, the best results were obtained when our method was used.
SNR-RMS (the mean value)
SNR (the mean value)
Mean Peak SNR (the mean value)
SSIM (the mean value)
Noisy images versus original images
7.3128
17.2193
26.3217
0.5838
Cleaned images versus original images (NLMF)
12.1367
21.6369
30.8944
0.8377
Cleaned images versus original images (ProbShrink)
15.3787
23.7071
33.1043
0.8782
Cleaned images versus original images (BM3D)
15.0573
23.5242
33.3160
0.8867
Cleaned images versus original images (the proposed method)
19.5967
25.8067
35.0112
0.9163
SNR-RMS (the mean value)
SNR (the mean value)
Mean Peak SNR (the mean value)
SSIM (the mean value)
Noisy images versus original images
7.1013
17.0083
24.6427
0.6868
Cleaned images versus original images (NLMF)
7.5050
17.4992
25.8548
0.6503
Cleaned images versus original images (ProbShrink)
9.6534
19.7858
27.7324
0.8002
Cleaned images versus original images (BM3D)
8.7304
18.8084
27.6967
0.7389
Cleaned images versus original images (the proposed method)
20.8267
26.2779
34.0215
0.9341
SNR-RMS/new image
SNR/new image
Mean Peak SNR/new image
SSIM/new image
Noisy images versus original images
7.1352 7.7979 6.7452 9.0387 6.1524
17.0681 17.8395 16.5800 19.1221 15.7809
26.3356 26.4804 26.2165 26.4122 26.5486
0.5700 0.5693 0.5733 0.6410 0.5513
Cleaned images (using the preprocessing step) versus original images
8.4056 9.6450 8.3065 9.5696 7.4780
18.4914 19.6860 18.3883 19.6179 17.4758
27.8057 28.3873 28.0880 26.9517 28.2541
0.6328 0.6411 0.6392 0.6643 0.6308
Cleaned images versus original images (NLMF)
12.1067 13.6918 11.9228 14.5670 10.8248
21.6605 22.7292 21.5275 23.2674 20.6884
31.0955 31.5051 31.3547 30.6666 31.6597
0.8474 0.8448 0.8424 0.8588 0.8619
Cleaned images versus original images (ProbShrink)
15.6521 15.5453 14.6064 17.5242 13.2653
23.8914 23.8320 23.2909 24.8728 22.1486
33.2490 32.5356 33.0326 32.2178 33.5365
0.8834 0.8648 0.8705 0.8836 0.8812
Cleaned images versus original images (BM3D)
14.833316.338814.710817.796513.4046
23.424824.264423.352725.006722.4632
33.3433.5933.6833.0033.63
0.88240.88540.89910.85250.9074
Cleaned images (using NN’s) versus original images (the proposed method)
13.7982 16.9576 16.5712 12.6325 13.4488
22.7964 24.5873 24.3871 22.0298 22.5737
32.2178 33.0786 33.8409 29.6259 33.6427
0.8850 0.8781 0.8968 0.8454 0.8939
SNR-RMS/new image
SNR/new image
Mean Peak SNR/new image
SSIM/new image
Noisy images versus original images
6.7603 6.5259 7.7468
16.5994 16.2928 17.7824
24.6606 24.6103 24.7497
0.6564 0.6796 0.6157
Cleaned images versus original images
6.8194 7.83837.3444
16.674917.884517.3191
24.7947 26.2810 24.3228
0.6590 0.72030.6990
Cleaned images versus original images (NLMF)
7.7084 6.9041 8.2267
17.7393 16.7822 18.3045
26.0425 25.3903 25.4729
0.6773 0.6025 0.6830
Cleaned images versus original images (ProbShrink)
9.5398 9.0598 10.0007
19.6813 19.1423 20.0039
27.7907 27.6171 27.5228
0.8060 0.7732 0.8218
Cleaned images versus original images (BM3D)
9.05897.8517 9.7064
19.141517.899319.7412
27.9526.9927.89
0.72650.6969 0.7688
Cleaned images (using NN’s) versus original images (the proposed method)
9.4599 13.1507 10.0606
19.5177 22.3790 20.0525
27.6939 30.8164 27.1269
0.80900.8770 0.8412
6. Conclusive Remarks and Suggestions for Further Work
The proposed method combines the decorrelation and shrinkage techniques to neural network-based approaches for noise removal purposes. The images are assumed to be transmitted as sequences of blocks of equal sizes, each block being distorted by a stationary statistical correlated noise and some amount of the noise being partially removed using the method that combines noise decorrelation and standard shrinkage technique. The preprocessing step provides, for each initial image, a sequence of blocks that are further PCA-compressed at a certain rate, each component of the resulting sequence being supplied as inputs to a feed-forward neural architecture FX→FH→FY. Therefore, each indexed block is processed by a neural network corresponding to that index value. The local memories of the neurons of the layers FH and FY are generated through a supervised learning process based on the compressed versions of blocks of the same index value supplied as inputs and the compressed versions of them resulting as the mean of their preprocessed versions. Finally, using the standard PCA-decompression technique, the sequence of the decompressed blocks is the cleaned representation of the initial image. The performance of the proposed method is evaluated by a long series of tests, the results being very encouraging as compared to similar developments for noise removal purposes. The evaluation of the amount of the noise removed is done in terms of some of the most frequently used similarity indicators, SNR, SNR-RMS, Peak SNR, and SSIM.
The results produced by applying the proposed method were compared to those produced by applying three of the most widely used algorithms for eliminating correlated noise. NLMF algorithm consistently produces weaker results than the proposed method. Using ProbShrink or BM3D, the results are similar to or weaker than those yielded by the proposed method, in both quality and quantity.
The long series of tests proved good results of the above-described methodology entailing the hope that further and possibly more sophisticated extensions can be expected to improve it. Among several possible extensions, some work is still in progress concerning the use of different output functions for the hidden and output neurons and the use of more hidden layers in the neural architectures. Also, some other compression techniques combined with new techniques for feature extraction as well as the use of other learning schemes to generate the local memories of the neurons are expected to allow the removal of a larger amount of noise.
Competing Interests
The authors declare that they have no competing interests.
Acknowledgments
A major contribution to the research work reported in this paper belongs to Mrs. Luminita State, a Professor and a Ph.D. Distinguished Member of the Romanian academic community; Professor Luminita State passed away in January 2016. The authors will always remember her amazing spirit, as well as her brilliant mind. May God rest her soul in peace!
PrattW.López-RubioE.Restoration of images corrupted by Gaussian and uniform impulsive noiseStateL.CocianuC.SăraruC.VlamosP.New approaches in image compression and noise removalProceedings of the 1st International Conference on Advances in Satellite and Space Communications (SPACOMM '09)July 2009Colmar, FranceIEEE9610110.1109/spacomm.2009.342-s2.0-71049160865ShamsiZ. H.KimD.-G.Multiscale hybrid nonlocal means filtering using modified similarity measureFieguthP.TanL.JiangJ.KayS.Egmont-PetersenM.de RidderD.HandelsH.Image processing with neural networks—a reviewHussainF.JeongJ.Efficient deep neural network for digital image compression employing rectified linear neuronsHussainA. J.Al-JumeilyD.RadiN.LisboaP.Hybrid neural network predictive-wavelet image compression systemBhattacharyyaS.PalP.BhowmickS.Binary image denoising using a quantum multilayer self organizing neural networkLiY.LuJ.WangL.TakashiY.Denoising by using multineural networks for medical X-ray imaging applicationsTurkmenI.The ANN based detector to remove random-valued impulse noise in imagesHaykinS.WuY.TraceyB. H.NatarajanP.NoonanJ. P.Fast blockwise SURE shrinkage for image denoisingHyvarinenA.KarhunenJ.OjaE.ShangL.HuangD.-S.ZhengC.-H.SunZ.-L.Noise removal using a novel non-negative sparse coding shrinkage techniquePopomarenkoN. N.LukinV. V.ZelenskyA. A.AstolaJ. T.AstolaJ. T.Adaptive DCT-based filtering of images corrupted by spatially correlated noise6812Image Processing: Algorithms and Systems VIJanuary 2008San Jose, Calif, USAProceedings of SPIE10.1117/12.764893PopomarenkoN. N.LukinV. V.EgiazarianK. O.AstolaJ. T.A method for blind estimation of spatially correlated noise characteristics7532Image Processing: Algorithms and Systems VIIIFebruary 2010Proceedings of SPIE10.1117/12.847986JohnstoneI. M.SilvermanB. W.Wavelet threshold estimators for data with correlated noisePortillaJ.StrelaV.WainwrightM. J.SimoncelliE. P.Image denoising using scale mixtures of Gaussians in the wavelet domainPižuricaA.PhilipsW.Estimating the probability of the presence of a signal of interest in multiresolution single- and multiband image denoisingGoossensB.LuongQ.PizuricaA.PhilipsW.An improved non-local denoising algorithmProceedings of the International Workshop on Local and Non-Local Approximation in Image Processing (LNLA '08)August 2008Lausanne, Switzerland143156JansenM.DabovK.FoiA.KatkovnikV.EgiazarianK.Image denoising by sparse 3-D transform-domain collaborative filteringChawlaM. P. S.PCA and ICA processing methods for removal of artifacts and noise in electrocardiograms: a survey and comparisonGriffantiL.Salimi-KhorshidiG.BeckmannC. F.AuerbachE. J.DouaudG.SextonC. E.ZsoldosE.EbmeierK. P.FilippiniN.MackayC. E.MoellerS.XuJ.YacoubE.BaselliG.UgurbilK.MillerK. L.SmithS. M.ICA-based artefact removal and accelerated fMRI acquisition for improved resting state network imagingCommonP.JuttenC.FukunagaK.GentleJ. E.JolliffeI. T.CocianuC.StateL.VlamosP.Neural implementation of a class of PCA learning algorithmsGnana SheelaK.DeepaS. N.Review on methods to fix number of hidden neurons in neural networksHuangG.-B.Learning capability and storage capacity of two-hidden-layer feedforward networksSeberG. A. F.WildC. J.GonzalesR.WoodsR.WangZ.BovikA. C.SheikhH. R.SimoncelliE. P.Image quality assessment: from error visibility to structural similarityhttp://www.geocities.ws/senthilirtt/Senthil%20Face%20Database%20Version1http://sipi.usc.edu/database/database.php?volume=sequencesBuadesA.CollB.MorelJ.-M.A non-local algorithm for image denoising2Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05)June 2005San Diego, Calif, USAIEEE606510.1109/cvpr.2005.382-s2.0-24644478715