Noise Attenuation of Seismic Data via Deep Multiscale Fusion Network

Convolutional neural network(CNN-) based deep learning (DL) architectures have achieved great success in many fields such as remote sensing, medical image processing, and computer vision. Recently, CNN-based models have also been attempted to solve geophysical problems. This paper presents a noise attenuation method of seismic data via a novel deep learning (DL) architecture, namely, deep multiscale fusion network (MSFN). Firstly, we integrate multiscale fusion (MSF) block to adaptively exploit local signal features at different scales from seismic data. And then, a series of stacked MSF blocks are formed into MSFN, which can restore the noisy seismic data effectively and preserve more useful signal information. Furthermore, a comparative study of our method and other leading edge ones is conducted by using synthetic seismic records and the SEG/EAGE salt and overthrust models. The results qualitatively and quantitatively show the capability of our method of achieving higher peak signal-to-noise ratios (PSNRs) while preserving much more useful information, comparing with other methods. Finally, our method is utilized in the real seismic data processing, obtaining satisfactory results.


Introduction
It is crucial to depict the underlying geological structures using the information contained in the seismic data acquired through the use of various sensing equipment and networks [1][2][3][4][5][6][7]. However, the reliability of seismic analysis is degenerated due to the random noise in seismic data. Hence, noise attenuation plays a critical role in improving signal-to-noise ratio (SNR) for geological interpretation based on seismic data.
In recent years, with the gradual extension of the field of seismic exploration, the deepening of exploration depth, and the increasingly complex exploration environment, the noise also increases significantly and can be more complex. This will hinder the realization of high-precision seismic exploration. So, remarkably improving the SNR becomes the most important and basic task. However, conventional seismic data denoising methods are difficult to satisfy the demands of high-precision seismic exploration. Therefore, it is urgent to develop a more effective new technique.
In the present work, a noise attenuation method of seismic data via a novel deep learning architecture is proposed. Our contributions in this paper are threefold: (i) We propose MSF block to adaptively exploit local signal features at different scales from seismic data (ii) A series of stacked MSF blocks are formed into MSFN, which can restore the noisy seismic data effectively and preserve more useful signal information (iii) The superior of our method over other leading-edge methods is demonstrated with the synthetic seismic records, SEG/EAGE salt and overthrust models, and real seismic data The remainder of the paper is structured as below. Section 2 reviews related work. Section 4 presents a detailed description of the suggested scheme, and Section 5 validates the proposed method. Finally, the conclusions of this paper are summarized in Section 5.

Related Work
At present, numerous seismic denoising approaches [8][9][10][11][12][13][14][15][16][17][30][31][32][33] including some new methods [10,13,29] have been suggested. Actually, seismic random noise, which penetrates the whole time domain, is the most common in all types of noise for seismic data. And it can seriously interfere with effective seismic signals, thus resulting signal perturbation. Various effective random noise attenuation approaches, e.g., the empirical mode decomposition-(EMD-) based methods and the sparse transform-based approaches have been proposed on the basis of the initial denoising method developed by Canales [26]. Chen and Ma [32] proposed to use f-x EMD predictive filtering to remove the random noise. Liu et al. [33] presented a random noise attenuation method based on variational mode decomposition to perform seismic time-frequency analysis. Chen and Fomel [12] suggested a novel random noise attenuation method based on an EMDseislet transform. Neelamani et al. [9] presented a coherent and random noise attenuation method based on the curvelet transform. Zhang and Lu [8] proposed a wavelet transformbased denoising approach and achieved improved resolution of seismic data. Subsequently, some improved and/or combined transform domain based methods were proposed [8][9][10][11][12][13][14][15], which achieves good results.
Compared with conventional superresolution (SR) methods, the CNN-based schemes from the first SRCNN [18] to the latest feedback network [29] can remarkably improve the SR quality. The shallow structure of SRCNN limits its performance. To overcome this drawback, deepening structures were adopted in networks. For example, a deeper structure was used in the VDSR model proposed by Kim et al. [20]. Several new very deep models, e.g., RCAN [21], achieved outstanding SR performance. Besides, dense connections integrated SR models, e.g., SRDenseNet [23] and MemNet [25], displayed a better resolution. Moreover, by connecting all the same signal feature extraction (SFE) modules in the entire network, the efficiency of the constructed SR methods based on CNNs, e.g., RDN [26], IDN [27], MSRN [28], and SRFBN [29], could be increased, indicating each block was crucial.

Proposed Method
This section presents the network architecture of a novel seismic data denoising method (MSFN). The structure has two parts, namely, a shallow signal feature extraction (SSFE) and a deep signal feature extraction (DSFE) module, as shown in Figure 1. Let us denote the clean data and the noised data by I H and I L , respectively, by solving the problem: where L denotes the loss function which can minimize the discrepancy between the clean data I H and the noised data I L , N denotes the number of training samples, and θ = fW 1 , W 2 , ⋯, W p , b 1 , b 2 , ⋯, b p g is a set of weights and biases of the pth convolutional layer. The mean square error (MSE) function [26] and L2 function are the two most popular objective optimization   [23]. To reduce computations and avoid introducing unnecessary training tricks, as a better alternative, a mean absolute error (MAE) function L 1 are used and given by where k⋅k denotes L1 norm. So, the shallow feature F 0 can be 3 Wireless Communications and Mobile Computing extracted by two convolution layers as follows: where and are the SSFE convolution operations of two layers. After that, F 0 is employed in DSFE module, containing a cascaded MSF block set. Then, the output information is adaptively controlled by using a 1 × 1 convolutional layer as follows (named as feature fusion): where denotes the composite function of a 1 × 1 convolutional layer and ½F 1 , F 2 , ⋯, F M is the feature map set produced by all MSF blocks. By using the global residual learning, we get the feature maps F DF by Figure 2 shows the proposed MSF block. A three-bypass network with various convolutional kernels for each pass is constructed in each MSF block. So that, the signal features at different scales can be detected because the information can be shared between these bypasses. According to [18], we define the operation as follows: where W 1 3×3 , W 1 5×5 , and W 1 7×7 refer to the weights of 3 × 3, 5 × 5, and 7 × 7 convolutional layers in Figure 2, respectively; W 2 1×1 and W 3 1×1 refer to the 1 × 1 convolutional weights of the second and third layers, respectively; b denotes the bias and ½A 1 , A 2 , A 3 denote the feature map set produced by A 1 , A 2 , and A 3 . F d−1 and F d are the input and output of the dth MSF block, respectively. σð⋅Þ denotes the ReLU function [34].

Experimental Results
The qualitative and quantitative experiments are conducted to evaluate the performance of our method. In this work, three traditional seismic denoising methods (wavelet-based threshold denoising (WTD), curvelet-based threshold denoising (CTD), and shearlet-based threshold denoising (STD)) and one deep learning-based method (information distillation network (IDN) [27]) are selected for the comparative study.
The basic data can be synthesized with 24 seismic records including linear, curvilinear, fault, and various dip angle events. The trace number is 150 and the sampling frequency is 1000 Hz. The selected seismic wavelet is Ricker wavelet, where f denotes the sampling frequency and t denotes time. Figure 3(a) presents partial synthetic seismic records. The SEG/EAGE salt and overthrust models [35] are used to obtain the immigrated stack profile (Figure 3(b)). At the same time, these two types of seismic data are rotated by 45°, 90°, 135°, 180°, 270°, and 360°, respectively, following [25]. To obtain additional expanded versions, random noises of various levels are added to the original and rotated data, and training sets are the 80% versions, with the rest as test sets.
Our MSFN contains 12 MSF blocks and all convolutional layers have 32 filters. There are 24 pixels overlapping for training in the cropped training seismic data of 48 × 48 patches. The batch size is set as 64. The leaning rate reduces by half for every 50 epochs and the initial value of 10 −4 for all layers. Our model is trained with Tesla k80 GPUs, and time is about 14 hours.
The denoising performance of our method is evaluated as below. All models are trained with the same training set for fair comparison. And the codes of contrastive methods are publicly released. The reconstruction results are justified by a quantitative evaluation metric of the PSNR [36], which can be calculated as follows: where X denotes the clear data of size M × N, X ′ denotes the denoised seismic data, Xði, jÞ and X ′ ði, jÞ are the values of element ði, jÞ of X and X ′ , respectively, and MAX I denotes the maximal signal intensity can be possibly achieved. Firstly, synthetic seismic records are used in the comparing study of our method and the traditional WTD, CTD, and STD methods and deep learning IDN model, and the results are presented in Figure 4. A better result was achieved by our method with higher PSNR value, compared with other methods. In addition, the performance of our method is also quantitatively evaluated on synthetic seismic records. Table 1 shows the PSNRs (dB) with bolded optimal values. The comparison indicates much higher PSNRs of our method than that of others. Table 2 presents the PSNRs (dB) on synthetic seismic records 1-3 scale fusion networks. It can be seen that 3 scale fusion network achieve the best results.
Secondly, the SEG/EAGE salt and overthrust models are used for evaluating our method. Table 3 shows the PSNRs (dB) with bolded optimal values. The significantly higher PSNR values of our method are obtained again, comparing with other methods. Particularly, our method shows a more considerable performance when the level of noise in the seismic data increases. In Table 3, the higher the noise level is, the lower the SNR is. Table 4 presents the PSNRs (dB) on SEG/EAGE salt and overthrust model for 1-3 scale fusion networks. It can be seen that 3 scale fusion network achieve the best results. Besides, a qualitative comparison between our model and a deep learning-based one and the results are presented in Figure 5. We have Figure 5(b) by adding random noise to the clean seismic data (Figure 5(a)). Figures 5(c) and 5(d) are the obtained denoised results. Obviously, our method is an ideal denoising method for removing random noises while keeping coherent details.
Furthermore, we select the field data examples (noisy seismic data of Liaohe depression, China) in the same data acquisition work area with the same way of excitation and reception to validate the processing result of the proposed method. We utilize traditional random noise reduction 5 Wireless Communications and Mobile Computing modular of large processing system to roughly denoise these data, guaranteeing no loss of valid information. The denoised data are view as targeted clear data. Due to the generalization ability of deep learning, we add random noise of various levels to the targeted data with the aim to learn and recognize noise and effective signals. Similarly, to obtain additional expanded versions, we rotate these real seismic data by 45°, 90°, 135°, 180°, 270°, and 360°, respectively. 80% versions are selected as training sets; the rest is as test sets. Figures 6(a) and 6(b) present the original noisy data and the denoised result by our method, respectively. Some effective signals highlight, especially the region in the red ellipse; the interlayer structure is clearer; and the continuity of the events is also enhanced, as shown in Figure 6.

Conclusions
We propose a novel network MSFN based on CNNs to denoise seismic data, wherein a cascaded MSF block set and seismic data features are exploited to perform noise attenuation. The results qualitatively and quantitatively demonstrate our scheme is much superior to other leading edge ones especially in promoting the seismic data restoration ability.