A Nonlinear Gradient Domain-Guided Filter Optimized by Fractional-Order Gradient Descent with Momentum RBF Neural Network for Ship Image Dehazing

To avoid the blurred edges, noise, and halos caused by guided image filtering algorithm, this paper proposed a nonlinear gradient domain-guided image filtering algorithm for image dehazing. To dynamically adjust the edge preservation and smoothness of dehazed images, this paper proposed a fractional-order gradient descent with momentum RBF neural network to optimize the nonlinear gradient domain-guided filtering (NGDGIF-FOGDMRBF). Its convergence is proved. In order to speed up the convergence process, an adaptive learning rate is used to adjust the training process reasonably. The results verify the theoretical results of the proposed algorithm such as its monotonicity and convergence. The descending curve of error values by FOGDM is smoother than gradient descent and gradient descent with momentum method. The influence of regularization parameter is analyzed and compared. Compared with dark channel prior, histogram equalization, homomorphic filtering, and multiple exposure fusion, the halo and noise generated are significantly reduced with higher peak signal-to-noise ratio and structural similarity index.


Introduction
Marine accidents caused by visibility account for a considerable proportion of marine meteorological accidents. Effective image defogging methods can protect the safety of ship navigation [1]. Based on the theory of dark channel, Dong proposed an improved algorithm to remove fog outside the ship cabin [2]. It used the method of guided filtering instead of soft matting to optimize the transmittance and improved the estimation method of atmospheric light intensity. However, when the fog is very thick or the picture itself is white and there is no shadow coverage, the dark channel theory cannot be applied. Wu et al. used the dark channel prior defogging algorithm to reduce the influence of cloud cover on target recognition and improve the recognition effect of marine ships under complex weather conditions [3]. However, it has no obvious effect on large targets. Li et al. pro-posed a method based on the prior fusion of retinex and dark channel to enhance the defogging of sea cucumber images [4]. However, the disadvantage of this method is the increased processing time.
Among image dehazing methods, guided image filtering (GIF) is a kind of image filtering technology. It filters the input target image based on a guided graph. However, GIF also has some disadvantages. For example, its regularization coefficient in the objective function is fixed and does not change with the spatial position of the image. The detail layer adopts the fixed gain method and may cause additional noise in the background. Therefore, gradient identification has been used for improving the dada filter method [5]. Ding proposed maximum likelihood recursive identification for the multivariate equation-error autoregressive moving average systems using the data filtering [5]. Liu et al. proposed a gradient domain-guided image filtering, which contained a clear first-order edge perception constraint for better preserving edges [6]. Khou et al. proposed a probability pansharpening method based on gradient domain-guided image filtering to fuse panchromatic and multispectral images [7]. Zhuang et al. proposed a new MRI reconstruction algorithm based on edge-preserving filtering [8]. Zhuang et al. introduced side window guided filtering [9]. Yin et al. proposed a novel edge-preserving smoothing algorithm for multiscale exposure fusion to preserve the detail information of the brightest or darkest region [10].
Therefore, taking into account the above analysis, this paper proposed a fractional-order gradient descent with momentum RBF neural network to optimize gradient domain-guided filtering. Neural network itself has some shortcomings, such as slow learning speed, easy to fall into local extremum, and learning and memory instability. The weights of the neural network are obtained by training the network. Gradient descent (GD) is a basic method for updating and optimizing the weights of neural networks [11,12]. Standard GD has two main shortcomings: slow training speed and easy to fall into local optimal solution. On the basis of GD, stochastic gradient descent (SGD) divides the total data into several small batches according to the data distribution and updates the parameters with small batches of data [13]. The calculation time of each update step does not depend on the number of training samples. It can converge with very large number of training samples. However, it is difficult to choose a suitable learning rate. To avoid the high variance oscillation of SGD, momentum method was proposed [14]. It simulated the inertia of moving objects, by considering the former related training directions and weakening the irrelevant directions. Momentum can accelerate the learning in the related direction, restrain the oscillation, and accelerate the convergence, especially when dealing with high curvature, small but consistent gradient or gradient with noise. However, it is difficult to choose a better learning rate.
To solve this problem, the gradient descent with momentum (GDM) was proposed [15]. By accumulating the past gradient values, the fluctuation on the path to the optimal value is reduced and the convergence is accelerated. GDM can accelerate learning when the direction of the current and past gradients is uniform, while restrain oscillation when the direction of current and past gradients is inconsistent.
Fractional-order calculus, proposed in 1965, is an important branch of mathematics and has successfully demonstrated to have more advantages than integer-order calculus in the field of neural network. It has faster response speed and smaller buffeting effect [16]. Kinh et al. proposed a fractional gradient descent method for training BP neural networks [17]. Wang et al. proposed a new fractional gradient descent learning algorithm for radial basis function neural networks [18]. Khan et al. proposed a fractional gradient descent method for training BP neural networks [19].
However, there is still no fractional gradient descent with momentum algorithm for training neural network. This paper proposed fractional-order gradient descent with momentum method for training RBF neural network. The aim of the paper is to improve the guided filter for defogging ship images. The algorithm is design for avoiding the blurred edges, noise, and halos caused by the original guided image filtering algorithm. The proposed method estimates the optimal parameters of nonlinear gradient domain-guided image filter. The main contributions of this paper are as follows: (1) A nonlinear gradient domain-guided image filtering algorithm was proposed (2) The optimal value of model parameters is proved where IðxÞ denotes the observed hazy image, JðxÞ denotes the clean image to be recovered, tðxÞ denotes the the transmittance, A denotes the global atmospheric light, and tðxÞ is defined as [1] where dðxÞ denotes the distance between the camera to the image scene and β denotes the scattering coefficient.

Gradient Domain-Guided Filtering.
In the gradient domain-guided filtering algorithm, the filtered image q is assumed to be a linear transform of the guidance image I in the window Ω [2]: wherea, b are two constants in the window. The cost function is defined as [2] where p denotes the input image to be filtered. Denote the regularization parameter as ε = λ/Λ. Λ is the edge-aware weighing parameter, λ is the regularization parameter, and γ is a coefficient for distinguishing edges from smooth areas defined as [2] where μ is the mean value of all p pixels and η is calculated as [2] 2.3. RBF Neural Network. In 1985, Powell proposed a radial basis function for multivariate interpolation. RBF neural network is a three-layer neural network, which includes an input layer, hidden layer, and output layer. A radial basis function is used to form the hidden layer space, so that the input vector can be directly mapped to the hidden space without weight connection. When the center point of RBF is determined, the mapping relationship is determined. The mapping from the hidden layer space to the output space is linear. The output of the network is the linear-weighted sum of the output of the hidden unit, and the weight is the adjustable parameter of the network.
The mapping from input to output of the network is nonlinear, while the output of the network is linear to the adjustable parameters. The weight of the network can be directly solved by the linear equations, thus greatly speeding up the learning speed and avoiding the local minimum problem.
The neural network is used to determine the optimal parameters of the nonlinear-guided filter, so as to minimize the difference between the input image p and the output image I. The output image retains the overall characteristics of the input image, and the filtering results can fully obtain the change details of the guide image.
The most frequently used radial basis function is Gauss function: where x is the input vector, kxk denotes the Euclidean Norm of x, φ i denotes the radial basis function, x i denotes the central vector of the function, σ j denotes the width of the radial basis function, μ i denotes the threshold vector, P denotes the number of hidden layer nodes, N denotes the number of input training samples, and y denotes the output of the neural network.
2.4. Fractional-Order Calculus. The Riemann-Liouville definition of fractional calculus is as follows: Definition 1. For the absolute integrable function xðtÞ in the interval ½t 0 , t, its Riemann-Liouville integral is as follows: where the real part of α is positive and ΓðxÞ is the gamma function.
Definition 2. For the absolute integrable function xðtÞ in the interval ½t 0 , t, its Riemann-Liouville differential is defined as where α ∈ ½m − 1, mÞ and m is a positive integer.

Nonlinear Gradient Domain-Guided Filter.
We suppose the filtered image q is a nonlinear transform of the guidance image I: where α is the exponent. In order to avoid gradient reversion, we also set the following restrictive conditions: when α = 1, (14) will degenerate into (3). Therefore, the gradient domain-guided filtering is a special case of the newly proposed nonlinear gradient domain-guided filtering algorithm. The optimization of guided filter is to search out optimal factors a and b, so as to minimize the difference between the input image p and the output image I.

Theorem 4.
The optimal values of a and b are computed as where |w k | is the number of pixels in the window.
Proof. The noise is defined as Substituting (14) into (18) yields The ultimate goal is to minimize this noise. Therefore, the 3 Journal of Sensors cost function can be written as where w k is the window centered at pixel k. Seeking partial derivative of network parameters can obtain Therefore, the following results hold: The proof of this theorem relays in the same principles as ordinary least squares. Thus, this completes the proof of Theorem 4. Next, the above model will be applied to the entire image filtering window. But each pixel is contained in multiple windows. For example, if a 3 * 3 window filter is used, every point except the edge area will be included in nine windows. Therefore, for different windows, we will get |w k | numbers of q i value. Denote The final result is obtained by averaging all the q i values.
Thus, the mapping from I to q for each pixel has been established.

Proposed
Define the objective function as Denote According to the gradient descent with momentum algorithm, one can obtain where is η > 0 the learning rate and γ n is the momentum coefficient designed as below: where γ ∈ ð0, ηÞ is the momentum factor and k⋅k is the Euclidian norm. According to the definition of Caputo fractional derivative, one can obtain 3.3. Convergence Analysis of FOGAM-RBF. The following assumptions are given: (A1) jφj, jφ ′ j, jφ ″ j, jyj, jy ′ j, jy ″ j are uniformly bounded: (A2) w are uniformly bounded: This condition can be easily satisfied since the most common Gauss function is uniformly bounded and differentiable.
The batch size is 256. The initial fractional order is 0.8. The learning rate is 0.9. The algorithm is tested on computer with Intel (R) Core (TM) i3-4150T CPU @ 3.00 GHz 3.00 GHz, operating system of memory 4.00 GB 64 bit, and x64 based processor. Figure 1 show that compared with other algorithms, NGDGIF-FOGDMRBF can significantly reduce halo and noise and obtain clear details. Table 1 shows the PSNR results of seven images dehazed by DCP, HE, guided image filtering algorithm, gradient domain-guided image filtering algorithm, and neural network using nonlinear gradient domain-guided image filtering algorithm, respectively.

Comparison of Different Guided Filter Algorithms.
In Table 1, we can see that NGDGIF-FOGDMRBF has the strongest ability to suppress noise among those different algorithms, which is consistent with the previous analysis.    Journal of Sensors Table 2 shows the SSIM results of seven images dehazed by DCP, HE, guided image filtering algorithm, gradient domain-guided image filtering algorithm, and neural network using nonlinear gradient domain-guided image filtering algorithm, respectively.
In Table 2, the PNSR and SSIM values of the method in this paper is the largest. Because other algorithms do not well suppress noise, and the edge part is also obscure, which cannot effectively maintain the actual structure of original images especially the details; their SSIM values are smaller. Table 2 can verify this analysis conclusion. Figure 2 shows the PSNR and SSIM results of four images dehazed by nonlinear  13 Journal of Sensors gradient domain-guided image filtering algorithm with different exponent ɑ. Figure 2 shows that with the increase of exponent ɑ, the PSNR and SSIM are increased. However, when exponent ɑ exceeds a certain threshold, the PSNR and SSIM begin to decrease. Since linear model is a special case of nonlinear model, the nonlinear model has the advantages of wider parameter selection range and more flexible.  Tables 3 and 4 show that FOGDMRBF-GDGF has the strongest ability to suppress noise among those different algorithms consistent with the previous analysis. Other algorithms produce obscure edges different from the actual structure of original images especially the details. So their SSIM value is smaller. 4.5. Impact Analysis of Parameters. The PSNR with different regularization parameters for image No.05 is listed in Table 5. With the increase of regularization parameter, the PSNR is improved. When the regularization parameter exceeds a certain threshold, the PSNR begins to decrease.

Impact Analysis of Exponent.
The optimal regularization parameter calculated by FOGDMRBF is also 0.04. By the neural network, the regularization parameter can be dynamically adjusted with different images and adaptive to different detail layers. The neural network optimization has the advantages of wider parameter selection range and more flexible.
In formula (4), a determines the gradient preservation of the final image and represents the image edge preservation. When a is small, the gradient is small, the image edge is blurred, and the smoothing force is greater, and vice versa. The regularization parameter ε is used to prevent a from becoming too large; thus, it is less than 1. The smaller ε is, the smaller the superposition smoothing effect is, and vice versa. Therefore, the guiding filter uses a and ε to determine the edge preservation and smoothness of the output image.

Conclusion
The research work of this paper is summarized as follows. A nonlinear gradient domain-guided image filtering algorithm was proposed. The optimal value of model parameters is proved. Fractional-order calculus is applied to gradient descent with momentum algorithm for training neural network. The new algorithm is used to adjust the weights of the neural network to improve its learning speed and performance for optimizing gradient domain-guided filtering. The convergence of the FOGDM-RBF is proved.
Future research will continue to improve the neural network for better accuracy and convergence speed for better dehazing more complex images.

Data Availability
The data used to support the findings of this study are included within the article.

Conflicts of Interest
The authors declare no conflicts of interest.