A Hybrid Fuzzy Wavelet Neural Network Model with Self-Adapted Fuzzy c -Means Clustering and Genetic Algorithm for Water Quality Prediction in Rivers

Water quality prediction is the basis of water environmental planning, evaluation, and management. In this


Introduction
As environmental pollution becomes increasingly serious, water quality (WQ) has attracted extensive attention.Many countries have put forward a variety of pollution prevention measures to improve WQ and to protect the ecological water requirements [1].Online analytic instruments include important information from the state of the river system, so it can be utilized by operators to apply real-time management strategies and solve the problem of water pollution.However, due to the lack of suitable online analytic instrument (existing lag and instability) [2] for measuring chemical oxygen demand (COD) and turbidity (TU), the effective monitoring and control for WQ especially are hampered in the river system.Hence, in order to overcome these problems, it is necessary to develop a software sensor model to estimate hard-to-measure variables from other online measurable variables.The modeling nonlinearity of WQ parameters and WQ prediction based on soft-measurement theory are important issues in water resource management and environmental engineering [3].There are many mathematical modeling methods that have been used to forecast the longterm water demand [4,5].The traditional mathematical modeling technologies are restricted for applications of actual large-scale complex changing processes.The mathematical models could be set up, but their models are complex, so it is difficult to handle the nonlinear characteristic of water quality [6,7].
In order to overcome the shortcomings of the traditional mathematical modeling methods, intelligence techniques, such as the artificial neural network (ANN), the fuzzy logic (FL), and the wavelet transform (WT), were developed to model the nonlinearity of the WQ management process [8][9][10][11][12][13].The neural network (NN) is capable of modeling the complex relationships between the input and the output parameters without requiring a detailed mechanistic description of the phenomena [14], so it could be used to model the WQ management process.Faruk [15] used a backpropagation neural network to predict the WQ using 108-month WQ observational data, including water temperature, boron, and dissolved oxygen, in the Büyük Menderes River, Turkey.The results indicated that the model captures the nonlinear nature of the complex time series and produces more accurate predictions.Another continuous flow system performance was simulated using ANN by Ahmed [16] who developed a feedforward NN model and a radial basis function NN model to predict the dissolved oxygen from the biochemical oxygen demand (BOD) and the chemical oxygen demand (COD) in the Surma River.Although the ANN is successfully used for the modeling WQ management process, the ANN schemes still have a low convergence rate and a local minimum [17].
To overcome the shortcomings of the ANN, various types of novel integrated intelligent networks have been developed, such as the FNNs [2,[18][19][20] and the wavelet neural networks (WNNs) [21].WNN is a new branch of ANN theory research that is constructed by taking the wavelet transform as the preprocessor of the ANN.Wavelet transform has a wide range of applications in the processing of complex systems due to its time-frequency localization characteristics.The WNN is better than the traditional ANN in modeling precise and convergence rates and learning memory ability and accuracy [22,23].On the basis of the various online input variables, an ensemble modeling approach based on the wavelet analysis model was designed to forecast reservoir inflow [24].A wavelet-linear genetic programming model was proposed to monitor sodium (Na + ) concentration in rivers [25].The WNN has difficulty in overcoming the subjectivity of selecting fuzzy rules by human, which is one of the characteristics of the FL.Gharibi et al. developed a WQ index based on the FL for WQ assessment, particularly for the analysis of human drinking water [26].The results indicated that the method solves the uncertainty problem of the change of the WQ.A WQ index using a fuzzy inference system has been successfully proposed, since it has proven to be advantageous in addressing uncertainty problems [27].
A novel hybrid intelligent technique-fuzzy wavelet neural network (FWNN)-is proposed by combining NN, FL, and WT.The FWNN makes effective use of the selfadaptability of NN, the uncertainty capacity of FL, and the partial analysis ability of WT, so that the networks have better learning and higher speed and precision [28].The FWNN combines the time-frequency localization ability of the wavelet, the fuzzy inference of the FL, and the education character of the ANN together, to improve its ability to reach the global best results.The FWNN offers a much more elegant and powerful way for the modeling process and simulation of the water management process than the traditional modeling approaches.
In this study, a hybrid intelligent software sensor model based on the FWNN is developed for the real-time estimation of the WQ of the Pearl River, China.A self-adapted fuzzy c-means clustering was used to withdraw the system's character and to optimize the network space.A hybrid learning algorithm was presented to further improve the neural network prediction capabilities.The FWNN model was developed to predict and estimate the WQ based on the available historical data, such as COD, NH 4 + -N, dissolved oxygen (DO), electrical conductivity (EC), pH, water temperature (WT), and turbidity (TU), which could increase the safety and improve the operational performance of WQ management.

Materials and Methods
2.1.Study Area.The Pearl River rises in the Yunnan Plateau, flows through the Guizhou, Guangxi, Guangdong, Hunan, and Jiangxi Provinces of China and through the northern part of Vietnam, and finally pours into the South China Sea.The Pearl River, including the Xijiang, Beijiang, and Dongjiang Rivers, is the third longest river in China, with an overall length of 2320 km and a river basin area of 0.45 × 10 6 km 2 .As the largest tributary of the Pearl River, the Xijiang River is "gold belt" with condensed resources in South China, which connects the coastal developed areas to the Southwest of China.The accuracy and the efficiency of the developed prediction model were tested using observed water quality data at the Hengsha station of the Xijiang River in Zhongshan City (China) (see Figure 1).
The Hengsha station with complete daily sampling was selected.Six important WQ variables, COD, NH 4 + -N, DO, EC, WT, pH, and TU, were analyzed.COD is one of the most important parameters for the degree of water pollution.It is commonly referred to as an important indicator of organic substances in water and is one of the most important parameters in water monitoring.The TU is considered an integrative parameter and is an important index in water.TU is closely related to other parameters associated with water quality, such as the COD, or the concentrations of different substances related to pollution (ammonium, sulphate, and nitrate).Due to the large effects of the other parameters, TU has become indicative of the comprehensive effect of other parameters.TU is an integrative parameter.The COD (i) Layer 1 was the input layer, which was used to accept the present signal of the system.It consisted of the influencing factors (x 1 , x 2 , … , x n ) where the outputs were used as the input into the second layer.In this work, the input parameters were DO, pH, EC, WT, and NH 4 + -N; then, n = 5.
(ii) Layer 2 was the fuzzy layer, where the input characteristic variables were translated into fuzzy variables.In this work, the distribution of the WQ time series was close to a Gaussian function, so the membership function was composed of the Gaussian function [29], which made the network weight values have definite knowledge meaning.The outputs of the layer are shown below: where i and j are the numbers of input variables and fuzzy linguistic terms, respectively, and c ij and σ ij are the center and the width parameters of the Gaussian function, respectively.A self-adapted fuzzy c-means clustering was employed to determine the number of clusters for the sample data and then to establish the number of rules (48 fuzzy rules).So, the number of nodes was 48 × 5 in this layer.
(iii) Layer 3 was termed the fuzzy rule layer, which was used to realize the logical inference based on the fuzzy rule.Each node at layer 3 corresponded to one fuzzy logic rule according to the multiplication.So, there were 48 nodes in layer 3.In this layer, the outputs were given as follows: where n is the number of fuzzy rules.
(iv) Layer 4 was termed the wavelet layer.In this layer, WNNs were utilized as the consequence of the network, where the wavelet function substituted for the effective function of the connotative layer and was used for data denoising transformation.The input in this layer was from the product of the output from layers 3 and 4. The output of the WNNs with the jth wavelet neuron was represented as in which a ij and b ij are the dilation and the translation parameters of the wavelet network, respectively, and w j is the weight of the wavelet networks.The Mexican hat wavelet was the second derivative of the Gaussian function, which had excellent properties, particularly for the measurement of the instantaneous amplitude and frequency of oscillating signals.The Mexican hat wavelet was used as the mother wavelet, and the number of decomposition levels was the same as the number of fuzzy linguistic terms.
(v) Layer 5 was the output layer.In this layer, the total output of the network y was as follows: In this work, the WQ indexes, COD, and TU were chosen as the outputs of the network.
The FWNN combines the time-frequency localization ability of the wavelet, the fuzzy inference, and the education character of the ANN together, where the ability to reach the global best results was improved [30].

Training Algorithm to Optimize the Proposed FWNN.
A novel hybrid algorithm that combines the good property of 3 Complexity the global search of the genetic algorithm (GA) and the good property of the regional search of the gradient descent algorithm (GDA) was proposed to train the proposed FWNN and to optimize the network parameters.The center (c ij ) and the width (σ ij ) parameters of the Gaussian functions, dilation (a ij ) and translation (b ij ) parameters of the wavelet functions, and the weight (w j ) of the wavelet networks were determined.
GA is a simple, but efficient, combinatorial optimization algorithm based on the principle of nature evolution [31].GDA determined that it easily fell into the local optimum and was sensitive to the initial values.The GA initialized the network; then, the antecedent parameters and consequent parameters of the FWNN were simultaneously optimized with the GDA.In this work, an objective function was proposed to evaluate the minimum error (E) between the desired values (y d L ) and the output values of the FWNN (y k ), which is given as follows: The output of the FWNN according to the s-th chromosome with y k L was represented by the following equation: Here,   Complexity where i and j are the numbers of input variables and wavelet neurons, respectively.The operation of the chromosome was the following real coding set: where where i and j are the numbers of input variables and fuzzy linguistic terms, respectively.The initial parameters of the FWNN were optimized through selection, crossover, and mutation.The initial population size N pop was 100, crossover rat Pc was 0.7, and the interval of mutation Pm was 0.01.

Updating Parameters through the Gradient Descent
Algorithm.After the proposed FWNN was initialized by GA, the GDA was employed to update the parameters of the FWNN [29].The center (c ij ) and the width (σ ij ) parameters of the Gaussian functions, the dilation (a ij ) and translation (b ij ) parameters of the wavelet functions, and the weight (w j ) of the wavelet networks were adjusted according to the following function.
where y d t and y t were the desired values and the output values of FWNN at time t, respectively.The parameter values of the FWNN were calculated by the following formulas: where η and ξ are the learning rate and the momentum factor of networks, respectively.
The values of derivatives in (11) and (15) were calculated by the following formulas: Here, 2.3.Self-Adapted Fuzzy c-Means Clustering.In order to optimize the FWNN's fuzzy rules automatically, a novel validity function, B K , was proposed to solve problems regarding the partial optimization and to determine the cluster number [32].The numerator and the denominator of the validity function represented the sum of the distances between classes and the sum of the intradistances of all the clusters.Higher values for B K indicated better clustering results, and the best cluster number was obtained when B K reached its maximum value.The validity function was given as follows: where K and n are the classification number and the object number in the calibration data.u m ij is the membership function value, and m is the weight coefficient that represented the fuzziness of the WQ classification.In which, ⋅ represented the Euclidean distance measure.
The FCM clustering algorithm worked as an unsupervised classification method based on the minimization of a criterion function.The objective function was introduced based on the minimum square sum of the weighted Euclidean distances d ij .The objective function was defined as follows: where d ij stands for the Euclidean distance between an object and a cluster, which was defined by where x j and v i are the observed value and the cluster centroid, respectively.• stands for the Euclidean distance measure.For the individual objects, (20) was used for the counting membership values.The membership function value u ij to the i th cluster at the time k was defined as follows: After computing the membership values for all the calibration objects, the cluster centers v i were provided by the following equation: The minimization of ( 19) originated after providing the initial values for the cluster centers.Equations ( 20)-( 23) were repeated successively in each iteration step.

Results and Discussion
3.1.Data Collection and Preprocessing.The data between 2013 and 2014 was obtained from the China National Environmental Monitoring Centre and used to develop the software sensor, as seen in Figure 3.The 340 samples were collected from the Hengsha section of the Pearl River to form daily composite samples for analysis.Among the total numbers of data, the numbers for training and testing (predicting) were 300 and 40.
A database that contains system performance information is a prerequisite for the model development.Generally, it is developed by regularly collecting monitored parameters.The quality of the training database was critical for the model to produce correct information about the system.The database should contain adequate and correct information in the system for accurately describing the process.It remains common for a raw database to contain some redundant or conflicting data.Thus, it is necessary to pretreat the raw database by removing redundancies and resolving conflicts in the data.

FWNN Development.
In this study, the performance of the WQ software sensor system for forecasting the COD and the TU was simulated using the FWNN model.DO, pH, EC, WT, and NH 4 + -N were the inputs of the models, and the output variables were COD and TU.Through analyzing the data set with the self-adapted fuzzy c-means clustering, 48 fuzzy rule clusters were obtained.The selfadapted fuzzy c-means cluster algorithm that preprocessed 6 Complexity the raw training data was required to prepare a concise database and to serve as an actual training database.The software sensor system contained two FWNN models, the FWNNCOD and the FWNNTU, for predicting the COD and the TU.Each model had its own rule base, but they share the same input.The model was used for predicting the COD and the TU.The structure of the FWNN software sensor system is shown in Figure 2.
After the initial structure and the parameters of the FWNN model were determined, a hybrid learning algorithm that integrated the GA and the GDA was applied to train and optimize the network parameters.After the structure and the parameters of the FWNN were optimized by the GA, the GDA was employed to update the parameters of the network.

Simulation of the Hybrid FWNN Model.
Two forecasting models based on the FWNN were implemented onto MATLAB.The initial population size N pop was 100, the crossover rat Pc was 0.7, the interval of mutation Pm was 0.01, the maximum generation number was 200, the learning rate η was 0.01, and the momentum factor ξ was 0.5.Figure 4 shows the training process of the FWNN (take FWNNCOD for example).Figure 3 shows that the hybrid algorithm had a rapid convergence ability and it rapidly met the target error.The centers and the widths of the membership functions of the fuzzification layer (c ij and σ ij , respectively), the dilation and translation of the wavelet functions of the wavelet layer (a ij and b ij , respectively), and the weight of the wavelet networks (w j ) were decided, as shown in Table S1-S4 (Supplementary Information).
As shown in Figures 5 and 6, the forecasting results of the software sensor model based on the FWNN for testing datasets were demonstrated.The observed values were consistent with predictions made by the forecasting models.In order to evaluate the forecast accuracy of the proposed FWNN model, various performance indexes were used to assess the stability of the predicted values, which included the coefficient of determination (R 2 ), correlation coefficient (R), root mean square errors (RMSE), mean square error (MSE), and mean absolute percentage error (MAPE).As shown in Table 1, the performance indexes of the proposed FWNN models were acquired for testing datasets.
Table 1 shows that R values of 0.9499 and 0.9678 for the COD and the TU, respectively, were achieved by using the FWNN.The R 2 values for the COD and the TU were 0.9023 and 0.9366, respectively.MAPEs for the COD and the TU were 19.2030 and 26.3203, respectively.The RMSE values were 0.3127 and 4.5452 for the COD and the TU, respectively, via the FWNN.These results showed that the proposed FWNN model achieved a satisfactory performance for WQ prediction.
As seen in Table 1, the proposed FWNN model demonstrated a very satisfactory performance on the prediction of the WQ with high determination coefficient values (R 2 ) of about 0.90 and 0.94 for COD and TU, respectively.The high value of the determination coefficient indicated that only  The NN model was unable to extract the nonstationarity from the data set.The NN model demonstrated their weakness to extract the nonlinearity when the length of the training data set was the same.This showed that the NN model had a limited capability to extract the nonstationarity from the data sets.The performances of the WNN and the FNN models were better than that of the NN model.This was because the wavelet transformation decomposed components of time series data extracted different time-varying components, which could be representative of the sum of the subprocesses associated with the original time series data set.These different components facilitated the ability of the NN model that used the WTs and the fuzzy inferences to extract nonlinearity and nonstationarity, making its performance superior to the NN model developed using raw data sets.The performance of the FWNN model, which included the NN, the FL, the WT, and the GA, was found to be the best compared to those of the remaining models.
The FWNN model achieved better performances than the FNN, the WNN, and the NN models, which illustrated that the FWNN model for predicting the WQ was more accurate than the FNN, the WNN, and the NN models.The results clearly indicated that the FWNN model had a high ability to extract the dynamic behavior and the complex interrelationships from various WQ variables.The results of this study suggested that the FWNN model had a high ability to extract the dynamic changes of the water resource management system.
Considering the high level of complexity in WQ management, there was a large quantity of variable information spread in the dataset and the wide concentration ranges, such as good prediction performance of the FWNN model for the parameters.The FWNN was a good choice for modeling the WQ management.The simulated models, based on the FWNN model, can be effectively applied to WQ management in order to cope with water quality variations.The results indicated that after hybrid learning, the proposed FWNN could perform better in the WQ management process than the FNN, the WNN, and the NN.With the environmental standards maintained, the FWNN model could effectively achieve both environmental and economic objectives of the WQ management in real time.

Conclusion
The WQ prediction is an important means for understanding water pollution tendencies.This study presented a WQ model based on FWNN for estimating for the WQ data sets.The performance of the FWNN model was compared to the performances of the three different models: the traditional ANN, the WNN, and the FNN models.The proposed WQ prediction model, based on the FWNN, produced better performance than the other three models.The descriptive performance indexes proposed by the FWNN model indicated that the FWNN could handle the severely fluctuating time series data of the better WQ accuracy.The proposed hybrid approach provided an effective and useful tool for modeling the WQ time series, which enables engineers to monitor various WQ parameters for improving WQ management.

Figure 1 :
Figure 1: Map of the Guangzhou section of the Pearl River under consideration.

Figure 3 :
Figure 3: Variations of water quality parameters.

Table 1 :
Performances of FWNN, WNN, and NN in modeling water quality management. of the total variations for COD and TU, respectively, were not explained by the proposed FWNN model.According to the high R for the COD and the TU, the predicted values of the new model were closer to the measured ones.According to the values of other descriptive performance indexes (MAPE, RMSE, and MSE), the developed FWNN model showed a superior prediction performance.There was a small deviation produced by the developed FWNN model.Using the WQ prediction model, the unfavorable influence of the river water function by a user's water drainage was estimated.This was attributed to the severe changes of concentration of the COD and the TU in the Zhongshan region.3.4.Comparisons with FNN, WNN, and NN.The developed FWNN model were compared with the FNN, the WNN, and the NN models to demonstrate the correctness, the efficiency, and the advantages of the hybrid network.The FWNN model had smaller RMSE (or MSE) and MAPE and higher R 2 and R values, as seen in Table1.When the FWNN model was used for predicting the COD in rivers (example being the COD), R, R 2 , MAPE, RMSE, and MSE values were 0.9499, 0.9023, 19.203, 0.3127, and 0.0978, respectively.When the FNN, the WNN, and the NN models were used to predict the COD in rivers, the R values were 0.8129, 0.7733, and 0.5639, respectively.The R 2 values were 0.6608, 0.598, and 0.3180, respectively.The MAPE values were 26.2873, 46.9316, and 34.0226, respectively.