A Fusion Water Quality Soft-Sensing Method Based on WASP Model and Its Application in Water Eutrophication Evaluation

Water environment protection is of great significance for both economic development and improvement of people’s livelihood, where modeling of water environment evolution is indispensable in water quality analysis. However, many water quality indexes related to water quality model cannot be measured online, and some model parameters always vary among different water areas. Thus, this paper proposes a water quality soft-sensingmethod based on the water qualitymechanismmodel to simulate evolution of water quality indexes online, where unscentedKalmanfilter is utilized to estimatemodel parameters. Furthermore, amodified fuzzy comprehensive evaluation method is presented to evaluate the level of water eutrophication condition. Finally, the water quality data collected from Taihu Lake and Beihai Lake are used to validate the effectiveness and generality of the proposed method. The results show that the proposed soft-sensing method is able to describe the variation of related water quality indexes, with better accuracy compared to nonlinear least squares based method and traditional trial-and-error based method. On this basis, the water eutrophication condition can be also accurately evaluated.


Introduction
With the rapid development of modern society, production of industrial and sanitary sewage is daily increasing, and eutrophication phenomenon of lakes and reservoirs is becoming much more serious [1,2].Generally, the occurrence of eutrophication is related to excessive nitrogen, phosphorus, and other inorganic nutrients in water, where nitrogen and phosphorus are the main reasons accounting for the eutrophication of slow flow water, such as lakes, reservoirs, and bays [3][4][5].Currently, eutrophication phenomenon exists in 54% of the lakes of Asia-Pacific region [6].Therefore, how to economically and effectively handle the eutrophication problem has become an urgent priority.
In order to timely evaluate or predict the eutrophication condition, the variation of water quality indexes should be timely measured or learned [7,8].However, many water quality indexes, such as biochemical oxygen demand (BOD) and total nitrogen (TN), cannot be measured online [9].Thus, the manner of soft-sensing is introduced to overcome this limitation in this paper.Soft-sensing is to establish a mathematical relation model between easily measured process variables and difficultly measured process variables based on mechanism analysis and sensor data mining [10].The existing soft-sensing modeling approach can be classified into three types: mechanism modeling, identification modeling, and artificial intelligence-based modeling [11][12][13].The mechanism modeling approach is to obtain a mathematical expression based on the analysis of the system's internal relations, which adopts the basic physical and chemical laws, such as material, energy, or momentum conservation relation [14].The identification modeling approach is to establish a mathematical model based on the information of system input and output by certain parameter identification, filtering, or regression analysis methods, without understanding the mechanism of the dynamic process [15].The artificial intelligence-based modeling approach is to get an underlying model of the real-world system or a portion of system based on artificial intelligent methods [16].In addition, there also exist fusion methods derived from a combination of the above approaches; that is, the mechanism modeling approach is utilized to describe partial behavior of the studied system with known mechanism, and identification modeling approach or artificial intelligence-based approach is used to handle the remaining part.
Water Quality Analysis Simulation Program (WASP) mechanism model is a comprehensive water quality model that can be used to interpret the process of natural or artificial water quality deterioration [17].It can simulate migration and transformation of conventional water quality indexes (including dissolved oxygen (DO), BOD, and nutrients) and toxic contaminants (including organic chemicals, metals, and sediment) in water [18].Currently, WASP model has been widely applied to different water areas, such as Mobile Bay [19], Murderkill River [20], Lake Michigan [21], and Songhua River [22].However, WASP also has a limitation.That is, some model parameters vary among different water areas, and their values are always determined by trial-and-error method [23].This is insufficient for accurately modeling the water quality variation of a specific water area.
Therefore, this paper builds a fusion water quality softsensing method, where the WASP model is employed as a soft-sensing method and its unknown parameters are estimated by the unscented Kalman filter (UKF) [24].Then, the variations of DO, BOD, nitrate nitrogen (NO 3 -N, related to TN), ammonia nitrogen (NH 3 -N), phytoplankton carbon (Phyt, related to chlorophyll a (Chl a)), and so forth can be simulated by the fusion water quality soft-sensing method.On this basis, a modified fuzzy comprehensive evaluation method is presented to evaluate the eutrophication condition of the rivers and lakes, combining both the simulated values of DO, BOD, TN, and Chl a from the soft-sensing method and the online measured values of transparency (SD) and total phosphorus (TP) [25].Finally, by taking Taihu Lake and Beihai Lake as examples, the effectiveness and generality of the proposed method are validated and the water eutrophication condition is evaluated.Comparative studies are also presented and discussed.
The remainder of this paper is organized as follows.Section 2 presents the methodology of the WASP based water quality soft-sensing method, where a simplified WASP model is presented and the procedure of unknown parameters estimation by UKF is listed.Section 3 introduces the modified fuzzy comprehensive evaluation method, where the modified methods of selecting water quality indexes and calculation of the corresponding weight are presented.Section 4 presents two case studies of Taihu Lake and Beihai Lake to validate the effectiveness and generality of the proposed fusion softsensing method and the water eutrophication evaluation method.Section 5 gives the conclusion and indicates future development.

Dynamic Model of Water Quality Based on WASP.
Eutrophication module (shorten as EUTRO) is an essential part of the WASP model.This module describes the dynamic behavior of water quality indexes including DO, BOD, Phyt, NO 3 -N, NH 3 -N, organic nitrogen (ON), and organic phosphorus (OP).The interacting relations of them can be represented by four reaction systems, namely, phytoplankton kinetics, phosphorus cycle, nitrogen cycle, and DO balance.
Let  = ( 1 ,  2 ,  3 ,  4 ,  5 ,  6 ,  7 ,  8 ,  9 ,  10 )  .Here, assume that the unknown parameters are piecewise-constant.Then, the process equation can be written as Further let  = [  ,   ]  and the process noise item be added.An augmented process equation is obtained as [26] Ẋ = [  () where  is the process noise, satisfying that  ∼ (0, ),  is the covariance matrix.Then, the observation equation is set as follows: where ℎ is the observation matrix, describing the mapping from state indexes to observations; V is the measurement noise, satisfying that V ∼ (0, );  is the covariance matrix.
Up to now, we obtain a continuous-time dynamic model of water quality indexes as follows: To obtain the discrete-time model, the fourth-order Runge-Kutta is utilized, where the step length is .The formula is as follows: Then, ( 5) is discretized as
Step 1 (UT transformation).In the original state distribution, some sampling points are selected according to certain rules, so that the mean and covariance of these sampling points are equal to the mean and covariance of the previous state distribution.These points are substituted into the nonlinear function, and the corresponding set of the nonlinear function values is obtained.Then, the mean and covariance of the nonlinear transformation are obtained from these sets of points.
First, compute the 2 + 1 sigma points, namely, sampling points: where  is the number of the state dimension.
Second, calculate the associated weight  of the sampling points: where  is the estimated covariance, satisfying ( √ )  ( √ ) = ; ( √ )  is the square root of the th column in ;  is scaling parameter used to reduce the total prediction error;  controls the spread of sampling points;  is the selected parameter, and its value is not bounded generally, but it is usually necessary to ensure the semidefinite of ( + ) matrix.Under normal circumstances,  = 10 −3 and  = 0.  is a nonnegative weighting index, and the optimal value is 2 for a Gaussian distribution of .
Step 2. Compute predicted state    and predicted covariance    : Step 3. Compute predicted ŷ , measurement covariance      , and cross-covariance of the state and measurement      : Step 4. Compute gain   , updated state    , and covariance   : From the estimate of the augmented variable , the estimated values of unknown parameters can be obtained as θ = [ k1 , k2 , . . ., k10 ].

Eutrophication Condition Evaluation Based on Modified Fuzzy Comprehensive Evaluation Method
On the basis of the output from the proposed fusion water quality soft-sensing method and online measurements of water quality indexes, a modified fuzzy comprehensive evaluation method is used to evaluate the water eutrophication condition.
Fuzzy comprehensive evaluation method adopts fuzzy mathematical theory to obtain a quantitative evaluation result of an object in view of the complexity of object and fuzziness of the evaluation index.The procedure of the proposed fuzzy comprehensive evaluation method is presented as follows.
Step 1 (water evaluation index selection).In the traditional fuzzy comprehensive evaluation method, the key evaluation indexes, which have great influence on the water environment, are obtained by empirical measures.Although these methods are relatively convenient, they lack objectivity and theoretical foundation.In order to compensate this limitation, a cumulative frequency method is introduced by calculating the cumulative frequency of excessive multiple of each water quality index [29].
The evaluation set is a collection of criteria for evaluating the object [30].Suppose the water eutrophication condition can be classified to  levels, written as Further suppose there exist  evaluation indexes, written as { 1 ,  2 , . . .,   }.On this basis, the cumulative frequency can be calculated as follows: where  is the label of the evaluation index;  is the label of eutrophication level;   is the concentration value of the th index;   is the standard value of the th index in level ;   is the excessive multiple value of the th index;   is the cumulative frequency of the firstindexes.According to the statistical analysis requirements in the selection of evaluation indexes, generally take [31]   ≥ 85%.
Step 2 (establishment of the fuzzy relation matrix  :  → ).By adopting fuzzy mathematical theory to evaluation study, the most critical issue is to establish the membership functions for the evaluation indexes.Triangular linear membership functions are commonly used in practice, which are also selected for determining the fuzzy relation matrix  in this paper [32].The configuration of the membership function is as follows: For ∀  ∈  ( = 1, 2, . . ., ), (1) when  = 1, the membership function is (2) when 1 <  <  − 1, the membership function is (3) when  = , the membership function is where   presents the membership degree of index   in level   ;  −1 ,   ,  +1 are the standard values of the th index in level  −1 ,   ,  +1 , respectively.When   is given, the above membership functions can be applied to determine the membership degree of the evaluation index   for each level of water eutrophication.Then, the fuzzy relation matrix  :  →  is constructed as where ∑  =1   = 1,  = 1, 2, . . ., .
Step 3 (weight determination of evaluation index).The determination of the evaluation index weight is one of the most important factors that directly affect the final evaluation results.In this paper, the clustering weight method is used to determine the weight of each evaluation index, combining the index concentration value with the standard values which more objectively reflect the relative importance of each evaluation index in all the indexes.The method of calculating the index weight is as follows: where   is the weight of the th water evaluation index in the th eutrophication level.Therefore, the index weight matrix  determined by the clustering method is where   is the index weight matrix of the th eutrophication level.
Step 4 (fuzzy synthesis operation).Combining the index weight  with the fuzzy relation matrix , the multiplication and addition method of the weighted average is chosen to obtain fuzzy comprehensive evaluation result  based on all indexes.The advantage of this method is that it can balance all indexes according to the weight values to reflect the comprehensive condition of water quality [33].The specific formula is as follows: where the element   is membership of the water object with regard to th water eutrophication level.The water eutrophication level can be obtained by the principle of maximum membership, where the specific formula is as follows:

Case Study
In this part, water quality data of Taihu Lake (case 1) and Beihai Lake (case 2) are utilized to verify the effectiveness and generality of the proposed method.First the UKF is used to estimate the unknown parameters in the soft-sensing method, where the estimated result is compared to the results obtained by the nonlinear least squares method and trial-and-error method.Then, simulated values of the water quality indexes are deduced, which are compared to the real measured water quality data.On this basis, the water eutrophication evaluation is carried out by the modified fuzzy comprehensive evaluation method, depending on both simulated values of DO, BOD, TN, Chl a and real measured values of SD, TP.Since the measured data include DO, BOD, Phyt, TN, and NH 3 -N, the observation matrix in ( 7) is set as 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 Then the real-time estimated values of the ten unknown model parameters can be obtained, which are depicted in Figure 1.
The appearance of algal blooms is a feature of water eutrophication, whose formation process can be divided into three stages, namely, recovery, biomass increase and accumulation, dormancy [34].In this paper, January to March is the recovery period of algae bloom, April to mid-October is the second stage, and the remaining part is the dormancy stage.Following the multistage principle, the average values of unknown model parameters of each stage are shown in Table 1.
Then, the estimated values of the unknown parameters in Table 1 are substituted into the water quality soft-sensing method, and the simulation process is carried out.For comparison, the simulated values of water quality indexes are compared to the real measured values and the simulated values obtained by the models with estimated parameters based on the trial-and-error method and nonlinear least squares method, respectively.Figure 2 depicts the results of DO concentration, BOD concentration, NH 3 -N concentration, TN concentration, and Chl a concentration.It should be noted that the amount of Chl a is indirectly expressed as the concentration of Phyt; that is to say, through the ratio between Phyt and Chl a, the concentration of Chl a is obtained.
It can be drawn from Figure 2 that the simulated values are in good agreement with the measured values of water quality indexes, besides TN concentration.There are two main reasons for this.One is that the data of TN are used instead of NO 3 -N during the experiment, and the other is the inaccuracies caused by external influences.However, the overall experimental results can verify the effectiveness of the fusion water quality soft-sensing method.Then the model accuracy based on the UKF is better than those based on the nonlinear least squares method and trial-and-error method.
In order to quantitatively evaluate the error, Root Mean Square Error (RMSE) is utilized to indicate the deviation between the simulated values and the measured values of each water quality index.The specific formula is as follows: where   is the measured value, X is the simulated value, and  is the number of measurement.The result is shown in Table 2.

Water Eutrophication Evaluation Result and Analysis.
The modified fuzzy comprehensive evaluation method is used to evaluate the water eutrophication status of Taihu Lake, by taking both the simulated values and measured values of the water quality indexes into consideration.For comparison, the eutrophication evaluation result is compared to the result based on merely measured data.
Step 1 (water evaluation index selection).As the eutrophication mechanism is complex, scholars often choose different evaluation criteria to assess water quality status.By referring to [35,36], five levels are classified to describe the eutrophication condition: I (none), II (mild), III (medium), IV (heavy), and V (extremely heavy); namely,  = {I, II, III, IV, V}.Then, according to ( 14) and ( 15) to select the water quality evaluation indexes, SD, BOD, TN, TP, DO, and Chl a are selected as evaluation indexes.By referring to the technological regulations for surface water resources quality assessment published by Ministry of Water Resources, People's Republic of China, and related references [37,38], the standard values of water quality indexes for lakes and reservoirs in each eutrophication level can be determined, which are shown in Table 3.On this basis, triangular linear membership functions of the water quality indexes are constructed as shown in Figure 3.
Step 2 (determine the fuzzy relation matrix  and the weight matrix ).Given the data collected from a monitoring station in Taihu Lake, according to ( 16)-( 18) and ( 20), the time-related fuzzy relation matrix () and weight matrix () are obtained, respectively.Step 3 (fuzzy synthesis operation).The fuzzy comprehensive evaluation result () of the monitoring station in Taihu Lake is obtained by using (22).Figure 4 depicts the obtained memberships of level I, level II, level III, level IV, and level V based on the measured values and simulated values of UKF, respectively.
According to (23), the water eutrophication level based on measured values, simulated values of trial-and -error method, simulated values of nonlinear least squares, and simulated values of UKF is evaluated, and results are shown in Figure 5.
It can be seen from Figure 5 that, during the recovery and dormancy stage of algal blooms, the water eutrophication status is basically in level I, while in the biomass increase and accumulation stage, the water eutrophication status is mainly in level IV.Then, with the increase of temperature and rainfall in June and July, algae blooms gradually accumulate so that the water eutrophication status reaches level V.However, we can see that the deviations appear in eutrophication levels based on measured and simulated values from Table 3 and Figure 5.The reason lies in two aspects: (1) Model approximation: the used WASP model is an approximation of real water quality evolution.This approximation will bring uncertainty and inaccuracy during simulating the evolution process, such as abrupt variation process of eutrophication degree.(2) Model parameter selection: in the simulation, the model parameters are selected as constant values, which can be viewed as a simplification, since the model parameters are time-variant in practice.This simplification will introduce errors when simulating the water quality evolution process.
Here, it should be noted that the modified fuzzy comprehensive evaluation method can be applied to more monitoring stations in Taihu Lake, which will lead to a more comprehensive evaluation result of its eutrophication level.In order to quantitatively represent the accuracy of evaluation results based on simulated values of different methods, the  consistency percentage of evaluation results of each method is given in Table 4.
It can be drawn from Table 4 that the accuracy of the evaluation result based on simulated values of UKF is higher than those based on simulated values of trial-and-error method and simulated values of nonlinear least squares.

Model Parameter Estimation Result and Analysis.
The UKF is also used to estimate the unknown model parameters of Beihai Lake shown in Notations.And the observation matrix is the same as that of Taihu Lake.Then the real-time estimated values of the ten unknown model parameters can be obtained, which are depicted in Figure 6.
The algal blooms formation process of Taihu Lake also applies to Beihai Lake.Following the multistage principle, the average values of unknown model parameters of each stage are shown in Table 5.
Similarly, the estimated values of the unknown parameters in Table 5 are substituted into the water quality softsensing method, and the simulation process is carried out.Then Figure 7 depicts the results of DO concentration, BOD concentration, NH 3 -N concentration, TN concentration, and Chl a concentration based on measured values and simulated values of different methods.
From Figure 7, it can be seen that the simulated values are in good agreement with the measured values of each water quality index, which can verify the effectiveness and generality of the fusion water quality soft-sensing method.Then the model accuracy based on the UKF is better than those based on the nonlinear least squares method and trialand-error method.Similarly, RMSE is utilized to indicate the deviation between the simulated values and the measured values of each water quality index.By using (25), the specific result is shown in Table 6.

Water Eutrophication Evaluation Result and Analysis.
Following the proposed algorithm, the eutrophication evaluation result of Beihai Lake is shown in Figure 8.According to (23), the water eutrophication level based on measured values, simulated values of trial-and -error method, simulated values of nonlinear least squares, and simulated values of UKF is evaluated, and results are shown in Figure 9.
It can be seen from Figure 9 that, during the recovery and dormancy stage of algal blooms, the water eutrophication status is basically in level II, while in the biomass increase and accumulation stage, the water eutrophication status is mainly in level IV.Analogously, the modified fuzzy comprehensive evaluation method can be applied to more monitoring stations in Beihai Lake.
In order to quantitatively represent the accuracy of evaluation results based on simulated values of different methods, the consistency percentage is also introduced and utilized for calculation, and the results are shown in Table 7.
It can be drawn from Table 7 that accuracy of evaluation result based on simulated values of UKF is higher than those based on simulated values of trial-and-error method and simulated values of nonlinear least squares, too.

Conclusions
The fusion water quality soft-sensing method is constructed with a combination between the WASP mechanism model and UKF, and the modified fuzzy comprehensive evaluation method is presented to evaluate the level of water eutrophication.Then, taking Taihu Lake and Beihai Lake as examples, the results show that the simulated values of water quality indexes are in good agreement with the measured values, which can verify the effectiveness and generality of the fusion water quality soft-sensing method, and unknown parameter estimation based on UKF can further improve the accuracy of the model more than nonlinear least squares method and trial-and-error method.Besides, the modified fuzzy comprehensive evaluation method is used to assess the water eutrophication status, and the evaluation results of eutrophication level are consistent in most cases based on simulated values and measured values.Moreover, the modified fuzzy comprehensive evaluation method can be applied to more monitoring stations in Taihu Lake and Beihai Lake, which will lead to a more comprehensive evaluation result of their eutrophication level and provide a scientific reference for water environment management.
In future research, more water quality indexes should be considered in the procedure of water eutrophication evaluation.Furthermore, some quantitative indicators, such as health degree [39,40], should be introduced to evaluate water eutrophication.

4. 1 .
Case 1: Taihu Lake 4.1.1.Model Parameter Estimation Result and Analysis.The UKF is used to estimate the unknown model parameters of Taihu Lake shown in Notations.Besides, due to the limitation of measured data, the TN data are used instead of NO 3 -N.

Figure 2 :
Figure 2: Results of DO concentration, BOD concentration, NH 3 -N concentration, TN concentration, and Chl a concentration.

Figure 3 :
Figure 3: Triangular linear membership functions of the water quality indexes.

Figure 5 :
Figure 5: Evaluation result of the monitoring station.

Figure 6 :
Figure 6: Real-time estimated values of ten unknown model parameters.

Figure 8 :
Figure 8: Membership evaluation result of each level based on the measured values and simulated values of UKF.

Figure 9 :
Figure 9: Evaluation result of the monitoring station.

Table 1 :
Average values of unknown model parameters of each stage.

Table 2 :
RMSE values of each water quality index obtained by different methods.

Table 3 :
Standard values of water quality indexes for lakes and reservoirs.
Membership evaluation result of each level based on the measured values and simulated values of UKF.

Table 4 :
Consistency percentage of results based on simulated values of different methods.

Table 5 :
Average values of unknown model parameters of each stage.
Figure 7: Results of DO concentration, BOD concentration, NH 3 -N concentration, TN concentration, and Chl a concentration.

Table 6 :
RMSE values of each water quality index obtained by different methods.

Table 7 :
Consistency percentage of results based on simulated values of different methods.