^{1}

^{1}

^{1}

^{1}

^{2}

^{3}

^{4}

^{1}

^{2}

^{3}

^{4}

To improve the detection rate and reduce the correction error of abnormal data for water quality, an outlier detection and correction method is proposed based on the improved Variational Mode Decomposition (improved VMD) and Least Square Support Vector Machine (LSSVM) algorithms. The correlation coefficient is introduced, for solving the optimal parameter

Water resource is an important strategic resource of the country, and it has important influence on economic and social development. In recent years, people's awareness of environmental protection has been gradually strengthened, and the state's supervision of water pollution is gradually increasing. Water quality monitoring has become an important link in the process of water pollution control [

Carlson and Byer applied the Pauta criterion to outlier detection of water quality for the first time, and it is assumed that data exceeding three sigma of the sample mean is outlier [

Through the analysis of previous studies, it is found that the detection of outlier for water quality has made great progress, but there is a need to improve detection rate although few scholars have corrected outlier detection. On the basis of these studies, this paper proposes an outlier detection and correction method for monitoring data of water quality based on improved VMD and LSSVM. This method has a higher detection rate and lower error rate than the EMD and EEMD algorithms. In addition, the paper adds to the correction of outlier by the LSSVM algorithm. Compared with SVM and BP neural network, LSSVM algorithm improves the fitting accuracy, and the error of reconstructing data is smaller. Finally, the algorithm package of this paper is useful for engineering application through our independently developed software platform.

The Variational Mode Decomposition (VMD) algorithm [

To find optimal solution for the constrained variational model, it needs to be converted into a nonconstrained variational problem, and the secondary punitive factor

The VMD algorithm adopts the alternate direction method of multipliers (ADMM) to solve the optimal solution of variational problem by alternating update parameters

Initialize

Alternate update parameters

Repeat step

Stop updating, and get

The LSSVM (Least Square Support Vector Machine) algorithm is an improved algorithm for SVM (Support Vector Machines). LSSVM is a statistical learning theory that adopts a least squares linear system as a loss function [

Let us say there is a nonlinear sample set S, and

Use structural risk minimization principle to solve parameters

According to Kuhn-Tucher conditions [

The VMD algorithm searches for the optimal solution of the variational model by nonrecursive iteration in frequency domain and determines the center frequency and bandwidth of each amplitude modulation and frequency modulation component, and finally adaptive frequency division and separation of components are realized [

The algorithm block diagram of Newton's method.

In the LSSVM algorithm, the regularization parameter

The flow chart of LSSVM algorithm optimizations.

The specific steps are as follows [

Parameter initialization: initialization of PSO parameters, including population size, learning factors, and inertia weight.

Calculating individual fitness: the fitness value of each particle is calculated by LSSVM model, and then the current fitness value is compared with the best fitness value of the particle itself to get the optimal position of the particle.

Population regeneration: comparing the optimal position fitness of each particle with the optimal position fitness of the population, the best one is selected as the optimal position of the population, and the position and velocity of each particle in the population are updated.

When the number of iterations reaches the maximum, the optimization is finished and the current optimal particle is selected as the parameter of LSSVM algorithm. Otherwise, jump to step

In order to improve the detection rate and reduce the correction error of abnormal data for water quality, an outlier detection and correction method for monitoring data of water quality was proposed based on improved VMD and LSSVM. First of all, the data of water quality monitoring station is preprocessed by the rule of the Pauta criterion to eliminate the obvious outlier. Then, the improved VMD algorithm is used in the mode decomposition for the residual monitoring data sequence and the outlier of monitoring data was detected by superimposing the low frequency modal components. Finally, the LSSVM algorithm is used to correct the outlier. The detailed block diagram of water quality outlier detection and correction model is shown in Figure

The block diagram of water quality outlier detection and correction model.

Using the EMD and improved VMD algorithms to decompose the simulation signals and to compare the performance of the two algorithms in signal decomposition, the simulation signals are composed of three cosine signals of 55Hz, 266Hz, and 580Hz and a group of noise sequence

In this experiment, the sampling frequency is 5120Hz and the sampling number is 1024. The time domain graph and the corresponding spectrum graph of the simulation signal

The time domain graph of the simulation signal.

The corresponding spectrum graph of the simulation signal.

As can be seen from Figure

For the selection of

For the selection of

Results of signal decomposition by improved VMD.

Relevant spectra figures by improved VMD.

The EMD algorithm is used to decompose the simulation signals, and the decomposition results and corresponding spectra are shown in Figures

Results of signal decomposition by EMD.

Relevant spectra figures by EMD.

From Figures

The fourth IMF component in Figure

The time domain diagram of original signal and stacking signal.

Taking ±50% as the threshold of relative error between original data sequence and the stacking data is calculated. The data is treated as outlier when the relative error exceeds threshold. In Figure

Outlier detection results by improved VMD for simulated data.

In order to verify the superiority of improved VMD algorithm in outlier detection, two comparative experiments were designed: using the EMD algorithm and EEMD algorithm to detect outliers. The detection results of the two algorithms are shown in Figures

Outlier detection results by EMD for simulated data.

Outlier detection results by EEMD for simulated data.

The detection results of EMD, EEMD, and improved VMD are shown in Table

Detection results of abnormal data.

Element | Number of sampling | Number of outlier | Number of detected | Number of false detected |
---|---|---|---|---|

EMD | 1024 | 38 | 33 | 7 |

EEMD | 1024 | 38 | 36 | 4 |

improved VMD | 1024 | 38 | 37 | 2 |

In order to calculate the accuracy of three algorithms for outlier detection, the number of normal data is distinguished into normal data labeled as TP, the number of abnormal data is distinguished into normal data labeled as FP, the number of normal data is distinguished into abnormal data labeled as FN, and the number of abnormal data is distinguished into abnormal data labeled as TN. The calculation methods for the detection rate Acc(Accuracy) and the error rate Fal(False) of the abnormal data are shown by the following formula.

According to formula (

Detection rate and error rate.

Element | TP | FP | FN | TN | Acc (%) | Fal (%) |
---|---|---|---|---|---|---|

EMD | 979 | 5 | 7 | 33 | 86.84 | 0.71 |

EEMD | 982 | 2 | 4 | 36 | 94.74 | 0.41 |

improved VMD | 984 | 1 | 2 | 37 | 97.37 | 0.20 |

As seen in Table

Remove the abnormal data detected in Figure

The parameters

The fitting result by LSSVM.

The correction result of outlier is shown in Figure

The correction result of outlier.

In order to increase the contrast of the experiment, two algorithms of SVM and BP neural network are used to correct the outlier detected, respectively. The value of correcting outlier by the LSSVM, SVM, and BP neural network algorithms is shown in Figure

The value of correcting outlier by three algorithms.

MSE, MAE, and MAPE indicators are used to evaluate the performance of the algorithm for outlier correction, and the results are shown in Table

Comparison results of the algorithm performance.

Element | SVM | BP neural network | LSSVM |
---|---|---|---|

MSE | 0.0098 | 0.0061 | 0.0013 |

MAE | 0.0722 | 0.0556 | 0.0260 |

MAPE | 10.319 | 7.8877 | 3.4540 |

According to the experimental results of Table

Take the monitoring data of DO for a period of time in a water quality monitoring station in Hangzhou (Wan Cun station from Jan 1, 2018 to Feb 2, 2018) as an example, and record it as

The DO concentration of the monitoring site.

In order to simplify the operation of improved VMD algorithm, the raw data of DO is preprocessed first. According to the Pauta criterion in classical statistics, the preliminary outlier detection results are shown in Figure

The preliminary outlier detection results.

After preliminary detection, four outliers are detected. In Figure

Decomposition result and corresponding spectrum by improved VMD.

Remove the third IMF component and superpose the remaining two IMF components. A superimposed sequence is obtained, as shown in Figure

The superimposed sequence.

Selecting the threshold according to the method in the simulation experiment, the outlier is detected and shown in Figure

Outlier detection results for DO monitoring value.

As shown in Figure

Outlier correction results for DO monitoring value.

In order to verify the effectiveness of this method in practical engineering application, we add a set of comparative experiments additionally. A set of standard monitoring data was obtained from Zhejiang Provincial Environmental Protection Department. The data was 200 samples of DO content at Jiu Xi monitoring station from April 1, 2018 to May 3, 2018. Then 20 samples were artificially modified to represent abnormal samples. EMD, EEMD, and improved VMD algorithm is used respectively to detect the outliers, and LSSVM algorithm is used to repair the outliers. The results are shown in Figures

Detection rate and error rate for actual data.

Element | TP | FP | FN | TN | Acc (%) | Fal (%) |
---|---|---|---|---|---|---|

EMD | 178 | 4 | 2 | 16 | 82.00 | 1.11 |

EEMD | 179 | 2 | 1 | 18 | 90.00 | 0.56 |

improved VMD | 179 | 1 | 1 | 19 | 95.00 | 0.56 |

Outlier detection and correction results by EMD

Outlier detection and correction results by EEMD

Outlier detection and correction results by improved VMD

The results show that the improved VMD algorithm is of great effect on outlier detection and it has high accuracy and low error rate. From the two indicators of detection rate and error rate, we can see that the performance of the improved VMD algorithm is better than that of EMD and EEMD algorithms, which is consistent with the experimental results in simulated data scenarios.

For the outlier correction, the comparison of algorithm performance among SVM, BP neural network, and LSSVM is shown in Table

Comparison of the algorithm performance for actual data.

Element | SVM | BP neural network | LSSVM |
---|---|---|---|

MSE | 0.0108 | 0.0073 | 0.0021 |

MAE | 0.0785 | 0.0571 | 0.0280 |

MAPE | 4.704 | 3.1630 | 1.6667 |

The result in Table

The method of this paper has already realized the engineering application of algorithm package in the water quality parameters monitoring and trend forecast system developed by ourselves, and it has been applied to water quality monitoring stations in Zhejiang Province, China. This method avoids the data error caused by equipment failure, external interference, and other factors. It also substitutes traditional artificial statistics and correction and improves the efficiency and service quality of environmental protection. The location of water quality stations in the software platform is shown in Figure

The location of water quality monitoring stations.

Using the method of this paper, abnormal values of DO concentration in different monitoring stations are detected and corrected. Taking the Wan Cun monitoring site of Hangzhou as an example, the engineering implementation effect of the algorithm package is shown in Figure

The engineering implementation effect of algorithm package.

In future, the algorithm will be applied to more water quality parameters, such as COD, PH, NH_{3}-N, and TP, which will be more practical.

To improve the detection rate and reduce the error rate of outlier for water quality data, an outlier detection and correction method based on improved VMD and LSSVM is proposed, and the method is applied to Wan Cun which is a water quality monitoring station in Hangzhou. The method avoids the shortcoming of EMD algorithm in the process of signal decomposition. On the indicator of detection rate and error rate, the method of this paper is superior to the algorithm of EMD and EEMD. Based on the outlier detection, the outlier of DO monitoring value is corrected. On the indicator of MSE, MAE, and MAPE, improved VMD is better than the algorithm of SVM and BP neural network. The method proposed in this paper can be applied to water quality monitoring and its related fields, which will provide an important reference for the enforcement of environmental department and the implementation of environmental protection measures.

The real-time monitoring data used in the manuscript were obtained from the Drinking Water Quality Automatic Monitoring Station of Zhejiang Environmental Protection Department collected from Jan/01/2018 to May/31/2018. Any researcher can see

The authors declare that they have no conflicts of interest.

This study is supported by International Science and Technology Cooperation Program of Zhejiang Province for Joint Research in High-Tech Industry (No. 2016C54007), National Natural Science Foundation of China and Zhejiang Joint Fund for Integrating of Informatization and Industrialization (No. U1509217), Provincial Key R&D Program of Zhejiang Province (No. 2017C03019), and the National Key R&D Program of China (No. 2016YFC0201400).

A graphical summary of the manuscript that let readers quickly capture the core content of the paper.