^{1,2}

^{1}

^{1}

^{1}

^{1}

^{1}

^{2}

Owing to the increase and the complexity of data caused by the uncertain environment, the water environment monitoring system in Three Gorges Reservoir Area faces much pressure in data handling. In order to identify the water quality quickly and effectively, this paper presents a new big data processing algorithm for water quality analysis. The algorithm has adopted a fast fuzzy C-means clustering algorithm to analyze water environment monitoring data. The fast clustering algorithm is based on fuzzy C-means clustering algorithm and hard C-means clustering algorithm. And the result of hard clustering is utilized to guide the initial value of fuzzy clustering. The new clustering algorithm can speed up the rate of convergence. With the analysis of fast clustering, we can identify the quality of water samples. Both the theoretical and simulated results show that the algorithm can quickly and efficiently analyze the water quality in the Three Gorges Reservoir Area, which significantly improves the efficiency of big data processing. What is more, our proposed processing algorithm provides a reliable scientific basis for water pollution control in the Three Gorges Reservoir Area.

The Yangtze River Three Gorges project is one of the world-renowned large water conservancy projects. After the completion of the project, the storage capacity of the reservoir has been growing tremendously. Because of the deterioration of the self-purification capacity of hydrostatic body, the environment of reservoir water is gradually worsening, which has attracted widespread attention. The safety of water environment is related to the Three Gorges Reservoir Area, the success of the Yangtze River region water security, the smooth implementation of the water diversion project, and even the overall sustainable development of China [

In order to overcome the deficiencies of the prior technique, the project team funded by the Natural Science Foundation of Chongqing has made a series of researches. It has applied the wireless sensor network (WSN) technology to Three Gorges Reservoir Area water environmental monitoring, studied the expansion technology of WSN expansion in Three Gorges Reservoir Area water environment monitoring, and explored new ways to solve the use of new WSN in wireless remote real-time monitoring of water environment issues. On the basis of the research results obtained [

The database or big data solution becomes efficient and receives more attention nowadays [

In water quality analysis, the assessment of water quality has a variety of parameters according to surface water quality standards [_{3}–N), which have an impact on water quality in the Three Gorges Reservoir Area, should be comprehensively analyzed to determine water quality. The data collected from each node of ultra-large-scale WSN in the Three Gorges Reservoir Area contains these three parameters (DO, CODMn, and NH_{3}–N). Obviously, the analysis and processing of multidimensional data from WSN monitoring network are a typical big data processing problem.

Currently, the key technologies for big data analysis are clustering, fuzzy logic, evolutionary algorithms, and so forth [

In this paper, the existing big data analysis and processing method has been studied in depth based on the achievement in our project. According to massive data collected from “ultra-large-scale WSN for water environment monitoring of Three Gorges Reservoir Area,” our emphasis is put on the research of the big data analysis and processing method.

The rest of this paper is organized as follows. In Section

For the wide distribution of the Three Gorges Reservoir Area, some places are still not accessible. This leads to the special area in Three Gorges Reservoir Area and the reservoir shows a tree structure of winding [

Distribution of the Three Gorges Reservoir and structure model of WSN water environment monitoring system.

From Figure _{3}–N. As we can see, the traditional data analysis methods face a challenge to analyze large data in a short time as well as obtain the quality of current water conditions in a particular region. Therefore, we need to explore a fast and effective method for big data analysis and processing.

Both clustering, fuzzy logic and evolutionary algorithms are an effective way to solve the problem of big data. The concept of fuzzy clustering was first proposed by Ruspini. The fuzzy logic is adopted in clustering, and the result of fuzzy clustering represents the extent of the sample belonging to the related cluster, rather than simply hard clustering indicating which sample belongs to a cluster [

Let

Bezdek summed up the fuzzy

In formula (

The Lagrange multiplier method is applied to solving the optimization problem of formula (

In the FCM clustering algorithm, we set an initial cluster center randomly. At the same time, adjust the classification and cluster center according to formula (

Further studies have shown what follows. In the standard FCM clustering algorithm, when

Firstly, the HCM clustering algorithm is adopted to quickly determine hard cluster center of the input data. Then the hard cluster center is used as the initial cluster center of the FCM clustering algorithm for the fuzzy clustering iteration. Because of these, the convergence of the algorithm is speeded up; thus this fast FCM clustering algorithm is built. The iterative steps of the algorithm are as follows.

Determine the number of clusters

Calculate the cluster center

For the

If

Set the iteration number

Calculate the membership matrix

Adjust the cluster center

If

The calculated membership matrix

In order to verify the feasibility of the big data processing algorithm and its application to the big data processing for water environment monitoring in the Three Gorges Reservoir Area, we use the massive amounts of data collected by “ultra-large-scale WSN for water environment monitoring of Three Gorges Reservoir Area,” refer to monitoring data of the environment inspection department of Chongqing [_{3}–N) in each group of data; the unit of them is mg/L.

According to the standard of surface water environment quality, it divides the water quality into five classes named I, II, III, IV, and V, in which class I is the best, class V is the worst, and II, III, and IV are for the intermediate case. Then we select

First of all, the cluster center of the five kinds of water quality is analyzed using HCM clustering algorithm and FCM clustering algorithm, respectively. The two algorithms’ simulation cluster centers obtained are shown in Table

The cluster center of HCM clustering and FCM clustering.

HCM | FCM | |||||
---|---|---|---|---|---|---|

DO (mg/L) | CODMn (mg/L) | NH_{3}–N (mg/L) |
DO (mg/L) | CODMn (mg/L) | NH_{3}–N (mg/L) | |

I | 11.5950 | 1.8594 | 0.1976 | 11.6141 | 1.8361 | 0.1974 |

II | 9.7224 | 1.8369 | 0.2000 | 9.7662 | 1.7530 | 0.2020 |

III | 8.3351 | 1.7809 | 0.1760 | 8.3753 | 1.7475 | 0.1770 |

IV | 8.2311 | 3.6315 | 0.1582 | 8.1945 | 3.2879 | 0.1570 |

V | 7.0191 | 2.1462 | 0.1788 | 7.0756 | 2.1450 | 0.1794 |

We cluster the selected sample data by FCM clustering algorithm and FFCM clustering algorithm, respectively, and then we analyze the convergence and the iterative process of the two algorithms. The convergence and the iterative process for the two algorithms are shown in Figure

The convergence process of the objective function in FCM clustering and FFCM clustering.

The cluster centers of five kinds of water quality conditions are shown in Table

The cluster center of FFCM clustering.

DO (mg/L) | CODMn (mg/L) | NH_{3}–N (mg/L) | |
---|---|---|---|

I | 11.6088 | 1.8371 | 0.1973 |

II | 9.7600 | 1.7518 | 0.2019 |

III | 8.3685 | 1.7477 | 0.1769 |

IV | 8.1980 | 3.2885 | 0.1571 |

V | 7.0729 | 2.1475 | 0.1794 |

The distribution of water quality under the influence of CODMn, NH_{3}–N, and DO.

The distribution of water quality under the influence of CODMn and NH_{3}–N.

The distribution of water quality under the influence of DO and NH_{3}–N.

The distribution of water quality under the influence of DO and CODMn.

From Figure _{3}–N. With comprehensive analysis of these three sets of parameters of water quality indicators, we can judge the quality correctly. The water quality situation can be considered comprehensively by the fast fuzzy C-means clustering algorithm from the three main indicators. It divides the 1024 selected sample points into five categories and determines the water quality level.

In Figures _{3}–N is relatively less important. Mainly on the basis of DO and CODMn, we can get a preliminary determination of water types. The water in which DO is higher than 11 mg/L can be determined as class I. If CODMn is less than 3 mg/L, when DO is in the range of 9 mg/L to 10 mg/L, the water can be determined as class II. When DO is in the range of 8 mg/L to 9 mg/L, the water is class III. When the DO is less than 8 mg/L, the water is classified as V. The water belongs to class IV when the CODMn is more than 3 mg/L.

In this paper, a new fast fuzzy C-means clustering algorithm is proposed to complete the big data processing of the Three Gorges Reservoir Area water environment monitoring. The new algorithm improves the clustering algorithm convergence. The result of hard clustering is utilized to guide the initial value of fuzzy clustering. The new clustering algorithm can speed up the rate of convergence and improve the efficiency of big data processing. Simulation results show that, compared with the HCM clustering and the standard FCM clustering algorithm, this algorithm can not only effectively realize fuzzy clustering of data but also have faster convergence. The algorithm can quickly and efficiently analyze the discrimination of water quality in the Three Gorges Reservoir Area, improve the efficiency of big data processing in Three Gorges Reservoir Area water quality testing, and provide a reliable scientific basis for water pollution control in the Three Gorges Reservoir Area. The algorithm can not only be applied to the complexity of ultra-large-scale WSN for big data analysis and processing but also have some guidance for other areas in the big data processing.

The authors declare that there is no conflict of interests regarding the publication of this paper.

The authors acknowledge the following foundation items: the Scientific and Technological Project of Chongqing (no. cstc2012gg-yyjs40010) and the Natural Science Foundation of Chongqing (no. CSTC, 2008BB2340).