Design of Computational Model for Cyanobacterial Pollutant Diffusion Change in Chaohu Lake of China Based on Large Data

on the normal life of and the economic development. it is urgent to control the cyanobacterial pollutants in Lake. In in order to improve the scientiﬁcalness and feasibility of control measures, it is an important prerequisite and condition to grasp the change of cyanobacterial pollutant diﬀusion in Chaohu Lake. For this reason, a computational model for cyanobacterial pollutant diﬀusion in Chaohu Lake, China, was designed based on the relevant large data. The design of the model is divided into three parts: the ﬁrst part builds an area calculation model to analyze the change of cyanobacterial pollutant diﬀusion area; the second part builds a concentration calculation model to analyze the change of cyanobacterial pollutant concentration; and the third part combines the previous two to build a diﬀusion change calculation model to analyze the rule of cyanobacterial pollutant diﬀusion change in Chaohu Lake. In order to verify the feasibility and validity of the model, simulation experiments were carried out. The results show that, under the large data related to cyanobacteria pollution in Chaohu Lake, China, from May to August 2017, the calculation model is used to calculate the cyanobacteria pollutant diﬀusion change. The data obtained are basically consistent with the actual situation, which proves the feasibility and validity of the model. This provides data support for the cyanobacteria pollution control in Chaohu Lake and improves the eﬃciency and eﬀect of the control.


Introduction
China is a vast land with many rivers and lakes, such as Junyang Lake in Jiangxi Province, Dongting Lake in Hunan Province, Taihu Lake and Hongze Lake in Jiangsu, and Chaohu Lake in Anhui Province, which are five major freshwater rivers in China and important part of fresh water resource. Taking Chaohu Lake in China as an example, this paper analyzes the change of cyanobacterial pollutant diffusion. Located in central Anhui Province and between the Yangtze River and the Huaihe River, Chaohu River covers a total area of 769.5 square kilometers and stretches across 6 counties and cities, Hefei, Feixi, Shucheng, Lujiang, Chaohu, and Feidong, with multiple functions such as transportation, fish breeding, and agricultural irrigation [1]. However, as the economy develops rapidly and urban population increases quickly, there are increasing pollutants discharged into Chaohu Lake, especially industrial wastewater, which causes serious water heavy metal pollution. According to the data of satellite monitoring, the frequency and the maximum area of cyanobacterial outbreak were higher than those of the same period last year; in the first half of this year, the total nitrogen concentration of Chaohu Lake was basically equal to that of the same period last year, and the total phosphorus concentration increased by 7.32%; the average algae density of the whole lake was 78.1% higher than that of the same period last year [2]. In addition, since this summer, there have been 29 cyanobacterial outbreaks in Chaohu Lake. e maximum outbreak area reached 130 square kilometers, accounting for 17.1% of the area of Chaohu Lake, which not only had a serious impact on the normal life of the surrounding residents but also caused serious losses to local transportation and fishery development. In this context, it is urgent to control the cyanobacterial pollutants in Chaohu Lake, China. However, due to insufficient mastery of the information related to the change of cyanobacterial pollutant diffusion, the adopted treatment measures could not suit the remedy to the case. erefore, based on big data, this paper designs a calculation model for cyanobacterial pollutant diffusion change in Chaohu Lake, China. In order to roundly grasp the changing situation of cyanobacterial pollutant diffusion in Chaohu Lake, China, this model comprehensively evaluates pictures and data and concludes the comprehensive result including the change of pollutant diffusion area and concentration distribution situation in water body as time goes by integrating all results [3]. Among them, the former two were realized based on remote sensing data, while the latter was realized on the detection data from the sensor. Finally, in order to test the validity of the calculation model in this paper, the relevant historical data from May to August 2015 was taken as sample data for simulation test. After comparing the test result with the actual result recorded in that year, the result showed that data related to the change of cyanobacterial pollutant diffusion in Chaohu Lake obtained by the calculation model in this paper is close to the actual results and proved the validity and feasibility of the model in this paper, which provides important treatment data for cyanobacterial pollutants in Chaohu Lake, China, and improves the water environment of Chaohu Lake to some extent.

Design of Calculation Model for Cyanobacterial Pollutant Diffusion Change in Chaohu Lake of China
Cyanophyta, also known as cyanobacteria, is the most primitive and the oldest algae with extremely wide distribution range and strong reproductive ability. Especially with suitable temperature and rich nutrient substances, Cyanophyta will reproduce excessively, which will cause oxygen reduction in water body, affect the growth of other aquatic plants, generate cyanobacterial toxins, harm aquaculture, and influence the normal life of surrounding residents with stink [4]. As one of five major freshwater lakes, Chaohu Lake has Cyanophyta outbreak every year, which causes enormous economic loss. For the above situations, this paper divides the calculation model for cyanobacterial pollutant diffusion change in Chaohu Lake, China, into three parts: the first is the calculation model for the pollutant diffusion area change; the second is the model of pollutant transportation and diffusion concentration; and the third is that the previous two are combined to construct the mathematical model of water quality to describe the pollutant diffusion rule as space and time go by [5]. e specific content is shown in Figure 1.

Calculation Model for the Pollutant Diffusion Area
Change. Pollutant diffusion area is directly related to the input effort and layout of treatment measures of cyanobacterial pollutants in Chaohu Lake. erefore, in this chapter, based on satellite remote sensing images, the distribution and area change of cyanobacterial outbreak zone in Chaohu Lake are analyzed [6]. e analysis process of the model is divided into three parts: satellite remote sensing image processing of cyanobacterial pollutants, image identification, and area calculation.

Remote Sensing Image Processing.
Remote sensing is a science and technology acquiring its information without touching the target object on the Earth's surface directly. A lot of information can be obtained from remote sensing images, such as water body, vegetation, land, and mountain. However, the original cyanobacterial pollutant images of Chaohu Lake collected by satellite remote sensing shall be processed before image analysis, such as image correction, image mosaic, and cutting [7].
(1) Image Correction. Affected by the character of the sensor, ground light conditions (terrain and solar altitude angle influence), and atmospheric action, there is an inconsistency between the measured value of the remote sensing equipment and the actual ground spectral radiant emissivity; namely, radiometric distortion is generated. However, radiometric distortion will cause the deviation of the geometric position, shape, size, location, and other characteristics of surface features on the original image from the expression requirements in the reference system; namely, geometric distortion is generated. e previous two distortions shall be corrected, and images shall be restored. Radiometric distortion includes sensor correction, topographic correction, and atmospheric correction. Geometric distortion correction mainly involves the selection of control point, transformation of spatial location (transformation of coordinates), and recollection of pixel brightness value [8].
(2) Image Mosaic. When the area to be researched cannot be summarized with a remote sensing image, two or more images shall be spliced together to form an image with a wider cover area, so that the collected images are rounder. In order to make image mosaic, based on an image, the contrast ratio matching, pixel size, and data type of mosaic image shall be determined firstly; then, another image shall be inserted into a basic image through hue adjustment, removal of overlapping, and other processings [9].
(3) Image Cutting. Among the collected remote sensing images of Chaohu Lake, the research goal is the area covered by Chaohu Lake instead of other areas. erefore, it is necessary to cut the excessive part and leave the area to be researched, so as to reduce the research difficulty. Due to the large difference between the spectral reflectivity of water body and the spectral reflectivity of other surrounding areas, area growth can be used directly to generate polygon, so as to gain the boundary of Chaohu Lake and cut out Chaohu Lake area from remote sensing image along the boundary.

Identification of Cyanobacterial Pollutants.
Because this paper aims to research the cyanobacterial pollutants in Chaohu Lake, Chaohu water area image shall be cut out further to determine the distribution scope of cyanobacterial pollutants, so as to calculate the pollutant diffusion area subsequently.
For target identification, there are many methods, such as deep learning algorithm, SVM, and decision-making tree. Here, the decision-making tree is adopted to identify the cyanobacterial pollutants in Chaohu Lake, China. e identification process is mainly divided into two stages.
(1) e First Stage. Characteristic analysis of cyanobacterial pollutants in Chaohu Lake: Due to different properties and characteristics of every substance, there are differences in the reflectivity of remote sensing beam, which makes images show different object characteristics. erefore, different reflectivity of water body on the remote sensing images of Chaohu Lake will distinguish water body from various substances in water body. Under this principle, except for Chaohu Lake water body, all influencing factors including chlorophyll, suspended matter, and yellow substance can be shown through the inherent optical characteristic parameters of water body. e mathematical description formula of the relationship between various substances and water body reflectivity influence is as follows: In formula 1, Z w represents the water surface reflectivity; x w refers to the backscattering coefficient of water; x s means the backscattering coefficient of inorganic suspended matters; s p represents the backscattering coefficient of algal substances; y w is the absorption coefficient of water; y s means the absorption coefficient of suspended matters; y p represents the absorption coefficient of algal substances; y y refers to the absorption coefficient of yellow substances.
Containing a lot of chlorophyll, Cyanophyta is also called green alga. erefore, in the event of the mass propagation of Cyanophyta in Chaohu Lake, the content of chlorophyll a in water will also increase. However, the concentration of chlorophyll a strongly correlates with the spectral ranges of 550∼580, 630∼670, and 685∼715 nm; namely, when the concentration of chlorophyll a in water is high, the accumulation area of cyanobacteria will have spectral characteristics similar to those of plants with high reflectivity [10].
(2) e Second Stage. Classification and identification of Cyanophyta pollutants in Chaohu Lake: Based on the above spectral characteristics of Cyanophyta pollutants in Chaohu Lake, decision-making tree algorithm is used to construct classifier and identify Cyanophyta pollutants in Chaohu Lake. As a machine learning algorithm, decision-making tree is a tree structure similar to flow chart [11]. In a decision-making tree, because the top node of the tree represents the root of the tree, it is called root node; containing all the data to be analyzed, the root node is a data set; branch nodes in the internal structure of the decision-making tree represent the best test for an attribute, and each branch represents the taxonomic structure; the tree structure also includes leaves, namely, leaf nodes that represent categories [12]. e construction of the decision-making tree requires ID3, CART, and C4.5 algorithms. Here, CART algorithm is adopted to construct the decision-making tree [13]. Known as classification and regression tree, CART is a nonparametric classification and regression method [14]. Its mathematical definition is shown below: In formulas (2) and (3) e basic flow of CART is shown in Algorithm 1.

Calculation of Cyanobacterial Pollutant Diffusion Area.
Based on the distribution range of cyanobacterial pollutants in Chaohu Lake identified by the above CART decision-making tree [15], the cyanobacterial pollutant diffusion area can be calculated directly. e calculation mode is shown below. Given that the density of cyanobacterial pollutants within the polluted area of Chaohu Lake is ρ and the location of the central coordinate in the standard coordinate system is (x, y), the distribution degree coefficient of cyanobacterial pollutants within the polluted area of Chaohu Lake is In formula (4), j represents the number of pixels of the cut boundary outline of the polluted area of Chaohu Lake; v represents the number of all pixels of the polluted area of Chaohu Lake; a refers to convection coefficient of water flow; b means the mobile coefficient of boats and ships on water body. en, the above distribution degree coefficients of cyanobacterial pollutants within the polluted area of Chaohu Lake are used to calculate partitioning coefficient as shown below: Partitioning coefficient describes the partitioning situation between the cyanobacterial polluted area and nonpolluted area.
Finally, the distribution degree coefficient and partitioning coefficient of cyanobacterial pollutants within the polluted area of Chaohu Lake are used to calculate the pollutant diffusion area, namely, e pollutant area can be obtained through pollutant diffusion area coefficient [16]. However, affected by nonlinear disordered factors such as water flow and various material activities in water body, the diffusion path is unpredictable. erefore, error compensation is required [17]. Error calculation formula is as follows: In formula (7), E is the diffusion distance offset parameter of pollutants; t represents spatial location parameters of pollutants; d refers to diffusion factor; f means the coefficient of pollutant diffusion area; g is the pollutant diffusion direction deviation parameter.

Computational Model for Pollutant Diffusion Concentration Change.
In Chapter 1, based on remote sensing image information, the cyanobacterial pollutant diffusion range and area are calculated. However, diffusion range and area coefficients are a part of the computational model for diffusion change, and the concentration calculation of another part is also very important. e flow and diffusion of cyanobacterial pollutants in Chaohu Lake is a complicated process because the pollutants will spread around or reduce or disappear due to self-cleaning action of water body. erefore, the distribution and calculation of pollutant concentration in water body are important problems in the environmental engineering of Chaohu Lake. erefore, in this chapter, the model of pollution diffusion concentration change is established to calculate the concentration of pollution diffusion [18].
As an unsteady diffusion process, the flow and diffusion of cyanobacterial pollutants in Chaohu Lake can be described by diffusion model [19]. Firstly, if all factors are considered, the diffusion model will be complicated, and the calculation will be difficult. erefore, it is necessary to set hypothesis conditions. Although the exact result cannot be obtained, certain veracity and feasibility can be possessed [20].
(1) Each section and degree of depth in Chaohu Lake are consistent (2) e amount of cyanobacterial pollutants can be represented by certain concentration symbol (such as BOD or COD) (3) Certain self-cleaning action that reduces pollutants can be regarded as a kind of first-level reaction (4) e diffusion movement of cyanobacterial pollutants has a steady diffusion coefficient According to Fick's Law, the formula of cyanobacterial pollutant diffusion can be obtained: In formula (8), q is the pollutant concentration; t means time; u refers to diffusion coefficient; x represents polluted area; h is water velocity of Chaohu Lake.

Computational Model for Cyanobacterial Pollutant Diffusion Change. Based on the above calculation of the area and concentration of cyanobacterial pollutants in Chaohu
Lake, a computational model for cyanobacterial pollutant diffusion change is constructed, which describes the variation laws of pollutant diffusion with the time as well as spatial development.
e model derivation formula is as follows: In formula (9), A is the area of the pollutant; C is the average concentration of the pollutants; U is the average flow rate; D x is the turbulent diffusion coefficient; E x is the longitudinal diffusion coefficient; K is the pollutant degradation coefficient; h is the average depth of the section; S r is the speed of releasing pollutants from riverbed sediment; S is the amount of pollutants discharged per unit of riverbed in unit of time.
According to experience, the longitudinal movement of water flow in the lake is the main factor leading to the variation of the pollution area and concentration, so the above model can be simplified on this basis. Regardless of the effects of turbulent diffusion, release of pollutants from lake bed sediments, and discharge of other pollutants, the computational model for pollutant diffusion change is In this paper, the implicit difference system in the finite difference method is adopted for solution among the above models. As a numerical method of differential equations, the finite difference method approximates the derivative through finite difference, so as to obtain the approximate solution of the differential equation. e principle diagram is shown in Figure 2. e finite difference method can be represented by the xt grid plane. As shown in Figure 2, Δx represents the step size in direction x; Δt is the step size in direction t; (x i , t j ) represents the grid node formed by the intersection of two grid lines; (0, t j ) and (x j , 0) represent boundary nodes; C(x i , t j ) or C j i represents the water quality and concentration at any point.
After that, the implicit difference equation can be used to solve the computational model for pollutant diffusion change, insert the boundary firstly, and then obtain the solution via the chasing method.

Simulation Test Analysis
After finishing building the model, further simulation experiments are needed to verify the feasibility and effectiveness of the model. e data used in this experiment are all cited from the website of Chaohu Lake China Resources Satellite Ground Application Center.

Experimental Data Set.
In this experiment, data were collected from May to September 2017, generally from 10:00 am to 2:00 pm when the weather is fine, the wind was small, and the water surface was relatively calm. e water is collected once a month on the 15th and 30th days from 5 collection points. ere are totally 20 spectrum collection points, which are densely arranged in the western half of the lake while being sparsely arranged in the eastern half of the lake (Table 1).
(1) Sensing Image Set. e sensing image set contains 1600 images about the cyanobacterial pollutant diffusion in Chaohu Lake, as shown in Figure 3. (2) Water Quality Related Data.

Test Software.
e experiment was carried out on the virtual simulation platform of smart city pollution control, a Input: Train date set Y of the remote sensing images of cyanobacterial pollutants; conditions for stopping calculation Output: CART Decision-making Tree According to training data set, conduct the following operations on each node recursively from root node and construct binary decision-making tree: Step 1: Set the training data set of nodes as D, calculate the Gini coefficient of existing characteristics on this data set, test whether to divide D into D1 and D2 according to sample point A � a for each characteristic A and each possible value a, and then calculate Gini (D, A) Step 2: among all possible characteristics A and segmentation points a, select the characteristic with the smallest Gini coefficient and corresponding segmentation points as the optimal characteristic and segmentation point. According to the optimal characteristic and segmentation point, generate two child nodes from the existing node and distribute the training data set into two child nodes according to characteristics Step 3: make recursive calls on two child nodes until the sop conditions are met Step 4: Generate CART Decision-making Tree end Note: the conditions for algorithm to stop calculation are that the number of samples in nodes is less than a predefined threshold or Gini coefficient of sample set is smaller than a predefined threshold (samples are basically of the same category), or there are no more characteristics. Discrete Dynamics in Nature and Society monitoring and disposal management platform independently developed by UNISOL directed at various environmental pollutions. By integrating with virtual simulation, BIM model, GIS data, Internet of ings, artificial intelligence, and other complex technologies, this platform is designed to improve real-time monitoring and preventive processing capabilities in wastewater treatment as well as solid waste prevention. e platform is mainly composed of three modules: environmental monitoring and treatment, emergency simulation and disposal, and urban planning and reporting. It can display the integrated water quality monitoring and manual monitoring data of the surrounding lakes and the monitoring data of pollution sources in a visual way, drive the control value, and simulate the variation form of water pollution under different data. Tables 2 and 3, the data and status of cyanobacterial pollutant change in Chaohu Lake during the period from May to September 2017 are calculated by using the proposed model. ese data are almost consistent with those recorded with the website of Chaohu Lake China Resources Satellite Ground Application Center.

Test Results. As can be observed from
Regarding analysis of pollution change, in May, the area of cyanobacterial pollutant in Chaohu Lake was small and the concentration was low. In the middle of June, the level of cyanobacterial pollutant in Chaohu Lake reached its peak    and the relevant departments started treatment since then, so there were fewer pollutants during the middle of July. en, the monitoring in August and September found that the area and concentration of cyanobacterial pollutant in Chaohu Lake began to increase relatively, but they were less than the peak in the middle of June.

Conclusions
To sum up, as one of the five largest freshwater lakes in China, Chaohu Lake provides important freshwater resources for transportation, agricultural irrigation, and domestic water. However, the massive and rapid reproduction of cyanobacteria in Chaohu Lake during summer has serious water pollution and seriously influenced the surrounding residents as well as local economic development. Under this background, it is of great practical significance to study the law of cyanobacterial pollutant change in Chaohu Lake. To this end, a computational model for the cyanobacterial pollutant diffusion change in Chaohu Lake is designed based on big data, which can help grasp the diffusion of cyanobacterial pollutants in Chaohu Lake so as to provide data support for its governance and improve the feasibility as well as scientificalness of the treatment plan. Finally, the simulation test is carried out to test the model validity, which shows that the results obtained by the proposed computational model are almost consistent with the actual results. erefore, the model proposed in this paper is proved to be effective and the research purpose is achieved.
Data Availability e authors used simulation data, and their model and related hyperparameters are provided in the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.