A Study on Water Pollution Source Localization in Sensor Networks

. The water pollution source localization is of great significance to water environment protection. In this paper, a study on water pollution source localization is presented. Firstly, the source detection is discussed. Then, the coarse localization methods and the localization methods based on diffusion models are introduced and analyzed, respectively. In addition, the localization method basedonthecontourisproposed.Thedetectionandlocalizationmethodsarecomparedinexperimentsfinally.Theresultsshowthat thedetectionmethodusinghypothesestestingismorestable.Theperformanceofthecoarselocalizationalgorithmdependsonthe nodesdensity.Thelocalizationbasedonthediffusionmodelcanyieldpreciselocalizationresults;however,theresultsarenotstable. Thelocalizationmethodbasedonthecontourisbetterthantheothertwolocalizationmethodswhentheconcentrationcontours areaxisymmetric.Thus,inthewaterpollutionsourcelocalization,thedetectionusinghypothesestestingismorepreferableinthe sourcedetectionstep.Ifconcentrationcontoursareaxisymmetric,thelocalizationmethodbasedonthecontouristhefirstoption. And,incasethenodesaredenseandthereisnoexplicitdiffusionmodel,thecoarselocalizationalgorithmcanbeused,orelsethe localizationbasedondiffusionmodelsisagoodchoice.


Introduction
Water pollution, one of the accident-prone man-made disasters, is attaining more and more attention.The pollution source localization in water is of great importance in water conservation.There are many existing water pollution source detection and localization methods, such as robots under water and artificial detection.However, underwater robots are expensive and prone to failure and thus cannot keep working.And artificial detection is time-consuming and vulnerable to water terrain and weather conditions.As a result, sensor networks are applied in the pollution source localization to overcome the deficiencies of the two methods.The advantages of sensor networks include the following: the node distribution is relatively dense; the monitoring range is large; and the monitoring is not restricted by geographical locations [1,2].
The problem of water pollution source localization is how to locate the pollution source based on the known parameters such as node locations, sampling times, and sensing values of nodes.In the pollution source localization, the pollution source detection is the premise of the source localization.Only when the pollution source has been detected can the monitoring values of nodes be used in the pollution source localization.
In this paper, a study on the water pollution source localization in sensor networks is presented.The localization problem is discussed theoretically and practically.Firstly, the source detection problem is studied.Then, different water pollution source localization methods are introduced and analyzed.Finally, different source detection and localization methods are tested and compared in the experiments.

Pollution Diffusion in Water
In most cases, water pollution disasters are caused by the static source which discharges the sewage clandestinely.Before choosing a source localization method, the physical processes of the pollution diffusion must be known.
The diffusion of pollutants in static water is slow, while in flow water the pollutants migrate with water and the diffusion is relatively faster.
In different backgrounds, the diffusion is different as well.In this paper, three typical examples are displayed: the diffusion without boundary constraints in static water, the diffusion with a boundary constraint in static water, and the diffusion with water flow.The diffusion with a boundary constraint is different from the diffusion without boundary constraints.Figures 1 and 2 show the diffusion simulations in MODFLOW [3,4] which is standard software for the hydrological simulation of pollution diffusion.As shown in the figures, when the diffusion is not affected by the boundary, the concentration contours in the diffusion field are approximate circles; and as time goes on, the diffusion is influenced by the boundary, and the concentration contours deform.
An example of the case shown in Figure 1 is the following: in shallow water, there is an instantaneous source at (, ), the mass of the pollutant is , and the diffusion coefficient is .
where  0 is the initial diffusion time and  is the current time.
An example of the case shown in Figure 2 is the following: in shallow water, the water depth is , there is a continuous source at (, ) with the mass flow rate of the pollutant  0 , where In dynamic water, the contaminants migrate with flow water.An example is that, in shallow water, the water flow is along the  direction with the flow rate , there is an instantaneous source at (, ), the mass of the pollutant is , and the diffusion coefficient is .The concentration at (  ,   ) is [5] In this case, as the diffusion is affected by the flow, the pollution locates on one side of the source, and the concentration field is as shown in Figure 3.The cases introduced above are the special ones in which the diffusion can be specified by explicit diffusion models.In most cases, the diffusion is influenced by many factors such as shearing flow, turbulent flow, and dispersion, and it is difficult to specify the diffusion processes by diffusion models with explicit expressions.

Pollution Source Localization
In this part, the background information about the sensor network is given firstly, followed by the pollution detection, and the pollution source localization methods are suggested at last.

Network Deployment.
The self-organizing sensor network is used to monitor the pollution in the water. (>5) sensor nodes are deployed in the monitoring area uniformly and the type of the pollutant to be monitored is known previously.The detection sensors which are stretched into water are identified.The locations of the nodes are fixed.After the initialization, the sensor nodes know their own positions.All static nodes in the network sample and store the concentration values synchronously with the same time interval.The background information such as the diffusion coefficient, the water depth, and the sampling time interval is known previously.The upper computer is the data processing center, and the monitoring information is routed to the sink node and processed by the data center.Literatures [6,7] are the references for the specific self-organizing scheme and data routing scheme.
(A) The Simple Detection Method.The current detection method available is the simple detection method, for example, in water pollution monitoring applications by using sensor networks [8][9][10][11][12][13].In these efforts, the authors consider that if the nodes have monitoring values or the monitoring values are larger than a given threshold   , there is a pollution event.
Since there is an initial pollution concentration in normal production and life, when the sensor nodes have monitored relevant information, it cannot be deduced that there exists pollution generated by a pollution source.At the same time, in the water environment, there are plankton, garbage, aquatic animals and plants, and so forth, which intervene in water pollution monitoring and bring about disturbances to the monitoring data.Therefore, it is difficult to determine an empirical threshold in the simple detection method.If the given value is less than the pollution concentration of normal production and living sewage, it will induce high false report rate.And if the given value is too large, the water area will be heavily stained when the network alarms the pollution risk.
(B) The Detection Based on Monitoring Data.To overcome the defects of the simple detection method, the detection method based on hypothesis testing can be used.
The present author once gave a simple detection method by using hypothesis testing [14] and in the work it is assumed that the distribution of noise  is known.However, in the practical environment,  is often unknown.The work in this paper can handle this problem and the specific method is as follows.
In hypotheses testing, there are some empirical values of the significance level [15].As the detection based on sensing data is to test the difference between sensing values, it does not care about the initial pollution concentration in normal production and living sewage, and the detection accuracy is not influenced by a single sample.

The Coarse Localization Algorithms and the Localization Based on Diffusion Models. The coarse localization algorithms and the localization methods based on diffusion models are often used in water pollution source localization. (A) The Coarse Localization Algorithms
(1) The Maximum Monitoring Value Point Approach (MPA) [16].As the sensor node with the maximum monitoring value is always very close to the pollution source, the location of the sensor node with the maximum monitoring value in the network is the source location.
(2) The Earliest Detection Point Approach (EPA).The source location is the location of the sensor node which detects the pollution the first time.

(B) The Localization Algorithms Based on Diffusion Models.
The mathematical localization algorithms are based on the diffusion models, such as [17][18][19][20].
If (  ,   ,   , , ) is the theoretical concentration of node (  ,   ) provided by the diffusion model, C(  ,   ,   ) = (  ,   ,   , , ) +  is the corresponding monitoring value with noise , and   (, ),  = 1, 2, 3, . .., is the related constraints of (, ), under assumptions of whether the distributions of the measurement noise are known or not and the distributions are normal distributions or not, there are many estimation methods that can be available, such as Maximum Likelihood estimation [21], Bayesian estimation [22,23], Extended Kalman filter [24], and Least Squares.The commonly used one is the Least Squares as follows: The advantage is that this method is simple and can be applied in the practical applications when the distribution of  is unknown.

Analyses on the Localization Methods Above (A)
The Coarse Localization Algorithms.The premise of the coarse localization algorithms is that small sampling errors occur to the MPA node and the EPA node.And the localization accuracy depends on the density of nodes.Theoretically, if the nodes are dense enough and there is a sensor node at any location in the water area, the pollution source localization would be very accurate.
For the coarse localization algorithms, when the pollution source is in the monitoring area (as shown in Figure 4), the location error is 0 ∼ √ 2, where  is the distance between the two farthest neighbor nodes.When all the nodes are far from the pollution source (as shown in Figure 5), the coarse localization algorithms fail to show the accurate estimation.Thus, the location errors are related to the distance of the source from the monitoring area.(B) The Algorithms Based on Diffusion Models.In the localization algorithms based on diffusion models, there are two key points.
(1) Determining the Diffusion Model.In practical applications, the diffusion is sophisticated.In many cases, there are no explicit mathematical models of the diffusion.
And, actually, one reason for estimation errors in the localization based on diffusion models is that the theoretical diffusion models are under ideal hypotheses and not accurate.
(2) How to Solve the Mathematical Problem of the Localization.For example, if the localization problem is a nonlinear Least Squares problem, there are many solving algorithms, such as the interior point trust-region method [25], Levenberg-Marquardt method [26], and Reflective Newton method [27].The results are always different when different solving algorithms are used and the number of iterations in numerical calculation is different.In most cases, the unknown parameters are not only source positions but also the mass flow rate and the initial diffusion time, which bring about coupling interferences in the estimation.

The Localization Algorithms Based on the Contour.
Combining with the above analyses, there are many problems in the coarse localization algorithms and the localization based on diffusion models.In this paper, a localization algorithm based on the concentration contour is proposed when the concentration contours are axis-symmetric, like the contours shown in Figures 1 and 2. The localization method is independent of the diffusion models and is discussed in the cases below.

(A) The Source Localization Based on the Contour in Static
Water.The rectangular coordinate system is as shown in Figures 1 and 2; if there is a bank, the direction along the bank is .Under the rectangular coordinate system, the symmetry axis is  = .The location of the diffusion source (, ) is on the axis of symmetry.
First, if there are two nodes ( 1 ,  1 ) and ( 1 ,  2 ) with the same -coordinate value on the same contour, it can be obtained that Second, even if there is concentration superimposed effect, the points far from the bank on the contour are still on a circle.Choose any  points marked as ( 1 ,  1 ), ( 2 ,  2 ), ( 3 ,  3 ), . . ., (  ,   ) to locate the pollution source; one can obtain where  is the circle radius.And it can be written as It can be obtained that and the residual Λ is Based on formulas (9)∼( 12), the whole localization algorithm is as follows.
Assumptions.The rectangular coordinate system is as shown in Figures 1 and 2. If there is a bank, the direction along the bank is .There are some nodes with the same unidirectional coordinate value.
Step 1. Give a threshold  and let the counting marks be  = 1 and  = 1.
Step 3. If the number of connected nodes is larger than 4, it can be deduced that the nodes are in the same contour; go to Step 4. Otherwise,  is adjusted to  = +1 and return to Step 2.
Step 4. Let  be the number of nodes which are on the same contour and obtained in Step 3. In the  nodes, if there are two nodes ( 1 ,  1 ) and ( 1 ,  2 ) with the same  position, the estimation of  can be calculated as (9).
Step 6. Search the minimum value in {Λ () } and let the corresponding location estimation of the minimum value be the ultimate estimation of .

(B) The Source Localization Based on the Contour in Dynamic Water
Assumptions.The rectangular coordinate system is as shown in Figure 3 and the  direction is perpendicular to the bank.The monitoring area is mesh covered densely.Under the rectangular coordinate, the symmetry axis is  = .As the diffusion is affected by the water flow, the diffusion is unidirectional.Along the flow, the straight line between the two nodes in the same contour is parallel to the -axis.In the innermost contour, if there are two nodes ( 1 ,  1 ) and ( 1 ,  2 ), one has The premise of the localization method based on the contour is that the concentration contours are axis-symmetric and there are enough nodes on the same contour.For the localization in static water, the selection nodes are in the outer contour and off the bank boundary.For the localization in dynamic water, if the pollution source in the monitoring area is as shown in Figure 4, under the mesh covered nodes deployment, the location error is 0 ∼ d, where d is the distance between the neighbor nodes in the same row (line).If the pollution source is out of the monitoring area and is as shown in Figure 5, the location accuracy is related to how far the pollution source is from the monitoring area.

Experiments
Experiment 1 (the source localization in the concentration field without boundary constraints in static water).
Background.In shallow water, of which the size is 200 cm × 200 cm and the average depth is  = 100 cm, there is a continuous source at the center.Starting from  0 = 0, the solution of MgSO 4 is discharged to the water.The background of the experiment is shown in Figure 6.The locations of the source and the sampling nodes are shown in Figure 7.In the initial state, the pollutant migrates with the solution flow.At some time, the diffusion would be stable, and the whole contaminated area can be deemed as a point source.
The diffusion process can be depicted by the diffusion model (1).The monitoring concentration values of different sensor nodes are shown in Table 1.The initial observation is at 5 s.
The Detection Using Hypotheses Testing.At different significance levels, the detection results are listed in Table 2.
The Simple Method to Detect the Source.For different thresholds, the detection results are listed in Table 3.
Localization Using Different Methods.The localization problem is to estimate the source location (, ) based on the known information such as the node locations, the sampling times, the concentration samples of nodes, and the water depth.The localization results of different methods are shown in Table 4.
In Table 4, the localization based on the diffusion model is the Least Squares method as (8) with no constraint, and the data being used is the monitoring values at 20 s.The coarse localization result is the MPA point at 20 s.In the localization based on the contour, the threshold  which connects the nodes on the same circle is 0.02 g/L, and the result is the average value of the localization result using nodes 0, 2, 4, and 6 and localization result using nodes 1, 3, 5, and 7.In the experiment, it can be seen that the performance of the localization based on the contour is the best and the coarse localization algorithm is the worst.
The Results of Experiment 1.The detection method using hypotheses testing is more stable.In the simple detection, in order to detect the pollution source timely, the threshold should be as small as possible.The performance of the localization method based on the contour is better than the coarse localization algorithm and the localization based on the diffusion model.The results of the localization based on the diffusion models vary with different initial values.
Experiment 2 (the source localization in the concentration field with a boundary constraint in static water).
Background.In shallow water, of which the size is 10 m × 10 m and the average water depth is  = 10 m, apart from the impermeable bank , there is a continuous source at (, ) = (1.05,6.05) (m).The pollution solution is discharged to the water from  0 = 0.The mass flow rate  is 100 kg/h.The diffusion coefficient is  = 1 m 2 /h.
The diffusion can be depicted by the diffusion model ( 2).The experiment is studied in a MODFOLW simulation, and the simulation values are shown in Table 5.
The Detection Using Hypotheses Testing.At different significance levels, the detection results are listed in Table 6.
The Simple Method to Detect the Source.For different thresholds, the detection results are listed in Table 7.
Comparing Table 6 with Table 7, the same conclusions as Experiment 1 can be obtained.
Localization Using Different Methods.The localization problem is based on the known information such as the node locations, the sampling times, the concentration samples of nodes, the water depth, and the diffusion coefficient to estimate the source location (, ).The localization results of different methods are as follows.
The Coarse Localization Algorithm.In this experiment, the EPA point and the MPA point are the same.If the nodes are     9.
The Results of Experiment 2. The detection method using hypotheses testing is more stable, and the detection accuracy of the simple detection method depends on the precision of the given threshold.The accuracy of the coarse localization depends on the nodes density.The results of the localization based on the diffusion model vary with initial values and are not stable.In the localization based on the contour, the effect-influenced points bring about a larger location error; the more the effect-influenced points are, the worse the localization accuracy is.The Source Detection.The same conclusions as Experiments 1 and 2 can be obtained.
Localization Using Different Methods.The localization problem is based on the known information including the node locations, the sampling times, and the concentration samples of nodes to estimate the source location (, ).As there is no specific diffusion model, in this case, only the coarse localization and the localization algorithm based on the contour are tested.The experiment results are shown in Table 10.
In the experiment, the EPA point is the same as the MPA point, which is (9, 0) m.In the localization based on contours, the threshold  which connects the nodes on the same circle is 0.02 g/L.The Result of Experiment 3. The performance of the localization based on the contour is better than the coarse localization algorithm.

Performance Analyses
Based on the Experiments.All of the above experiment results show that the detection method using hypotheses testing is more stable, and the detection accuracy of the simple detection method is related to the given threshold.The simple detection method can be more timely but the decision threshold should be small.However, if the noise in the practical applications is considered, small thresholds may bring about large false alarm rates.
The simple localization methods can only be used when the nodes are deployed densely; otherwise, the localization error is possible.The results of the localization based on the diffusion models vary with different initial values and are not stable.Actually, in the numerical calculations, the variable boundaries are set previously to ensure the convergence in the iteration calculations.The performance of the localization method based on the contour is better than the coarse localization algorithms and the localization based on diffusion models when concentration contours are axisymmetric and most of the nodes participating in the localization are with the same distances to the source.

Figure 1 :
Figure 1: The concentration field without boundary constraints.

Figure 2 :
Figure 2: The concentration field with a boundary constraint.

Figure 3 :
Figure 3: The concentration field in dynamic water.

Figure 4 :
Figure 4: The pollution source in the monitoring area.

Figure 5 :
Figure 5: The pollution source off the monitoring area.

Experiment 3 (
the source localization in dynamic water).Background.In the water area, of which the size is [0, 10] m × [0, 20] m, there is a continuous source at the location (, ) = (10, 0) (m).The pollution solution is discharged to the water with the mass flow rate  0 = 100 kg/h from time  = 0.The interval of sampling time is 1 h.The water flow is along the  direction with the flow rate  = 1 m/s.The diffusion coefficient is  = 0.5 m 2 /h.The diffusion model is[27] to the time integration and not an explicit model.The simulation tool is MATLAB.The sample nodes are mesh grid deployed in the area [0, 10] m × [0, 20] m with the average distance between the neighbor nodes of 1 m.

Table 1 :
The observations of different nodes in Experiment 1.

Table 2 :
The detection results at different significance levels in Experiment 1.

Table 3 :
The detection times for different decision thresholds in Experiment 1.

Table 4 :
The localization using different localization methods in Experiment 1.

Table 8 .
The Localization Based on the Contour.The threshold which connects the nodes on the same circle is 0.01 g/L.At 4.0 h, in the same contour, there are  points with the same distance to the source and  superimposed effect-influenced points with different distances to the source, and  +  = 5.

Table 5 :
The observations of different nodes in Experiment 2.