A Novel Statistical Model for Water Age Estimation in Water Distribution Networks

The water retention time in the water distribution network is an important indicator for water quality. The water age fluctuates with the system demand. The residual chlorine concentration varies with the water age. In general, the concentration of residual chlorine is linearly dependent on the water demand. A novel statistical model usingmonitoring data of residual chlorine to estimate the nodal water age in water distribution networks is put forward in the present paper. A simplified two-step procedure is proposed to solve this statistical model. It is verified by two virtual systems and a practical application to analyze the water distribution system of Hangzhou city, China. The results agree well with that from EPANET. The model provides a low-cost and reliable solution to evaluate the water retention time.


Introduction
Water quality will deteriorate with the increment of retention time in the water distribution system, leading to malfunctions such as disinfection by-product formation, disinfectant decay, corrosion, taste, and odor.Water age is very important for the water quality of water distribution system.The water age primarily depends on the water distribution system design and its demands.Although Brandt et al. [1] reviewed some tools to estimate the retention time and several examples presented, they conclude that there are no low-cost, effective, and reliable ways to estimate it in any circumstances.In some circumstances, these tools may be appropriate, but this is not always the case.
There are two types of tools to estimate the water age: tracer studies and numerical models.Tracer studies involve injecting chemical into the water distribution system for a fixed period, and sensors are set up at downstream nodes to determine the duration before the water containing the chemicals passes the monitoring stations.This method has been applied to calculate the water age throughout the water distribution system and calibrate the water quality and hydraulic models [2][3][4].The tracer study is useful in validating hydraulic and water quality models.However, it is seldom applied in water distribution networks for its disadvantage.Some reports [1,5] have shown its disadvantages, that is, the tracer chemical stability, continuous regulatory compliance, customer perceptions, lack of studies on the larger distribution systems, and high operational cost.Numerical models give the other way to estimate the water age in water distribution systems.The steady traveling time models were proposed by Males et al. [6].These models were subsequently extended to dynamical representations that determine varying water age throughout the distribution systems [7].A simplified model of water age in tanks and reservoirs was developed in the early 1990s [8].Many hydraulic network modeling packages incorporate certain algorithms to calculate the water age at any node in the network [9][10][11][12][13][14]. Water quality is directly related to water distribution system operation conditions.Thus, a careful hydraulic calibration is necessary under varying demand assumed for the accurate estimation of water age.Unfortunately, numerical models may have some limitations in the capability of accurately predicting the water age for the following reasons [1,5]: (1) skeletonization: the skeletonization is necessary if the water distribution system contains more pipe segments than the model can handle, and in almost all cases, the skeletonization is inevitable.The effect of skeletonization on the accuracy of water age estimation differs from system to system; (2) insufficient calibration: in most cases, the roughness cannot be estimated accurately.If 2 Mathematical Problems in Engineering the overall demand is miscalculated, it will result in more or less source and reservoir operation than what actually occurs.Misestimating the demand allocation might lead to error flow direction; (3) water storage tanks: tanks are modeled as completely mixed reactors in most models, which will lead to misestimation of the water age.
You et al. [15] have shown that the residual chlorine decay ratio per unit length is different at different time.For example, it is 0.175 mg/(L⋅Km) at peak-demand time, that is, 0.459 mg/(L⋅Km) at the minimum-demand time in Shenzhen city.You et al. [15] said that the retention time is one of the most important elements of the residual chlorine fluctuation for long distance water distribution systems.Our monitoring data in Hangzhou city is the same as theirs.Some water distribution systems have SCADA (supervisory control and data acquisition).The concentration of residual chlorine in the water system can be monitored by SCADA.In the present paper, a novel statistic model is proposed to estimate the water age in water distribution systems according to the monitoring data serials of the residual chlorine concentration from SCADA.The model is discussed theoretically and numerically.And it is also applied to predict the water age of water distribution system in Hangzhou city.The statistic model is in good agreement with EPANET 2.0 numerical results.

Governing Equations for Water Quality
A water distribution system consists of pipes, pumps, valves, fittings, and storage facilities that are used to convey water from the source to consumers.The dissolved substance travels along the pipe with the same average velocity as the carrier fluid while reacting (either growing or decaying) at certain rates.The equations governing the water quality are based on the principle of conservation of mass coupled with reaction kinetics [10][11][12].Usually, the role of longitudinal dispersion is neglectable.The conservation of mass during transport within a pipe is described by the classical one-dimensional advection-reaction equation.The advection transport within a pipe  is represented as where   is the concentration (mass/volume) in pipe  as a function of distance  and time ,   is the flow velocity (m/s) in pipe , and (  ) denotes the rate of reaction (mass/volume/time) as a function of concentration.When junctions receive inflow from two or more pipes, it is assumed that the complete mixing of fluid is accomplished simultaneously.Thus, the concentration of a substance in water when water leaves the junction is simply the flowweighted sum of the concentrations from the inflowing pipes.For a specific node , the concentration is expressed as follows: where Ω  is the set of pipes with flow into node ,   is the flow (m 3 /s) in pipe ,  ,ext is the external source flow entering the network at node , and  ,ext is the concentration of the external flow entering at node ; the notation   | =0 denotes the concentration at the start of node , while   | =  is the concentration of the tail of pipe  at node .

Water Age Estimation Model
Although more complicated models are available for modeling the decay of chlorine (e.g., [16]), the first-order decay model is popular for its convenience in implementation.The rate of reaction (  ) is as follows: where   is the decay coefficient in pipe .
Traveling along with the water trace line, (1) can be rewritten as follows: The solution of ( 4) is where  0 is the concentration (mass/volume) in the source.  denotes the decay coefficient on pipe .
Water travels from the water station to the consumer through many pipes.In most skeletal pipes, the influence of mixing is neglected.The solution of (5) at node  is as follows: where   denotes the concentration at node  and  is number of pipes through which water travels from the source to node .
The water age at node  is   = ∑  =1   .The average decay coefficient  can be expressed as  = ∑  =1     /  .Consequently, ( 6) is simplified as The residual chlorine concentration varies with the water age.The variation of residual chlorine can show the fluctuations of water age.The first-order expansion of (7) near the average water age is as follows: where  , is the concentration at node  at time ,   is the average concentration at node ,  , is the water age of node  at time  and   is the distance from the source to the node . , =   / , denotes the average velocity from the source to node  at time ;   is the average velocity from the source to node .The relation between   and  , is   = (1/) ∑  =1  , .Assume that the average velocity is linearly dependent on the water demand in water distribution systems.Thus, where  , denotes the average water demand of the whole water distribution system during the water age of node  at time  and the formula is is the mean value of  , , which can be expressed as   = (1/) ∑  =1  , .The above derivation shows that the concentration of residual chlorine at node  is linearly dependent on the average demand during the water retention time, which means the correlation coefficient between the water age and the average demand is close to unity.The length from the source to the monitoring point is const; so one gets  ,1  ,1 =  ,2  ,2 = ⋅ ⋅ ⋅ =  ,  , .The average velocity is linearly dependent on the water demand; the above function can be rewritten as  ,1  ,1 =  ,2  ,2 = ⋅ ⋅ ⋅ =  ,  , .Equation ( 9) can help us to estimate the water age at monitoring points.
Assuming that the standard deviations of   and   are constant   and   , respectively, a statistical model is built to estimate the water age at monitoring node  according to the monitoring data from SCADA: where () and   = (1/) ∑  =1  , .The objective function of the above model is the correlation coefficient between the residual chlorine and the water demand.
3.1.Solution Procedure.Equation (10) gives out the model to estimate the water age at monitoring nodes.It is not easy to solve it directly, because this model involves too many variables and the constraint conditions are difficult to deal with.In this section, a solution procedure is put forward to solve this optimal problem.This solution procedure consists of two steps.
Step 1. Assuming  ,1 =  ,2 = ⋅ ⋅ ⋅ =  , =   , the objective function in (10) is expressed as follows: where Equation ( 11) is an unconstrained optimization problem which involves only one variable, the average water age at monitor node .It is easy to estimate the average water age   .
Step 2. Because the distance from the source to the monitor node  is a constant,   =    , , the water age at node  at any time can be calculated according to the following: Since the monitored data from SCADA is discrete, the above model is transformed to a discrete model.The sampling period is Δ; the water age at the monitoring node  is   =   Δ, where   is the number of sampling periods.The objective function of ( 11) can be expressed as According to (13), the average water age  , at node  is  , Δ.Equation ( 12) can be transformed to

Verification of the Model
In order to verify the proposed model, two virtual water distribution systems, namely, the simplest system consisting of one pipeline and a complicated multisource water system, are modeled.They are also modeled using EPANET 2.0 [14] for the purpose of comparisons.
Scenario 1 (one pipeline system).The simplest system consisting of one pipe is shown in Figure 1.The length of pipe is 10 km.Two demand patterns (Figure 2) are tested.Because the concentration of residual chlorine at the source fluctuates in the real conditions, white noise of the concentration of residual chlorine at levels of 5% or 20% is mixed in the simulations.
The correlation coefficient between the residual chlorine concentration and the mean demand is shown in Figures 3  and 4. The maximum objective function (correlation coefficient) is very close to 1.0.In the first demand pattern, the maximum objective function is 0.96, the average water age is 4.46 hours, and the maximum relation error is 3%.In pattern 2, the maximum objective function and the average water age are 0.97 and 4.7 h, respectively.Figures 5 and 6 show the water  age modeled by the proposed model and by the EPANET 2.0, respectively.The maximum error is less than 0.3 hours for the first pattern and 0.5 hours for the second one, respectively.Figures 5 and 6 indicate that the statistic model agrees well with EPANET 2.0.The white noise has little influence on the results.
Scenario 2 (multisource system).A multisource system is shown in Figure 7.It consists of 91 junctions, 112 pipelines, and two reservoirs as the water sources.Nodes 193 and 211 are modeled to estimate their water age.At node 211, the water is supplied by the source RIVER.At node 193, 4% ∼30% of the water is supplied by the source of LAKE, while the other is supplied by the source of RIVER.
It is assumed that the residual chlorine concentration at two sources is 2.0 mg/L and the decay coefficient is 2.0 mg/(L⋅d).Figure 8 shows the water age at node 211.The statistical model predicts that the average water age at node 211 is 26 hours with a maximum relative error of 7%.The water age at node 193 is about 25 hours by the statistical model and about 30 hours by the EPANET model (Figure 9).At node 211, our model agrees well with the EPANET model.The accuracy is lower at node 193 than that at the former node.There are two possible reasons for errors.The first is due to the fact that this system is nonlinear, while the statistical model is based on a linear hypothesis; the second reason is the mixture of water.The water flowing into node 193 is from two sources, and the routine from the two sources to node 193 is too complicated.If the node is supplied by multisource, it should be very careful to use this statistical model.The mixed water and the complicated routine may reduce the accuracy of the model.If the node is supplied by a single source, the statistical model can make a good prediction.

Application in Hangzhou City
Hangzhou city is the capital city of Zhejiang province, located at the east of China.Its water distribution system consists of 2,639 km pipelines and 5 sources as shown in Figure 10.
It provides over 10 6 m 3 of water per day.Figure 10 illustrates the water distribution network.14 water quality monitors are distributed in the water distribution system, monitoring the residual chlorine concentration and turbidity every 15 minutes.5 monitors (S1, S2, S3, S4, and S5) are set up at water stations.2 monitors (M7 and M8) are around the division line of different sources as node 193 in Scenario 2, in which their residual chlorine concentration changes complicatedly.Scenario 2 shows that the accuracy of the statistical model is not high for this type of monitor. 2 nodes (M2 and M9) are close to the stations, where the concentration of residual chlorine varies with the source fluctuation.For example, the residual chlorine concentration at the monitor M3 fluctuates with that at S3, as shown in Figure 14.Long-term monitoring data has indicated that the residual chlorine concentrations of monitors M1, M3, M4, M5, and M6 fluctuate periodically.Thus, there are 5 nodes (M1, M3, M4, M5, and M6) that satisfy the requirement of statistic model.The statistic model is used to estimate the water age at the five monitoring points.Detailed information of modeling for 5 nodes is introduced in the following sections.
Figure 12 shows the total demand of network, the concentration of residual chlorine at water stations, and monitoring points.Red lines are filtered data by a wavelet filter.The filtered data indicate two properties of the residual chlorine concentration: (1) periodicity: the noise of the monitored residual chlorine concentration data is very high.It is very difficult to find this feature directly.The wavelet filter is an excellent tool to remove the noise.The filter data show the periodicity of 24 hours clearly.The fluctuation range varies every day.The residual chlorine concentration is always higher in the late evening and becomes lower in the early morning.(2) Detention: the change of the residual chlorine concentration is later than the change of demand of system.For example, due to the storage of the water distribution system, the highest residual chlorine concentration at M4 appears during 4:00 PM ∼ 7:00 PM, whereas the maximum demand appears at about 8:00 AM; the lowest residual chlorine concentration appears during the 6:00 AM ∼8:00 AM, whereas the minimum demand appears at about 4:00 AM.Someone may argue that the fluctuation of the residual chlorine at monitor comes from the source.According to (8), the concentration fluctuation at source will decay when the water arrives at the monitor.Figure 11 also shows that the amplitude of fluctuation in water station is lower than that at the monitoring point as described by You et al. [15].Most monitoring data in Hangzhou show that the fluctuation at water distribution network is lager than that at the water source.Thus, the fluctuation concentration of the monitors is derived from the detention time.
The statistical model always needs more monitoring samples.The decay coefficient of the residual chlorine varies with the water temperature.The decay coefficient is different There are 1440 samples for each monitoring point.Figure 12 shows the objective functions at different water ages for five monitors.We can get the optimal average water age for each monitor point directly.Figure 13 indicates the water age at different time.Table 1 shows the average water age of five monitors estimated by the model, and the results agree well with that estimated by the EPANET model.The fluctuation range of the water age estimated by EPANET is lager than that of the present model.The fluctuation range in the present model is about 15%∼ 25%, while it is about 20% ∼ 35% in EPANET model.Now there is no measurement data to verify which model is more reliable.However, the model provides an acceptable estimation of the water age.
Table 1 also shows that the simpler routine from the source to the monitors gives the larger objective function.The simplest routine is M4's, and its correlation coefficient between residual chlorine concentration and total demand attains 0.95, which is very close to 1.The second is M5's, with a correlation coefficient of 0.90.The routines from the sources to M1, M6, and M3 are more complicated; so their correlation coefficient is lower.The mixture of water in the water distribution network may affect the accuracy of the statistical model.
Besides EPANET model, another method is utilized to verify the model.If the monitor is very close to the sources, the residual chlorine concentration at the monitoring point will fluctuate with that of sources.We can determinate the water age directly through both monitoring data sequence of the monitoring point and the source.For example, if the residual chlorine rises to a peak value of 2 mg/L at 9 o' clock, we may find that the residual chlorine concentration at the downstream monitoring point rises to a peak value of 1.5 at 10 o' clock.The water age at the monitoring point is 1 hour, and the decay coefficient can be directly calculated according to (7).If the present statistical model is reliable, the decay coefficient estimated that the two methods should be in agreement.The water of M2 and M3 is supplied by S3.Since M2 is close to S3, the residual chlorine concentration in Mathematical Problems in Engineering M2 varies with that of S3, as shown in Figure 11.According to their impulse of the residual concentration at S3 and M2, the water age at M2 is about 2.5 hours, that is, about 3 hours by EPANET.The average residual chlorine concentration is 1.479 (L⋅h) at S3 and is 1.267 (L⋅h) at M2. Thus, the average decay coefficient is  = ln( 0 /)/ = ln(1.479/1.267)/2.5 = 0.619 mg/(L⋅h).The average residual chlorine concentration at M3 is 0.456 mg/L, and its average water age estimated by the statistical model is about 20.75 hours.Thus, the average decay coefficient is  = ln( 0 /)/ = ln(1.479/0.456)/20.75= 0.567 mg/(L⋅h).The relative error of tow decay coefficients by two methods is less than 10%, which further illustrates the reliability of the statistical model.

Limitations
Regarding other tools to estimate the water age summarized by Brandt et al. [1], there are also some limitations for the present statistical model.The application in Hangzhou city shows that the model does not work well when the monitor is located near the water division line of different sources, such as M8.The data from monitor M8 are periodical, but their correlation coefficient between the residual chlorine concentration and the demand is negative.This is because that this node is at the water division line of the sources S4 and S5, and it is supplied by two water sources alternatively.If the water at monitoring point is supplied by two or more sources, the model will make a prediction with less accuracy or fail.Engineers should carefully handle the cases under such condition.The second limitation is the monitor (such as M2 and M9) being too close to the source.Meanwhile, the effect of the time becomes not important.The periodical fluctuation is distorted by the noise; or the concentration at monitors will fluctuate with that of the source; so the model will fail.It is suggested that the statistic model should be applied for nodes in which the water ages are larger than 5 hours.

Conclusions
A

Figure 7 :
Figure 7: Water distribution network for case 2.

Table 1 :
Water age at monitors.Another serious problem is the drift error of chlorine sensors.The drift error appears in the long-term working chlorine sensor, and it becomes serious gradually.Calibration will make the monitoring serial discontinue.The more the monitoring samples are, the more the unpredicted factors affect the model.We should select the sample when the water temperature and sensors are stable.It is believed that 15 days' to 30 days' samples are enough for this statistical model.In the present paper, 15 days' samples are used to estimate the water age.The sample period Δ is 15 minutes.
statistical model to estimate the water age of water distribution networks has been proposed in the present paper.The model is based on the solution of the advection transport equation governing the residual chlorine.A simple two-step solution procedure for the model has been given out.The model was tested by the simplest one-pipeline system and a multisource system.The numerical test indicates that the model agrees well with the EPANET model, if monitoring points are not around the water division line.The model is also applied to the water distribution system in Hangzhou city.The results agree well with those estimated by using EPANET model.The comparison of the decay coefficients estimated by another methodproves that the result obtained by the statistical model is reliable.This method does not need complicated calibration as numerical models and also does not need high operation cost as tracer studies.If the water distribution system has SCADA to monitor the residual chlorine concentration, this statistical model will be a simple and effective tool to estimate the system's water age without additional costs.Figure 11: Residual chlorine concentration at water stations and monitoring points, the total water demand.Red solid lines are filtered data., : Average total water demand in whole system during water age of node  at time  : W aterage  , : Waterageatnode at time  (  ): Rate of reaction (mass/volume/time), function of concentration  , : Average velocity from source to node  at time    : Averaged velocity for all time   : Flow velocity (length/time) in pipe  Ω  : Set of pipes with flow into node    : Standard deviation of  Figure 12: Objective function for different water age.
Figure 14: Residual chlorine concentration at M2, S3, and M3.Solid line is at S3, dashed line is at M2, and doted line is at M3.