Applying Hybrid Heuristic Approach to Identify Contaminant Source Information in Transient Groundwater Flow Systems

Simultaneous identification of the source location and release history in aquifers is complicated and time-consuming if the release of groundwater contaminant source varies in time. This paper presents an approach called SATSO-GWT to solve complicated source release problems which contain the unknowns of three location coordinates and several irregular release periods and concentrations. The SATSO-GWT combines with ordinal optimization algorithm (OOA), roulette wheel approach, and a source identification algorithm called SATS-GWT.The SATS-GWT was developed based on simulated annealing, tabu search, and threedimensional groundwater flow and solute transport model MD2K-GWT.The OOA and roulette wheel method are utilized mainly to reduce the size of feasible solution domain and accelerate the identification of the source information. A hypothetic site with one contaminant source location and two release periods is designed to assess the applicability of the present approach.The results indicate that the performance of SATSO-GWT is superior to that of SATS-GWT. In addition, the present approach works very effectively in dealingwith the cases which have different initial guesses of source location andmeasurement errors in themonitoring points as well as problems with large suspicious areas and several source release periods and concentrations.


Introduction
The issues of identifying the source location and/or recovering the release history of a groundwater contaminant plume are getting more and more public concern recently.In some countries, groundwater is an important source for drinking water and agricultural use.If a site is found to have groundwater contamination, the source information including the location as well as release concentration and period should be determined before taking remedial actions.Site remediation is usually quite costly, so the responsible parties for the contamination should be recognized via the source identification works.In addition, incorrect information on contaminant source may confuse or mislead remediation plan.The technique for identifying contaminant source location and its release history is therefore important in dealing with the groundwater contamination problem.Moreover, if the source release varies in time, the estimation of the actual source information becomes rather complicated and difficult.Thus, there is a need to develop an effective approach for identifying the contaminant source location and its release history based on the concentration measurements.
Identification of unknown contaminant sources in groundwater is an inverse problem.Mathematically, the processes of contaminant transport in groundwater are irreversible and their inversion solutions are sensitive to the errors in the observation data, especially when data are sparse or missing (e.g., [1][2][3][4][5][6]).Atmadja and Bagtzoglou [1] pointed out that the groundwater source identification is an ill-posed problem because the solution may not be unique and stable.They reviewed the available methods for source location identification and the release history reconstruction and classified them into four categories: optimization approaches, probabilistic and geostatistical (GS) simulation approaches, analytical solution and regression approaches, and direct approaches.Tracking the contaminant source location usually needs to run forward simulations with an initial guess solution and then to search the best-fitted solution via an optimization approach.Probabilistic and GS simulation approaches employ several probabilistic and statistical techniques to assess the probability of the location of the sources (Sun [7]).Atmadja and Bagtzoglou [1] indicated that this approach is applicable only when 2 Mathematical Problems in Engineering the location of the potential source is known in advance.Regression approaches along with analytical solution can estimate all the parameters simultaneously but work well only for simple aquifer geometry and flow conditions.Direct approaches reconstruct the release history by solving governing equation directly.
For the third-type problem, Aral and Gaun [35] proposed an approach called improved genetic algorithm (IGA) to determine the information of source location, leak rate, and release period.They showed that the results obtained from the IGA match with those obtained from linear and nonlinear programming approaches.Later, they further developed an iterative approach based on genetic algorithm (GA) algorithm, defined as progressive genetic algorithm (PGA), in conjunction with a finite element groundwater flow model (Aral et al. [43]) to identify the coordinates of source location and release history in a two-dimensional steadystate groundwater flow system.In their case study, the source release history was assumed to be over 180 months and four monitor wells were installed at the downstream of the source location.The contaminant concentration was sampled at each well every 3 months.Therefore, there were totally 240 observation data used in identifying the source location and reconstructing the release history.Mahar and Datta [2] used an optimal source identification methodology based on embedded optimization models to estimate the release concentrations of multiple hypothetical contaminant sources over discrete time intervals using breakthrough curve data.Their study successfully identified the source information for flow in both steady and transient states.Bagtzoglou [36] modified the reversible-time particle tracking method by introducing a variance minimization procedure to backtrack groundwater solute concentration profiles and identify the most probable source location.Bagtzoglou and Atmadja [37] presented a comprehensive literature review on the mathematical methods for hydrologic inversion and the identifications of the contaminant source location and timerelease history.Neupauer and Lin [38] extended the work of Neupauer and Wilson [44] by conditioning the backward probability density functions of source location to measured concentration data.Sun et al. [39] employed a constrained robust least squares (CRLS) method to recover the release history of a single source, and the results of CRLS in their hypothetic example are better than several classic methods (i.e., ordinary least squares (LS), standard total least squares (TLS), and nonnegative least squares (NNLS)).In addition, they further employed the CRLS combined with a branchand-bound global optimization solver for identifying source locations and release histories (Sun et al. [40]).In their numerical examples, a two-dimensional and steady-state flow field was developed.Totally 57 sampled observation data from eleven observation wells were utilized for release history identification.Later, Sun [7] developed a robust version of the GS approach, namely, the robust geostatistical (RGS) approach, to explicitly illustrate the contaminant release history identification in a 2D heterogeneous aquifer with the hydraulic conductivity exhibiting spatially lognormal distribution.Yeh et al. [15] constructed a source identification model, SATS-GWT, by combining simulated annealing (SA), tabu search (TS), and MF2K-GWT, to identify the constant source release problem.Their model can determine the information of contaminant source with a constant release rate in a three-dimensional (3D) transient groundwater flow system.Ababou et al. [41] presented a new and stable methodology for pollutant source identification in terms of unknown initial position and past history based on the reverse antidiffusive random walk scheme.Butera et al. [42] introduced the simultaneous release function and source location identification (SRSI), which was capable of simultaneously identifying the source location and release history of the contaminant in 2D confined aquifers with strongly nonuniform flow fields.
Ho et al. [45] presented an approach called ordinal optimization algorithm (OOA) which can solve complex optimization problems, which usually require huge amount of computing time in obtaining the optimal solution, effectively and accurately.The OOA is suitable for solving optimization problems with sifting the most possible solution for further evaluations (e.g., [46][47][48][49]).
This study aims at developing a novel approach called SATSO-GWT to identify the source location and release history in a 3D, heterogeneous, and transient groundwater flow system.The approach combines the model MD2K-GWT for simulating the groundwater flow and pollutant transport with heuristic optimization techniques such as SA, TS, OOA, and roulette wheel method.This new approach has the advantages of avoiding possible trap in a local optimum and improving the computational efficiency when the searching space of problem becomes very large.A hypothetic case is design to assess the applicability of the SATSO-GWT.In addition, three cases are considered to assess the performance of SATSO-GWT.They are (1) different initial guesses of source location, (2) various measurement errors in the monitoring points, (3) a large suspicious area with six release periods and concentrations.

Methodology
2.1.Groundwater Flow and Transport Simulation.Darcy's law can be written as (Konikow et al. [50]) where   is a vector of the average linear velocity of groundwater flow [L/T],  is the effective porosity (dimensionless),   is the hydraulic conductivity tensor of the porous media [L/T], ℎ is the hydraulic head [L], and   are the Cartesian coordinates.Combining Darcy's law with the continuity equation, the 3D groundwater flow equation can be expressed as (Konikow et al. [50]) Mathematical Problems in Engineering 3 where   is the specific storage and  is the volumetric flux per unit volume (positive for inflow and negative for outflow [1/T]).Equation ( 2) can be used to predict the hydraulic head distribution in a 3D groundwater flow system.The governing equation for 3D solute transport in groundwater can be written as (Konikow et al. [50]) where  is the contaminant concentration [M/L 3 ],   is a second-order tensor of the dispersion coefficient [L 2 /T], and   is the concentration of the source or sink fluid [M/L 3 ].The average linear velocity   can be determined by ( 1).
The computer model MF2K-GWT developed based on ( 2) and ( 3) by the United State Geological Survey can be used to simulate the groundwater flow and contaminant transport simultaneously.This model combines the modular 3D finite-difference ground-water flow model, MODFLOW-2000 (Harbaugh et al. [51]), and the 3D method-ofcharacteristics solute-transport model (MOC3D) (Konikow et al. [50]) to simulate groundwater flow field and spatial and temporal plume distribution.

Simulated Annealing.
Press et al. [52] mentioned that SA is a technique suitable for solving large-scale optimization problems.The concept of SA is based on an analogy to crystallization of a solid annealing from a high temperature state.If the temperature is cooled properly, a most stable crystalline structure of the solid will be obtained with minimum energy state.The possible solution spaces for a problem to be solved looks like different crystalline structures and the optimal solution of the problem is equivalent to the most stable crystalline structure.
In the SA, the Metropolis mechanism (Metropolis et al. [53]) is employed to determine the acceptance of adjacent solution.This mechanism dictates that the SA is capable of accepting bad trial solution to avoid the problem of being trapped in the local optimal solution.More details of the introduction of SA are available in Metropolis et al. [53] or Yeh et al. [15].The SA has been successfully applied to various types of problems such as aquifer parameter estimation (e.g., [54][55][56]), pipe wall surface reaction rate (e.g., [57]), and pumping source information (e.g., [58]).[59] proposed two main concepts of TS: memory and learning.Through memory and learning, the TS is of more intensification and diversification in algorithm.Memory means to memorize the past solutions to avoid the repetition of evaluations.During the process of learning, the result of next experiment infers from the memorized prior result.A better result may encourage the next trial to increase the accuracy of the obtained solution.Then through the learning process, the succeeding search can focus on better solutions but not wasting time on worse solutions.According to these two ideas, TS utilizes the tabu list and aspiration criterion to interdict or to encourage some trial solutions during the iterative process.The utility of the tabu list is to memorize some previously evaluated trial solutions.The goal of the aspiration criteria is to release some of the solutions memorized in the tabu list to avoid the iteration cycling and may finally be trapped in a local optimal solution.

Tabu Search. Glover
The TS has been successfully applied to solve groundwater problems such as the identification of optimal parameter structure (Zheng and Wang [60]) and the determination of spatial pattern of groundwater pumping rates (Tung and Chou [61]).The iterative procedure of TS in Yeh et al. [15], which contains the components of initial guess, candidate solution and movement, tabu list, and aspiration criterion, is adopted in this study.

Ordinal Optimization.
Recently, the OOA has been applied to many areas related to simulation-based complex optimization problems.The OOA has two major tenets: ordinal comparison and goal softening procedures.The first procedure is to see if there is a relative relationship between each solution because it is much easier to find better solutions.The second procedure is to determine a reliable and good enough solution instead of directly evaluating the optimal solution in a complex optimization model.Therefore, this procedure reduces the consumption of computation and obtains the optimum solution from the feasible solution space.To get top proportion solutions is much easier than to find out the best one.Lau and Ho [47] showed that the OOA ensures that top 5% solutions can be regarded as good enough solutions and are of very high probability (≧ 0.95) to be reliable.
According to the OOA, all the possible trials are estimated roughly and ranked quickly.The feasible solution domain is divided into several different parts, and the possible optimum solution located in subdomain might be effortlessly recognized.The optimum solution can then be obtained while all the calculation efforts are focused on searching the possible subdomain.A crude model should first be employed to estimate and rank the solution, and the good solutions can be separated from the bad ones.The goal softening procedure then concentrates on the top proportion solutions in order to find the optimum solution.Accordingly, the simulation time can be considerably reduced.The OOA has been successfully applied to many areas such as power system planning and operation (Guan et al. [62]; Lin et al. [63]), electricity network planning (Liu et al. [64]), and wafer testing (Lin and Horng [65]).

Roulette Wheel Method.
The roulette wheel method is an important part of GA.The concept of GA is based on the survival of the fittest by natural selection.Better solutions have larger areas occupied on roulette wheel and the corresponding solutions will have higher chance to be selected.Through the procedure of not evaluating the bad solutions, computer time can be considerably reduced.wheel selection method.The objective function in SATSO-GWT is to minimize the sum of square errors between the simulated and observed concentrations and defined as

SATSO-GWT
where  is the total number of monitoring wells,  is the number of observed concentration measured in a monitoring well,  ,sim is the simulated concentration at th terminated time period in th monitoring well, and  ,obs is the observed concentration sampled at th terminated time period in th monitoring well.The value  ×  is generally greater than the number of unknowns (Yeh et al. [15]).Equation ( 4) is used to calculate the objective function value (OFV) of the trial solution generated by the present approach.
To use MF2K-GWT, the problem domain has to be discretized into block-centered finite difference meshes.A block including several finite difference meshes is then chosen as a suspicious area which includes the contaminant source.Figure 1 displays the flowchart of SATSO-GWT while Figures 2 and 3 show the flowcharts of TS process and OOA in SATSO-GWT, respectively.All meshes in the suspicious area are called candidate source locations (CALOs).The first step of SATSO-GWT is to calculate the initial OFV based on the initial guesses of the source location, release periods, and release concentrations.The initial guess of source location is considered as the current location (CULO) and the initial OFV is set as the current global optimal objective function value (OFV GO ).Then SATSO-GWT generates one CALO and  trial solutions for the source release periods and concentrations.For each set of trial solutions, MF2K-GWT is employed to predict the simulated concentrations at  the monitoring points using (3) and the OFV corresponding to each set of trial solution is then computed using (4).Each CALO is regarded as one subdomain and the best combination of the source location and the release periods and concentrations in the monitoring points, i.e., the least objective function value at each CULO (hereinafter referred to as OFV CULO ), is recorded at each subdomain.The TS process (Figure 2) is then applied to determine whether the CULO is in tabu list.If not, the CULO is moved to tabu list and the OFV GO is replaced by OFV CULO if OFV CULO < OFV GO .Otherwise, check the aspiration criterion (i.e., OFV CULO < OFV GO ).If OFV CULO < OFV GO , the OFV GO is replaced by OFV CULO and the CULO is set as new CALO.On the other hand, the global optimal location (GOLO) is assigned as new CALO based on Metropolis criterion defined as where   is the acceptance probability of the trial location and  is the current temperature defined by SA.A random number ranging from zero to one is generated to compare with   .The GOLO will be rejected if   is less than the random number.Totally,  CALOs are generated at each temperature; therefore,  sets of the combinations are obtained.After generating 3 times of CALOs, the top 5% best subdomains can be sifted by the OOA as demonstrated in Figure 3 The roulette wheel method is then applied so that the better combinations (lower OFVs) regarding source release information have larger probability to be chosen.In reality, the real source location falls in the top 5% best combinations.The algorithm is terminated when the OFVs are less than 10 −6 four times successively.Finally, the latest updated solution, including the estimated location and the release concentrations and time periods, is considered as the final solution.

Hypothetic Contamination Site and Identification Results.
A hypothetic site shown in Figure 4 is designed to test the applicability of SATSO-GWT for a source identification problem.The domain of the site is divided into 27 × 27 × 4 finite difference meshes in -, -, and -directions.Both the grid width and length are 20 m and the grid height is 6 m.Thus, the total length and width of the site are both 540 m, and the aquifer thickness is 24 m.The site is heterogeneous and divided into three different areas with the hydraulic conductivities of 20 m/day, 10 m/day, and 30 m/day in areas I, II, and III, respectively.The aquifer porosity, specific storage, and hydraulic gradient are 0.3, 10 −4 m −1 , and 0.009, respectively.The recharge rates are assumed to be 120 mm/year, 80 mm/year, and 100 mm/year in areas I, II, and III, respectively, in the first 180 days.The contaminant is assumed to be no decay and not adsorbed by the aquifer media.The dispersion coefficients in -, -, and -directions are 40 m 2 /day, 10 m 2 /day, and 1 m 2 /day, respectively.
The boundary conditions for the flow system are illustrated in Figure 4.The slash grids represent no flow boundary.The origin of the vertical coordinate is taken at the land surface.The source S1 is located at coordinates (110 m, 270 m, −9 m) and the rate of source release () is 1 m 3 /day with the concentrations of 100 ppm and 50 ppm over first and second 180 days, respectively.There are seven unknowns to be determined including three coordinates of the source location and two release periods and release concentrations.Yeh et al. [15] mentioned that the number of sampling points should be greater than the number of unknowns.Accordingly, eight sampling points, i.e., points A to H shown in Figure 4, with various depths are considered.The A2 represents that the concentration measurement is sampled from second layer below the ground surface at point A.
The concentration measurements at these sampling points are listed in Table 1.The groundwater transport model MF2K-GWT is utilized to generate the concentrations at these monitoring wells and the SATSO-GWT is used to determine the source information.Before the source is identified, a block with 3 × 3 × 4 meshes is delineated as a suspicious area which contains the contaminant source.Thus, there are 36 candidate sources within the block and one of the candidates is the real source.The lower and upper bounds of the release period are taken as 0 day and 400 days, respectively, and the lower and upper bounds of the release concentration are considered 0 ppm and 200 ppm, respectively.If the measures of release period and concentration have the accuracy to the first decimal place, the feasible solution domain will be 36 × 4000 2 × 2000 2 .Such a solution space is very huge and poses a large computational burden to find the contaminant source information.Therefore, the OOA is adopted in SATSO-GWT in the identification process.Once the SATSO-GWT generates the CALOs for 3 times, the top 5% combinations with different source locations can be sifted.To state more specifically, there are 2 best locations (36 × 0.05 = 1.8 ≈ 2) that can be sifted.The obtained results of the sifted locations from eight different initial locations are listed in Table 2 indicating that the real source location (110 m, 270 m, −9 m) already falls within the top 2 best locations and, thus, the solution space is largely reduced.Note that the parameters , , initial temperature, and reduce temperature factor are taken as 20, 10, 0.5, and 0.7, respectively, in this case study.Table 3 shows the identification results of the case study using SATS-GWT and SATSO-GWT.The same SA parameter values and initial location, i.e., at coordinates (290 m, 130 m, −21 m) are used for these two approaches.The SATS-GWT obtains correct source location but has deviated results for the release period and concentration.In contrast, the information of contaminant source is accurately identified by SATSO-GWT.Moreover, SATS-GWT takes about 1 day and 10 hours to obtain the result while SATSO-GWT only consumes about 12 hours and 36 minutes performing on a personal computer with Intel 3.3 G E3-1230v2 CPU and 16 GB RAM.From this table, the performance of SATSO-GWT is much superior to that of SATS-GWT.
To further assess the performance of SATSO-GWT, the following three cases are considered: (1) different initial guesses of source location; (2) various measurement errors in the monitoring points; and (3) a large suspicious area with various source release periods and concentrations.

Different Initial Guesses of Source Location.
In this case, eight scenarios with different initial locations are considered.Eight different source locations, situated at the corners of the area, are chosen to investigate the influence of different initial locations.Table 4 shows the identified results of the source location as well as two release periods and concentrations.data within 360 days.In these eight cases, the estimated source locations are all correct.In addition, the estimated release periods and concentrations are fairly good when compared with the real release data.

Measurement Errors in the Monitoring
Points.The second case is to assess the performance of SATSO-GWT when the simulated sampling concentrations contain measurement errors.The disturbed observed concentrations are expressed as (Mahar and Datta [2]) where   ,obs is the disturbed observed concentration, Er is defined as the level of measurement error, and RD 1 is a random standard normal deviate generated by the routine RNNOF of IMSL [66].Three different values of Er, 1%, 5%, and 10%, are considered for this problem.
The predicted results shown in Table 5 indicate that the source locations of those three scenarios are all correctly identified.When Er = 1%, the optimal OFV is 0.490 × 10 −7 .As Er = 10%, the OFV is 13.04 × 10 −7 .The predicted release periods and concentrations are slightly deviated from the target values in scenario 3 but still are acceptable.Figure 6 shows the temporal concentration at monitoring well A2 predicted by SATSO-GWT.The results indicate that even though the sampling concentrations contain measurement errors whose level is up to 10%, the proposed SATSO-GWT    still gives fairly good results in reconstructing the release history and identifying the source location.

Larger Suspicious
Area with Six Release Periods.In Section 3.1, it has been shown that SATSO-GWT can reduce the feasible solution domain based on OOA for a complex source information identification problem.Thus, the SATSO-GWT is applied to the case of a large suspicious area which has 100 candidate source locations (5 rows × 5 columns × 4 layers) delineated by the dashed lines as shown in Figure 7.This case considers six source release periods within a year and each period has an interval of two months.The concentrations in those six release periods are assumed to be 100 ppm, 200 ppm, 150 ppm, 50 ppm, 100 ppm, and 70 ppm.Therefore, there are fifteen unknowns involved in this case (i.e., three coordinates of the source location, six release periods, and six release concentrations).Thus, sixteen sampling data from wells A to H at 360 days and 390 days are considered and listed in Table 6.
The parameter  associated with the generated locations by TS process at each temperature is taken as 25 to accommodate larger candidate locations.Because the number of the total CALOs is modified as 100, there are 5 best locations (5%) chosen by OOA.The upper part of Table 7 displays the top 5 best locations and the rank number 1 of sifted location exactly matches the real one.In Table 7, SATSO-GWT gives correct identification of source location and the predicted source release history is very close to the target one.This case has fifteen unknowns and the SATSO-GWT takes about 12 hours to obtain the results when performing on a personal computer with Intel 3.3 G E3-1230v2 CPU and 16 GB RAM.Obviously, the SATSO-GWT works very well in identifying the source information even when the suspicious area is large and the release pattern is rather irregular.

Concluding Remarks
A novel identification approach, SATSO-GWT, is developed to combine the model MD2K-GWT for simulating the groundwater flow and pollutant transport MD2K-GWT with heuristic optimization approaches such as SA, TS, OOA, and roulette wheel method.This new approach is capable of simultaneously identifying the pollution source location and release history in a 3D, heterogeneous, and transient groundwater flow system.A hypothetic contamination site consisted of 27 × 27 × 4 finite difference meshes along with a suspicious area of having 3 × 3 × 4 meshes is designed to assess the capability of the present approach.The site is divided into three different areas and each area has different hydraulic conductivity and surface recharge rate.The contaminant source is continuously released over two periods with different concentrations.The present approach successfully identifies the source location and corresponding release periods and concentrations.Moreover, the results obtained from the same problem indicate that the performance of SATSO-GWT is much superior to that of SATS-GWT.Three cases are designed to further assess the performance of SATSO-GWT.These are (1) different initial guesses of source location; (2) various measurement errors in the monitoring points; and (3) a large suspicious area with various source release periods and concentrations.The SATSO-GWT gives exact identification in source location and accurate predictions of release periods and concentrations in eight scenarios with different initial guess locations.In addition, the SATSO-GWT gives fairly good results when the sampling concentration having measurement errors, even when the error level is up to 10%.For a large suspicious area with six release periods and concentrations, the SATSO-GWT can also give excellent results which demonstrate its capability of dealing with complex optimization problems.The SATSO-GWT has been shown to be an efficient tool in solving the complicated groundwater source identification problems.

Figure 1 :
Figure 1: Flowchart of SATSO-GWT.The OFV represents the objective function value, CALO represents the candidate location, and OFV CULO represents the optimal OFV at current location.

Figure 4 :
Figure 4: The groundwater flow system has an area of 540 m by 540 m and the domain is divided into three areas with different hydraulic conductivities and recharge rates.The real source is located at S1 and A to H represent the monitoring wells.The slash grids represent no flow boundary.

Figure 5 1 Figure 5 :
Figure 5: The temporal concentration distribution at monitor well A2 with 8 different initial guesses of source location.

Figure 6 :
Figure 6: The temporal concentration distribution at monitor well A2 with 3 measurement error levels.

Table 1 :
The sampling points and concentration measurements.

Table 2 :
Results of 8 scenarios for sifting the top two locations.

Table 4 :
Results of 8 scenarios with different initial guesses of source location.

Table 5 :
Results of three scenarios when observed concentrations have measurement errors with different levels.
Note that the real source is located at (110 m, 270 m, −9 m); real release concentration is 100 ppm over the first 180 days and 50 ppm over the second 180 days.

Table 6 :
The sampling points and concentration measurements in third case.