Performance Evaluation of Two ANFIS Models for Predicting Water Quality Index of River Satluj ( India )

Water quality index is the most convenient way of communicating water quality status of water bodies, but its evaluation requires subjectivity in terms of user involvement and dealing with uncertainty. Recently, artificial intelligence algorithms that are appropriate for nonlinear forecasting and also dealing with uncertainties have been applied to various domains of water quality forecasting. +is paper focuses on development of a data-driven adaptive neurofuzzy system for the water quality index using a real data set obtained from eight different monitoring stations across River Satluj in northern India. Novelty in the paper lies in the estimation of water quality index using two different clustering techniques: fuzzy C-means and subtractive clustering-based ANFIS and assessing their predictive accuracy. Eachmodel was used to train, validate, and test the index that was obtained from seven water quality parameters including pH, conductivity, chlorides, nitrates, ammonia, and fecal coliforms. +e models were evaluated on the basis of statistical performance criteria. Based on the evaluations, it was found that the SC-ANFISmethod gavemore accurate result as compared to the FCM-ANFIS.+e tested model, SC-ANFIS model, was further used to identify those sensitive parameters across various monitoring stations that were capable of causing change in the existing water quality index value.


Introduction
Surface water quality evaluation is an issue that draws the attention of regulatory agencies time and again for the purpose of safeguarding various intended uses.In this regard, continuous water quality monitoring is undertaken so as to assess the water quality and thereof propose adequate measures for its management.One of the several ways in which the large amount of data generated from the monitoring stations is assimilated for easy communication to the stakeholders is water quality index (WQI).e water quality index is a single number that expresses water quality by aggregating the measurements of several water quality parameters.Its output ranges from 0 to 100.A value of 100 represents excellent water quality condition, while zero indicates water not suitable for the intended use.
Several advancements are seen in evaluating WQI, ever since Horton [1] proposed the first index in the year 1965.
ese advancements have mainly come in the form of using soft computing tools such as data mining techniques, artificial intelligence, and fuzzy modeling system.e techniques have the capability to take care of uncertainties that often accumulate in a traditional way of evaluating WQI [2,3].Babbar and Babbar [3] employed different data mining techniques such as support vector machines, k nearest neighbor, decision trees, and naïve bayes to develop the predictive environment to classify water quality into understandable terms based on the overall index of pollution.Various researchers have successfully employed AI techniques to evaluate the water quality status [4][5][6][7][8][9][10][11][12][13].However the choice of methods depends on the quantity and quantity of the data available.Fuzzy logic methods on the other hand are heuristic system description that uses if-then rules to establish quantitative relationships among the input and output variables [14].However, the main problem with the fuzzy model is that there is no systematic procedure to define the membership function parameters, which must be predetermined by expert knowledge about the modeled system.At the same time, artificial neural network (ANN) has the ability to learn from an input and output pair and adapt to it in an interactive way.Based on these understanding, a hybrid technique adaptive neuro-fuzzy inference system (ANFIS) is gaining popularity in dealing with ill-defined and uncertain domains such as water quality predictions.ANFIS is a technique that embeds the fuzzy inference system into the framework of adaptive networks.ANFIS thus draws the benefits of both ANN and fuzzy techniques in a single framework.One of the major advantages of the ANFIS method over fuzzy systems is that it eliminates the basic problem of defining the membership function parameters and obtaining a set of fuzzy if-then rules.e learning capability of ANN is used for automatic fuzzy ifthen rule generation and parameter optimization in ANFIS [15].Jang [16] introduced the concept of ANFIS and since then it has been successfully implemented in various water quality problems.Yan et al. [17] developed the ANFIS model for classifying the water quality status of river and compared its performance with ANN.In this study, different types of membership functions such as generalized bell, Gaussian, trapezoidal, and triangular were compared and tested for training and testing data.e best-fit model was obtained using the Gaussian membership function.Sahu et al. [18] employed ANFIS to predict WQI of groundwater.
Rahimzadeh et al. [19] suggested ANFIS approach for the prediction of oily wastewater microfiltration permeate volume.Similarly, Talebizadeh and Moridnejad [20] compared ANN with ANFIS in forecasting lake level fluctuations, where ANFIS turned out to be superior to ANN in terms of efficiency.For water quality prediction, ANFIS has been applied for biochemical oxygen demand estimation based upon other water quality parameters as inputs [21].Despite prediction model improvements, increase in model accuracy while avoiding overfitting is still a challenge for most researchers.According to recent researches, model performance can be significantly improved if an appropriate hybrid of multiple models is used for forecasting and prediction than using a single model in this regard [22].
As is apparent from the above studies, ANFIS has its popularity in terms of its applications and is also well established in the literature, but the choice of algorithms in automatically generating membership function is not given due considerations.erefore, this study presents the automatic generation of membership function for WQI using fuzzy C-means (FCM) and subtractive clustering-(SC-) based ANFIS model which yields the homogeneous clusters (or classes) of WQI. e clustering method is one of the important methods for data analysis and decision making that allows retrieval of the useful information by grouping or categorizing multidimensional data in clusters.Since water quality data are a multidimensional data, it is expected that by performing two different clustering techniques a model performance can be evaluated and compared.
us the present study aims to develop the ANFIS model based on two different clustering methods and statistically identify the best amongst the two.e second objective is to employ the identified best model to find out the most sensitive water quality parameter that can cause a change in the predicted water quality.

Study Area.
River Satluj is one of the five rivers of Indus River system that joins Indus River on its eastern side.It originates from the Manasarovar-Rakas Lakes in western Tibet at a height of 4,570 m within 80 km of the source of the Indus.In its course of travel, it flows through the Himalayan range in the Indian state of Himachal Pradesh and enters in the plains of Punjab from the Shivalik hills near Nangal, India.e river carries historical importance because of the fact that its water allocation is a part of famous Indus water treaty between India and Pakistan and also world's highest gravity dam, Bhakra Nangal Dam, is built across the river at a point where it enters Punjab (India).Satluj River is extensively used for irrigation as well as drinking purposes.e hydrology of the river is controlled by snowmelt from Himalayas and South Asian monsoons.
is study focuses 238 km stretch of Satluj River flowing in the Punjab state of India.e watershed area of the river in this stretch is about 10,880 km 2 and is geographically bounded between 31 °45′ N, 74 °57′ E and 30 °45′ N, 76 °50′ E. e river is regularly monitored at 8 monitoring stations by apar University, Patiala (India), under the river monitoring program initiated by the Government of India in 1996.ese monitoring stations are strategically located for comprehensive study on water quality of the river in Punjab (India).e locations of the monitoring stations along the main river are shown in Figure 1.
Studies have shown that the water quality U/S of Headwork Nangal is of "A" class or of pristine quality.But further downstream, the river quality deteriorates in the study stretch area. is deterioration in the river quality is because of release of toxic effluents from industries and untreated sewage discharges, making river water unfit for any beneficial use.

Water Quality Parameters.
e water quality parameters that are routinely monitored at eight monitoring sites are pH, conductivity, chlorides, dissolved oxygen (DO), 5-day biochemical oxygen demand (BOD 5 ), total dissolved solids (TDS), suspended solids (SS), ammonical-N, nitrates, total phosphorous (TP), and fecal coliform (FC).All these parameters are monitored on monthly basis and analyzed according to the APHA standard methods [23].For the purpose of this study, the historic water quality data from the year 1996-2012 are taken into consideration.For each station and for each parameter, the month wise 16-year average concentration was computed as given in Table 1.From Table 1, it is visible that water quality deteriorates in terms of DO (% sat.), BOD, and fecal coliform from station 6. is is because confluence of small rivulets: Budha Nallah and Chitti Bein happens after station 6.Both these rivulets discharge a large quantity of industrial and domestic effluent in River Satluj.

Water Quality Index.
e formulation for the water quality index for River Satluj is given in the study of Sharma and Reddy [24].In it, the calculations for water quality index Mathematically, the WQI is expressed as where Q i is the subindex for ith water quality parameter, W i is the weighting factor of the weight associated with each water quality parameter based, and n is the number of water quality parameters.In this study, WQI computed was meant for municipal and domestic water use.e parameters considered in this particular category were pH, conductivity, chlorides, TDS, ammonia-N, nitrates, and fecal coliform.e selection of the  Advances in Civil Engineering parameters was based on the fact that these parameters are considered useful for characterizing municipal and industrial waste, are routinely monitored for any polluted river, and therefore, form a small set of parameters that sufficiently convey information of the overall water quality for designated uses.More parameters can be used for estimation of WQI; however, more the number of parameters, more is the logistic concern for their monitoring and analysis.Since every parameter has its own importance with regard to an expression for water quality, there comes the need of ranking them in terms of their relative importance over the other parameter by assigning them a weighting factor which while formulating the index was obtained from expert opinion analysis given in detail in the study of Sharma and Reddy [24].e transformation equations that were used to develop WQI for 16-year data in this study are given in Table 2.
ese transformation equations basically represent parameter rating curves that meant to bring different units of different parameters on a single scale unit for the purpose of direct aggregation in (1).ese equations were developed on the basis of water quality standards and Indian effluent standards.In the equations given in Table 2, x represents the concentration range and y represents the scaled value of the parameter under consideration.

Description of ANFIS
ANFIS is a multilayer feed-forward network that uses neural network learning algorithms and fuzzy logic to map an input space to an output space [25].Jang [16] suggested adaptive neuro-fuzzy inference system (ANFIS) to construct an input-output mapping based on the initial given fuzzy system and available input-output data pairs by using learning procedures. is system can achieve a highly nonlinear mapping and is superior to common linear methods in producing nonlinear time series [26].In the process of mapping input space to an output space, two commonly employed fuzzy inference systems (FIS) are used in various applications.
ese are Mamdani inference system and Sugeno inference system, which are described in the literature [27,28].e consequences of the fuzzy rules for these two inference models are different, and thus their aggregation and defuzzification procedures also differ accordingly.
e Sugeno system is, however, considered more compact and computationally efficient than Mamdani's system [27].e consequence parameter in Sugeno FIS is either a linear equation, called first-order Sugeno FIS, or constant coefficient, called zero-order Sugeno FIS [26].
If it is assumed that the system includes two inputs, x 1 and x 2 , and the output y and the rule base contains two fuzzy if-then rules; then the representation of rules for the firstorder Sugeno FIS can be expressed as where p i , q i , and r i are the linear parameters in the consequent part of the Sugeno fuzzy inference system.e architecture of ANFIS consists of five layers.Each layer contains several nodes described by the node function.Adaptive nodes, denoted by squares, represent the parameter sets that are adjustable in these nodes, whereas fixed nodes, denoted by circles, represent the parameter sets that are fixed in the system.e output data from the nodes in the previous layers are the input in the present layer.e description of each layer in the ANFIS architecture is given below.
Layer 1 is the fuzzification layer in which each node represents membership grade of the crisp inputs and each nodes output θ 1 i is computed by where x 1 and x 2 are the crisp input to the node i, A i and B i are the linguistic labels characterized by the proper membership functions μ A and μ B , respectively.e Gaussian membership function is given by where a i , b i   are the parameter set of the membership function in the premise part of fuzzy if-then rules that modify the shapes of membership functions.e parameter in the input layer is called the premise parameter.
In layer 2, each node provides the strength of rules by means of multiplication operator given in (5).e output of this layer is firing strength θ 2 i as the products of the corresponding degree obtained from layer 1.
e membership values represented by μ A (x 1 ) and μ B (x 2 ) are multiplied in order to find the strength of the rule Layer 3 is the normalization node, which normalizes the strength of all rules according to the equation given below: Layer 4 is a layer of adaptive node, and every node in this layer computes the contribution of each ith rule towards the overall output and the function defined as where ω i is the output of layer 3 and p i , q i , b i   are the parameter set.Parameters in this layer are referred to as consequent parameters.
Layer 5 is the output layer in which the single node computes the overall output by summing all the rules from the previous layer.Accordingly, the defuzzification process transforms each rule's fuzzy results into a crisp output in this layer.e output, θ 5  i , is computed as in (7).
ANFIS applies the hybrid-learning algorithm, which consists of the combination of "gradient descent" and "leastsquares" methods to update the model parameters.Each epoch of this hybrid learning procedure is composed of a forward pass and a backward pass.In the forward pass of the hybrid learning procedure, the node output goes forward until layer 4 and the consequent parameters are identified by the least squares method.In the backward pass, the error signal propagates backwards and the premise parameters are updated by gradient descent.e detailed description of this algorithm is given in Jang and Sun [25].

Modeling with ANFIS
e proposed methodology for ANFIS application to water quality evaluation is shown in the form of a flow chart in Figure 2. In the current study, the input parameters of the   ANFIS under consideration are pH, conductivity, TDS, chlorides, nitrates, ammonia and fecal coliforms and the output is the WQI.e data with selected water quality parameters were rst converted into the number of principal components and then loaded as input into the ANFIS taking the Gaussian input parameter membership function.
e relationship between water quality index and input variables using ANFIS to generalize the relationship was of the form WQI f(pH, conductivity, TDS, chlorides, nitrates, ammonia-N, fecal coliform). ( For the purpose of modeling, the water quality data obtained from eight monitoring stations equaling a total of 204 observations were divided into three sets: training data, checking data, and testing data.e training data were used for the training of ANFIS, while the checking data were used for verifying the identi ed ANFIS. e testing data were used to evaluate the model performance.e rst data set, containing 70% of the records, was used as the training data; the second data set containing 15% of the records was used as the checking data, while the remaining 15% data were applied as the testing data.e target values represented the WQI computed using (1).
In this study, MATLAB Fuzzy Logic Toolbox ANFIS GUI was used as a modeling tool.Two separate clustering algorithms were used to automatically generate Gaussianshaped membership functions.ese algorithms are fuzzy c-means (FCM) and subtractive clustering (SC).Both these techniques generate fuzzy if-then rules but are di erent in terms of implementation.Fuzzy c-means (FCM) is a data clustering technique wherein each data point belongs to a cluster to some degree that is speci ed by a membership grade.Originally introduced by Bezdek [29], it provides a method that shows how to group data points that populate some multidimensional space into 12 di erent clusters.
On the other hand, if there is no clear idea about how many clusters there should be for a given set of data then subtractive clustering is useful.Subtractive clustering is a fast, one-pass algorithm for estimating the number of clusters and the cluster centers in a set of data [30].e cluster estimates obtained can be used to initialize iterative optimization-based clustering methods and model identication methods like ANFIS.In this study, 15 cluster centers were determined for the given 204 data sets.e number of fuzzy rule set would be equal to the number of cluster centers, each representing the characteristic of the cluster.
e details on various parameters, and their values taken for modeling with the two above clustering methods are given in Table 3.
For each algorithm de ned above, ANFIS works on the model and tunes it by means of a hybrid technique combining gradient descent back propagation and mean least squares optimization algorithms.At each epoch, an error measure, which is the sum of the squared di erence between actual and desired output, is reduced.When the values of the premise parameters were learned, the overall WQI was obtained as a linear combination of these parameters.3. It is evident from this gure that the measured and predicted WQI SC-ANFIS values are in close agreement as compared to the FCM-ANFIS method.
In order to further verify the result, the performances of the two ANFIS models were evaluated according to di erent statistical criteria.e correlation coe cient (R 2 ), root mean square error (RMSE), and MSE were obtained using (10).e evaluated values are given in Table 4. Advances in Civil Engineering 7 where X obs , X model , and X model are the observed value, modeled values, and average value, respectively.
From Table 4, it is found that the R 2 values for both training and testing data are 0.9919 and 0.98271, respectively.e RMSE and MSE values from Table 4 also indicate that the SC-ANFIS model predictions are very close to the experimental value as compared to the FCM-ANFIS model.
Figure 4 shows the overall tness of the SC-ANFIS method, showing that the predicted WQI values were plotted against the measured ones.
e WQI values lie around a straight line passing through the origin, which implies a very close agreement between the two.

Sensitivity Analysis
Sensitivity analysis is a process that helps us to nd out how model output values are a ected by changes in model input values.e purpose of performing sensitivity analysis was to determine those parameters that can change the output value (WQI), to an extent that water quality class shifts from its existing class.For this, each input water quality parameter was perturbed by ±2 times its standard deviation, and the changes in WQI were noted using the SC-ANFIS predictive model.e perturbation of ±2 times the standard deviation was taken so as to account the variability of parameter and the associated in uence on WQI.Finally, the resulting WQI was compared to the reference values.Figures 5 and 6 show the sensitivity e ect of −2d and +2d variation of each input on WQI, respectively, with respect to eight monitoring stations.
e WQI changes its existing class when the perturbed parameter value shifts the WQI trend in a particular group of observation data shown on x-axis.In Figure 5, stations 6, 7, and 8 are visibly sensitive to di erent parameters.Station 6 is sensitive to ammonia, station 7 to TDS and chloride, and station 8 is sensitive to ammonia and fecal coliforms.e change in WQI is signi cantly high such that the existing water quality class shifts from good to poor; good to poor and good to very poor at stations 6, 7, and 8, respectively.
Similarly by studying the pattern of change in Figure 6, it is observed that fecal coliform is the most sensitive parameters and any perturbation in this parameter is switching the WQI of all stations.At station 8, the WQI changes from good to poor because of +2d change in fecal coliform as compared to other stations where maximum change in water quality class is from good to fair.Chlorides and TDS are most sensitive parameters at station 7, and ammonia is the most sensitive parameter at station 3 and station 5. 8 Advances in Civil Engineering e water quality parameters such as ammonia, chlorides, TDS, and fecal coliforms are the indicator parameters that are strongly linked to municipal or domestic waste.e sensitivity of Satluj River to these four parameters across its monitoring stations indicates that the river is under the influence of strong municipal waste, and fluctuations in these parameters should be given due considerations when communicating the water quality class.

Conclusions
In this study, two different clustering algorithms were used to develop the ANFIS model for water quality index prediction of River Satluj in northern India.e formulation for WQI of River Satluj is established in literature, and the index was computed by considering seven water quality parameters, crucial for municipal use.e data set for 16 years across eight monitoring stations on the river was used.
e two ANFIS models that were based on subtractive clustering and fuzzy c-means methods were trained, validated, and tested for modeling WQI.Based on the statistical evaluations, it was found that SC-ANFIS model predictions at training and testing stages were very close to the experimental value when compared to the FCM-ANFIS model.
e SC-ANFIS model, because of its good predictive capability over the FCM-ANFIS model, was further used to perform sensitivity analysis.
e effect of perturbation in each water quality parameter was modeled and analyzed.
e analyses showed that ammonia, chlorides, and fecal coliform were the most sensitive parameters that were capable of switching the existing water quality class to poorer class and hence warrant more attention in terms of their analysis for WQI computation.
e study reveals that ANFIS modeling with SC-ANFIS can be a useful approach to characterization of water quality in the form of water quality index.Since the approach obviates the otherwise lengthy computations of WQI, the present study holds its importance in developing a model and employing it for faster dissemination of information as well identifying the critical water quality parameters affecting WQI. e future scope of the work lies in the usage of a combination of hybrid SC-FCM and ANN (neurodynamic fuzzy expert system) to evaluate water quality.

2
Advances in Civil Engineeringwere performed taking into consideration three di erent water uses: ecological, irrigation, and municipal and domestic use.e steps involved included (a) selection of parameters, (b) assignment of weights to the selected parameter, (c) transformation of the monitored parameter values into common environmental scale units through use of parameter rating curves/equations (PRC/E) as given in Table2, and (d) aggregation of the parameter values into the nal score.On the basis of the nal score obtained, the water quality is classi ed into six classes with highest quality having maximum score as very poor (<45), poor (45-60), fair (61-69), good (70-79), very good (80-90), and excellent (91-100).

Figure 2 :
Figure 2: Flow chart depicting the proposed methodology.

Figure 5 :Figure 4 :
Figure 5: Sensitivity e ect of −2d variation of each input on WQI.

Figure 6 :
Figure 6: Sensitivity e ect of +2d variation of each input on WQI.

Table 1 :
Average values of water quality parameters along River Satluj.
where the variable x 1 has linguistic value A and x 2 has linguistic value B.

Table 3 :
Parameter values for different clustering-based ANFIS algorithms used.

Table 4 :
Water quality performance based on FCM and SC methods.