Statistical Assessment of Water Quality Parameters for Pollution Source Identification in Sukhnag Stream : An Inflow Stream of Lake Wular ( Ramsar Site ) , Kashmir Himalaya

The precursors of deterioration of immaculate Kashmir Himalaya water bodies are apparent. This study statistically analyzes the deteriorating water quality of the Sukhnag stream, one of the major inflow stream of Lake Wular. Statistical techniques, such as principal component analysis (PCA), regression analysis, and cluster analysis, were applied to 26 water quality parameters. PCA identified a reduced number of mean 2 varifactors, indicating that 96% of temporal and spatial changes affect the water quality in this stream. First factor from factor analysis explained 66% of the total variance between velocity, total-P, NO 3 –N, Ca, Na, TS, TSS, and TDS. Bray-Curtis cluster analysis showed a similarity of 96% between sites IV and V and 94% between sites II and III. The dendrogram of seasonal similarity showed a maximum similarity of 97% between spring and autumn and 82% between winter and summer clusters. For nitrate, nitrite, and chloride, the trend in accumulation factor (AF) showed that the downstream concentrations were about 2.0, 2.0, and 2.9, times respectively, greater than upstream concentrations.


Introduction
River water quality is of great environmental concern since it is one of the major available fresh water resources for human consumption [1,2].Throughout the history of human civilization, rivers have always been heavily exposed to pollution, due to their easy accessibility to disposal of wastes.However, after the industrial revolution the carrying capacity of the rivers to process wastes reduced tremendously [3,4].Anthropogenic activities such as urban, industrial, and agricultural as well as natural processes, such as precipitation inputs, erosion, and weathering of crustal materials affect river water quality and determine its use for various purposes [1][2][3][4][5].The usage also depends upon the linkages (channels) in the river system, as inland waterways play a major role in the assimilation and transportation of contaminants from a number of sources [6][7][8].Besides linkages, the seasonal variation in precipitation, surface runoff, interflow, groundwater flow, and pumped in and out flows also have a strong effect on the concentration of pollutants in rivers [9][10][11][12].In view of the limited stock of freshwater worldwide and the role that anthropogenic activities play in the deterioration of water quality, the protection of these water resources has been given topmost priority in the 21st century [13][14][15].Research-wise, one of the important stages in the protection and conservation of these resources is the spatiotemporal analysis of water and sediment quality of the aquatic systems [16].The nonlinear nature of environmental data makes spatio-temporal variations of water quality often difficult to interpret and for this reason statistical approaches are used for providing representative and reliable analysis of the water quality [17].Multivariate statistical techniques such as cluster analysis (CA) and factor analysis (FA) have been widely used as unbiased methods in analysis of water quality data for drawing out meaningful conclusions [18,19].Also it has been widely used to characterize and evaluate water quality for analyzing spatio-temporal variations caused by natural and anthropogenic processes [20][21][22].In this paper we present 2 Journal of Ecosystems a methodology for examining the impact of all the sources of pollution in Sukhnag stream (Kashmir Himalayas) and to identify the parameters responsible for spatiotemporal variability in water quality using CA and FA.

Materials
2.1.1.Study Area.The present study was carried out on Sukhnag stream in Kashmir Himalaya.It is among the five major inflows of the Lake Wular.This lake is the largest fresh water lake of Indian subcontinent and has been designated as a Ramsar site in 1990 under the Ramsar convention of 1975.The Sukhnag, a torrential stream, flows through Budgam district, in the state of Jammu and Kashmir (Figure 1).It flows from the mountain reaches of the Pir Panjal mountain range located in the southwest of Beerwah town.The Sukhnag stream drains the famous Toshmaidan region in the higher locales of Pir Panjal range.It has a glacial origin and covers a distance of about 51 kms from head to mouth.Descending from the mountains, the Sukhnag passes through a sand choked bed across the Karewas, finally merging with the outlet of Hokersar wetland (Ramsar site).The Sukhnag drainage system spreads over an area of about 395.03 km 2 and about 1551 streams cascade the waters for the whole watershed into this stream.During flash rains the water in this stream flows with the tremendous velocity in the upper reaches causing soil wastage of the left and right embankments of the stream and greatly damaging standing crops, plantation, houses, and road communication.The stream passes through a large area of high socioeconomic importance to North Kashmir.These areas include Rangzabal, Zagu, Bras, Arizal, Chill, Zanigam, Sail, Kangund, Goaripora, Siedpora, Beerwah, Aarwah, Aripanthan, Rathson, Makhama, Nawpora, Check-kawosa, Botacheck, and Narbal.The stream serves as a life line of this vast area as it serves as a source of water for both domestic and agricultural purposes.The current study is therefore a step forward in addressing the deteriorating conditions of the stream so as to recommend concrete measures for its sustainable management.

Sampling and Analysis.
Samples were taken flow proportionately (i.e., more frequently during peak flow periods than during low flow periods) to capture nutrient pulses during runoff events from February 2011 to January 2012.The surface water samples were collected in midchannel points between 10.00 and 12.00 hours from each of the sampling sites and placed in prerinsed polyethylene and acid-washed bottles for the laboratory investigations.The parameters such as depth, transparency, temperature, pH, and conductivity were determined on the spot while the rest of the parameters were determined in the laboratory.These include orthophosphorus, total phosphorus, ammoniacal nitrogen, nitrite-nitrogen, nitrate-nitrogen, organic nitrogen (Kjeldahl nitrogen minus ammonical nitrogen), alkalinity, free CO 2 , conductivity, chloride, total hardness, calcium hardness, magnesium hardness, sodium, and potassium.They were determined in the laboratory within 24 hours of sampling by adopting standard methods of Golterman and Clymo (1969) and APHA (1998) [23][24][25].

Statistical Analysis.
Data for physicochemical parameters of water samples were presented as mean values and analyzed using descriptive analysis.We used coefficient of correlation (CV) and -test, for describing the temporal variations of the observed water quality parameters.Prior to investigating the seasonal effect on water quality parameters, we divided the whole observation period into four fixed seasons: spring (March, April, and May), summer (June, July, and August), autumn (September, October, and November), and winter (December, January, and February).Regression analysis (RA) was carried out in order to know the nature and magnitude of the relationship among various physicochemical parameters.First, we determined the best-fit model (the largest  2 ) for exploring whether there was any significant relationship among water quality parameters or not.
Accumulation factor (AF), the ratio of the average level of a given parameter downstream (following source discharge) to the corresponding average level upstream (prior to the source discharge) [26], was used to estimate the degree of contamination due to anthropogenic inputs.
The degree of river recovery capacity (RRC) for this stream was calculated using the mathematical equation by Ernestova and Semenova [27] and modified by Fakayode [26]; that is, where  0 is the level of a parameter downstream (i.e., immediately after the discharge point) and  1 is the corresponding average level upstream where the water is relatively unpolluted.

Multivariate Statistical Methods.
With the objective of evaluating significant differences among the sites for all water quality variables, data was analyzed using one-way analysis of variance (ANOVA) at 0.05% level of significance [28].Stream water quality was subjected to two multivariate techniques: cluster analysis (CA) and principal component analysis (PCA) [29].CA and PCA explore groups and sets of variables with similar properties, thus potentially allowing us to simplify our description of observations by allowing us to find the structure or patterns in the presence of chaotic or confusing data [30].All statistical analyses were performed using the SPSS (v.16) and PAST (v.1.93) software applications.
Cluster Analysis (CA).Cluster analysis is a multivariate statistical technique, which allows the assembling of objects based on their similarity.CA classifies objects, so that each object is similar to the others in the cluster with respect to a predetermined selection criterion.Bray-Curtis cluster analysis is the most common approach of CA, which provides intuitive similarity relationships between any one sample and the entire dataset and is typically illustrated by a dendrogram (tree diagram).The dendrogram provides a visual summary  of the clustering processes, presenting a picture of the groups and their proximity with a dramatic reduction in dimensionality of the original data [31].
Factor Analysis/Principal Component Analysis (PCA).Factor analysis is applied to reduce the dimensionality of a data set consisting of large number of interrelated variables, and this reduction is achieved by transforming the data set into a new set of variables-the principal components (PCs), which are orthogonal (noncorrelated) and are arranged in decreasing order of importance.In this study we used principal component analysis (PCA) of factor analysis.The PCA is a data reduction technique and suggests how many varieties are important to explain the observed variance in the data.Mathematically, PCs are computed from covariance or other cross-product matrixes, which describe the dispersion of the multiple measured parameters to obtain eigenvalues and eigenvectors.Moreover, these are the linear combinations of the original variables and the eigenvectors [32].PCA can be used to reduce the number of variables and explain the same amount of variance with fewer variables (principal components) [33].Also PCA attempts to explain the correlation between the observations in terms of the underlying factors, which are not directly observable [34].
Prior to modeling, all the nutrient concentrations were log-transformed to make the distribution closer to the normal.Statistical conclusions and tests were made on the basis of a multiparametric model.We have used CV, test, ANOVA, RA, CA, and PCA to evaluate the impact of anthropogenic activities and spatio-temporal variations on physicochemical characteristics of Sukhnag stream.

Results and Discussion
The mean values of physicochemical parameters at different sampling sites in Sukhnag stream during the period of 12 months (February 2011-January 2012) are presented in Table 1.Water temperature, pH, and DO demonstrated a seasonal cycle during the period of study.High temperature values were recorded (24.33 ± 2.52 ∘ C) in summer season at site V and low values (2.33 ± 1.23 ∘ C) were recorded in winter at site I. pH ranged from 7.26 (±0.07) to 8.07 (±0.21) with the highest values in winter and the lowest in summer at most of the study sites.The pH values at the tail site (site V) of the stream showed a decreased trend from wet to dry season while upstream values were higher during the dry season.DO values were generally higher at upstream sites and the lowest at the downstream sites.There was however a progressive ) in winter at site I.The overall values of CV showed significant difference of concentrations from head to tail.On the basis of molar concentrations, among the cations, Ca 2+ and Na + were dominant and Mg +2 and K + were found in minor concentrations.Chloride was the dominant anion observed.Overall we observed significant degree of spatial and temporal variations in the concentration of water quality parameters using ANOVA ( < 0.05) and test ( < 0.01) analysis.Domestic wastewaters, particularly those containing detergents and fertilizer runoff, contribute to the higher levels of phosphates in the water column.Phosphate concentrations indicate the presence of anthropogenic pollutants [35].The nitrate-N and organic nitrogen concentrations had spatial distributions that increased from the upstream to downstream, mainly due to the contributions of agricultural runoff and sewage discharge [36].
The accumulation factor (AF) and river recovery capacity (RRC) of the physicochemical parameters during the sampling period are presented in Table 2. Accumulation factor of the parameters revealed that the nitrate and nitrite of downstream water were about 2.2 and 2.9 times, respectively, more than what was observed upstream.Other parameters showed an average accumulation factor of 1.6.Chloride, NO 3 -N, NO 2 -N, and free CO 2 showed the highest percentage recoveries of about 66%, 50%, 51%, and 50%, respectively, while conductivity showed the lowest recovery of about 28% in water downstream.Recovery values for  water quality parameters indicated that there was little or no change in values at the tail site compared to headstream site.The accumulation factor of the physicochemical parameters clearly indicated higher values downstream compared to the reference point upstream and is a clear indication of anthropogenic impact.The low recovery values observed for most of the parameters suggest that these substances are being released into the river in quantities that exceed the removal carrying capacity of the Sukhnag stream [37].The higher levels of these nutrients clearly surpass the river recovery capacity.

Regression Analysis (RA).
To explain the nature and magnitude of relationships among various physicochemical parameters, we plotted concentrations of all dependent variables against independent variables.The observed relationships between dependent variable and independent variable concentrations [log()] were different and not significant for all parameters.Concentrations of most variables increased with increasing independent variable (Table 3).The results of the statistical analysis with the general linear regression model (Figure 2) showed strong significant positive relationships ( < 0.0001) of water temperature with organic-N, free CO 2 , and NO 2 -N; depth with velocity, Total-P, NO 3 -N, TS, TSS, and TDS; conductivity with total hardness; velocity with total-P, NO 3 -N, TS, TSS, and TDS; total-P with NO 3 -N, Ca 2+ and TS; alkalinity with NO 3 -N, organic-N, and NO 2 -N; TDS with total-P and NO 3 -N, while strong significant negative ( < 0.0001) relationship was shown by water temperature with DO; DO with free CO 2 and organic-N.Concentrations of NH 4 -N and alkalinity respond moderately to water temperature and DO, respectively.Concentration of K + , Mg 2+ , and Cl − showed reasonable relationship with stream flow.Other variables have positive relationships as shown in Table 3.
Depth increases with higher runoff, which in turn brings higher load of nitrate from this agriculture dominated watershed in spring and summer seasons.Nitrate is more associated with the use of organic and inorganic fertilizers [38,39].Significant variation of total-P with depth and velocity could be due to the agricultural activity since farmers use phosphate as a fertilizer.The relation of suspended solids with depth and velocity indicated agricultural runoff.The linear curve fitting model of both minerals and nutrients reflected that their origin in river runoff is from the agricultural field along with waste disposal activity.

Cluster Analysis (CA).
The cluster analysis is useful in solving classification problems, whose objective is to place factors or variables into groups such that the degree of association is strong between members of the same cluster and weak between members of different clusters [40].In this study, CA showed strong spatial and temporal association on the basis of variations of principal pollution factors and indicated that the effects of human activities on water quality vary spatially as well as temporally.The dendrogram indicates pollution status as well as the effect of contamination at the sampling sites.It provides a visual summary of the clustering processes, presenting a picture of the groups and their proximity.Cluster analysis (CA) was used to detect similarity between the five sampling sites and four seasons.CA generated a dendrogram, grouping the sampling sites and the months on the basis of percentage of similarity and dissimilarity of water quality parameters.The dendrogram of percentage similarity of five study sites, on the basis of physicochemical factors, is presented in Figure 3.The analysis of similarity of study sites from 82% to 100% was carried out to indicate relationship intensity between sites as cluster.The Bray-Curtis similarity analysis confirmed that there is a similarity of 96% between sites IV and V and 94% between sites II and III.Contrary to these sites, site I showed maximum dissimilarity with other sites during the entire study period as it is located on the head portion of the stream.Hence, the impact of human activities on the stream ecosystem at site I is relatively low.The dendrogram of percentage seasonal similarity (Figure 4) shows that there is a maximum similarity of 97% between spring and autumn and 82% between winter and summer clusters.Summer and winter clusters showed only 82% similarity with spring and autumn clusters.The generated dendrogram grouped the sampling sites and seasons into three groups.Using this analysis we could categorize study sites into three groups: low pollution (site I), moderate pollution (sites II and III) and high pollution (sites IV and V).Seasonal grouping showed higher inorganic and organic loads during spring and autumn seasons.

Principal Component Analysis (PCA).
Principal component analysis was carried out to extract the most important factors and physicochemical parameters affecting the water quality.Due to the complex relationships, it was difficult to draw clear conclusions.However, not only could principal component analysis extract the information to some extent and explain the structure of the data in detail, on temporal characteristics by clustering the samples, but it could also describe their different characteristics and help to elucidate the relationship between different variables by the variable lines.SPSS 16.0 and PAST software were used to carry out principal component analysis to determine the main principal components from the original variables [41][42][43].Based on the eigenvalues screen plot (Figure 5), the 26 physicochemical parameters were reduced to 2 main factors (factors 1 and 2) from the leveling off point(s) in the screen plot [44].The first factor corresponding to the largest eigenvalue (17.16) accounts for approximately 66.00% of the total variance.The second factor corresponding to the second eigenvalue (7.96) accounts for approximately 30.63% of the total variance.The remaining 24 factors have eigenvalues of less than unity.Any factor with an eigenvalue greater than 1 is considered significant [44,45].A correlation matrix of these variables was computed, and factor loading was defined to explore the nature of variation and principal patterns among them [46].Further analysis of factor loadings showed that velocity, total-P, NO 3 -N, NO 2 -N, organic-N, Ca 2+ , Na + , TS, TSS, TDS, and free CO 2 were the 11 major factors affecting the water quality of Sukhnag stream (Table 4).For factor 1, velocity, total-P, NO 3 -N, Ca 2+ , Na + , TS, TSS, and TDS have the highest factor loading value (>0.96) and showed that these are the most influential variables for the first factor or principal Velocity, r 2 = 0.9 Total-P, r 2 = 0.9 Ortho-P, r 2 = 0.6 NO 3 -N, r 2 = 0.9 Total-P, r 2 = 0.9 Ortho-P, r 2 = 0.7 NO 3 -N, r 2 = 0.9 Na + , r 2 = 0.7 TS, r 2 = 0.9 TDS, r 2 = 0.9 Cl − , r 2 = 0.5 Mg 2+ , r 2 = 0.5 TSS, r 2 = 0.9   component.It also reflects that overloadings of total-P, NO 3 -N, Ca 2+ , Na + , TS, TSS, and TDS are responsible for the heaviest pollution problem in the stream.For factor 2, NO 2 -N, free-CO 2 , and organic-N have the highest factor loading value (>0.96), suggesting that organic-N pollution is also a major environmental pollutant in rivers.Factor loadings can be interpreted as the correlation between the factors and the variables, that is, physicochemical parameters.
To determine which sampling variables were closely related, a plot of factor coordinates for all significant observations (cases) was constructed using the factors obtained from factor loading analysis (Figure 6). Figure 7 shows the cluster of seasons which gets affected by all 26 physicochemical parameters.The variable lines were obtained from the factor loadings of the original variables.They stand for the contribution of the variables to the samples.The closer the two variable lines, the stronger the mutual correlation [47].The sampling sites (cases) that are clustered near each other have similar characteristics with respect to the factors.The positive loading of temperature is associated with seasonal variation [48].The inverse relationship between temperature and dissolved oxygen is a natural process in water [49], because warm water easily becomes saturated with oxygen and thus holds less DO [50].At higher temperature, the solubility of calcium decreases [51].Positive loading of NO 3 -N, total-N, alkalinity, free CO 2 , Ca 2+ , and Na + indicates organic matter from domestic wastewaters [52].Positive loading of Ca 2+ and Na + is attributed to agricultural runoff [53], while as asserted seasonal variations of Ca, Mg, and K are linked with parent rock materials in the catchment area [54].Positive loading of NO 3 -N and total-P in spring and summer season has been also associated with agricultural runoff [55].During these seasons, farmers use reckless fertilizers and pesticides, which represent point and nonpoint source pollution from orchard and agriculture areas.Negative loading of pH and DO (−0.9)   ), E = electrical conductivity, F = velocity (mg L −1 ), G = free CO 2 (mg L −1 ), H = alkalinity (mg L −1 ), I = orthophosphorus (mg L −1 ), J = total phosphorus (mg L −1 ), K = NO 3 -N (mg L −1 ), L = NO 2 -N (mg L −1 ), M = NH 4 -N (mg L −1 ), N = organic nitrogen (mg L −1 ), O = total nitrogen (mg L −1 ), P = Cl − (mg L −1 ), Q = total hardness (mg L −1 ), R = Ca hardness (mg L −1 ), S = Mg hardness (mg L −1 ), T = Ca 2+ (mg L −1 ), U = Mg 2+ (mg L −1 ), V = Na + (mg L −1 ), W = K + (mg L −1 ), X = total solids (mg L −1 ), Y = total suspended solids (mg L −1 ), and Z = total dissolved solids (mg L −1 ). in factor 1 with positive scores of TN, TP, and inorganic nutrients represented anthropogenic pollution sources as high levels of dissolved organic matter consume large amounts of oxygen for decomposition leading to formation of organic acids, CO 2 , and ammonia.Hydrolysis of these acids, dissolution of CO 2 in water column, and/or oxidation of NH 4 ions under oxic conditions by the nitrification processes created a decrease in water pH [56][57][58].From the physicochemical data matrix, it was found that the TSS load was highest at h n a g st re a m Sukhnag stream draining into the Hokersar wetland outlet, which finally drains into the Wular Lake

Figure 1 :
Figure 1: Location of the study area with respect to Jhelum basin watersheds and surface water quality monitoring stations in the Sukhnag watershed.

6 − 10 Figure 5 :
Figure 5: Score plot of eigenvalues versus components along with % variance components in Sukhnag stream.

Figure 6 :
Figure 6: Factor score for factors 1 and 2 in the Sukhnag stream.

Table 1 :
Physicochemical characteristics of water of Sukhnag stream (February 2011 to January 2012).
increase in DO (6.46 ± 0.31 to 13.4 ± 0.45) at all the sampling sites during the transition to rainy, winter season.Variation in EC was significant (CV = 5.2-18.5%,<0.01) among seasons and at all sampling sites ( < 0.05).Higher values of EC were recorded in spring (396.6 ± 20.8 S/cm) at the tail site (site V) and lower in winter (208.0 ± 8.54 S/cm) at the headstream site (site I).The higher EC is attributed to the high degree of anthropogenic activities such as waste disposal and agricultural runoff.The seasonal variations in depth during one year of study showed that it was highest in the spring season.By autumn the depth starts to decrease and is lowest in the winter.The water depth at all the sites varied both spatially ( < 0.05) and temporally (CV = 4.6-60.3%,<0.01).Maximum surface water velocities for the five sites were recorded in spring season (peak flow season) and minimum were recorded in winter season (snowy season). in winter at site I.Total hardness, calcium hardness, and magnesium hardness were observed highest in spring and lowest in autumn and winter at all the sites.Lower levels of total hardness, calcium hardness, and magnesium hardness were observed at upstream sites compared to the downstream sites and among the seasons; the lowest values were recorded in the winter at all the sites.The highest values of Ca 2+ , Mg 2+ , Na + , K + , and TDS(46.18±4.41, 9.82 ± 0.93, 15.39 ± 1.47, 3.56 ± 0.26, and 353.3 ± 50.5 mg/L, resp.) were recorded at the tail site (site V) in spring season and lowest (16.83 ± 5.28, 3.91 ± 1.22, 6.23 ± 1.95, 1.55 ± 0.48, and 116.6 ± 27.3 mg/L, resp.) at the headstream site (site I) in the winter season.TDS and TSS values were recorded highest (487.0 ± 93.3 and 145.66 ± 43.13 mg/L, resp.) in spring at site IV and lowest (139.3 ± 40.7 and 22.66 ± 13.42 mg/L, resp.

Table 2 :
Accumulation factor and river recovery capacity for physicochemical parameters of Sukhnag stream.

Table 3 :
The results of the statistical analysis with the general linear regression model to delineate the nature and magnitude of relationship among physicochemical parameters of Sukhnag stream.The regression parameters,  and , were estimated from  =  + log  (). 2 refers to the regression analysis.

Table 4 :
Loadings of 26 experimental variables on principal components for Sukhnag stream.
[59]ward sampling sites due to possible contribution from nonpoint sources, most probably from land drainage[59].