Correlation, Regression Analysis, and Spatial Distribution Mapping of WQI for an Urban Lake in Noyyal River Basin in the Textile Capital of India

Department of Civil Engineering, Erode Sengunthar Engineering College, Perundurai, Erode, Tamil Nadu 638057, India Department of Civil Engineering, K. S. Rangasamy College of Technology, Tiruchengode, Tamil Nadu 637215, India Department of Civil Engineering, Knowledge Institute of Technology, Salem, Tamil Nadu 637504, India Department of Civil Engineering, M. Kumarasamy College of Engineering, Karur, Tamil Nadu 639113, India Department of Business Administration, LRG Government Arts College for Women, Tiruppur, Tamil Nadu 641604, India Department of Civil Engineering, Bannari Amman Institute of Technology, Erode, Tamil Nadu 638401, India Department of Geotech Design, Pavai Infra Geotech, Coimbatore, Tamil Nadu 641004, India Department of Mathematics and Physics, Rumbek University of Science and Technology, Rumbek, South Sudan


Introduction
Water is one of the greatest sources of life on Earth. e sources of freshwater are so limited with the present population increase and the human inventions that cause a lot of impacts on the deterioration of water. It is seen from many studies that water exploitation and water theft have also become a major threat to water scarcity [1]. Many methods and many components have been invented for the assessment of water quality and water pollution. e development of a new method of assessing pollutant load is viewed as a broad study in today's culture. Various methods and mathematical tools are employed and checked for their best t for assessing the water pollutants. e quality of water for urban, rural, and industrial purposes can be determined best by calculating the WQI. e groundwater samples were from the southern part of wells in the Varanashi district of Uttar Pradesh, India. A total of 16 groundwater wells were chosen to assess 22 water parameters for the potability study. e samples were collected during the premonsoon season in the month of May 2015, respectively. According to the current study, 20% of the drinking water in this region is unsuitable, while the remaining 80% is classified as good, moderate, bad, or extremely poor according to the WQI. e findings of this study will aid in the effective planning and management of available water resources [2]. DWQI development model, parameter classification, and subindex construction have been done with the help of the regression equation and aggregation function with the Min-Max operator. For the quality assessment, twenty-two water quality variables were used from 24 groundwater well samples [3]. e WQI of Obulavaripalli Mandal in the YSR district shows that 40% of samples are good for drinking; 30% of samples are very poor for drinking. A total of ten water parameters analysed from 20 groundwater samples resulted in the overall report showing the groundwater quality is unfit for human consumption [4].
Water quality must be protected in order to provide safe drinking water to the public. WQI of the Oros Reservoir in the Northeast of Brazil was studied using the Principal Component Analysis (PCA) for identifying human consumption. e entry points P1, P3, P4, and P5 of the reservoir show the lowest WQI value, while P6 and P7 at the exit point of the reservoir show the highest WQI [5]. From September 2014 to January 2016, 96 locations along major rivers were sampled four times during four seasons. Our findings are useful for water quality management and might be implemented in the Lake Taihu Basin for immediate and low-cost assessment of water quality [6]. e WQI of groundwater samples collected from Tumkur taluk has a WQI ranging from 89.21 to 660.56. According to the findings, the groundwater in the region requires some type of treatment before consumption, as well as protection from pollution. us, the study demonstrates that the use of WQI is a fundamental tool for assessing the potability of water. [7]. A WQI was developed from nine physicochemical properties that were periodically monitored at eighteen sampling locations (January-November 2000) to assess the geographical and surface water quality in the watershed altered with time.
To decrease the expenses involved with its adoption, modifications to the original WQI were made using Principal Component Analysis (PCA) [8]. To provide safe drinking water on a long-term basis, it is essential to monitor and safeguard its quality. Before evaluating the suitability of groundwater for various purposes, it is necessary to understand the chemical composition of ground water. Ground water contains seven major chemical elements in dissolved state. ey are Ca 2+ , Mg 2+ , Cl − , HCO − 3 , Na + , K + , and SO 2− 4 [9]. Mathematical tools are most helpful for assessing the quality of water. Assessment of pollutants and the level of pollution analysis require a large number of dataset assessments with a lot of in situ tests and laboratory tests. A minimum of 1000 results must be analysed in order to improve water conservation. Environmentalists use these datasets to develop the indexes for analysing the water quality.
is type of indexing uses mathematical tools that compress the dataset and provide a clear view of the water quality values. Twenty variables are taken into consideration by two WQI (subjective and objective) (WQI sub and WQI obj ). A case study of the River Suqua in Cordoba City, Argentina, has been analysed for the spatial and seasonal variations in water quality using WQI. e impact of the city's rapid urbanisation on water quality is harmful, particularly in the areas where sewage is discharged. e quality of the water seems to be poor during the dry season [10]. For the past few years, the quality of water in lakes has been measured using the WQI method. Fifteen water parameter analyses of fifty groundwater samples and thirty-five surface water samples from the Brahmaputra plain in the Jia-Bharali River basin indicates that the multivariate statistical analysis is a great tool for assessing water quality. A complex and highly variable dataset can be interpreted with the use of PCA analysis and other multivariate tools. Identification of pollution load can be easily assessed using Varimax Factors from PCA analysis [11].
For decades, many developments have been developed and modified methods have been created to check the suitability of water for drinking. e WQI is a significant rating to describe the water quality characteristics in a specific line and assists in the selection of appropriate treatment options to solve the problems. WQI represents the complete impact of a wide range of criteria for the water quality analysis and distributes water quality data to the public and authorities [12]. A "modified DWQI" in Iran's metropolitan regions was studied using Canadian DWQI to assess the fresh drinking water quality index. According to the revised DWQI value, approximately 95 percent of groundwater flow rates were in good condition, while water quality was found to be fair in 3% of the samples and marginal in the remaining 2% [13].
A significant analysis of water quality assessment can be easily done with the implication of mathematical tools and software tools in it. Water quality evaluation has grown into a large topic of study in water analysis. Model analysis has become more significant in the assessment of water quality.
To analyse water quality, several computer techniques and mathematical models have been created. is study focuses on creating an estimation model for four samples collected from Periyakulam Lake, Ukkadam, located in Manchester City of South India. e estimation model results show that the sunflower optimization algorithm (SFO) is an efficient computing tool for estimation models [14].
e WQI for Loktak Lake shows that increased human activity causes stress on the wetland. Eleven parameters were utilised to generate the WQI for the analysis of water prospective for five sample stations. Based on its importance, each characteristic was given a relative weight ranging from 1.46 to 4.09 [15]. e high WQI indicates that the contamination in river water quality was determined by 21 water quality parameters in the Mettur river basin, Salem, Tamil Nadu. After identifying the likely sources of contamination into the river basin using the factor analysis, preventative measures and monitoring programmes can be implemented to prevent the future contamination levels. Multivariate statistics are effective in converting big value datasets into simple interpretations by displaying geographical variations, whereas cluster analysis is used to split the areas into high, medium, and low contaminated areas [16]. Pearson's correlation analysis using SPSS can be used for the analysis of surface water and contour analysis were used for groundwater quality in rivers from the regions of Sukinda, Odisha. Due to the mining activity near the area, 98% of the chromium is supplied to the entire country.
is region was highly contaminated with chromium ore in both the surface and groundwater water quality and the people residing in this area were a ected with diseases, like respiratory tract problems, skin, and immune system drops [17].
Identifying the overall water quality and any possible risks due to the development of socioeconomical activities around the Parbati River, Himachal Pradesh, was done using various analysis techniques like PCA, along with multivariate, cluster analysis, and some graphical visualisation techniques. Statistical studies were one of the tools employed to identify any traces of pollutants in the river basin and also to understand the factors inducing the chemical characteristics in water and other sources of contamination as well [18]. A logistic regression model was incorporated for this research work to assess the quality of water in public pools in Medellin, Colombia, by reviewing their microbial and physicochemical characteristics. To be more speci c, for correlation, Pearson's coe cient was used to identify their linear relationship, factor analysis was used to evaluate the water quality parameters, and a regression model was used to bind both microbial and water quality parameters. In pools, hypochlorite is added to disinfect the pool and it converts and tends to increase the EC of the pool and some microorganisms are a ected by particular parameters of the water. Regular monitoring should be done to maintain the adequate condition of the pool [19].
In the Yamuna River basin, Fuzzy-based WQI was derived to determine the natural conditions of water and showed that the sensitivity of parameters was used in dening the surface water quality. Data taken from 22 sampling stations takes eight characteristics of water samples and ranges them between very good, good, fair, poor, and very poor. Five classes of water quality were assigned under these classes of classi cation as inputs to fuzzy models. As it is limited to freshwater studies in this research, the extended version of this study can be applied to groundwater studies [20]. For the water quality metrics in Lake Prashar, Himachal Pradesh, Pearson's correlation, principal component analysis, and cluster analysis were used. Because there was a signi cant di erence across seasons and months, WQI, PCA, and cluster analysis were given on a season-byseason basis. PCA identi ed water temperature, TDS, conductivity, turbidity, phosphate, BOD, hardness, calcium, sodium, and potassium as key characteristics that lowered water quality during the summer and monsoon seasons. e ndings found that the main variables a ecting water quality during the monsoon were the garbage generated by tourists that entered the system and the surrounding erosion as a result of overgrazing in the Prashar Lake area [21]. A relatively minimum number of wells of the study area witness extraordinary values of conductivity and chloride due to the usage of fertiliser for agricultural use [22]. e main objective of the study is to evaluate the quality of Mooli Kulam lake water in Tirupur District to check its suitability for drinking purpose and irrigation use.

Study Area.
Mooli Kulam is an urban lake situated within the city limits in Tirupur city with an area of about 0.223 km 2 . e source of water to the lake is from rainfall and the Noyyal river. e water to the lake is fed by a 2 km long canal from the Noyyal river during oods and is one of the 32 irrigation tanks of the Noyyal river basin. e lake is situated at 11°07′17.6″ N and 77°22′59.9″ E. e depth of the lake varies from 3 m to 6 m. e lake's water looks dark green in color and has a well-grown ecosystem with the presence of aquatic plants and shes. e south bank of the river contains many colored dyes and e uents. e climate in Tiruppur is experienced by hot semiarid with the mean maximum of 35°C (95.0°F) and minimum temperatures of 22°C (71.6°F). e average annual rainfall is around 700 mm (28 in) with 47% during northeast and 28% during southwest monsoons. is lake holds water to its maximum level during ood in the Noyal River. e shape of the lake is unde ned, and the bed of the lake consists of colluvial sediments providing water throughout the year to the nearby areas for irrigation. e location of the study area is shown in Figure 1.

Methodology
Water samples were collected from Mooli Kulam Lake in October 2021, and parameters such as pH, total hardness, TDS, chlorides, sulphates, calcium, magnesium, sodium,  Advances in Materials Science and Engineering fluoride, bicarbonate, and electrical conductivity were analysed in accordance with the APHA, 2004. e results are formulated to form a water quality index. 3 methods of WQI have been adopted, and the results obtained for WQI are plotted as a spatial distribution map for data integrity. A mathematical tool such as correlation analysis has been applied to the dataset to obtain similarities and interrelationships between the variables of water quality. A regression model has been created with highly correlated parameters to assess the best-fit regression models.

Weighted Arithmetic Method.
e universally adopted method of classifying low water quality can be carried out using the weighted arithmetic index method.
is was formulated by Brown in 1972 and many methods are followed up with this method in slightly modified forms. e following equation calculates the water quality index using this method: where Qi is rating scale of quality and W i is unit weight.

Canadian Council of Ministers of the Environment.
e compatible method to enumerate the water quality for the public was formulated by Canadian jurisdictions by establishing a committee. is committee developed WQI, which many water agencies can apply with slight modifications [22,23]. e sampling method in this procedure requires that a minimum of four parameters be sampled with a minimum of four times [24,25]. e CCME WQI method index scores can be obtained by where F 1 is scope (the ratio of the failed variables number to the total variables in percentage). F 2 is frequency (number of times of the unmet objectives). F 3 is amplitude (unmet objectives amount). Based on the WQI values in weighted arithmetic method and CCME, water quality for drinking can be rated and represented in Table 1

Horton Equation.
e WQI of the study area is carried out using Horton's method (Horton, 1965) by considering ten parameters, namely, TDS, pH, sulphates, total hardness, chlorides, calcium, bicarbonate, sodium, magnesium, and fluoride, by comparing their standards with WHO, ICMR, and ISI standards using the following equation.
where q n is the rating scale for nth water quality parameter and W n is the unit weight factor. e quality rating V n is the estimated value of nth water quality parameter at a given sampling location and V id is the value of the nth water quality parameter in pure water. e entire ideal values (V id ) are taken as seven for pH and zero for all the other parameters.
S n is standard permissible value of nth water quality parameter.
Unit weight is expressed as follows: where K � 1/Σ (1/S n � 1, 2, 3, . . ., n). S n � Standard permissible value of nth water quality parameter. Based on the WQI values, water quality for drinking and irrigation can be rated and represented in Table 2.

Descriptive Analysis.
Descriptive statistics is concerned with quantitatively describing the characteristics of a particular individual or a group. It summarizes data from a sample with minimum and maximum values, mean or standard deviation, and the measures of variability and central tendency.

Correlation Analysis.
It is a measure of the degree between two sets of quantitative data. It is an association of data that contains two sets of variables. e other variables are considered to be correlated if a change in one variable impacts a change in the other. If the two variables vary in the same direction, that is, if an increase in one variable causes an increase in the other, and vice versa, Karl Pearson's coefficient of correlation can be worked out by using the equation below.

Coefficient of correlation
Water quality analysis depends on several parameters. ese parameters are measured in terms of calcium, potassium, sodium, carbonate, bicarbonate, chloride, magnesium, sulphate, EC, TDS, fluoride, and pH. A systematic statistical analysis of the correlation coefficient of water quality measures aids in the assessment of overall water quality in such a way that the correlation coefficient (r) up to 0.5 has no significant linear association between them. A significant linear correlation exists when r is between 0.5 and 0.8, and a strong linear correlation exists when r is greater than 0.8.

Regression Analysis.
To estimate the response variable outcome, Multiple Linear Regression (MLR), a statistical technique, is used with the help of several explanatory variables. e linear relationship between the explanatory (independent) variables and response (dependent) variables can be modelled using MLR. Multiple regressions are simply an extension of ordinary least square regression (OLS) by incorporating more than one explanatory variable.

Results and Discussion
3.1. Water Quality Index. In total, 11 parameters have been used; namely, pH, TDS, total hardness, chlorides, sulphates, calcium, magnesium, sodium, uoride, bicarbonate, and electrical conductivity were used to calculate WQI of the lake water for drinking. For irrigation, except bicarbonates and uoride all other parameters were used. Table 3 shows the variations in the WQI of the samples from 1 to 10. For the purpose of drinking and irrigation, only a few parameters have been changed and the method of calculation of WQI remains the same. It is seen that 9 samples are termed to be of good water quality in terms of drinking and only one sample is seen to be poor for drinking purposes. In considering the WQI for irrigation, it is seen that all samples are found to be t for irrigation, where the values lie in the range between 86 and 100. Table 3 shows the variations in the WQI of water samples from the Mooli Kulam lake using Horton method. e WQI for irrigation seems to be excellent for the parameters included, and the quality of the water for drinking water quality standards seems to be good for 9 samples and fair for one sample. Table 4 shows the variations in the WQI of water quality sampling in Mooli Kulam lake and the method of assessment of WQI is by the Canadian Council. It is seen that the samples do not obey the standards. e WQI obtained is 68.62, which is depicted as fair water quality. e spatial distribution of WQI for drinking and irrigation is presented in Figure 2. Table 5 shows the descriptive analysis of the lake water samples. It is seen that the pH value has a maximum of 8.42 and a minimum of 8.26 with a low standard deviation of 0.1702. EC has the highest standard deviation of all the variables. e maximum value of EC is 1479, which is so close to the permissible limit standards. Turbidity has a maximum value of 84 and a minimum value of 13. Both the maximum and minimum values are not within permissible limits. Iron has a maximum of 1.1 and a minimum of 3.5, with a standard deviation of 0.733. e maximum level of potassium is 24 and the minimum is 10, with a standard deviation of 5.06.

Correlation Analysis.
e value of correlation coecient for northeast season is given in Table 6. It is seen that TDS has a strong linear correlation with electrical conductivity, with a correlation coe cient of 0.999. Total hardness has a strong linear correlation with EC and TDS with a correlation coe cient of 0.809 and 0.808. Sulphate has a strong linear correlation with EC, TH, and TDS with a correlation coe cient of 0.819, 0.918, and 0.817.
Chloride has a strong linear correlation with TH and SO 4 with a correlation coe cient of 0.965 and 0.87. Chloride has negative linear correlation with pH, EC, TDS, and hardness with correlation coe cient (−0.40, −0.23, −0.23, and −0.38). Its value is independent of pH, EC, TDS, and hardness. Magnesium has a strong linear correlation with ve parameters, namely, EC, TDS, TH, SO 4 , and Cl (0.874, 0.872, 0.974, 0.927, and 0.882).
Iron has a strong linear correlation with turbidity and HNO 3    e R 2 value of 1 in the present model indicates that the water parameters turbidity, electrical conductivity, pH, bicarbonate alkalinity, total alkalinity, carbonate hardness, total hardness, and calcium explain 100% of the variability of TDS. e best-fit MLE for predicting the TDS is given below.
TDS � 11.1127 − 0.0151Turbidity + 0.6957EC −0.7365pH + 0.0053HCO3Alk. + 0.0155TA + 0.02CO3Hardness − 0.0438TH + 0.0484Ca. (8) e larger the R 2 value, the bigger the F-ratio, which indicates that the relationship between the dependent and the independent variables is stronger. e overall significance of the regression model is determined by the larger value of the F-statistic as given in ANOVA in Figure 3. e P-value of 0.0013 (P < 0.05 level) validates that the data is a good fit as per the regression model.

Conclusion
e research shows a variety of WQI methods for the assessment of the water quality of lakes. e results obtained from Weighted Arithmetic and Horton's method shows that station 8 comes under poor and fair, which cannot be used for drinking purpose. e remaining station area water quality is found to be good and can be used for drinking. e CCME method results show the water quality is found as fair. us, the results state that the water is safe for drinking except station 8. For irrigation, all the station water can be used for irrigation. e IDW interpolation method of spatial distribution analysis shows the variations in the water quality parameters at one location. e correlation analysis states that the variables have a linear correlation with many other variables. TDS and EC are found to be too highly correlated. e regression analysis also states the majority of variables are dependent on the total dissolved solids, whereas a multiple linear regression fit for TDS shows that the variables are dependent on total dissolved solids. e lake's water turbidity is a major threat to human consumption. Organic pollutants are highly concentrated in the lake water. Certain strategies for water conservation and water body conservation should be adopted for the wellness of the water body and human health.

Data Availability
e data used to support the findings of this study are included within the article.