Land Use Land Cover Changes in Detection of Water Quality: A Study Based on Remote Sensing and Multivariate Statistics

Malacca River water quality is affected due to rapid urbanization development. The present study applied LULC changes towards water quality detection in Malacca River. The method uses LULC, PCA, CCA, HCA, NHCA, and ANOVA. PCA confirmed DS, EC, salinity, turbidity, TSS, DO, BOD, COD, As, Hg, Zn, Fe, E. coli, and total coliform. CCA confirmed 14 variables into two variates; first variate involves residential and industrial activities; and second variate involves agriculture, sewage treatment plant, and animal husbandry. HCA and NHCA emphasize that cluster 1 occurs in urban area with Hg, Fe, total coliform, and DO pollution; cluster 3 occurs in suburban area with salinity, EC, and DS; and cluster 2 occurs in rural area with salinity and EC. ANOVA between LULC and water quality data indicates that built-up area significantly polluted the water quality through E. coli, total coliform, EC, BOD, COD, TSS, Hg, Zn, and Fe, while agriculture activities cause EC, TSS, salinity, E. coli, total coliform, arsenic, and iron pollution; and open space causes contamination of turbidity, salinity, EC, and TSS. Research finding provided useful information in identifying pollution sources and understanding LULC with river water quality as references to policy maker for proper management of Land Use area.


Introduction
Land Use Land Cover (LULC) refers to two separate terminologies that are often used interchangeably [1,2]. Land Cover can be defined as the physical characteristics of the earth's surface which involve vegetation, water, soil, and other physical features created through human activities like settlements, while Land Use refers to land used by humans for habitats concerning economic activities [1]. LULC patterns depend on human usage in terms of natural and socioeconomic development through space and time. In other words, Land Use changes have the ability to affect the Land Cover and vice versa. Shifting into possibility negative impact through the Land Use perspective for social activities is affecting Land Cover to change, especially in biodiversity, water and earth radiation, trace gas emission, and other processes that come together to affect the climate and biosphere [2,3]. These changes are attributed to only one main factor in terms of size and pattern, namely, "population growth." Increasing population growth directly and indirectly contributes to LULC changes, especially from the perspective of demand for built-up area, agricultural activities, and water resources. Ecological expertise is very concerned with LULC changes that impact biodiversity and aquatic ecosystems [4]. LULC changes in a watershed will affect water quality, leading to increased surface runoff, reduced groundwater discharge, and transfer of pollutants [2,4]. Therefore, LULC information at the watershed level is important for selection, planning, monitoring, and management of water resource so that the changes in Land Use meet the increasing demand for human needs and welfare without compromising water quality.
Various research studies have been conducted about the change analysis of watersheds, which are important in developing effective management strategies to protect water resources [1,[5][6][7]. Watershed management is necessary because a watershed is not only a hydrological unit [8] but also plays an important part in socioecological perspective by providing economical, food, and social security as well as provision of life support services to local residents [9]. LULC changes in the watershed area for urbanization and 2 Journal of Environmental and Public Health deforestation will continuously have negative impacts on water quality and indirectly affect the nature of a watershed ecosystem. Hence, understanding of the spatial and temporal variations that occur in a watershed over time as well as explanation of the interaction between hydrological components of the watershed will allow better water conservation strategies to be formulated [5]. Specifically, remote sensing has been widely used to classify and map LULC changes with different techniques and data sets, such as Landsat images that provide better classification of different landscape components at a large scale [10]. Several change detection techniques have been developed in remotely sensed image with continuous debate on the advantages and disadvantages of each technique. These include unsupervised classification or clustering, supervised classification, PCA, hybrid classification, and fuzzy classification, which are all commonly applied and used in classification [1,2,11,12]. Although various classification techniques have been proposed, supervised classification methods are considered as favorable for change detection analysis. More recently, researchers have applied supervised classification for several LULC change detection for several research aims and purpose [1][2][3]13].
The Malacca River watershed area has been selected for a change detection study because of its uncontrolled urbanization, unmanageable sewage discharge, and active soil erosion and tree cutting. Apart from these actions, pesticide residues and animal husbandry residues are suspected to become major concerns in the watershed area due to increasing agricultural and poultry farm activities [14]. Rapid urban development in the study area has led to several problems like fragmentation of aquatic animals, soil erosion, and river pollution due to deforestation and discharge of municipal garbage and industrial waste [15]. This study is carried out using the remote sensing application to differentiate the extent of changes which occurred in the Malacca River  Panchor subbasin, Kampung Pulau subbasin, Kampung Sungai Petai subbasin, Kampung Tamah Merah subbasin, and Kampung Tualang subbasin. Only 7 subbasins of 13 were selected, with 9 sampling stations along the river (Figure 1). Malacca state has a reservoir located between Alor Gajah and Malacca Central. This is the Durian Tunggal Reservoir, with a catchment of 20 km 2 . It acts as a source of water for Malacca residents. Increasing local population has led to increasing public facilities such as transport, healthcare, accommodation, sewage, and water supply services [14][15][16]. Due to the drastic population growth, rapid urban development in the Strait of Malacca has also increased, especially from a Land Use perspective. A majority of residents are centralized in the city, which extends about 10 km to the west, 10 km to the east, and 20 km to the north. The changes in Land Use have continuously developed until today, which is in line with the vision and mission of sustainable tourism sector. Eventually, these actions indirectly contribute to economic growth and political changes, strengthened cultural and social relationships, and also impact environmental quality, especially the water in Malacca River.

Data Collection
Nine sampling stations were chosen along the Malacca River. River water quality data included samples in year 2015 analyzed based on APHA [17], while river water quality data for 2001 and 2009 were collected from the Department of Environment (DOE), Malaysia. The primary data was collected in 2015 to obtain recent water quality data status as well as field data verification. There are only two methods of measurements involved: in situ analysis and laboratory analysis. River water quality was analyzed according to physicochemical parameters, that is, pH, temperature, electrical conductivity (EC), salinity, turbidity, total suspended solid (TSS), dissolved solids (DS), dissolved oxygen (DO), biological oxygen demand (BOD), chemical oxygen demand (COD), and ammoniacal nitrogen (NH 3 N), trace elements (i.e., mercury, cadmium, chromium, arsenic, zinc, lead, and iron), and biological parameters (i.e., Escherichia coliform and total coliform) as shown in Table 5

River Water Data
4.1.1. Water Quality Analysis. Water samples were analyzed based on in situ measurement and laboratory analysis. In situ measurement involves pH testing using a SevenGo Duo pro probe (Mettler Toledo AG); turbidity test using a portable turbidity meter (Handled Turbidimeter Hach 2100); and multiparameter probe (Orion Star Series Portable Meter) tests on temperature, EC, DS, salinity, and DO. Meanwhile, laboratory analysis involves measurement on NH 3 N using a spectrophotometer based Hach Method 8038; COD parameter using APHA 5220B open reflux technique; BOD parameter measure using APHA 5210B (Hach Method 8043); TSS measure using APHA 2540D method; E. coli and coliform test using membrane filtration method based APHA 9221B; and trace metal test using an inductive coupled plasma-mass spectrometry (ICP-MS, ELAN DRC-e, Perkin Elmer). Each sample underwent the tests in triplicate before calculating the mean value, and standard deviation (SD) was used as an indication of the precision of each parameter measured with less than 20%.

Statistical
Analysis. The analysis results are then input into Statistical Package for Social Science (SPSS) version 23 for statistical analysis using principal component analysis (PCA), canonical correlation analysis (CCA), hierarchical cluster analysis (HCA) and nonhierarchical cluster analysis (NHCA), and analysis of variance (ANOVA). Generally, PCA can be expressed through (1) original data reduced to dominant components of factors (source of variation) that influence the observed data variance and (2) the whole data set extracted to produce eigenvalue and eigenvectors [18]. Only eigenvalues greater than 1 are considered significant [19] to perform new group variable Varimax Factor (VFs). A VFs coefficient with 0.6 is considered "moderate" and will be taken into account as factor loadings. PCA is applied in this study to define possibility of pollutant sources in the Malacca River. Continuously, the components of PCA will be extracted into CCA for further analysis. CCA have an ability to investigate relationship between the two groups. In other words, (1) CCA will seek for vectors of a and b in random variables of and to maximize the correlation of = corr( , ); (2) random variable of = and = will be constructed to perform new sets of canonical variates that are linear combinations from the original variables with simple correlation between and V; (3) then other vectors and having maximal correlation subject but being uncorrelated with the first canonical variate will be produced as the second canonical variates [20]. CCA is applied in this study to determine accurately and precisely pollutant sources in the river. HCA is able to sort different objects into the same group based on similarity between objects, which involve (1) Ward's methods using variance analysis to minimize between any two clusters [18,21]; (2) measuring the similarity through Euclidean distance between two samples [18,21]; and (3) a dendogram to provide the results for high similarity with small distances between clusters in a group [12]. This study employed HCA to determine possible area contributing to pollution in the study area. Unlike HCA, NHCA with the involvement of -means method is used to obtain the correct classification of pollutant sources based on the PCA components provided. Lastly, ANOVA will be used to analyze between Land Use classes of LULC changes analysis with water quality from factor loadings of PCA analysis. The main purposes of using ANOVA are to determine and to prove the existing of LULC classes that react as pollutant sources to impact the water quality and cause contamination in the Malacca River.

Image Preprocessing, LULC Classification, and Change
Detection Analysis. Satellite images required preprocessing to ensure that the primary object could be established into a more direct affiliation between acquired data and biophysical phenomena [1]. The preprocessing was accomplished using ArcGIS version 10.0 for georeferencing, mosaicking, and subsetting of the image for the Area of Interest (AOI). Landsat 8 images underwent spatial sharpening using the panchromatic bands which resulted in images with a 15 m resolution. Meanwhile, Landsat 5 TM images for 2001 and 2009 were in an original 30 m resolution. Further image processing analysis was carried out using ENVI 5.0. The image was displayed in natural color composite using a band Table 1: Classes delineated on the basis of supervised classification.

Class name Description Vegetation
Including all agricultural and forest lands.

Built-up area
Including all residential, commercial, industrial, and transportation. Water Including all water bodies (river, lakes, gravels, stream, canals, and reservoirs).

Open space
Including all land areas that exposed soil and barren area influenced by human. In performing LULC change detection, a postclassification detection method is applied in ENVI 5.0, which involves two independently classified images used to make comparisons to produce change information on a pixel basis. The interpretation between images provides changes in "from, -to" information. Classified images of two different data sets were compared using cross-tabulation in determining qualitative and quantitative aspects of changes for the periods from 2001 to 2009 and 2009 to 2015. The magnitude of change and percentage of changes can be expressed in a simple formula as follows: where is magnitude of changes, is percentage of changes, is first data, and is reference data [11].

Accuracy Assessment.
Accuracy classification assessments for 2001, 2009, and 2015 images were carried out to determine the quality of information provided from the data. If classification data is to be used for change detection analysis, it is important to conduct accuracy assessments for individual classifications [1]. Kappa test is used to perform measurement of the classification accuracy as the test is able to account for all elements in confusion matrix including diagonal elements [22]. A kappa test is a measure calculated using predefined producer and user assigned ratings, which can be expressed as follows: where ( ) is the number of times the raters agree and ( ) is the number of times the raters are expected to agree only by chance [1,23]. Meanwhile, user accuracy can be defined as the probability that a pixel in an image actually represents a class on the ground, while producer's accuracy indicates the probability a pixel being correctly classified and is mainly used to determine how well an area can be classified [23]. As described previously, the four categories of classes that have been delineated should have a minimum of 50 points for each considered category to increase the percentage of accuracy assessment [1]. Therefore, this study indicates the overall classification accuracies for 2001, 2009, and 2015 are 89.51%, 88.49%, and 92.21%, with kappa statistics of 0.87, 0.85, and 0.90, respectively. According to Weng [24], the minimum level for accuracy assessment in identification of Land Use and LULC classes in remote sensing data should be at least 85%.    Table 3. In other words, the majority of the water body area is reduced and converted into open space and agricultural land, including certain areas that already transformed into built-up (Figures 3(a) and 3(b)). Meanwhile, Table 4

Water Quality Assessment Based on Determination of
Pollutant Sources. PCA was applied to compare composition patterns between water quality parameters and to determine the factors influenced by the identified regions in Malacca state. According to Table 6, there are 7 PCs identified through eigenvalues larger than 1 with 69% of total variance. Principal component (PC) 1 loadings with 15.3% of total variance have positive loadings for dissolved solids, electrical conductivity, and salinity, which are connected to agricultural activities and contribute to nonpoint source pollution through surface runoff [18]. Salinity pollution exists due to pesticide usage in oil palm and rubber plantations as well as animal husbandry (chickens, cows, and goats) carried out by some local residents along the Malacca River. Apart from that, erosion of riverbank due to dredging activity in the river is contributed to electrical conductivity pollution in the river. PC 2 explains positive loadings of turbidity and total suspended solid with total variance of 10.3%. This condition could happen when there are interruptions of human activities in terms of hydrologic modifications like dredging, water diversions, and channelization causing disruption in the Malacca River [16].
On the other hand, increasing population growth leading to land clearing increase for urban development [18,19] and surface runoff cause road edge erosion [19] to happen within residential areas adjacent to the river. Next, PC 3 show positive loading on BOD and COD with the total of variance of 10.1%, which can be related to anthropogenic sources, having high possibility of coming from sewage treatment plant that contributed as point sources pollution [19]. PC 4 loadings with 10% of total variance have positive loadings on zinc and iron. Zinc pollution exists due to large numbers of houses and building development in urban and rural area that uses metallic roofs coated with zinc, where it can be mobilized into the atmosphere and waterways when contacting with acid rain or smog [19], while iron pollution happens because of agricultural activities in most parts of the rural area [18] and originating from industrial effluents in urban area [19]. PC 5 indicated positive loading of arsenic with total variance of 8.5%, showing that the pollutions are strong possibility of involving with the agricultural land [25]. PC 6 loadings with 8.0% of total variance have positive loadings on E. coli and total coliform, while negative loadings are dissolved oxygen. The presence of E. coli and total coliform pollution in the river is strongly connected with raw and municipal sewage from domestic and poultry farm mainly in rural and urban area. In addition to this, surface runoff and discharge from wastewater treatment plants from urban areas as well as dissolved oxygen pollution may be impacted by high levels of dissolved organic matter that consume large amounts of oxygen [19] and are suspected to come from agriculture activities and forest areas which are the dominant Land Use type in rural regions. Lastly, PC 7 resulted in positive loading of mercury with total variance of 6.8%, highly suspected to link with chemical industrial wastewater [25] that the majority occur at middle-stream and downstream of Malacca River. Therefore, the most likely sources of pollutants in terms of physicochemical and biological parameters are agriculture, residential activities, septic tank and sewage treatment plant activities, animal husbandry, industrial activities, and open space activities, which have an important role in specifying changes in LULC.
Continuously, CCA is carried out on the sets of data obtained from 7 PCs. There are 14 variables in the response  data set, namely, biological parameter with E. coli and total coliform and physicochemical parameter including turbidity, DS, EC, salinity, DO, BOD, COD, TSS, As, Hg, Zn, and Fe (Table 7). Table 7 represents the results of CCA for biological and physicochemical variables. Correlation coefficients for canonical variates 1 and 2 were 0.841 and 0.660, respectively, indicating both are statistically significant ( < 0.000). The test statistic for canonical variates 1 and 2 is found to be 2 1 = 620 with 24 degrees of freedom and 2 2 = 311 with 11 degrees of freedom. This result indicates that both variates of 1 and 2 are having strong relationship with high correlation between the response and predictor sets of data; only variate 2 is higher than variate 1. The dominant variable in first canonical variate for biological variables ( 1 ) is E. coli, while the dominant variables in 1 (physicochemical parameters) are DS, EC, DO, BOD, COD, Hg, and Zn. Next, the second canonical variates indicating the predictor variables are E. coli and total coliform, while the response variables have the result of turbidity, EC, salinity, TSS, As, and Fe. Considering the mentioned results, a regular pattern can be seen. From the first canonical variate it is indicated that residential and industrial activities have high percentage to cause pollutant sources, while second canonical variate indicates that agriculture, sewage treatment plant including septic tank, and animal husbandry activities proved to cause as pollutant sources and to react as nonpoint source pollution in the river.
Further analysis is carried out in hierarchical cluster analysis (HCA) and nonhierarchical cluster analysis (NHCA), as well as ANOVA between the LULC classes changes with river water quality data. The analysis of HCA using Ward's method indicates the results of three cluster areas, which can be divided into C1 with S7, S8, and S9; C2 with S1 and S2; and C3 with S3, S4, S5, and S6 (Figure 2(a)). The result provided will be further analyzed using nonhierarchical cluster analysis to obtain the correct classification of pollutant sources based on the PCA components in the location area involved. According to Table 8, NHCA confirmed four samples detected in cluster 1 with 275 cases involved to produce Hg, Fe, total coliform, and DO; cluster 2 has only 5 cases to produce two samples with salinity and EC; and cluster 3 detected three samples in 44 cases to produce salinity, EC, and DS. In other words, cluster 1 is significantly subjected to be involved with the industrial and residential activities, as well as sewage treatment plant [19], while cluster 3 is suspected to carry out agriculture, sewage treatment plant, and animal husbandry activities; and cluster 2 is involved with minor impact caused by agriculture and animal husbandry activities [18] (Figure 2(b)). Therefore, cluster 1 is likely to occur in urban area, cluster 3 is suburban area, and cluster 2 is rural area.
Lastly, as described in statistical analysis, analysis of variance (ANOVA) is carried out to obtain accurate result between LULC classes with river water quality of 15 years. Among the LULC classes, built-up areas are having the highest significance with 9 variables of water quality; vegetation is the second highest to have 8 variables significant with water quality; and the lowest significance is the open space with only 4 variables of water quality that resulted in ANOVA (Table 9). Built-up area is subjected to cause pollution in E.      Variables Z score: turbidity (NTU) Z score: dissolved Solid (mg/l) Z score: conductivity (uS) Z score: salinity (ppt) Z score: dissolved oxygen (mg/l) Z score: biochemical oxygen demand (mg/l) Z score: chemical oxygen demand (mg/l) Z score: total suspended solid (mg/l) Z score: E. coli (facel coliform) (cfu/100 ml) Z score: total coliform (cfu/100 ml) Z score: arsenic (mg/l) Z score: mercury (mg/l) Z score: zinc (mg/l) Z score: iron (mg/l) coli, total coliform, EC, BOD, COD, TSS, Hg, Zn, and Fe. In this condition, residential activities (BOD, COD, E. coli, total coliform, and Zn), industrial activities (Hg, Zn, and Fe), and sewage treatment plant (BOD, COD, E. coli, and total coliform) as well as animal husbandry (E. coli, total coliform) are suspected to become main pollutant sources to contaminate the Malacca River, as the majority occur in urban and suburban area. Meanwhile, vegetation area involves agriculture activities and forest land is suspected to cause pollution in river water quality. Agriculture activities with high usage of pesticide would cause salinization pollution; and high percentage of fertilizer would cause E. coli, total coliform, arsenic, and iron pollution. Indirectly, agriculture activities could disrupt the soil structure and cause EC as well as TSS in the river. These activities happen to result in nonpoint source pollution. Continuously, although DO is suspected to have contaminated in vegetation area, however, the variable is not considered due to no significance in analysis to provide the result of (df = 2, > 0.16) = 1.38. Probably minor cause of pollution from DO can be connected with forest land activities. Open space activities of LULC classes can be described as transition area for built-up area that converted from agriculture, as well as several areas from forest land into agriculture activities. On the other hand, hydrologic modification like dredging, water diversion, and channelization will cause erosion of riverbank to increase the pollution of turbidity, salinity, EC, and TSS.

Conclusion
Remote sensing is a tool to aid in detecting the magnitude of LULC change that has taken place in the Malacca River watershed for river water quality over the span of 15 years. It is divided into two parts: 2001 to 2009 for 9 years and 2009 to 2015 for 7 years. This research study has highlighted the application of remote sensing to develop LULC changes over time for the river water quality pollution based on pollutant sources. 7 PCs had been identified through PCA to result in DS, EC, salinity, turbidity, TSS, DO, BOD, COD, As, Hg, Zn, Fe, E. coli, and total coliform detected in the river water quality, which contribute possible detection of pollutant sources as agriculture activities, residential activities, industrial activities, septic tank, and sewage treatment plant activities, as well as animal husbandry activities. Simultaneously, selected variables from PCA will be applied into CCA to seek the relationship between the physicochemical parameters of response data and biological parameters of predictor data, with the result showing strong relationship and high correlation. The CCA indicate first canonical variate as E. coli, DS, EC, DO, BOD, COD, Hg, and Zn, to prove the existing of residential and industrial activities. Meanwhile, second canonical variate produces E. coli, total coliform, turbidity, EC, salinity, TSS, As, and Fe, which resulted as agriculture, sewage treatment plant as well as septic tank, and animal husbandry activities are carried out in the Malacca River watershed.
Afterwards, HCA is applied to determine possible area based on the pollution which occurred, indicating three clusters that consist of C1 with S7, S8, and S9; C2 with S1 and S2; and C3 with S3, S4, S5, and S6. Next, NHCA is used to obtain the correct classification of pollutant sources based on HCA cluster and PCA components, which defined that cluster 1 produces Hg, Fe, total coliform, and DO; cluster 2 produces salinity and EC; and cluster 3 produces salinity, EC, and DS. Overall, HCA and NHCA emphasize that cluster 1 occurs in urban area, cluster 3 is suburban area, and cluster 2 is rural area. Lastly, ANOVA between LULC and water quality data showed built-up area having contamination of E. coli, total coliform, EC, BOD, COD, TSS, Hg, Zn, and Fe, which highlighted the residential activities, industrial activities, and sewage treatment plant as well as animal husbandry that occur in urban and suburban area. Meanwhile, vegetation area of agriculture activities is suspected to cause EC, TSS, salinity, E. coli, total coliform, arsenic, and iron pollution, while forest land has minor impact to contaminate the river by DO pollution. Most of vegetation area occurs in suburban and rural area. Lastly, open space activities have pollution of turbidity, salinity, EC, and TSS due to hydrologic modification such as dredging, water diversion, and channelization. Overall, these research findings offer an effective solution to water quality management when large complex water quality data is involved, provided useful information in identifying pollution sources and understanding the river water quality with LULC change detection information providing references to policy maker in proper management of Land Use area.

Conflicts of Interest
The author declares that there are no conflicts of interest regarding the publication of this paper.