Runoff Potentiality of a Watershed through SCS and Functional Data Analysis Technique

Runoff potentiality of a watershed was assessed based on identifying curve number (CN), soil conservation service (SCS), and functional data analysis (FDA) techniques. Daily discrete rainfall data were collected from weather stations in the study area and analyzed through lowess method for smoothing curve. As runoff data represents a periodic pattern in each watershed, Fourier series was introduced to fit the smooth curve of eight watersheds. Seven terms of Fourier series were introduced for the watersheds 5 and 8, while 8 terms of Fourier series were used for the rest of the watersheds for the best fit of data. Bootstrapping smooth curve analysis reveals that watersheds 1, 2, 3, 6, 7, and 8 are with monthly mean runoffs of 29, 24, 22, 23, 26, and 27 mm, respectively, and these watersheds would likely contribute to surface runoff in the study area. The purpose of this study was to transform runoff data into a smooth curve for representing the surface runoff pattern and mean runoff of each watershed through statistical method. This study provides information of runoff potentiality of each watershed and also provides input data for hydrological modeling.


Introduction
Runoff from a watershed depends on rainfall, infiltration, and watershed characteristics and it can be measured daily, monthly, or annually. Watershed runoff is a major concern due to its impact on environmental, agricultural, and flood potential. For any watershed, runoff volume and peak flow directly rely on characteristics of watershed [1][2][3]. To assess environmental impact or flood potential, it is needed to know the watershed runoff contribution to the river or streams due to rainfall. At the same time, surface runoff information may be used for groundwater resource modeling by incorporating infiltration information due to rainfall [4]. Increasing rate of urbanization and its adverse effect causes more surface runoff and deterioration of water resources [5][6][7][8][9]. Similarly, agricultural practices and land use patterns have changed over time due to economic benefits [10][11][12], and these changes are also contributing to runoff [13]. For example, in a hilly or high slope landscape, due to greater runoff velocity, infiltration rate will be reduced, and, thus, it will generate higher runoff volume. Similarly, land cover or vegetation may contribute to evapotranspiration losses and infiltration rate and thus may affect the runoff quantity in watershed area [14][15][16][17].
Runoff of a watershed depends on rainfall intensity and typically it varies seasonally. As a result, the impact of watershed runoff might not be the same throughout the year. It is necessary to know the runoff volume for a watershed. There are several methods for determining watershed runoff. The soil conservation service (SCS) method is mainly applied to estimate the direct runoff due to its flexibility, simplicity, and versatility [18,19]. The SCS method [20] is used for interpreting water resources management and for planning the catchment area [21][22][23][24][25], and this method assesses the runoff volume for a particular rainfall depth of an agricultural watershed [26]. The hydrologic soil group (HSG) method defines a particular curve number (CN) that can be used in the SCS method. This method is analyzed to clarify its theoretical and experimental basis [27] and to predict the land use changing effect on runoff in urban hydrology [28]. The CN relies on several factors (e.g., evaporation, adsorption, transpiration, surface storage, etc.) and it represents runoff potentiality of a watershed [29][30][31][32]. The higher the CN value, the higher the runoff potential.
As mentioned before, runoff of a watershed depends on rainfall amount and it varies seasonally. Often actual physical measurements are labor intensive and they are not readily available. In this paper, functional data analysis (FDA) has been used as a tool for altering runoff values to a function which is computable for addressing the runoff pattern of the watershed. FDA represents a continuous runoff process at each watershed with a smooth curve which creates functional data object from the runoff observation data. FDA could provide information on the pattern and its variations [33]. Derivatives of the function also give the information of the slopes and curvatures of the curve. In the process, smooth curve is used to interpret the runoff patterns and variability over time. The FDA technique is free from distributional assumption and it contributes to the scientific fields of research. Currently, the FDA method is used for projecting the surface runoff in terms of water management strategies as well as in predicting flash flood and future climatic events.
Functional principal component analysis, canonical correlation, and linear models with their application can contribute to diverse field data analyses [34,35]. As the data changes with time and it is always found to be seasonal, Fourier series is applied to describe the smoothing parameters for smoothing curve [36][37][38][39][40]. The observed data is assumed to follow a particular distribution which will vary with the time. The fitted smooth curves represent the harmonic numbers of Fourier series that compare the observed rainfall data pattern between different regions [36,41]. From the fitted smooth curve, a bootstrap statistical method can be adopted to estimate the mean value of surface runoff. Bootstrapping is a nonparametric technique to assess the mean sample distribution from an empirical data set without using normal theory [42,43]. Bootstrapping based model forecasts the prediction interval of the mean runoff and is used for analysis in hydrologic models.
The main concentration of this study is to apply the SCS method for identifying the surface runoff of watersheds and  Figure 2: Watersheds and drainage network at Alor Gajah and Jasin.
thenceforward use functional data analysis for predicting the surface runoff pattern of each watershed in the form of fitted smooth curve. At the end of this study, mean runoff with confidence interval of watersheds was estimated by bootstrapping technique. The findings are important in the field of hydrological modeling for obtaining input data under different conditions among the catchments. Therefore, mean runoff data is important to acquire the correct inputs for hydrological modeling process.

Methodology
The study areas are within the Alor Gajah and Jasin basin consisting of eight watersheds under the state of Melaka of Peninsular Malaysia ( Figure 1). Peninsular Malaysia lies between 1 ∘ and 7 ∘ north of the equator and at eastern longitude from 100 ∘ to 103 ∘ east. The climate of Malaysia is influenced by monsoons. The southwest monsoon occurs from May to August while the northeast monsoon occurs from November to February. The southwest monsoon period is drier for the country, while during the northeast monsoon, Sixteen rainfall stations are located in the eight proposed watershed areas ( Figure 2). Daily rainfall data has been collected from the rainfall stations for periods of 2007-2012 to analyze the runoff by SCS method. Elevation differences of the study area varied from 20 to 480 m and eight watersheds are mainly elongated shape in nature ( Figure 2). Ten land  use patterns are practiced in this watershed area, and runoff is influenced by land use and management. Considering large watershed area, the total weighted CN is necessary for estimating accurate runoff. Land use pattern of the study area belongs to different hydrologic soil groups, and each group contains a particular land use CN value. According to hydrologic soil groups most of the watersheds fall under the soil groups of C and D. Curve number for C and D soil groups contributes the greater runoff on the basis of soil textural pattern. The area under C and D soil groups is the first indicator for rapid runoff of the watershed area rather than the area under A and B soil groups. On the basis of soil group and CN value, the weighted CN was estimated for a particular watershed. These values were incorporated in SCS method for estimating daily runoff using daily rainfall data. Rainfall data of different watersheds were taken into account for generating runoff distributing pattern for watersheds (later on identified as 1, 2, 3, 4, 5, 6, 7, and 8) using the SCS method.
This study was divided into four subsections. First, a curve number of each soil group was assigned for measuring the weighted curve number of each watershed. Second, runoff was calculated using the SCS method based on daily rainfall data. Third, the FDA method was used for building the discrete runoff data function and the lowess method for smoothing parameter. Finally, the functional mean of a smooth curve with confidence interval was estimated through statistical bootstrapping method. All of these are described in the subsequent sections.

Curve Number Determination.
As per soil characteristics, soils in the study area fall under the hydrological soil groups of A, B, C, and D as listed in Table 1. In general, soil group A (sandy soil) has the highest rate of infiltration and less runoff potential. Conversely, soil group D (clay soils) has the lowest rate of infiltration and it produces the highest runoff potential. Other soil groups fall in between. Geographical Information System (GIS) software was used for preparing the land use map and soil group map of watershed areas. Based on the soil type and surface condition of a watershed, a CN number was assigned for an area of a watershed and a weighted CN number was calculated. A particular land use pattern of a watershed belongs to both of hydrological soil groups were estimated and used to calculate the individual curve number for soil groups C and D. Total curve number was computed for a catchment by weighting the CNs of the different subareas in proportion to the land cover associated with each CN value following Gumbo et al. [45] and Wong et al. [46] procedures. Total CN for particular watershed CN was expressed as where CN = total curve number for a particular watershed; CN = curve number of watershed for soil group C; CN = curve number of watershed for soil group D; and and = land use pattern percentage for soil groups C and D.

Runoff Measurement Using SCS Method.
Based on CN, runoff quantity was calculated using SCS method [20]. This method was applied to identify the runoff of different watersheds in Melaka state. Surface runoff equation (2) was used to estimate the excess rainfall depth or direct runoff in any watershed area. Consider where indicates different watershed numbers, = runoff, = rainfall, = potential maximum retention after runoff begins, and ( ) = initial abstraction. The water losses before surface runoff begins are termed as initial abstraction. Water retained in surface depressions, infiltration and intercepted by vegetation are included in initial abstraction. ( ) is variable, but generally it is correlated with soil and land cover parameters. ( ) can be estimated through experiments in watershed areas, but it is approximated by the following empirical equation: 6 The Scientific World Journal Substituting (3) into (2), runoff equation may be simplified as is very much allied to the soil and land cover conditions of a particular watershed through the weighted curve number (CN ). The potential maximum retention can be estimated by the SCS method through empirical studies and is expressed as where CN is a total curve number functioning of land use, land cover, and other factors. These factors affect the surface runoff and water retention in a particular watershed. It is a dimensionless number and is defined as 0 ≤ CN ≤ 100. In case of impervious surfaces and water surfaces, CN is equal to 100, and for natural surfaces CN value is less than 100.The higher CN value indicates the greater runoff factor or runoff potential of a watershed and vice versa. It is suggested by Michel et al. [47] that is an intrinsic model parameter which is independent of initial moisture condition, ( ) and are independent of each other, and ( ) is not an intrinsic.

Lowess and Fourier Series.
Once discrete runoff has been estimated, it can be transformed into function using lowess and Fourier series. The interpolation method may be employed if the data is assumed to be errorless. Another method is smoothing process by removing observational errors. Initially, the observed data can be plotted as a scatter plot. As the main intension was to convert discrete runoff data to smooth function, the first step was to make a set of functions representing the functional data. The functional data form was defined by the linear combination of function and is expressed as where is the coefficient and is the known function while is the size of maximum basis required. As the data set is periodic, the function may be represented by the Fourier series which can be written in the form of sine and cosine functions: ( ) = 0 + 1 sin + 2 cos This equation was defined by the basis 0 ( ) = 1, 2 −1 ( ) = sin , 2 ( ) = cos with = 1 , . . . , . The constant is related to the period for the periodic basis with the relation of = 2 / . Lowess linear fit model was applied for smoothing of the data set and for creating a smooth line. This fitting method calculates the residuals which indicate the fitting criterion of the curve. The procedure of lowess smoothing computes the regression weight function for the data points within the span. The regression uses a first degree polynomial for lowess. This method also represents the regression weights for each data point in the span. The weights are shown by the tricube function: where is the predictor value associated with the response value to be smoothed, are the nearest neighbors of as defined by the span, and ( ) is the distance along the abscissa from to the most distance predictor value within the span.
In this smoothing procedure, the values neighboring the outlier reveal the bulk of the data. As it is periodic in nature, the smooth line is to be fit with Fourier series (7) indicating general model of series. This model provides the goodness of fit and regression values resulting in fitting accuracy of the smooth curve.

Bootstrapping Method.
Bootstrapping is a statistical technique under the broader heading of resampling. It provides a good idea about the sampling distribution of a particular statistic. This resampling procedure is based on independent observation to estimate the distribution of statistic in repeating so many times. It is completely automatic and requires no theoretical calculations. Therefore, this technique was used to create a series of randomly selected events from an empirical data set. In this study, the sample is repeated 500 times to represent an empirical bootstrap distribution of the sample mean for driving a 95% confidence interval.

Total CN of Different Watersheds.
Eighteen soil series are observed in Alor Gajah and Jasin basins (Figure 3). Each soil series is classified based on soil characteristics and texture and they fall under C and D hydrologic soil groups (Figure 4). Soil series, soil texture, and corresponding hydrological soil group of the area are shown in Table 2  defined by using (5). Two thousand one hundred ninetytwo (2192) daily rainfall data from sixteen rainfall stations in Alor Gajah and Jasin during 2007 to 2012 were used for runoff analysis. Putting the rainfall data and values in (4), the depth of runoff ( ) was calculated for each watershed of the area. This equation is valid only for the condition of > 0.2 . Every watershed followed this condition. Total curve number (CN ), potential maximum retention ( ), and mean runoff of each watershed of Alor Gajah and Jasin are provided in Table 3. It is evident from Table 3 that no significant differences in maximum retention after runoff value were observed, but mean runoff was higher for watersheds 1, 2, 6, 7, and 8. This means that runoff from watersheds 1, 2, 6, 7, and 8 might be environmental and flood potentials for this basin.

Identifying Smooth Curve from a Scattered Plot.
After calculating the daily runoff from daily rainfall data during the time period, monthly runoff was estimated by summing  the daily runoff data. Seventy-two data sets were prepared for monthly runoff analysis. Functional data analysis technique was applied to create a function from the observation data.
Runoff of smooth and fitted smooth curve of eight watersheds is presented in Figure 7. Residuals are also projected for justification of each watershed. Due to differences from the influence of monsoon season over the time period, runoff values fluctuate and differ among various watersheds. From the discrete data set, a smooth curve of lowess method was applied for the best representation of the data set. On the basis of data distribution, five spans of lowess method match for all the data sets. As the runoff data varies, Fourier series of fitting method was considered for fitted smooth curves. Different terms of Fourier series were adopted for different data series for the best fit and goodness of fit for each watershed is presented in Table 4. After multiple iterations, it is found that 7 terms of Fourier series fit the best for the watersheds 5 and 7 while 8 terms of Fourier fit the remaining watersheds. At the same time residual plots justify the best fit of the smooth curve. These residuals are randomly scattered near zero forming a good fit for data set. Therefore, validation of smooth curves represents their specification. Fitted curve, prediction bound, and smooth curve point of eight watersheds are shown in Figure 8. The dashed line presents the 95% prediction bound of smooth curve data set. It is observed that most of the runoff data of watersheds is under this prediction bound representing the fitting justification.

Mean Runoff through Bootstrapping Technique.
Runoff analysis was conducted based on the daily discrete rainfall data of the watersheds. Bootstrapping technique indicated the mean runoff range for a particular watershed. By using bootstrapping technique ( = 500), at the 95% confidence interval, the mean runoff of each watershed was calculated in the range of their upper and lower limit. Figure 9 shows the smooth curve and 95% confidence interval of mean runoff for eight watersheds. Watershed 1 displays the smooth runoff curve with seven runoff peaks. All peaks are similar and the highest peak is about 50 mm. Typical runoff ranges from 10 mm to 50 mm for this watershed. All peak runoffs were observed during the months of November to February, when most of the rainfall occurred. One large peak is found in watershed 2 and contains the mean runoff ranging from 21 mm to 27 mm. The runoff values vary from one peak to another peak but all peaks occurred during the months of November to February. Watershed 3 exhibits four dominant peaks of runoff throughout the time periods. In December of 2010 the maximum value of peak runoff is observed and the rest of the high runoff value shows the same pattern. The mean runoff ranges were from 19 mm to 24 mm. In watersheds 4 and 5, a similar trend with one high peak runoff was observed and it prevailed between the months of November to February. These two watersheds represent the low range of mean runoff value 13-18 mm and 11-16 mm, respectively. Watersheds 6 and 7 also exhibit a similar trend. Watershed 6 shows more subpeaks than watershed 7 and runoff value for watersheds 6 and 7 ranged from 20 to 26 mm and from 23 to 29 mm, respectively. The runoff pattern in watershed 8 displays one runoff peak having the mean value ranges from 22 mm to 31 mm. Subpeaks are found in this watershed in the months of November to February and another three are found in months of July to August. From this analysis it is anticipated that this runoff shows the watershed characteristics in the form of different degrees of effect for the study area. Therefore, the runoff of a watershed varies with CN values in the study area. Watersheds 1, 2, 6, 7, and 8 in Alor Gajah and Jasin area contributed more runoffs than other watersheds, which is likely due to permanent cover from impervious surfaces and palm trees (Table 5). In this context, watershed 1, 3, 6, 7 and 8 are contributed surface runoff 5, 7, 5, 5 and 4 Mm 3 , respectively. Runoff analysis revealed that watersheds 2, 4, and 5 contributed significantly to groundwater recharge compared to other watersheds in the study area. Figure 10 shows the estimated rainfall and runoff values. A strong relationship ( 2 = 0.99) existed between the rainfall and runoff for the study area. This analysis implies that the surface runoff due to rainfall in Alor Gajah and Jasin basins may  be predicted using the CN. When runoffs of watershed in Alor Gajah and Jasin basins were compared, they varied with seasonal monsoon and most of the peak runoffs were observed during the months of November to February. The runoff characteristics of the watersheds 1, 3, 6, 7, and 8 are very important since they produce significant runoff. This runoff may contribute to the river or stream causing flood and sediment erosion.

Conclusion
The SCS method was applied to assess the surface runoff of eight watersheds. Curve numbers were identified for different hydrologic soil groups in each watershed in Alor Gajah and Jasin basins and they fall mostly under C and D soil groups. Different land use patterns and cover crops were identified for the region. The area averaged weighted curve number was computed for the entire watershed based on land use pattern and curve numbers of a watershed. FDA technique was applied for building the discrete runoff data function and to provide information for smoothing of curves. Eight smooth curves for each watershed represent the nature of surface runoff pattern and smooth curves runoff pattern was compared among the watersheds. The fluctuation of the smooth curve indicates the seasonal variation of runoff due to monsoonal rainfall. Most of the curves show that the highest peak was observed during November to February, when most of the rainfall occurred. Based on the bootstrapping technique the mean of the smooth curve was identified with 95% of the confidence interval providing the upper and lower limit of the mean. The overall findings of smooth curve indicated that a significant difference is obtained among the mean values of eight watersheds. Watersheds 1, 3, 6, 7, and 8 had most of the surface runoff of this region and were likely to contribute runoff water to the river. Surface runoff volume provides the firsthand information for rainwater distribution and contribution. It may be useful to account for runoff information for planning of surface water management. It also 14 The Scientific World Journal indicates the rate of infiltration of the area and contribution and potentiality of groundwater recharge. Instead of discrete runoff data, a functional form of data set could be analyzed to predict runoff over any time interval. The nature of the smooth curve describes runoff potentiality of each watershed and they also provide information on water management in agricultural sector and provide input data of hydrological modeling.