Deriving Design Flood Hydrograph Based on Conditional Distribution: A Case Study of Danjiangkou Reservoir in Hanjiang Basin

Design flood hydrograph (DFH) for a dam is the flood of suitable probability and magnitude adopted to ensure safety of the dam in accordance with appropriate design standards. Estimated quantiles of peak discharge and flood volumes are necessary for deriving the DFH, which are mutually correlated and need to be described by multivariate analysis methods. The joint probability distributions of peak discharge and flood volumeswere established using copula functions.Then the general formulae of conditional most likely composition (CMLC) and conditional expectation composition (CEC) methods that consider the inherent relationship between flood peak and volumes were derived for estimating DFH.The Danjiangkou reservoir in Hanjiang basin was selected as a case study. The design values of flood volumes and 90% confidence intervals with different peak discharges were estimated by the proposed methods. The performance of CMLC and CEC methods was also compared with conventional flood frequency analysis, and the results show that CMLCmethod performs best for both bivariate and trivariate distributions which has the smallest relative error and root mean square error. The proposed CMLC method has strong statistical basis with unique design flood composition scheme and provides an alternative way for deriving DFH.


Introduction
A reservoir dam can be designed with the design flood hydrograph (DFH), which is a hydrograph adopted according to design standards to ensure the safety of a structure.DFH generally refers to the computed flood hydrograph at the dam site that may be estimated from precipitation or flow records, depending on engineering judgment after consideration of all the pertinent data, including the extent and reliability of the flow data and rainfall data [1].The DFH is characterized by the joint behavior of several correlated random variables such as flood peak, volume, and duration.Knowledge of the flood peak or volumes alone is not sufficient to design the dam spillway; the entire flood hydrograph must be utilized.The quantiles of the flood peak and volumes corresponding to a particular design return period are necessary for deriving DFH.However, the conventional flood frequency analysis methods for deriving DFH recommended by many countries are based on the univariate distribution, mainly concentrated on the analysis of annual peak discharge or flood volume series without analyzing the inherent relationship between flood peak and volumes [2].For example, the Chinese univariate flood frequency analysis method for deriving DFH implicitly assumed that flood peak and flood volumes are independent and amplified with the same frequency, which has shortcomings in practice [3,4].Over-or underestimation of risk would be produced by the univariate frequency analysis method because of the unclear relationship between the correlated hydrological variables [5,6].
To solve this problem, many authors have used statistical methods to derive the DFH by examining the dependence of flood peak, flood volume, and its shape.Xiao et al. [2] proposed a multicharacteristic synthesis index method based on the MSI quantile and derived the DFH.Pramanik et al. [7] used the probability density functions to fit the shape of hydrographs and subsequently to develop design flood hydrographs for various return periods.Mediero et al. [8] estimated flood volume for a given peak discharge by fitting a regional log-log regression equation over the observed pairs, and a Monte Carlo experiment was conducted to generate an ensemble of DHFs that maintain the statistical properties of marginal distributions of the peaks, volumes, and durations.Serinaldi and Grimaldi [9] studied the advantages and shortcomings of using simple distribution functions with finite support (namely, beta and generalized standard twosided power distributions) to represent and synthesize direct runoff hydrographs.In their study, the dependence among peak discharge, volume, and duration was explored on a few flood events selected by a recursive digital filter algorithm and an over threshold approach.Domínguez and Arganis [10] introduced and used the IINGEN method to estimate the design floods for Malpaso Dam, and the method makes it possible for considering flood peak, volume, and shape simultaneously.Óscar et al. [11] proposed a method to select the time base of the DFH using spectral analysis method and defined the number of days necessary for taking into account the complete dynamics of the hydrological floods.
Recently, the copula functions have been increasingly applied in exploring the inherent correlation between flood peak and volumes for deriving DFH.Copula functions allow for more flexibility in the marginal distributions and the dependence between peaks and volumes.Chowdhary et al. [12] discussed the identification of suitable copulas for bivariate frequency analysis of flood peak and flood volume data, and Gumbel-Hougaard copula of Archimedean family was found to be the most suitable dependence model for flood peak and volume.Bačová and Halmová [13] explored the analysis and statistical evaluation of the joint probability of the occurrence of peak discharge and volume; they also calculated joint return periods and conditional return periods for the hydrological pair.There have been numerous studies on the degree of the dependence between peaks and volumes that is needed to estimate the multivariate quantiles (e.g., [5,6]) and the choice of the copula function (e.g., [12,14]).Gaál et al. [15] studied the peak-volume relationships for the region of Austria, which featured a diverse spectrum of hydrological flood processes and discussed whether flood peak-volume relationships can be typified by comparative hydrology.Sraj et al. [16] compared different bivariate copulas from three families and analyzed the bivariate flood frequency analysis in the Litija station in the Sava River.The previous researches [12][13][14][15][16] derived the multivariate quantile curve under one joint return period, and infinite combinations on the isoline or isosurface could be selected to evaluate the effects of different hydrological loads on the structure [17].However, one has to select a quantiles combination out of events which all share the same return period for deriving DFH [14].How to solve this problem has attracted the attention of many researchers (e.g., [6,14]).Li et al. [6] proposed a copula identical frequency method and used it to estimate bivariate joint flood quantiles in Three Gorges reservoir.Gräler et al. [14] gave a critical and practical review focusing on synthetic design hydrograph estimation.Their approaches were based on regression analysis, bivariate conditional distributions, bivariate joint distributions, and Kendall distribution function, highlighting theoretic and practical issues of multivariate frequency analysis.Chebana and Ouarda [18] proposed the decomposition of the level curve into a naïve part (tail) and the proper part (central); they assumed that the naïve part was composed of two segments starting at the end of each extremity of the proper part.Salvadori et al. [19] introduced two basic design realizations, that is, componentwise excess design realization and most likely design realization.Volpi and Fiori [20] identified a subset of the critical combinations set that includes a fixed and arbitrarily chosen percentage in probability of the events, on the basis of their probability of occurrence.
In the multivariate domain for deriving DFH, the return period must be defined in terms of the acceptable risk for the reservoir.Different return periods estimated by copula function have been developed for the case of a multivariate flood frequency analysis.Eight types of possible joint events were presented by Yue and Rasmussen [21] and Shiau [22], in which the OR, AND, conditional cases are of greatest interest in hydrological applications.Recently, another definition of the multivariate return period was given by Salvadori and De Michele [23], called Kendall return period.This return period was introduced to identify in a multivariate context a univariate critical threshold.Until now, only a very limited number of studies actually applied this kind of return period (e.g., [24]).Volpi and Fiori [25] proposed a general, structurebased return period for the design and risk assessment of hydrological structures in a bivariate environment.Their work draws attention of practitioners to the importance of considering the structure in hydraulic design and/or risk assessment problems in a multivariate environment, thus advising against the uncritical use of design event-based approaches which neglect the interplay between the structure and the hydrological loads acting on it.More information about the return periods could be found in Shiau [22] and Volpi and Fiori [25].Until now, a clear assessment of the use of return periods is still a matter of debates [26,27].
In this paper, we study the peak-volume relationships by copula functions for Danjiangkou reservoir in the Hanjiang basin extensively.Specifically, we investigate the inherent relationship on the basis of their probability of occurrence.Different copula functions from Archimedean family were applied and compared.Then the conditional most likely composition (CMLC) and conditional expectation composition (CEC) methods were proposed, inspired by the idea of Gräler et al. [14] and Salvadori et al. [19].The main aim of this study is to provide a simple tool for the identification of the relationship between flood peak and volumes and derive DFH.
To achieve these objectives, the paper is organized as follows: introduction of the Chinese method of deriving DFH is illustrated in Section 2. The methodologies used in this study are presented in Section 3. In Section 4, results and discussions of case study are shown, followed by the Conclusions.

Chinese Method of Deriving DFH
Pearson type III (P3) distribution is recommended by Ministry of Water Resources of China for deriving DFH [2,3].For a given sample series { :  <  < +∞}, the probability density function (PDF) of P3 distribution is expressed as where , , and  are the shape, scale, and location parameters, respectively, and Γ() is the gamma function.
The L-moment (LM) method is preferred for parameter estimation because of its robust properties in the presence of usually small or large values (outliers) [28,29].Univariate flood frequency analysis model (P3/LM), that is, P3 distribution and L-moment method, is used to estimate flood quantile for given return periods.
The peak and volume amplitude (PVA) method implicitly assumed that flood peak and flood volumes are amplified with the same frequency (return period).For example, the values of peak discharge and flood volumes are individually estimated by each marginal distribution under given return periods firstly.Then the DFH is constructed by multiplying each discharge ordinate of the typical flood hydrograph (TFH) by an amplifier.
The amplifier is either the ratio of the design peak with a given return period to the TFH peak or the ratio of the design volume with a given return period to the TFH volume.Suppose that the characteristics of annual maximum flood hydrograph consist of flood peak  max , 1-day maximum flood volume  1 d , 3-day maximum flood volume  3 d , 7day maximum flood volume  7 d , and 15-day maximum flood volume  15 d , the quantiles with design return period  are estimated by P3/LM and denoted by  max  ,  1 ,  3 ,  7 , and  15 , respectively.The corresponding characteristics of selected TFH are denoted by  TFH ,  1TFH ,  3TFH ,  7TFH , and  15TFH , respectively.These amplifiers are calculated as follows.
Amplifier   for flood peak discharge is as follows: Amplifier  1 for the 1-day maximum flood volume (excluding flood peak) is as follows: Amplifier  3−1 for the 3-day maximum flood volume except for the 1-day maximum flood volume is as follows: Amplifier  7−3 for the 7-day maximum flood volume except for the 3-day maximum flood volume is as follows: Amplifier  15−7 for the 15-day maximum flood volume except for the 7-day maximum flood volume is as follows: The dividing portion between two contiguous segments of the constructed DFH is not continuous and should be modified by hand smooth treatment in engineering practice [2,3].

Derivation of DFH Based on
Copula Function

Joint Distribution Based on Copula Function.
Copula function is a function that describes and models the dependence structure between random variables, independent of the marginal distributions involved (e.g., [26,28,30]).Let    (  ) ( = 1, 2, . . ., ) be the cumulative distribution function (CDF) of   .The objective is to determine the multivariate distribution, denoted as   1 , 2 ,...,  ( 1 ,  2 , . . .,   ) or simply .Thus, the multivariate probability distribution  is expressed in terms of its marginal and the associated dependence function as Sklar's theorem: where , called the copula function, is uniquely determined whenever    (  ) are continuous and captures the essential features of the dependence among the random variables.

Conditional Most Likely Composition (CMLC) Method.
The P3 distribution is selected as marginal distributions of peak discharge () and flood volumes (  ( = 1, 2, . . ., )), which are denoted as   (),    (  ).Then the corresponding density functions are denoted as   () and    (  ), in which  represented the number of flood volumes needed for estimating DFH.For a given design standard (return period), it is necessary to find an appropriate combination of flood quantiles since different combinations would result in different DFH [18,20].How to select appropriate combinations (i.e., flood quantiles composition method) is extremely important in practice.The conventional univariate hydrological frequency analysis method for deriving DFH implicitly assumed that flood peak and volumes are independent and amplified with the same frequency, which could not explore the inherent correlation of hydrological variables [2].Under given peak discharge, the combinations differ in terms of their probability of occurrence.

Mathematical Problems in Engineering
The corresponding conditional probability distribution is The density function of conditional probability distribution is where in which To maximize    |, 1 , 2 ,..., −1 (  ), the following equation should be satisfied: If   is following P3 distribution as (1), then the following equation should be satisfied: where , , and  are shape, scale, and location parameters of marginal distribution function    (  ), respectively.Substitute (13) into (12) and simplify it as follows: Equation ( 14) is the general formula of the CMLC method.If the peak discharge and flood volumes are independent, then the value of  1 would be 0, and the value of () would be 1.Substituting the values of  1 and () into ( 14), it could be simplified as follows:   = ( − 1)/ + .However, the independent assumption is only a special case in practice.
The CMLC method is an approach to describe the compositions of flood peak and volumes by using the conditional density function to measure the occurrence likelihood of flood events.It is the mode of the conditional distribution    |, 1 , 2 ,..., −1 (  ).Based on the principle of maximizing conditional joint probability density function, ( 14) can be solved by numerical computation methods, such as harmonic mean Newton's method [31].

Conditional Expectation Composition (CEC) Method.
The CEC method is also proposed to estimate the volumes under given hydrological variables, which is another way to analyze the inherent dependence between correlated flood peak and volumes.If the variables ,  1 ,  2 , . . .,  −1 are known, then the conditional expectation (  | , 1 ,  2 , . . .,  −1 ) is used to estimate flood volume; that is, where 15) is the general formula of conditional expectation composition (CEC) method, which can be solved by Gauss-Legendre quadrature rule [32].

Confidence Interval of Conditional Probability
For a given significance level , the lower and upper limits    and    identifying the confidence interval of flood volume   are derived by Specifically, if  = 0.10, then the 5% and 95% quantiles compose the 90% confidence intervals of conditional probability distribution.

Evaluation Indexes.
To assess and compare different methods, the root mean square error (RMSE) and relative error (Bias) were selected as evaluation indexes: where   and X are observed and estimated values and  is the length of sample series.The smaller these indexes are, the better the fitting or estimating methods are.

Case Study
4.1.Danjiangkou Reservoir.The Hanjiang basin is the largest tributary of the Yangtze River; it passes through the provinces of Shanxi and Hubei of China and merges into the Yangtze River at Wuhan city.The river's length is 1570 km and the basin area is 159,000 km 2 with a subtropical monsoon climate.The regional precipitation and flow are characterized by high seasonal variability, with 75% of rainfall occurring from June to October, where the rare rainstorms in the early summer and long-lasting rains in the autumn often result in great floods.
The Danjiangkou reservoir located in the middle reach of the basin (Figure 1) is the source of water for the middle route of the South-North Water Diversion Project in China.The available water resources in Hanjiang basin and the impact of water diversion have been discussed by many authors [33].Since it is a multipurpose reservoir, there is serious conflict between flood prevention, water supply, and hydropower generation.The normal pool level is 170 m and corresponding storage capacity is 29.05 billion m 3 .The annual maximum peak discharges and flood volumes of the inflow of Danjiangkou reservoir are available with a systematic record of 61 years .The flood data series was provided by the Bureau of Danjiangkou Water Resources Management.This data series has been restored and checked for possible long-term trends and abrupt changes, and the test results demonstrate that the data set is stationary and represented.The sample statistics of the flood peak and volume series were listed in Table 1.

Flood Quantiles Estimated by P3/LM. Univariate hydrological frequency analysis model (P3/LM
) was used to estimate flood quantile for given return periods.The estimated parameters of the P3 distribution for peak discharge and flood volumes series were listed in Table 2.A Chi-Square ( 2 ) goodness-of-fit test was performed to test the assumption,  0 , that the flood magnitudes follow the P3 distribution.Table 2 shows that the assumption could not be rejected at the 5% significance level.
For given different return periods ( = 5, 10, 20, 50, 100, 200, 500, and 1000 years), the corresponding quantiles Copula of flood peaks and volumes were estimated based on the univariate P3/LM model, and the results were listed in Table 3.
The empirical and P3 distribution frequency curves of flood peaks and flood volumes were drawn in Figure 2, on which the Weibull plotting position formula [34] was used.It is shown that theoretical values can fit the observed values very well.

Joint Distribution of Flood Peak and Volumes Based on
Copulas.Different families of copula functions have been proposed and described by Nelsen [30].The Archimedean copula family is more desirable for hydrologic analyses, because most of the family can easily be constructed and some of the family can be applied whether the correlation among the hydrological variables is positive or negative [16,26].The copula functions (Table 4) were applied to construct the bivariate joint distributions between flood peak ( max ) and flood volumes (  max - 15 d , respectively.The parameters of bivariate copulas were estimated by the method of the inversion of Kendall's .Based on the results of Kendall's , the AMH copula could not be used because it is only suitable when Kendall's coefficient is below 0.34 [30]. The G-H, Clayton, and Frank copula functions were compared with different statistical tests.The  values were calculated based on the parametric bootstrap or multipliers [35] procedure with 10 000 runs.The results of the Cramervon Mises test (   ) and Kolmogorov-Smirnov test (  ) were listed in Table 5.
The comparison between simulated (sample size 10 000) and observed values for three chosen copula functions was plotted in Figure 3. On the basis of statistical tests and graphical goodness-of-fit tests, the G-H copula is much suitable for constructing the bivariate joint distribution of peak discharge and flood volumes.This finding is in accordance with Li et al. [6] and Sraj et al. [16].
Therefore, the G-H copula was used to model the dependence between the flood peak and flood volumes in this study.The linear correlation between empirical and theoretical frequency values of bivariate G-H copulas for flood peak and volumes was plotted in Figure 4, on which the linear correlation coefficients are 0.9994, 0.9991, 0.9991, and 0.9991, respectively.The comparison of empirical plots and theoretic frequency curves of bivariate G-H copulas was also plotted in Figure 5, which shows that the theoretical frequency line (or curve) fit empirical values very well with high correlation coefficients.
The G-H, Clayton, and Frank copulas from Archimedean family were also used to establish trivariate symmetric and asymmetric joint distributions of  max - 1 d - 3 d ,  max - 3 d - 7 d , and  max - 7 d - 15 d , respectively.Table 6 shows selected trivariate copula functions, where parameter of    symmetric copulas is  and asymmetric copulas are  1 and  2 ( 2 >  1 ).Parameters of trivariate copula functions were estimated by the maximum likelihood method [26].To assess and compare six trivariate copula functions, the Kolmogorov-Smirnov test value   and Akaike information criterion (AIC) [29] were selected and the results were listed in Table 7.It is shown that AIC and   values of asymmetrical G-H copulas are less than those of other copulas.Therefore, the asymmetrical G-H copulas are selected to establish 3dimensional joint distribution of flood peak and volumes in this study.Similar to bivariate joint distribution, the linear correlation relationship and comparison of the empirical plots and theoretical frequency line (or curve) of asymmetrical trivariate G-H copulas were plotted in Figures 6 and 7, which show that the theoretical frequency line (or curve) fit empirical values very well with high correlation coefficients.The RMSE and Bias were selected as evaluation indexes to assess the performance of different methods.Table 8 shows that the CMLC method has smallest RMSE and Bias while CEC method has the largest RMSE and Bias values.For a given significance level  = 0.10, then 90% confidence intervals of flood volumes were evaluated by (17). Figure 8 shows 7-day and 15-day flood volumes as well as confidence intervals (CI) estimated by CMLC and CEC methods.Figure 8 shows that most of observations are within confidence intervals.It could be concluded that CMLC method is reasonable and performs best in bivariate distribution.
To compare the performance of different methods in trivariate distribution, 15-day volumes were estimated by CMLC and CEC method, respectively, based on peak discharges and 7-day volumes.Prob(      performs best with the smallest RMSE and Bias values, while CEC method has the largest values of indexes and univariate P3/LM model performs reasonably well.

Derivation of Design Flood Hydrograph.
All of these results demonstrate that the CMLC method performs best for analyzing the inherent relationship between flood peak and volumes.The proposed method was used to derive the DFH in Danjiangkou reservoir.There are so many definitions of return period, such as AND, OR, and Kendall return periods [23,24].The conditional return period [15,27] was selected just as an example to demonstrate the proposed method to derive DFH.In Danjiangkou reservoir, the peak discharge, 7day and 15-day volumes were always indicated as the control hydrological variables for deriving DFH.For different return periods (5,10,20,50,100,200, 500, and 1000 years), the peak discharge was estimated by marginal distribution firstly.The second step is to estimate 7-day volumes under the calculated peak discharge by bivariate CMLC method.The third step is to estimate 15-day volumes under the calculated peak discharge and 7-day volumes by trivariate CMLC method.The design flood volumes estimated by CMLC and Chinese method were listed in Table 3.It is shown that 7-day and 15day flood volumes estimated by CMLC method are less than those of P3/LM method for different return periods.
After getting the values of flood peak and volumes under each return period, the DFH could be derived by amplifying the typical flood hydrographs (TFH).The observed annual maximum flood hydrographs of 1975 and 1983 that have high peaks and large volumes with posterior-peak shape at the Danjiangkou reservoir were selected as TFH, respectively.The amplifiers of flood peak and volumes were calculated by (2) to (6), respectively.The DFH is derived by multiplying discharge ordinate of the TFH by corresponding amplification factor.As an illustrative example, only 1000-year DFH derived by the proposed CLMC and conventional Chinese methods were plotted in Figure 9 based on 1975 and 1983 TFH.It is shown that the flood quantiles estimated by the Chinese method are larger than that by the proposed CLMC method.

Conclusions
The derivation of design flood hydrograph is very important in multivariate flood frequency analysis frameworks.Unique design floods under given standard are necessary for engineering practice.The simple regression method may be preferred due to its ease for application.To avoid the arbitrariness of design floods, the proposed conditional methods which have statistical basis might be one of the possible options for deriving flood quantiles and DFH.In this study, the general formulae of CMLC and CEC methods were proposed and developed to consider the relationship between the flood peaks and volumes.The proposed methods were applied in the Danjiangkou reservoir and compared with conventional Chinese univariate flood frequency analysis method.The following conclusions were drawn from this study.
(1) The Chinese method for deriving DFH implicitly assumed that flood peak and flood volumes are independent and amplified with the same frequency, which might produce over-or underestimation of risk.
(2) The Gumbel-Hougaard copula is more suitable for describing the correlation between flood peak and volumes.The results of case study in Danjiangkou reservoir demonstrate that the CMLC method performs best while the CEC method performs worst.
(3) The proposed CMLC method not only has strong statistical basis, but also has rational results with unique composition scheme and will provide a new approach for derivation of design flood hydrograph.

Figure 1 :
Figure 1: Location of Danjiangkou reservoir in the Hanjiang basin.

Figure 2 :
Figure 2: The fit of P3 probability curves with observed flood peak and flood volumes.

Figure 4 :
Figure 4: Linear correlation between empirical and theoretical frequency values of bivariate G-H copulas for flood peak and volumes.

4. 4 .
CMLC and CEC Estimation and Assessment.Flood volumes were estimated by the CMLC, CEC methods and compared with P3/LM model.For bivariate distribution, the flood volumes ( 1 d ,  3 d ,  7 d , and  15 d ) based on peak discharge were estimated by (14) and the results were denoted as Prob( 3 d |  max ), Prob( 7 d |  max ), and Prob( 15 d |  max ), respectively.Similarly, the CEC method was used to  max - 15 d

Figure 5 :
Figure 5: Comparison of the empirical plots and theoretical frequency curves of bivariate G-H copula for flood peak and volumes.

Figure 7 :
Figure 7: Comparison of the empirical plots and theoretical frequency curves of trivariate asymmetric G-H copulas for flood peaks and volumes.

)Figure 8 :
Figure 8: Estimated volumes and confidence intervals for given peak discharges.

Figure 9 :
Figure 9: 1000-year DFH derived by the CMLC and Chinese methods based on 1975 and 1983 TFH.

Table 2 :
Estimated parameters of the P3 distribution and hypothesis test.

Table 3 :
Comparison of design floods estimated by P3/LM and CMLC methods.

Table 4 :
Bivariate copula functions and the relationship between their parameter and Kendall correlation coefficient.

Table 5 :
Estimated parameters of copulas and results of statistical test.

Table 8 :
Comparison of flood volumes estimated by univariate and bivariate distributions.

Table 9 :
Comparison of flood volumes estimated by univariate and trivariate distributions.