Sediment Deposition Risk Analysis and PLSR Model Research for Cascade Reservoirs Upstream of the Yellow River

It is difficult to effectively identify and eliminate the multiple correlation influence among the independent factors by least-squares regression. Focusing on this insufficiency, the sediment deposition risk of cascade reservoirs and fitting model of sediment flux into the reservoir are studied. The partial least-squares regression (PLSR) method is adopted for modeling analysis; the model fitting is organically combined with the non-model-style data content analysis, so as to realize the regression model, data structure simplification, and multiple correlations analysis among factors; meanwhile the accuracy of the model is ensured through cross validity check.Themodeling analysis of sediment flux into the cascade reservoirs of Long-Liu section upstream of the Yellow River indicates that partial least-squares regression can effectively overcome the multiple correlation influence among factors, and the isolated factor variables have better ability to explain the physical cause of measured results.


Introduction
Sediment deposition and the balance problems [1,2] of silt erosion and deposition are bound to be faced in the development and construction of the reservoir project, and they are also the key technical problems which have to be properly solved during project planning, design, construction, and operation management. The Yellow River is a heavy sediment-laden river. A rational prediction of sediment production and ensuring enough sediment transport water are the basic conditions of river flow with sand into sea; otherwise the sedimentation may occur, raise the riverbed, and aggravate flood threat. For this reason, researchers have attached great importance to the sedimentation analysis of the Yellow River and the fitting, forecasting, and other scientific research problems for sediment deposition, sediment production, and water demand for sediment transport. Compared with a single reservoir, basin cascade development mode, the river flow is under the influences of all levels of reservoirs storage and water discharges so that the risk of reservoir sedimentation becomes more serious and the relation between silt erosion and deposition is more complex. After the completion of a reservoir, starting with its initial impoundment, sediment is accumulation in the upriver reservoir area, especially reservoir area by dam. Bed load sediment blocked by dam is gradually accumulating in the reservoir area; meanwhile suspended load sediment is gradually deposited because of the flow velocity's decrease [3]. Improperly desilting design may also cause severe sedimentation after impoundment [4]. Downstream of this reservoir, driven by economic returns, the discharged volume may be much lower than the demand for sediment transportation which causes the risk of sand sedimentation.
Therefore, research of the risk of sediment deposition, the amount of sediment, and the balance of silt erosion and deposition in river basin cascade development mode can optimize sediment configuration, protect the built reservoir's usable capacity, extend the life service of the reservoir, and operate cascade reservoirs correctly. This has far-reaching significance for the overall efficiency of the cascade development.
In the research on prediction model of sediment erosion's changes, the domestic and foreign researchers have achieved some achievement, and the prediction models can be roughly divided into three categories: first: conceptual model of sediment deposition, such as regression analysis, mathematical statistics method [5]; second: numerical sediment model, such as hydrodynamics, hydrology, and sediment mathematical model [6]; and third: black box sediment model, such as fuzzy mathematical method, the gray system method, and the ANN model [7]. Although each prediction model has a certain level of applicability, with the influence of mathematical method, optimization solution, and practical application, there are some inevitable limitations of each of them. In the modeling analysis of reservoir sediment storage, various multiple regression analyses adopted intensively nowadays were established based on the assumption that there is no close correlation between each model's factor. While the actual situation is among impact factors of reservoir sediment meteorology, engineering, and social and economic factors, usually multiple correlation and varying degrees of uncertainty are existing. The most conventional least-squares regression analysis cannot effectively overcome the impact of the multiple correlation and relevant uncertainties among various model factors; even if multiple correlation coefficient has a high accuracy index, it can only represent that fitting accuracy between the model and measured data is high, but not showing the model is really effective in the analysis of the influence on each variable factor. If the model results are directly used to operationally manage the sediment flux into the reservoir, it may take a great risk.
Partial least-squares regression (PLSR) is a new multivariate data analysis method presented from application areas and mainly adopted for regression modeling analysis of single or multiple dependent variables on the multiple independent variables; it can effectively solve many problems which ordinary regression cannot solve. PLSR can also organically combine the basic functions of regression modeling, principal component analysis, and canonical correlation analysis. Within an algorithm, it can achieve regression modeling data structure simplification and correlation analysis between the two groups of variables. PLSR has been widely applied in many areas. Chun and Keleş used PLSR for high-dimensional genomic data analysis and they proposed an efficient implementation of sparse partial least-squares regression and compared it with well-known variable selection and dimension reduction approaches via simulation experiments and proved the practical utility of sparse partial least-squares regression in a joint analysis of gene expression and genomewide binding data [8]. P. P. Roy and K. Roy explored the fact that the optimum variable selection strategy for PLSR used a model dataset of cytoprotection data [9]. Carrascal et al. used PLSR as an alternative to current regression method in ecological studies, which pointed out that PLSR was more reliable than other techniques in identifying relevant variables and their magnitudes of influence [10].
Based on analysis of the complex causal relationships between the amount of sediment and their influencing factors, this paper proposed to adopt PLSR method to eliminate interference caused by multiple correlations and uncertainties of influencing factors of sediment flux into the reservoir in order to achieve model fitting and prediction of sediment amount of the reservoir in the cascade development mode, providing reliable technical support for joint dispatching, optimization sediment configuration, and extending the life of reservoir.
The main contents of this paper are as follows. The second section is the risk analysis of sediment deposition influence under the mode of drainage basin cascade development. The third section introduces the modeling ideas and procedures of sediment flux into the reservoir PLSR model. The fourth section selects the most representative cascade reservoir upstream of the Yellow River as the modeling instantiated application and modeling analysis. Finally, the results obtained are discussed.

Analysis of Influencing Factors.
The important means to prevent siltation of the Yellow River is strengthening its scientific use and management to ensure the ecological base flow and sediment transport. The main factors influencing the Yellow River sediment deposition are as follows [11,12].
(1) There are natural factors, including weather condition, vegetation condition, terrain condition, geological condition, and the inflow water and sediment condition.
(2) There are engineering factors, including the impact of dam impoundment, the project construction, and the water storage of cascade reservoir.
(3) Social and economic factors: during social economic development, irrational use of the river water resources may lead to a lack of base flow, discontinuous flow, and the river flow less than sediment transport demand, which are important causes of sediment deposition.

Risk Analysis of Sediment Deposition.
The comprehensive analysis showed the risk of sedimentation basin cascade development mainly in the following aspects [13,14].
(1) There are environmental risks, reduction of arable land, soil desertification, fertility deterioration, and air quality deterioration.
(2) Project benefit and risk: the constructions of cascade reservoirs block the original natural river, which lead to the river's flow pattern changes and the multilevel noncontinuous river morphology. As the effect of reservoir sediment affects the rivers erosion and sediment, the balance of the original river sediment is destroyed. Sediment deposition will decrease flood control capacity and limited storage capacity and shorten the operation life of reservoir [15].
(3) Aggravating flood risks: during the flood, sedimentation will raise the flood level, which significantly increases the possibility of overrun of water height and river capacity under the same flow rate. Meanwhile, flood losses will be aggregated by the sedimentation in the flood storage region.
(4) Other risks: clogged or silted channel, shipping difficulty, aggravation of water turbine wear, impact on the safe operation of the hub, siltation range extended, increased upstream flooding losses, downstream river's erosion and deformation influenced by reservoir discharged water, and pollutants attached to sediment affect water quality in the reservoir and so on. should carry their variation information of their own data table as much as possible to maximize their relevance while extracting the two components [16,17].

Partial Least-Squared Regression
After extracting the first principal components 1 and 1 , make the regression of on 1 and on 1 , respectively; then use residual information to extract the main ingredients for the second round. Continue to implement the return of on 1 and on 1 until a satisfactory accuracy is achieved [18]. Ultimately extract the components 1 , 2 , . . . , from ; the regression of on 1 , 2 , . . . , can then be transformed into the regression equation of ( = 1, 2, . . . , ) on the original variables 1 , . . . , ; this completes the partial leastsquared regression modeling.

PLSR Model of Sediment Storage Volume in Cascade Development Reservoirs.
Denoting dependent variables (sediment storage volume) by ∈ , then the independent variable set is = [ 1 , 2 , . . . , ], ∈ . The standardized independent variables matrix of is denoted as 0 , and the standardized dependent variables matrix of is denoted as 0 : where is the mean value of and is the standard deviation error of .
In the first step, the unit vectors 0 and 0 are known; for the single dependent variable , its component 1 equals 0 ; then . . . , where ( , ) is the simple correlation coefficient.
If so, the optimization value of objective function is The regression values of 0 and 0 on 1 are calculated, respectively; where 1 and 1 are residual matrices, 1 and 1 are regression coefficients, and 1 is invariant In the second step replace 0 , 0 with 1 , 1 , respectively. Repeat the first step to implement the regression of 1 and 1 on 2 and obtain the regression coefficients 2 and 2 . Conduct partial least-squares regression and so on.

Factors Multicorrelation Analysis.
According to principal component analysis, under the premise of the sediment storage data with minimal loss, make dimension reduction for high-dimensional data systems; when the originaldimensional data is reduced to two-dimensional system,  the location of all the sample points can be drawn out on the two-dimensional plan, which can be directly observed as similar structure of -dimensional space. After establishing the PLSR model, use the extracted components 1 and 1 directly, do a chart view of 1 / 1 , make ( 1 ( ), 1 ( )) as coordinate points of the sample points, and then plot the location of all sample points to analyze distribution and structural similarity of sample points in high-dimensional space.
According to the theory and the specific steps of PLSR modeling, we can use the MATLAB language to compile the PLSR analysis program of sediment storage in cascade development.

Basic Information.
The upper reaches of the Yellow River between Longyangxia and Liujiaxia section mainly have six large-sized cascade hydropower stations, which are Longyangxia, Laxiwa, Lijiaxia, Gongboxia, Jishixia, and Liujiaxia from upstream to downstream, as shown in Figure 1. Longyangxia hydropower station is the leading hydropower station in the main stream of the Yellow River, with a total reservoir capacity of 274 × 10 8 m 3 and less sediment concentration, at which control basin is one of the few large-scale runoff regional stabilities and is located in 1687 km from the source of the Yellow River. Jishixia hydropower station is fifth large-scale cascade hydropower station in the Long-Liu segment of the Yellow River, with the dam located at the outlet of Jishixia canyon in the Yellow River, up from Gongboxia hydropower station about 55 km, down from Liujiaxia hydropower station about 93 km. It is a daily regulating reservoir with the total capacity of 2.94 × 10 8 m 3 [19]. The basic statistics of other cascade hydropower stations in Long-Liu section of the Yellow River are showed in Table 1.
Each cascade hydropower station in the Long-Liu section is distributed between Tangnaihai and Xunhua Hydrometrical Stations. Tangnaihai Hydrometrical Station is located along the eastern edge of the Qinghai-Tibet Plateau, the boundary between the natural runoff and manually adjusted Mathematical Problems in Engineering 5  [20]. Xunhua Hydrometrical Station is the control station in Qinghai Province, as storage station of Jishixia reservoir, with 145,459 km 2 control area. Before the year of 1985 the annual runoff was not affected by human activities, but, after 1986, with the building of the reservoirs such as Longyangxia Reservoir, the measured data has included and reflected the impact of the upstream cascade regulating reservoirs. The measured water and sediment statistics of Tangnaihai and Xunhua Hydrometrical Stations are shown in Tables 2∼4.

Factor Selection and PLSR Model Establishment.
Making an exact fitting about the amount of sediment in the reservoir is significant for reasonable prediction of reservoir sedimentation and scientifically determines water demand for sediment. Due to multi-correlations among each factor (such as rainfall amount, natural runoff, sediment transport rate, the amount of water discharged from upper reservoir, water level of the upper reservoir, water level of the downstream reservoir, water temperature and other factors), which influence sediment flux into each cascade reservoir, so adopting the PLSR model established before to fitting. Main factors affecting the sediment storage are rainfall amount within the drainage area, runoff, upstream water level and the discharging water, the backup water level of downstream reservoir and discharged volume of this level's reservoir, natural river runoff, water temperature, sediment transport rate, and so on. For this reason, in the mode of cascade development, the PLSR model of sediment quantity is composed of runoff factor component , sediment discharge factor component , rainfall factor component , water level factor component , water temperature factor component , discharge water factor component , the constant 0 , and so forth. That is, = + + + + + + 0 .
According to (8) to reform the above equation into formula (9), we can get the PLSR model expression of sediment storage in the Long-Liu section of the Yellow River under cascade development mode: wherê * is fitted values of sediment storage, * is various influencing factors affecting sediment storage, is the partial least-squares regression coefficients corresponding to each * , 0 is a constant, and represents the amount of sediment storage factor. It is necessary to consider actual state to select factors. For different time periods, cascade reservoirs in different locations, the selected model factors may be different.

Fitting Results and Comparative Analysis of PLSR Model.
Considering the influence on the downstream channel and the reservoir sediment transportation caused by water storage of Longyangxia, this paper takes the Longyangxia and Liujiaxia reservoirs as example of built PLSR fitting model of sediment storage. Selecting the monthly average sediment storage capacity from January 1982 to December 1987 as the fitting time, including important periods from the Longyangxia Reservoir, water level gradually increased to the normal water level during initial impoundment so that it can better analyze the influence that water storage of Longyangxia Reservoir for sediment flux into downstream reservoir. A PLSR fitting model for sediment storage of Longyangxia and Liujiaxia Reservoirs is shown in Figures 2 and 3. Multicorrelation analysis of factors: take rainfall factor set and transport rate factor set as example, extract the first components 1 ( ) and 1 ( ), and draw plan of 1 − 1 , as shown in Figure 4. As can be seen, compositions 1 ( ) and 1 ( ) appear in roughly linear correlation changes, which suggests that between rainfall factor and transport rate factor of influence reservoir sediment exists in a complex correlation. In this case, the conventional least-squares regression is difficult to separate pressure, temperature, rainfall, and aging components; thus using PLSR to do modeling analysis is more rational and effective.
According to Figures 2 and 3, the following can be found.
(1) Focusing on the PLSR model of sediment storage of the cascade reservoirs in the Long-Liu section, it largely overcame the multicorrelation between the model factors and can effectively separate the various influencing factors. Meanwhile, model fitting has high precise results that can   better reflect the actual changes of sediment storage and explain the impact of each factor on the amount of storage reservoir sediment.
(2) The fitting sequence is made of monthly average sediment storage of Longyangxia and Liujiaxia Reservoirs, whose sediment storage in the previous 60 months is large while, due to the influence of water impoundment by Longyangxia, (corresponding to those in 1987), the incoming-reservoir sediment production decreases during later 60 months, which indicates that the leading reservoir's impoundment has an important influence on the cascade development of river sediment transport.
(3) The sediment storage is larger in the flood season every year, while it decreases obviously in dry seasons. It is characterized by cyclical changes significantly and is in line with the general rules of sediment yield and transport.
(4) The main factors affecting the amount of sediment storage are the changes of transport rate, and its model factor had a large impact on maximum amount of sediment storage; additionally, the influence of runoff factor , water factor , and discharge water factor is relatively large, while the impact of rainfall and temperature factors and is relatively small.

Conclusions
Under the cascade development mode, river flow is influenced by the reservoir impoundment of each cascade and discharge flow, reservoir sedimentation encounters greater risks; the relations between the erosion and deposition become more complicated. For this reason, on the basis of systematic analysis for sediment deposition risk of cascade development in the upstream of the Yellow River's, mainly studied the model and fitting method of sediment flux into the reservoir under cascade development mode.
Considering the difficulty to effectively identify and eliminate the multiple correlation influence among the independent factors by least-squares regression, the partial leastsquares regression (PLSR) method is adopted for modeling analysis; the model fitting is organically combined with the non-model-style data content analysis, so as to realize the regression model, data structure simplification, and multiple correlations analysis among factors; meanwhile the accuracy of the model is ensured through cross validity check.
The modeling analysis of sediment flux into the cascade reservoirs of Long-Liu section upstream of the Yellow River indicates that partial least-squares regression can effectively overcome the multiple correlation influence among factors, and the isolated factor variables have better ability to explain the physical cause of measured results.