An application of Genetic Programming (an evolutionary computational tool) without and with standardization data is presented with the aim of modeling the behavior of the water temperature in a river in terms of meteorological variables that are easily measured, to explore their explanatory power and to emphasize the utility of the standardization of variables in order to reduce the effect of those with large variance. Recorded data corresponding to the water temperature behavior at the Ebro River, Spain, are used as analysis case, showing a performance improvement on the developed model when data are standardized. This improvement is reflected in a reduction of the mean square error. Finally, the models obtained in this document were applied to estimate the water temperature in 2004, in order to provide evidence about their applicability to forecasting purposes.
Evolutionary computing has been widely used in hydraulics and hydrology, for example, the studies ofSavic et al. [
The motivation to work with models that allow the representation of water temperature behavior year after year is because each time a possible abnormal increase in this parameter occurs, the consequences and implications for the physical and chemical properties of water with their corresponding effects in aquatic life are numerous. Some models have been applied to maximum water temperatures by means of nonlinear relationships between air temperature and water temperature (Caissie et al. [
The field data used in this study were taken from the lower Ebro River, Spain. This river has a basin of 85 000 km2 and an average year inflow of 17 000 hm3 in natural regime. Three dams are located along the river (Figure
Station locations at the Ebro River, Spain.
Evolutionary algorithms, also known as Evolutionary Computation (EC), the optimization tool used in this work, use computational models of evolutionary processes in the design and implementation of computer-based problem solving. A general definition and classification of these evolutionary techniques is given in Bäck [
Program
In a similar way to that of natural evolution and heredity, these algorithms work on a population of
A typical genetic programming algorithm consists of a set of functions, which can involve arithmetic operators
A mathematical expression represented hierarchically by its parse tree.
The water from a river is in a constant heat exchange with its surroundings: the atmosphere and the river bed. This process may reach equilibrium so that the heat lost by the water equals that which is absorbed. Normally, the water temperature increases throughout the river in a natural state as the altitude decreases. To this spatial variation a double temporal variation is superimposed. In a river reach temperature varies following both a daily and an annual cycle.
In the study performed by Val [
The heat stored by a water mass as it moves along a river stretch of longitude
On the other hand, through an analysis of the historical behavior of the time variation of water temperature during consecutive years, similar results were observed, both in the cyclical variation and in the tendency to increase or decrease. This leads to an expectation of a correlation between the temperature variation in year
This background described led to the choice of the measured variables which were used in the prediction model.
Additionally, when physical variables are used to be fitted by means of genetic programming, several questions about the dimensionality of the problem could be made. But this problem can be solved considering the possible existence of dimension in the obtained constants of the calculated model. New physical interpretations of the related variables can be done by analyzing the model terms.
In this document, for simplicity, only four arithmetic operators were considered:
Twelve independent variables, one dependent variable, and a vector of real constants were selected. Thus, in the nonstandardized case the terminal set is
Tests were made with one hour, daily and weekly averaged water temperatures.
In the standardized case all the last variables are dimensionless.
The objective function considered in this problem was defined as the minimization of the mean square error between calculated and measured data:
The genetic programming algorithm was implemented in MATLAB (The MathWorks [
The variables were standardized by subtracting the mean and dividing by the standard deviation:
Variables with large variances tend to have a larger effect on the resulting model than those with small variances that can be also relevant. Standardized variables can then be advantageous in that their means are zero and their second moments (variances) are one.
Meteorological and water temperature data were taken in gauging stations installed in the Ebro River. Data consist of 10-minute averages of measurements taken every minute. Water temperatures were measured just downstream of the hydroelectric power plant of Flix. The meteorological variables were measured at the measuring station located on the Ribarroja Dam. The hourly average was calculated for all the variables and taken as input data: relative humidity (
The first experiment was carried out with the original data, and the second one with the standardized ones. GP parameter settings for both experiments are shown in Table
GP parameter settings.
Parameter | Value |
---|---|
Number of individuals | 250 |
Maximum number of nodes | 30 |
Maximum number of generations | 3000 |
Cross probability | 0.9 |
Mutation probability | 0.09 |
Node mutation probability | 0.03 |
In order to validate the applicability of the method, the correlation coefficient between measured and calculated data was obtained:
The genetic programming algorithm tendency is to produce relatively simple models. The equations produced in both experiments were
In order to get
For forecasting purposes, mean and standard deviations were estimated as follows:
Mean square error values.
Equation | MSE, ° |
---|---|
( |
9.4336 |
( |
6.4763 |
The mean (
Statistics of residuals.
Equation |
|
|
---|---|---|
( |
|
3.0716 |
( |
0.0230 | 2.5449 |
Water temperature values and residuals, experiment without standardization (hourly average values).
Water temperature values and residuals, experiment with standardization (hourly average values).
In Figures
Comparison between measured and estimated data (
Comparison between measured and estimated data (
In this case, the equations obtained without and with standardization were as follows:
By applying an inverse standardization process,
In (
Mean square error values. Daily average data.
Equation | MSE, ° |
---|---|
( |
8.279 |
( |
4.978 |
Statistics of residuals. Daily average data.
Equation |
|
|
---|---|---|
( |
0.0762 | 2.8802 |
( |
0.0213 | 2.2342 |
Water temperature variations against time and the obtained differences are plotted on Figures
Water temperature values and residuals, experiment without standardization (daily average values).
Water temperature values and residuals, experiment with standardization (daily average values).
Comparison between measured and estimated data (
Comparison between measured and estimated data (
Convergence of a genetic programming run.
In this last experiment, the equations obtained without and with standardization were
Equation (
Mean square errors and statistics of residuals appear in Tables
Mean square error values. Weekly average data.
Equation | MSE, ° |
---|---|
( |
4.538 |
( |
2.176 |
Statistics of the residuals. Weekly average data.
Equation |
|
|
---|---|---|
( |
0.0186 | 2.1509 |
( |
0.0239 | 1.4892 |
Water temperature values and residuals, experiment without standardization (weekly average values).
Water temperature values and residuals, experiment with standardization (weekly average values).
Comparison between measured and estimated data (
Comparison between measured and estimated data (
The results obtained for the weekly analysis show a reduction of 52% in the mean square error when data are previously standardized, and of about 31% reduction in the standard deviation of residuals. The correlation coefficient is also close to one.
The climatic daily data measured from 2002 to 2003 in Flix and Miravet stations were taken to estimate water temperature in the year 2004, in order to check the accuracy of models given by (
A mean square error of 49.549 and a correlation coefficient of 0.6744 were obtained by applying (
Measured and predicted water temperature for 2004, model without standardization (daily average values).
Comparison between measured and estimated data (
Measured and predicted water temperature for 2004, model with standardization (daily average values).
Comparison between measured and estimated data (
With both equations very big residuals for water temperature were obtained for some days of the estimated year.
Different models which allow the estimation of water temperature in the Ebro River in a given year were obtained, taking into account climatic variables measured in the same year, but also considering their variability in two previous years, in an attempt to explain the possible evolution of the water temperature behavior.
The GP algorithm considered as input hourly, daily, and weekly average measured data without and with standardization, in order to analyze the resulting equations when the shape of the input data varies from one form to another.
Intrinsically, measured data of water temperature and climatic variables have more oscillations in hourly average data than in daily or weekly average data. Particularly, in the experiment using hourly data, the GP algorithm amplifies the water temperature oscillations, probably because in the actual physical process, the oscillations of the climatic variables are filtered. Nevertheless, by using standardized data, mean square errors were lower than those without standardization, and a lower dispersion in data could be obtained. Similar situations occurred in the case of daily data.
According to the mean square errors, the standard deviation of residuals, and the correlation coefficient, when weekly data were considered, GP algorithms produced models more capable to follow the behavior of water temperature. This was particularly true for those models obtained with standardized data.
Therefore equations such as those obtained herein can be used as a first approximation to predict changes in water temperature when changes occur in climatic variables such as air temperature, wind speed, relative humidity, and solar radiation, all of which affect the water temperature as well as the physical and chemical water conditions, including the flora and fauna of a river.
When the models for daily data were applied in another year, lower correlations between measured and predicted data were obtained, particularly with the model that does not take into account standardized variables.
According to these results, it is feasible to obtain some improvements in generating water temperature models by means of genetic programming, when the standardization process is incorporated.
Results also show limits on the models developed herein; the models produced oscillations in the water temperatures that do not correspond to the measured data; the results of forecasting from 2004 are only fair. That is probably due to the fact that some variables included in the physical phenomena are eliminated, and the filtering that occurs in nature is not reproduced; nevertheless, these results are considered useful as a first-order explanation of a complex process. However future work is suggested to compare the proposed method with physically based ones.