Energy consumption forecasting (ECF) is an important policy issue in today’s economies. An accurate ECF has great benefits for electric utilities and both negative and positive errors lead to increased operating costs. The paper proposes a semantic based genetic programming framework to address the ECF problem. In particular, we propose a system that finds (quasi)perfect solutions with high probability and that generates models able to produce near optimal predictions also on unseen data. The framework blends a recently developed version of genetic programming that integrates semantic genetic operators with a local search method. The main idea in combining semantic genetic programming and a local searcher is to couple the exploration ability of the former with the exploitation ability of the latter. Experimental results confirm the suitability of the proposed method in predicting the energy consumption. In particular, the system produces a lower error with respect to the existing stateofthe art techniques used on the same dataset. More importantly, this case study has shown that including a local searcher in the geometric semantic genetic programming system can speed up the search process and can result in fitter models that are able to produce an accurate forecasting also on unseen data.
As reported in [
Given the high complexity of ECF, the iterative process of stepwise improvement of solutions that characterizes many CI methods often gets stuck in local optima, stagnating the search for better solutions.
Several existing forecasting methods are not able to deal with nonlinearity and other difficulties in modelling of time series. Hence, the performance of these models on test data are not as good as the one achievable on the training data.
The objective of the work presented in this paper is to fill this gap by developing a groundbreaking Genetic Programming (GP) system that (i) finds (quasi)perfect solutions with high probability (no error on training data) and (ii) generates models able to produce near optimal predictions also on unseen data (test instances). GP is one of the most successful existing CI methods. In the last years, it has obtained excellent results on a large number of complex reallife applications [
The paper is organized as follows. Section
This section describes the components of the proposed computational intelligence system designed for the ECF problem. In particular, Section
Despite the large number of humancompetitive results achieved by GP [
In this section, we briefly describe the definition of the geometric semantic operators (GSOs) proposed by Moraglio and coauthors [
Given two parent functions
Given a parent (as in [
The interested reader is referred to [
In this work, we integrate a local search (LS) strategy within GSGP. In particular, we include a local searcher within the GSM mutation operator, since previous works have shown that GSGP achieves its best performance using only mutation [
This section describes the data, the experimental settings, and the obtained results for the ECF problem.
Historical energy consumption data and weather information in Italy in the years between
Four different GP systems were compared: standard GP (STGP) that uses the standard syntaxbased genetic operators also considered in [
Regarding the four GP systems, all the runs used populations of
For all the considered techniques we studied the obtained performance over two different measures of error. In particular, these two measures are the mean absolute error (MAE) and the mean square error (MSE). The definition of these error measures is as follows:
In the next section, the obtained experimental results are reported using curves of the median error on the training and test set. In particular, at each generation the best individual in the population (i.e., the one that has the smaller training error) has been chosen and the value of its error on the training and test sets has been stored. The reported curves finally contain the median of all these values collected at each generation. The median was preferred over the mean in the reported plots because of its higher robustness to outliers. The results discussed in the next section have been obtained using the GSGP implementation freely available at
Figure
Training (plots (a) and (c)) and test (plots (b) and (d)) error for MAE (plots (a) and (b)) and MSE (plots (c) and (d)). The plots show the median over
To analyze the statistical significance of these results, a set of tests has been performed on the median errors. In particular, we want to assess whether the final results (generation
Training  Test  

LSGP  STGP  HYBRID  LSGP  STGP  HYBRID  
MAE  
GSGP 






HYBRID 


— 


— 
STGP 

—  — 

—  — 


MSE  
GSGP 






HYBRID 


— 


— 
STGP 

—  — 

—  — 
To conclude the analysis of the experimental results, Table
Execution time (seconds) of the considered GP systems. Median and standard deviation calculated over
MAE  MSE  

GSGP  HYBRID  LSGP  STGP  GSGP  HYBRID  LSGP  STGP  
Execution time  2.22  2.32  2.35  3.94  2.19  2.2  2.36  4.2 
Standard dev.  0.12  0.11  0.12  0.83  0.14  0.13  0.13  0.78 
Electricity consumption forecasting (ECF) is important for the power industry, especially in the context of the ongoing deregulation of the electricity market. Proper demand forecasts help the market participants to maximize their profits and/or reduce their possible losses by preparing an appropriate bidding strategy. In this study, the ECF problem has been considered and in order to address it a computational intelligence technique has been proposed. The proposed system is based on a variant of the Genetic Programming (GP) algorithm. In particular, the GP system uses particular genetic operators that, differently from the standard genetic operators used in GP, work on the semantics of the solutions. While the use of semantic methods in GP has been successfully investigated and applied, several important problems that do not allow us to efficiently use these methods are still open. In particular, the GP system that uses the semantic operators (GSGP) requires a huge amount of generations to converge towards optimal solutions and, moreover, by producing a (quasi) optimal fitting of the training data it often generates solutions that are not able to generalize well over unseen instances. Under this light, the contribution of this work consists in integrating the GSGP framework with a local search optimizer. The use of a local searcher is motivated by the improvement of convergence speed of GSGP towards good quality solutions. Thus, by combining the exploration ability of GSGP with the exploitation ability of a local search method we expect to find good quality solutions in a small number of generations, hence avoiding the excessive specialization of a model on the training instances and, consequently, overfitting.
To validate the proposed system, called LSGP, an extensive experimental analysis has been performed, considering electricity consumption data that cover the period 1999–2010 in the Italian territory. We tested three semanticbased GP systems (GSGP, HYBRID, and LSGP) and a standard, syntaxbased, GP system (STGP). GSGP is the GP system that uses the geometric semantic mutation defined in [
To summarize, the paper provides two contributions: from the point of view of the energy consumption forecasting, a system that is able to outperform the existing stateoftheart technique has been defined; from the machine learning perspective, this case study has shown that including a local searcher in the geometric semantic GP system can speed up the convergence of the search process, without a corresponding overfitting of training data. We hope that this contribution will pave the way for further research on these topics.
Genetic Programming (GP) is one of the techniques that belong to the computational intelligence research area called evolutionary computation. GP consists in the automated learning of computer programs by means of a process inspired by biological evolution [
The GP algorithm.
Hence, the recipe for solving a problem with GP is as follows.
Choose a representation space in which candidate solutions can be specified. This consists of deciding on the primitives of the programming language that will be used to construct programs. A program is built up from a terminal set (the variables in the problem and, optionally, a set of constant values) and a function set (the primitive operators).
Design the fitness criteria for evaluating the quality of a solution. This involves the execution of a candidate solution on a suite of test cases, reminiscent of the process of blackbox testing. In case of supervised learning, a distancebased function is employed to quantify the divergence of a candidate’s behavior from the desired one.
Design a parent selection and replacement policy. Central to every evolutionary algorithm is the concept of fitnessdriven selection in order to exert an evolutionary pressure towards promising areas of the program space. The replacement policy determines the way in which newly created offspring programs replace their parents in the population.
Design a variation mechanism for generating offspring from a parent or a set of parents. Standard GP uses two main variation operators: crossover and mutation. Crossover recombines parts of the structure of two individuals, whereas mutation stochastically alters a portion of the structure of an individual.
After a random initialization of a population of computer programs, an iterative application of selectionvariationreplacement is employed to improve the programs quality in a stepwise refinement way.
In order to transform a population, GP uses genetic operators. Considering the common tree representation of GP individuals, the standard genetic operators (crossover and mutation) act on the structure of the trees that represent the candidate solutions. In other terms, standard genetic operators act on the syntax of the programs. In this paper, we used genetic operators that, differently from the standard ones, are able to act at the semantic level. The definition of semantics used in this work is the one also proposed in [
However, to understand the differences between the genetic operators used in this work and the ones used in the standard GP algorithm, the latter are briefly described. The standard crossover operator is traditionally used to combine the genetic material of two parents by swapping a part of one parent with a part of the other. More in detail, after choosing two individuals based on their fitness, the crossover operator performs the following operations: (1) it selects a random subtree in each parent and (2) swaps the selected subtrees between the two parents (the resulting individuals are the children). The mutation operator introduces random changes in the structures of the individuals in the population. The most wellknown mutation operator, called subtree mutation, works as follows: (1) it randomly selects a point in a tree, (2) it removes whatever is currently at the selected point and whatever is below the selected point, and (3) it inserts a randomly generated tree at that point. This operation is controlled by a parameter that specifies the maximum size (usually measured in terms of tree depth) for the newly created subtree that is to be inserted.
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors acknowledge Project MassGP (PTDC/EEICTP/2975/2012), FCT, Portugal.