Application of Fuzzy and Conventional Forecasting Techniques to Predict Energy Consumption in Buildings

. Tis paper presents the implementation and analysis of two approaches (fuzzy and conventional). Using hourly data from buildings at the University of Granada, we have examined their electricity demand and designed a model to predict energy consumption. Our proposal was conducted with the aid of time series techniques as well as the combination of artifcial neural networks and clustering algorithms. Both approaches proved to be suitable for energy modelling although nonfuzzy models provided more variability and less robustness than fuzzy ones. Despite the relatively small diference between fuzzy and nonfuzzy estimates, the results reported in this study show that the fuzzy solution may be useful to enhance and enrich energy predictions.


Introduction
Electricity is one of the most important inventions science has conferred on humanity.It has become an essential aspect of people's work and day-to-day life.Today, electricity is a pivotal source of energy, and its growing usage worldwide is bringing new challenges in the energy efciency feld.Besides, the recent advances in technology are providing us with a vast amount of information that is not easily treatable for its heterogeneity [1].Nonetheless, even though our society tends towards more sustainable development, it is not a trivial task to create tools for the accurate treatment and monitoring of energy [2,3].Tus, being prepared for the future may be a key to solve energy waste and adequate energy efciency in our buildings.
Energy consumption forecasting is a critical feature for environmentally friendly buildings as well as an efective strategy to decrease energy consumption and its associated gas emissions along with the resulting economic impact [4,5].As a result, energy demand forecasting has been addressed in many scenarios so far [3,[6][7][8][9].Since this problem has in nature historical-oriented data, i.e., we always attempt to fnd dependencies between past values to model future ones, most of the authors employ time-series techniques to handle it.Plus, a variant that is gaining in popularity is the combination of fuzzy logic with time-series methods [2,[10][11][12][13][14][15][16].
What makes the fuzzy time series suitable for these sorts of problems is its capability to improve the comprehension of the models.Tat is to say, fuzzy logic provides us with a description of the data in linguistic variables, i.e., by words instead of numerically.Nonetheless, the defnition of the fuzzy sets requires introducing a new parameter and, therefore, more complexity to the solution, which is the number of intervals.Originally, some authors defned that the best number of intervals should be seven with a constant length [2,3].However, the researchers soon realized that it afected the predictive capacity of the model [17].
Today, these intervals are defned mainly by optimization algorithms.In [18], the authors presented a fuzzy solution for big data using time-series techniques.Te authors implemented an automatic clustering algorithm to group the historical data into intervals of diferent lengths.Teir models outperformed classical methods bringing with them several advantages: easy-to-implement, accuracy, and interpretability.Additionally, some authors have incorporated neural networks into their proposals, enhancing, even more, their estimates.Bas et al. [19] employed an artifcial neural network to determine fuzzy relationships to improve the accuracy of the forecasting performance.Cagcag Yolcu and Lam [20] combined a robust approach for fuzzy time series by analysing how the prediction performance of the models is afected by the outliers.Teir results were more accurate and robust.It is important to note that the authors of the previous two studies predicted directly using neural networks.Tis approach will be followed in our study in order to compare our results with the reference series.
Many other approaches have been suggested in the literature to solve energy demand prediction [17], starting with the traditional ARIMA approach [21][22][23][24] and moving towards more advanced deep learning techniques [25][26][27][28][29][30][31].Other research works are by Pérez-Chacón et al. [32] with their algorithm to predict big data time series based on a pattern sequence method.Tey used data from Uruguay's electricity demand to validate their solution.An interesting COVID-inspired algorithm was proposed by Martínez-Álvarez et al. [33] who used electricity load time series as an application case, showing outstanding performance.Other hybrid algorithms have been proposed by Ruiz et al. [34] in which the authors combine a memetic algorithm with recurrent neural networks to predict energy consumption in public buildings.An ensemble of several predictive models was introduced in [35] where three machine learning algorithms were used (decision trees, gradient boosted trees, and random forest).Teir combination successfully outperformed other big data time-series solutions.
We can also mention some applications of fuzzy logic to time series [10, 17-20, 36, 37].Some research as to a combination of deep learning and fuzzy time series is proposed in [18].Te authors implemented a LSTM-based forecasting model to predict energy consumption.Tey utilised the fuzzy rules to create preliminary estimates that were used to support the fnal prediction and to modify the learning process.In [17], we can fnd another hybrid forecasting system based on fuzzy time series for wind speed estimation.Here, the fuzzy time-series method was used to optimise a multiobjective algorithm to balance the confict between accuracy and stability.Other similar studies can be cited like the convolutional neural networks of Sadaei et al. [36] or the integration of heuristics for renewable energy forecasting in [38].Nonetheless, all the authors agree on the same point, the accuracy of the models using fuzzy logic is not good enough, and they use it only as a complement to their solutions.
Following the philosophy of the previous studies, the present work pursues to implement and compare several forecasting techniques to predict energy consumption in public buildings, more specifcally, at the University of Granada.What motivates our study is the lack of approaches that exploits the use of fuzzy logic to predict energy consumption.Te main advantage of applying fuzzy logic is that it provides us with extra information which can be interpreted as justifying changes in consumption.It may give rules such as «on Monday in the morning, the consumption in summer is low», and opposite to conventional approaches, the latter cannot provide such information.Nonetheless, these fuzzy-oriented models have the drawback of being less accurate than the numerical ones.Tis fact is somehow understandable as the fuzzy rules attempt to join information.Terefore, our frst goal is to implement a solution using fuzzy systems and optimise them so as to get a comparable precision.To do so, we propose a hybrid method of fuzzy time series and clustering algorithms.Te rest of the paper is structured as follows.Section 2 describes the proposed methodology, data used, and its treatment along with the methods applied.Section 3 introduces the experiments conducted.Section 4 gathers the main results obtained in this study.And Section 5 summarises the conclusions attained.

Methodology
Tis section is pivotal to properly understand the rest of the study along with the decisions made throughout this research.As a general overview of the steps followed in this research, we can examine Figure 1.First, we obtain the energy information directly from the meters, which may present missing values, errors, or other problems.After that, we cleaned, processed, and selected the data we planned to use for comparison.Since our frst aim was to compare the fuzzy implementations with the conventional ones, we carried out the nonfuzzy predictions and tested several parameters and granularities so as to get an advance of the estimates' behaviour.Ten, we selected the appropriate granularity and settled several considerations and applied the fuzzy approach.Both, fuzzy and nonfuzzy results were stored for a fnal comparison and analysis.Finally, we draw some interesting conclusions from our results altogether.

Dataset.
First of all, prior to defning the bulk of our methodology, it is important to know the data we are training with.Te time series in hand belongs to meters from the University of Granada.Te measurements were taken on an hourly basis and are expressed in kWh.Most of the series comprises 5 years of data, from 2013 to 2018.Some present slightly smaller sizes owing to the date of installation of the metering systems and therefore the start of the sample collection.
Our data consist of a set of meters.Each device measures several buildings.Accordingly, the time series and building distribution are exposed in Table 1.For privacy reasons, the name of the buildings is not shown.
We selected the series according to the quality of the data.We utilised 10 series out of 20 available.What motivated this decision was the quality of the data and the likeliness of the series.In other words, the sheer number of missing values in some cases was excessively high, and as an objective criterion, we discarded the ones with the higher absence of data.Besides, it also prevented biased results by the imputation of missing values.
In many cases, empty registers can be found in the middle of the series presenting more than 6 months of empty records, with some exceptions such as the one displayed in Figure 2. Te fgure represents the energy 2 International Journal of Intelligent Systems consumption of one of the buildings during the morning period.It can easily be seen that the motive of our choice is that, from 2013 to 2017, we do not have any relevant information except for a little more than a year.It is rather straightforward to see how the outlier frontier was not adjusted properly as most of the consumption is outside.Also, the lack of registers with a correct record makes the mean and median well below the actual value.Opposite to this, we have Figure 3.
As we mentioned before, our data are collected hourly and consequently, we split the series into three-time slots: morning, between 8 h and 15 h; evening, from 16 h to 22 h; and night, starting at 22 h until 6 h.Tose intervals were chosen according to the morning and afternoon shifts, taking into account the classes.An example has been already presented in Figure 2 with the morning series of the building.Another representative series we selected is depicted in Figure 3.In this case, in contrast to Figure 2, we can appreciate a more normalised behaviour in consumption.Most of the records were properly collected with some exceptions (see the frst month of 2017).

Data Preparation.
After describing the data we are using, we must mention how we preprocessed the data for our models.Te data used for our workfow are made of 10 energy time series on an hourly basis from meters at the University of Granada.Tese data were transformed into

Raw data
In doing this, we created a 5-element decomposition of each series and thus a set of 50 sets of data.We will apply a fuzzy treatment to each of these new series with the clustering algorithms.Tey will be used in the forecasting stage as well.Furthermore, it will generate a fuzzy time series for each clustering method.In other words, 150 fuzzy series are to be added to the 50 previously created.As a result, we obtained 200 time-series for each forecasting method.

Methods Applied.
Te current section introduces all the methods we have applied to the aforementioned data already prepared.In this work, we implemented two well-known machine learning techniques, multilayer perceptron (MLP) and long short-term memory (LSTM) neural networks, along with three clustering methods, namely, k-Means (kM), density-based spatial clustering of applications with noise (DBSCAN, DB in short), and hierarchical clustering (HC).Te two frst methods were used as a predictor in both fuzzy and conventional approaches, and the other three were used to defne the number of intervals for the fuzzy sets using the triangular membership function.
MLP is a type of feed-forward neural network.Its structure is mainly composed of three layers: input, hidden, and output layer.Te input takes the data to be processed, and the hidden layers get the results from the previous layer and pass the information to the output layer.Each layer has several neurons that use an activation function so as to move the computations onward through a particular value (or weight) between two neurons.In this case, the information is processed in a forward fashion, i.e., from input to output [3].
On the other side, we adopted the LSTM.In contrast to the MLP, where the data fow from back to front, LSTM has recurrent connections allowing them to move the information back and forth.In this way, feedback from other layers is provided [17].Te choice to employ this model was based not only on the wide range of successful applications [26,34,39,40] but also on its great fexibility and adaptation when solving problems.
Te frst algorithm we implemented for procuring the fuzzy variables was kM [18].Tis is one of the most popular techniques based on dividing the data into k groups.An iterative procedure assigns randomly k points as centres (or centroids).Ten, each sample is linked to a particular group that minimises the error.Once all the points have been associated with a cluster, it recalculates new centroids as the mean of the member points.Tis process is iteratively repeated until certain stop criteria are fulflled.
Te second clustering algorithm is DB [36] that, as its name indicates, is based on the detection of communities via density features.Te defnition of community has two parameters: the number of instances and ϵ, being the latter the distance needed to be considered in the vicinity of one cluster.Tis feature is rather interesting as those points far enough from all the centres are considered outliers.
Lastly, we implemented agglomerative or hierarchical clustering (HC) [38].In this technique, a distance metric is defned, frst of all, as being plausible a metric with a clustercluster or cluster-sample basis.In this way, all the points are isolated and they progressively come together to the closest cluster/sample creating new groups.Conceptually, it builds a tree-based structure where the leaves represent the initial data and each branch a specifc cluster.
For the fuzzifcation process, we implemented a Sugenobased inference system.It has multiple inputs and just one output.In our implementation of the inference system, the ANNs act as a black box generating functions to the related inputs with outputs instead of using directly the interpretable rules.Besides, we will be able to prevent the defuzzifcation phase through the ANNs which leads us to a more fexible approach.We utilised a triangular membership function like the one displayed in Figure 4. Tis function allows us to get the membership degree to each of the fuzzy sets.To this end, it is needed to know the limits and the central point that will defne each fuzzy set.For instance, the limits in the red set are 0 and 60 kWh and the central point is 20.
Ten, having a time series t 1,n of n values and a set C of m centroids, we can defne a triangular function for each c ∈ C. In doing so, we will obtain a matrix t m,n with the membership degrees of each centroid.
We should thoroughly take the number of lags l (previous values for prediction) of the time series as this will turn the original series into l • m columns to be predicted, thereby increasing the complexity of the problem.
Tese functions are defned by the centroid (e.g., k-Means) or by the mean of each cluster (e.g., DBScan).Te points that are not in the min and max of the distribution were classifed as outliers.Te edges were built as follows.Te cluster with the smallest values starts at 0 or the minimum value of the cluster minus the standard deviation.In the case of the cluster with the biggest values, it ends in the value of the biggest value plus a standard deviation.In doing this, we prevented the appearance of undetected outliers in which their membership degrees were 0 and had a very high (or low) consumption while they remained undetected.Te outliers amongst clusters were eliminated by maximising the Silhouette coefcient.Te outliers were processed the same way in all the implemented algorithms.Tey were detected by enlarging the edges of the distribution function or when they fall into two diferent membership functions.
Finally, the representation of the proposed workfow is depicted in Figure 5. First, the data are treated by the clustering algorithm in the fuzzifcation process providing the membership degrees and then the news information is given to the forecasting model so as to obtain the predicted value after defuzzifcation.Te use of the proposed forecasting models allows us to decide as to which output will be provided, the fuzzy representation or the numerical one as is shown in the fgure.Bearing in mind that the defuzzifcation phase is not as important as the fuzzifcation part of the implementation of our Sugeno-type inference system, this stage is implicitly included in the ANNs that translate the membership values into a straight prediction.

4
International Journal of Intelligent Systems

Experiments
Te experiments conducted in this study are detailed in the following paragraphs to provide a proper understanding of the technologies adopted and the design of the trials.Tis research project has been entirely developed in Python 3.7.We employed four widely used libraries, namely, Scikit-Learn, Keras, Pandas, and Bokeh.Scikit-Learn was used to implement MLP and the clustering algorithms, DB, HC, and kM.LSTM was implemented in Keras.Pandas was used to manipulate our data and process the information.Finally, Bokeh was utilised as a means for depicting our results.
We designed our experiments in several stages as can be seen in Figure 6.Te frst is data collection and clustering analysis based on the Silhouette coefcient.Second, we predicted the entire series and analysed the nonfuzzy approaches and their results.It was examined both daily and hourly granularities.Tird, we studied the performance of the fuzzy-based solutions.Finally, we contrasted both fuzzy and conventional solutions.

Results
Te results obtained from the prediction of the 200 timeseries generated are presented in this section and discussed in the next one.Tey will be analysed according to the forecasting method used.For simplicity reasons, we will introduce a summary of the most remarkable outcomes; otherwise, it would make it difcult for the reader to follow the discussion.
As we can deduce from Figure 5, the frst experiments we designed were in the fuzzifcation part.We had to set a number of clusters so as to get how many functions we will use in the next stage.We selected the Silhouette coefcient and selected the best values accordingly.Interestingly, in nearly none of the tests, the best number of clusters surpassed three.We skip these results considering them of no great signifcance as they do not provide much further information to the reader.We will discuss this fact in the next section.Having said that, let us introduce the most remarkable experiments we thought out.
Since we intend to compare two approaches, a fuzzybased solution and the numerical one (or the nonfuzzy approach), we present Table 2 frst, which gather the results for the entire series on both a daily and an hourly basis.
In view of the fact that the fuzzy table with all the models and experiments would be too large, we are going to split its content into diferent tables and highlight the most significant results.Table 3 compares the clustering methods we applied when using the MLP neural network.We implemented three clustering algorithms, DBScan (DB), hierarchical clustering (HC), and k-Means (kM).
As an example of the prediction performed by one of the models, we can see Figure 7.We do not put all the series together (the two models, LSTM and MLP) in order to make it easy to discern what is happening.Although some of them follow the trend of the original time-series, others cannot ft that well.In this piece of series, it is interesting to see how it is not clear to tell from the fgure whether DB is the algorithm with the highest error, but mathematically it is.
Similarly, we conducted the same experiments but using LSTM.We can see the metrics obtained in Table 4.
Table 5 presents the results obtained after adjusting our models with the hourly time series.We compared the three clustering algorithms, DB, HC, and kM as well.
As we mentioned before, we do not want to make it difcult for readers to follow our study; for this reason, we will skip some results and we will go straight to the International Journal of Intelligent Systems comparison table between fuzzy and nonfuzzy approaches.Since MLP has shown better error, Table 6 gathers the results with this model and compares the best results of the fuzzy solution versus the one without fuzzy logic applied.
As an illustrative example of the predictions, Figure 8 shows both conventional (blue series) and fuzzy solutions in the same graph.Tis chart presents the evolution of consumption on a daily basis.

Discussion
Tis section brings together the most signifcant results we achieved from our experiments.As we mentioned in the second paragraph of the previous section, we would like to point out that the number of clusters was chosen by maximizing the Silhouette coefcient and selecting the most appropriate value accordingly.Each cluster stands for a membership function of its own.Most of the experiments we carried out provided three as the optimal value although there were some experiments in which the number of clusters exceeded three according to our selection criteria.Having this into consideration, the tags in our univariate time-series defne a linguistic variable.For the three-cluster case, we would have «low», «medium», and «high» consumption.Similarly, given fve clusters, we tagged them as «very low», «low», «medium», «high», and «very high» consumption.Knowing that the linguistic variables given to each of the electricity meters were based on their consumption as the use of each building difers, it was reasonable to assume that the variables should behave uniformly in our scenario, not adding much information on their own by the name.
Ten, we have to comment on some important aspects of Table 2. Te table puts together the results of our models for both daily and hourly granularities.From the table, we can draw some conclusions.Te frst one is that virtually all the experiments revealed MLP as the best predictor in both granularities.Nevertheless, according to the MAE, only 6 out of 20 LSTM turned out to be better than MLP in the daily 6 International Journal of Intelligent Systems one and 9 in the other set of experiments.However, we can see how the R 2 metric provides a much worse value in those cases.Additionally, the diference between RMSE and the MAE in such tests resulted to be higher than MLP's which gives us the hint that MLP is having more robust estimates.It is remarkable to note that both models enhanced their predictions in terms of the R 2 metric when working with the hourly data.Tis can be a result of the amount of data available in such a case.Taking into account the errors made by the predictive models and both granularities, the next step is to implement the clustering methods.Tus, Tables 3 and 4 compare each clustering algorithm using MLP and LSTM, respectively.From Table 3 we might discard the use of DB as it did not attain the best adjustment in any case.Furthermore, DB got the third position according to all the metrics we used.Regarding HC and kM, they achieved very similar results.However, HC outperforms kM in 60% of the cases for the RMSE and R 2 metrics, and only for the MAE, they obtained 50% each.And what stands out in Table 4 is that DB maintains its last position and only in M6 outperformed its rivals which had a quite low R 2 .In this case, 5 out of 10 tests give HC the best performance by RMSE and R 2 , and the remaining 4 were for the kM algorithm.Te most interesting aspect of Table 4 is that 7 out of 10 kM's MAE were better than HC's, of which 4 repeated their behaviour from the MLP model (meters 2, 5, 7, and 8; see Table 3).
Next, we have to pay attention to Table 5 which has the metrics once trained our predictive models using the hourly time-series.A closer inspection of the table shows a significant improvement in the DB algorithm.In addition to this, all the metrics have been enhanced in all the cases.In fact, now DB has the best scores in 8 out of 10 predictions according to RMSE and R 2 and 5 out of 10 with the MAE metric.Te rest of them is distributed as follows: HC obtained the best RMSE and R 2 in M3 and the best MAE in M5; kM was the best for M2 according to the three metrics, and it was the best rank in M7, M8, and M10 for the MAE.Tese results reveal not only DB as the best clustering algorithm but also a general improvement in the accuracy of the predictions and better robustness using more data.Bear in mind that the table utilised the hourly time series, and therefore the models have more information to work with, and this has to be the reason for this improvement.6 indicate that both approaches turned out to be very accurate as virtually all of them achieved over 0.9 in R 2 .Further analysis of these predictions shows that the fuzzy approach gives better results in 50% of the cases by MAE and in 4 cases by RMSE and R 2 .Nonetheless, it is interesting to note that we can fnd slight diferences between them.Te biggest diference of R 2 was 0.006 for the M2, the tiniest with 0.00004 for the M4, and on average of 0.002.
Finally, as an illustrative example of the predictions, we may take a look at Figure 8.As we can see, there is a weekly pattern.Tere are two points, the lowest ones that correspond to the weekends, and therefore it is normal that they were lower than the rest.In any case, what should attract our attention is that certain patterns in this series are well predicted and they follow the evolution adequately.However, intentionally, we show two weeks (the last part of the graph) where the consumption is slightly diferent.On this occasion, the models struggled to follow the trend at frst, but they promptly adjust the prediction in a better way.Te most signifcant aspect of Figure 8 we would like to highlight is that the fuzzy-oriented solutions managed to ft the curves properly which is essential for our solution.
Having discussed the results obtained, it can be said that both approaches are shown to be suitable to solve this problem as both of them received the best score in half of the cases.International Journal of Intelligent Systems

Conclusions
In the course of this research, we implemented and compared several fuzzy and nonfuzzy time-series methods.Apart from the efectiveness shown in previous studies, the fuzzy time series ofers extra utility as they provide us with, not only a single value but an interval in which the objective value is expected to be.Tis can be translated into a piece of enriching information as there are many scenarios where absolute certainties are uncommon, but trends and approximations have higher importance, as is the case of energy efciency.Te methods implemented have shown great fexibility during the whole process, from the creation of the fuzzy sets to the fnal prediction.One of the limitations we should mention is that this fexibility turns into a higher computational cost as several tests should be done prior to deciding some of the parameters.Te second drawback is a fuzzy representation of the series leading to a loss in terms of accuracy, and we should balance whether this loss pays of or not depending on the problem to solve.Although the fuzzy models did not achieve the lowest error in all the cases, they managed to maintain higher robustness compared with the numerical ones.Furthermore, those cases in which the fuzzy approaches ranked below presented a slight diference in terms of error, i.e., both predictions were, in all cases, quite similar.
Finally, we would like to highlight the overall performance of the DBScan algorithm against the other clustering algorithms.It is a surprising fnding that this method achieved such good results in spite of not being mentioned in the literature by previous authors.
Te studied models have demonstrated to have the capability to predict energy consumption at the University of Granada and its buildings.However, as future work, we propose a method for optimal parameters search along with the modifcation and experimentation with diferent membership functions.Additionally, we propose the use of density clustering and mean-shift algorithms that may ofer good results in these kinds of problems.

DB:
Density-based spatial clustering of applications with noise HC: Hierarchical clustering kM: k-Means LSTM: Long short-term memory MAE: Mean absolute error MLP: Multilayer perceptron RMSE: Root mean square error.

Data Availability
Te data used in this study are not available for public access due to privacy policies.We are committed to safeguarding the privacy and confdentiality of the organizations involved in this research.Sharing the data, even in an anonymized form, would risk violating these privacy commitments and could compromise the trust and confdentiality of our data sources.

Figure 1 :
Figure 1: General scheme of the proposed methodology.

Figure 2 :Figure 3 :
Figure 2: Example of discarded time-series, representing the mean (red), median (green), and the outlier region (dotted orange) of the consumption (blue).

Figure 4 :
Figure 4: Triangular membership function for the three energy consumptions (high, medium, and low).

Figure 5 :Figure 6 :
Figure 5: General overview of the hybrid fuzzy logic and clustering model.

Figure 7 :
Figure 7: Example of the hourly prediction performed by the MLP of one of the meters.Comparison of the diferent clustering methods along with the conventional model.

Figure 8 :
Figure 8: Illustrative example of a daily prediction using MLP.Comparison between fuzzy and nonfuzzy models.

Table 1 :
Buildings' description of the database.

Table 2 :
Comparison between LSTM and MLP models without using the fuzzy approach using the entire series.Bold value represents the best value between the two rows of each model.
International Journal of Intelligent Systems

Table 3 :
Comparison of the diferent clustering techniques for the fuzzy-oriented approach with MLP on a daily basis.Bold value represents the best value among the three rows of each method.DB is DBScan, HC is hierarchical clustering, and kM is k-Means.

Table 4 :
Comparison of the diferent clustering techniques for the fuzzy-oriented approach with LSTM on a daily basis.Bold value represents the best value among the three rows of each method.DB is DBScan, HC is hierarchical clustering, and kM is k-Means.

Table 5 :
Comparison of the diferent clustering techniques for the fuzzy-oriented approach with MLP on an hourly basis.Bold value represents the best value among the three rows of each method.DB is DBScan, HC is hierarchical clustering, and kM is k-Means.

Table 6 :
Comparison between fuzzy and nonfuzzy approaches.Bold values represent the best approach for each metric; that is to say, for each column (metric), we compare the two approaches (fuzzy and nonfuzzy), and the best is highlighted in bold.