Interpretable Short-Term Electrical Load Forecasting Scheme Using Cubist

Daily peak load forecasting (DPLF) and total daily load forecasting (TDLF) are essential for optimal power system operation from one day to one week later. This study develops a Cubist-based incremental learning model to perform accurate and interpretable DPLF and TDLF. To this end, we employ time-series cross-validation to effectively reflect recent electrical load trends and patterns when constructing the model. We also analyze variable importance to identify the most crucial factors in the Cubist model. In the experiments, we used two publicly available building datasets and three educational building cluster datasets. The results showed that the proposed model yielded averages of 7.77 and 10.06 in mean absolute percentage error and coefficient of variation of the root mean square error, respectively. We also confirmed that temperature and holiday information are significant external factors, and electrical loads one day and one week ago are significant internal factors.


Introduction
e increase in urban population has caused various problems, such as resource depletion, traffic congestion, and environmental pollution [1]. For the effective operation of complex urban systems, many municipalities and governments have been trying to transform existing cities into smart cities [2]. e smart city concept aims to improve the efficiency and security of urban infrastructure as well as the quality of life for its citizens [3]. For instance, smart cities can reduce GHG emissions by reducing traffic congestion and energy consumption and introducing technologies such as electric vehicles, energy storage systems (ESSs), and renewable energy (RE) [4]. In particular, improving the energy efficiency of buildings with ESSs and RE is an important issue for cities because building energy consumption is one of the main sources of GHG emissions [5]. Most smart city systems use recent technologies such as the Internet of ings and big data to implement various city services [6]. For instance, the energy efficiency of existing buildings can be improved using building energy management systems (BEMSs) [7].
A BEMS is a computer-aided tool that improves energy efficiency between the grid operator and consumers through bidirectional interaction [7,8]. It collects and analyzes data related to electrical energy consumption to establish operational plans for building energy use [8]. On the demand side, a BEMS provides ways for customers to reduce or shift peak energy consumption and trade the remaining energy [9]. On the supply side, it serves as a tool for optimal allocation of RE, ESS, and demand response in electric utility grids [10]. Here, short-term load forecasting (STLF) has been widely used to determine the amount of power necessary for the reliable operation of electric utility grids from the next hour to the next week [9,11]. It includes daily peak load forecasting (DPLF), total daily load forecasting (TDLF), hourly electrical load forecasting, and very short-term load forecasting (VSTLF) [12]. DPLF and TDLF are used to predict from one day to one week later as an essential procedure for unit commitment, energy trading, security analysis, and tight scheduling of outages and fuel supplies in power systems [13,14].
Accurate STLF is challenging because typical electrical energy consumption has various patterns accompanied by uncertainties due to unforeseeable external factors [15]. Furthermore, when predicting electrical loads, it is necessary to adequately consider complex correlations between historical and current data [9]. Many studies have been conducted to achieve accurate STLF based on machine learning (ML) methods because they can properly extract implicit nonlinear relationships between input and output variables [16][17][18][19]. Table 1 briefly summarizes recent STLF models based on ML methods. For instance, Lee and Han [20] developed a day-ahead (DA) DPLF model using multiple linear regression (MLR). ey collected daily peak electrical load data in South Korea from 2012 to 2016 through the Korea Power Exchange (KPX) and constructed an MLR model using the days of the week, seasons, average temperature, and historical loads from one day to one year before the prediction time point. e model achieved better predictive performance than the forecasting model of KPX, extreme learning machine, and autoregressive moving average (ARMA). Fan et al. [21] proposed a STLF model based on weighted K-nearest neighbor (KNN), called W-KNN. When constructing the KNN model, they considered the inverse of the Euclidean distance to give appropriate weights to each data point after selecting a value for K. In experiments using electrical load data from the National Electricity Market (Australia), their model outperformed KNN, ARMA, and artificial neural network (ANN) in predicting performance.
On the other hand, Dong et al. [22] developed an hourly electrical load forecasting model based on bootstrap aggregating (Bagging) for Chinese special days. ey collected three years of hourly electrical load in Qingdao, China. To construct their STLF model, they defined a holiday variable, with 2 representing statutory days, 1 representing working days, and 0 representing bridging days, proximity days, and weekends. eir Bagging model showed better prediction performance than ANN and Bagging, which trained a holiday variable that included 1 for working days and 0 for holidays. Sun et al. [23] proposed an hourly electrical load forecasting model based on ANNs. ey first collected hourly electrical load data of Tai'an City, Shandong Province, China, from 2016 to 2018. ey then generated various input variables, considering timestamp, temperature, and historical electricity consumption to construct their STLF model. eir model showed a mean absolute percentage error (MAPE) of 4.11 and a mean absolute error of 33.88. Truong et al. [24] developed an additive ANN (AANN) model to predict building electrical energy consumption. ey collected hourly electrical load data for one year from a residential building with an RE system and configured five input variables, such as days of the week, hours of the day, isolation, temperature, and historical electricity consumption, to train the AANN model. eir concept was based on a gradient boosting machine (GBM). Unlike GBM, which generally uses decision trees (DTs) as weak learners, the AANN trains iteratively by estimating an ANN as one weak learner and passing the remaining residuals back to other ANNs. e AANN model outperformed MLR, DT, ANN, and support vector regression (SVR) in predictive performance.
Recently, various hybrid STLF models based on two or more ML techniques have been proposed to derive better prediction performance than single ML-based STLF models. For instance, Fan et al. [25] proposed an SVR-based STLF model, namely, EMD-SVR-PSO-AR-GARCH, by hybridizing with empirical mode decomposition (EMD), particle swarm optimization (PSO), and autoregressive-generalized autoregressive conditional heteroscedasticity (AR-GARCH). ey firstly decomposed an original electrical load data sequence into conventional intrinsic mode functions (IMFs) and residuals using the EMD. en, they used SVR-optimized PSO and AR-GARCH to fit IMF1 and other IMF components and residuals, respectively. Finally, they obtained the final prediction value by integrating and fitting the prediction values of the models. eir model outperformed ARMA, AR-GARCH, SVR, and others in predictive performance. Zhang et al. [26] developed a hybrid STLF model, called VMD-SR-SVRCBCS, using variational mode decomposition (VMD) and self-recurrent (SR)-SVR by optimizing the parameters through the cuckoo bird search process of the cuckoo search algorithm (CBCS). ey performed data preprocessing using the VMD to obtain more accurate IMFs and applied SR-SVRCBCS to model each decomposed IMF for more accurate forecasting results. Here, the SR mechanism, inspired from the combination of Jordan's and Elman's recurrent neural network, was used to learn more recurrent information from the hidden layer values at the previous time point concerning the outcomes of the SVRCBCS models. eir model outperformed single ML models such as autoregressive integrated moving average (ARIMA), seasonal ARIMA, ANN, and SVR in predictive performance.
However, because the decision-making process inside most of these models mentioned above is opaque (i.e., a black box), forecasting results derived from these models cannot be entirely accepted and utilized. erefore, their interpretation has been another challenging task [32]. Recently, interpretability methods in ML have attracted increasing attention for constructing accurate and interpretable forecasting models [33,34]. Here, "interpretable" means that the user can understand how the model employs the input variables to make predictions [33]. e variable importance (also referred to as feature importance) measure is the basis for enhancing the interpretability of a model [34]. Several studies have used the variable importance measure to confirm the most significant factors in STLF. Bouktif et al. [27] proposed a long short-term memory (LSTM) network-based STLF model using Metropolitan France's electrical load data. ey used a genetic algorithm to optimize the hyperparameter tuning of the LSTM model. ey also confirmed that the historical load was the most significant input variable for model training using the variable importance of extra trees (ETs). eir LSTM model outperformed ridge regression, KNN, RF, GBM, ANN, and ET. Wang et al. [28] proposed an hourly electrical load forecasting model based on RF. ey configured ten input variables representing weather, occupancy, and time-related data and constructed a forecasting model for each academic semester. ey predicted the electrical load of two educational buildings and identified the most influential variables that differed by semester. e RF model performed better than the DT and SVR models. Ruiz-Abellón et al. [29] developed four 48-hour-ahead electrical load forecasting models using hourly electrical load data collected from the Technical University of Cartagena in Spain. ey used Bagging, RF, conditional RF (CRF), and XGB as tree-based ensemble methods to construct the forecasting models. ey also described the essential input variables for training each forecasting model through variable importance. In the experiments, the XGB model achieved better prediction performance than Bagging, RF, and CRF. Abbasi et al. [30] proposed a 30-minute-interval electrical load forecasting model based on XGB. ey used variable importance to extract input variables from the historical load during a week and confirmed that the historical loads close to the prediction time point and from a week before the prediction time point had high importance for the model construction. Subsequently, they constructed the XGB-based forecasting model using the extracted input variables. e XGB model exhibited a MAPE of 10% and an accuracy of 97%. Zhang et al. [31] proposed a TDLF model based on K-means clustering and categorical boosting (CatBoost). ey collected total daily electrical load data from Yangzhong High-Tech Zone in China and executed K-means clustering to group industrial customers with similar load features into the same cluster. ey then constructed a CatBoost-based TDLF model for each cluster and showed the variable importance of the CatBoost model. eir proposed model outperformed ARMA, LSTM, GBM, and CatBoost in predictive performance.
Despite these efforts, there are still limitations. For instance, when a forecasting model based on these methods is constructed using conventional time-series forecasting model evaluation [35], we can determine the essential input variables by analyzing the variable importance in the model constructed from the training dataset. However, conventional time-series forecasting model evaluation performs unsatisfactorily when there is a significant gap between the training set period and the test set period [14]. is makes it difficult to ensure confidence in the decision-making process of the model. In addition, most of the studies were conducted mainly on STLF with high time resolution, such as hourly or subhourly intervals, for DA energy planning.
erefore, further studies are needed on the quantitative DPLF and TDLF for DA and weak-ahead (WA) energy planning. Furthermore, although a Cubist regression model has shown excellent performance in time-series forecasting [36][37][38], its use for STLF has rarely been reported [38].
To address these issues, this study proposes a robust interpretable short-term electrical load forecasting model for accurate DPLF and TDLF. To this end, we collected five Computational Intelligence and Neuroscience 3 electrical load datasets from two commercial buildings and three educational building clusters. We configured various input variables that highly correlate with DPLF and TDLF. en, we constructed STLF models using Cubist and timeseries cross-validation (TSCV) to achieve high accuracy. e main contributions of this paper are as follows: (1) We use electrical load data from two public buildings and three building clusters with different purposes to predict building-level electrical energy consumption, which has more complex patterns. (2) We configure different input variables to predict both DPLF and TDLF by considering both DA and WA energy planning for effective BEMS operation. (3) When a forecasting model is constructed for WA forecasting, we obtain multistep-ahead forecasting of all prediction time points (from one day to seven days) to compensate for uncertainty. (4) To address the data shortage problem and reflect current electrical load trends and patterns, we use the cross-validation procedure based on a rolling forecasting origin. e remainder of this paper is organized as follows. Section 2 describes the data preprocessing used to configure different input variables for forecasting types and presents the process of constructing the proposed model. In Section 3, we analyze the experimental results to demonstrate the superiority of the proposed model. Finally, we conclude our study and present the directions for future research in Section 4.

Materials and Methods
In this section, we describe in detail the data preprocessing and forecasting model construction. Figure 1 illustrates the overall flowchart of the proposed method.

Data Collection and Preprocessing.
In this section, we first present a data resampling process to construct data suitable for forecasting purposes. en, we describe the process of configuring the input variables according to the purpose (i.e., DA and WA forecasting) using timestamps, weather, and historical load information. Figure 2 illustrates the framework of the data preprocessing for constructing a Cubist model.
We collected electrical load datasets from five different types of buildings or building clusters, as summarized in Table 2. We first collected publicly available datasets from two buildings in Richland, WA, USA [39,40]. e datasets consist of three years' worth of information, including the hourly electrical load, the hourly outdoor temperature, and the corresponding timestamps. We filled in the missing values (i.e., daylight saving time in North America) in both datasets using linear interpolation. Because the dataset contains the hourly electrical load data for a day (24 rows), we calculated the maximum load value and the sum of all load values for each day for DPLF and TDLF, respectively.
In addition, we collected typical 15-minute-interval electrical load datasets from three clusters of buildings at a private university in Seoul, South Korea [41]. e dataset collection period was three years. e first cluster comprised 16 residential buildings, and the electrical load had a residential pattern. e second cluster consisted of 32 academic buildings, including the main hall, library, classrooms, and offices. e third cluster contained five science and engineering buildings.
is cluster exhibited much higher electrical loads per building than the other clusters, mainly due to the various experimental equipment and devices in the laboratories. In South Korea, the daily peak load of the building was calculated by multiplying the highest value among the 15-minute-interval electrical load used per day by 4. We took this into account when calculating the daily peak electrical load for DPLF. Likewise, we took the sum of 96 values per day to perform TDLF. Tables 3 and 4 provide some statistics on daily peak loads and total daily loads for the five datasets, respectively. Weather conditions and holiday information are closely related to electrical load [42]. erefore, we used these data to configure the input variables for Cubist modeling. Because we performed two types of forecasting, namely, DA and WA forecasting, we used different input variable configurations for each type of forecasting.
Time is another important factor for electrical loads. We considered various temporal variables, such as months, days, and days of the week. However, these variables cannot reflect periodic information when applied to forecasting models. For instance, 31 December and 1 January are temporally contiguous, yet, the range of both values in sequence form is 30 and 11 as day and month, respectively. Hence, we represented them as continuous data in the two-dimensional (2D) space to reflect their periodicity using equations (1)-(6) [41,42]. Here, the variable for seven days of the week is defined as 1 to 7 from Monday to Sunday, according to international standard ISO 8601. LDM month represents the last day of the month to which the day belongs (e.g., January: 31, February: 28 or 29, March: 31, and so on). Consequently, we used six input variables to describe the date and time of the prediction time points.  IV02  IV01   IV03  IV04  IV05  IV06  IV07  IV08  IV09  IV10  IV11  IV12  IV13  IV14  IV15  IV16  IV17  IV18  IV19  IV20  IV21  IV22  IV23  IV24   Open Energy Information   IV02   IV01   IV03  IV04  IV05  IV06  IV07  IV08  IV09  IV10  IV11  IV12  IV13  IV14  IV15  IV16  IV17  IV18   Importance  80  60  40  20  0   Importance  80  60  40 20 0 Figure 1: Architecture of interpretable short-term electrical load forecasting model (DA: day-ahead, WA: week-ahead, DPLF: daily peak load forecasting, and TDLF: total daily load forecasting).

Computational Intelligence and Neuroscience
Day of the Week x � sin 360 7 × Day of the week , Day of the Week y � cos 360 7 × Day of the week .
To verify the validity and applicability of the 2D representation, we computed several regression statistics on the electrical loads in one-dimensional (1D) space, consisting of three temporal variables (i.e., months, days, and days of the week), and 2D space, as shown in Tables 5 and 6. Here, the residual standard error (RSE), multiple R-squared, and adjusted R-squared were calculated using equations (7)- (9).
where y t and y t represent the actual and estimated values, respectively, at the time t. y represents the mean of the actual values, and n and p indicate the number of observations and variables, respectively. From the tables, we can observe that the 2D representation exhibits correlations more effectively than the 1D representation. Typical electrical load patterns are different on weekdays and holidays, depending on the type of building [16,40]. To reflect this, holiday information was also used as an input variable of the forecasting model. We collected holiday information by country from https://www.timeanddate.com [43]. e holidays included Saturdays, Sundays, and public holidays. We used one-hot encoding (i.e., "1" for the relevant data and "0" otherwise) as a nominal scale. Hence, we used seven time factors as input variables at the prediction time point.
In general, the electrical load increases in the summer and winter due to the heavy use of air-conditioning and electrical heating appliances, respectively [13,44]. Here, the most influential factor in the electrical load is temperature [13]. In this study, we focused on temperature-related variables, such as the daily maximum, average, and minimum temperatures [44]. In the case of Buildings 1 and 2, because the datasets we collected also included the outdoor temperature along with the electrical load, we calculated the daily maximum, average, and minimum temperatures (in the Fahrenheit scale). For Clusters 1 to 3, we first collected hourly temperature data using regional synoptic meteorological data provided by the Korea Meteorological Administration (KMA) and collected by the Seoul Meteorological Observatory located about 6 km from the university campus. As in Buildings 1 and 2, we calculated the daily maximum, average, and minimum temperature in Celsius. In South Korea, the KMA provides weather forecasts, including short and mid-term forecasts, for the most significant regions. e KMA mid-term forecast service provides the maximum and minimum temperatures from day 3 to day 10, as shown in Figure 3.
Historical electrical load data are particularly important in electrical load forecasting because they exhibit the trend of recent electrical loads [17,45]. To reflect the recent trend in the prediction, we used the daily peak and total daily loads of the previous seven days as input variables for the DPLF and TDLF models, respectively. Because the electrical load trends of weekdays and holidays can differ, we added a holiday indicator to indicate whether the day was a holiday [46]. We used a total of 24 input variables to build the DA-DPLF and TDLF models. Now, we describe the configuration of the input variables for the WA-DPLF and TDLF models. First, we used the same time and temperature variables as in the DA forecasting models. However, the daily peak and total daily loads from the previous day to the sixth day before the forecast date are unknown when considering WA forecasting. To compensate for this, we configured the input variables using the daily peak and total daily loads and holiday indicators of the same day of the previous four weeks for the WA-DPLF and TDLF, respectively [46]. erefore, we used a total of 18 input variables to construct our WA electrical load forecasting models. Table 7 presents all the input variables that we considered for DA and WA forecasts.

Forecasting Model Construction.
Variable importance is a technique that assigns scores to input variables based on the relative importance of each variable for accurate prediction [34]. Variable importance scores can be evaluated for both classification and regression problems. Variable importance scores play an essential role in constructing forecasting models because they can provide insight into the dataset or the model supported by the dataset [47]. For instance, relative scores can highlight which input variables have the greatest or least effect on the output variable. ese scores can be interpreted by domain experts and used as a reference for collecting more or different data. Because most variable importance scores are calculated by forecasting models that fit the dataset, these scores can be calculated for the model interpretation. In addition, analyzing variable importance can offer suggestions to improve the efficiency and effectiveness of the forecasting model through dimensionality reduction and feature selection [48].
To date, the R language has been widely used for data cleansing, preparation, and analysis [49]. To facilitate accessibility, we also adopted multiple R packages, including variable importance evaluation functions. To calculate variable importance measures in the R environment, we used Cubist, a regression tree-based model, because it provides a balance between interpretability and predictive power [36]. Figure 4 exhibits the flowchart of interpretable electrical load forecasting based on Cubist modeling. Cubist was developed based on Quinlan's M5 model tree [48,50].
e Cubist method generates a series of "if-after-after" rules. Each rule holds a linked multivariate linear model. As long as the covariate set meets the rule conditions, the corresponding model is used to compute the predicted value. e Cubist output includes variable usage statistics and provides the percentage of times each variable was adopted in a condition and/or a linear model.     (1) While the tree is growing, many leaves and branches grow. e Cubist model sequentially develops a series of trees with adjusted weights and strengthens them with training committees (usually one or more), similar to the "boosting" method. e number of neighbors in the Cubist model is used to correct rule-based predicted values, and the final predicted value denotes a function of all the linear models from the initial node to the terminal node. e percentages displayed in the Cubist output reflect all the models related in the predicted value rather than the terminal models displayed in the output. e variable importance used here is a linear combination of the rule condition usage and model [50].
When constructing a forecasting model, datasets are usually divided into a training set and a test set, and then the forecasting model is built using the training set and verified using the test set [35]. However, conventional time-series forecasting model evaluation could exhibit unsatisfactory prediction performance when there is a significant gap between the training set period and the test set period. Additionally, when the dataset is insufficient, it is challenging to obtain satisfactory prediction performance using a small amount of training data [19].
To solve these problems, we utilized TSCV based on a rolling forecasting origin [51]. TSCV focuses on a single or several forecast horizons for each test set. In this study, we used several different training sets, each containing one or more observations not used in the previous training set, depending on the scheduling period. To perform DPLF and TDLF, we took the test sets one day after the current time and a week after the current time, as shown in Figure 5. We evaluated the forecasting model performance by calculating the prediction accuracy at each time point and then calculated their average value. erefore, it is possible to solve the data shortage problem because more data can be used over time than in the conventional time-series forecasting model evaluation. We can also expect satisfactory prediction performance because it can adequately reflect recent electrical load patterns and adjust the weights of the input variables in the forecasting model. Here, we presented interpretable electrical load forecasting results by calculating the importance of the variables for each training set in the model.

Results and Discussion
In this section, we first introduce metrics to compare the prediction performance of the forecasting models and describe the experimental design and results in detail. We also present the results of several statistical tests to prove the validity of our experiment. We exhibit the interpretable short-term electrical load forecasting using the proposed model. Finally, we discuss experimental procedures.

Experimental Design.
e quantitative experiments were conducted with an Intel ® CoreTM i7-8700k CPU with Holiday (the day before seven days) Holiday (the day before four weeks) Binary 12 Electrical load (the day before seven days) Electrical load (the day before four weeks) Continuous 13 Holiday (the day before six days days) Holiday (the day before six three weeks) Binary 14 Electrical load (the day before six days) Electrical load (the day before three weeks) Continuous 15 Holiday (the day before five days) Holiday (the day before five two weeks) Binary 16 Electrical load (the day before five days) Electrical load (the day before two weeks) Continuous 17 Holiday (the day before four days) Holiday (the day before one week) Binary 18 Electrical load (the day before four days) Electrical load (the day before one week) Continuous 19 Holiday (the day before three days) -Binary 20 Electrical load (the day before three days) -Continuous 21 Holiday (the day before two days) -Binary 21 Electrical load (the day before two days) -Continuous 23 Holiday (the day before one day) -Binary 24 Electrical load (the day before one day) -Continuous Computational Intelligence and Neuroscience 9 32 GB DDR4 RAM. We performed the input variable configuration and forecasting model construction in RStudio (v. 1.1.453) with R (v. 3.5.1). We used three years of electrical load data from 2009 to 2011 for Buildings 1 and 2 and from 2016 to 2018 for Clusters 1 to 3, respectively. We divided the dataset into training (in-sample) and test (outof-sample) sets in an approximate proportion of 2 : 1. For Buildings 1 and 2, we confirmed that the electrical load from November to December 2011, the out-of-sample period, was higher than that for the remaining days. e R random number generator seed was set to 1234 for all methods.
To evaluate the predictive performance of forecasting models, we used the MAPE and coefficient of variation of the root mean square error (CVRMSE). MAPE and CVRMSE values show the accuracy as a relative percentage error. Hence, they are easier to understand than other well-known metrics, such as the mean absolute error, mean square error, and root mean square error [40,52]. e MAPE measures the prediction accuracy for constructing fitted time-series values in statistics, specifically in trend estimation. e CVRMSE is used to aggregate the residuals into a single measure of predictive ability and is more useful when significant errors are particularly undesirable. e lower the MAPE and CVRMSE values, the better the forecasting model's predictive performance. However, it is known that the MAPE and CVRMSE increase significantly when the actual value tends to zero [35,52]. e MAPE and CVRMSE are calculated using (10) and (11), respectively, where y t and y t are the actual and forecasted values at time t, respectively, y is an average of the actual values, and n is the number of observations. MAPE � 100 n n t�1 y t − y t y t ,  To demonstrate the validity of the proposed model, we considered a total of 12 machine learning methods including MLR, partial least squares (PLS), multivariate adaptive regression splines (MARS), KNN, SVR, DT, Bagging, RF, GBM, XGB, and CatBoost. Most machine learning methods have hyperparameters that can influence model performance. To further improve their performance, it is critical to tune the hyperparameters effectively. To find the optimal hyperparameters, we performed 10-fold cross-validation, all on the training set. Table 8 describes the hyperparameters for each method, the R package used, the references where details about their optimal value can be found, and the hyperparameter ranges.
For DA forecasting, we also considered the MLR model in [20], which achieved better prediction performance than KPX's forecasting model, as a baseline model in the evaluation of the proposed model. e MLR model adapts a rolling procedure using a dataset from one day to one year before the prediction time point and is specified in the following equation: where Y D is the expected daily peak or total daily electrical load at day D; Y D−1 , Y D−2 , and Y D−7 are historical loads before one day, two days, and one week, respectively; W D , which represents the day of the week at day D, is a categorical variable and consists of seven categories from 1 (Monday) to 7 (Sunday); S D is a categorical variable for season and consists of six categories from 1 (Jan. and Feb.) to 6 (Nov. and Dec.); T D is the average temperature at day D; S D T D is a quadratic variable to reflect appropriate weather information according to the season; and β 0 , β i , and ϵ are the constant term, the slope coefficient of the ith independent variable, and the error term, respectively.  holdout, except for PLS, where the input variable weight was adjusted in the forecasting model to reflect the recent electrical load pattern. We made the following five observations from the experiments:

Experimental
(1) e day-ahead electrical load forecasting models exhibited better prediction performance than the week-ahead electrical load forecasting models. (2) e week-ahead electrical load forecasting models exhibited a lower prediction performance as the prediction time moved further from the current time.
(3) Despite the same time point, the day-ahead electrical load forecasting models showed better prediction performance than the first prediction time point of the week-ahead electrical load forecasting models. (4) e total electrical load forecasting models displayed higher prediction performance than the peak electrical load forecasting models. (5) e prediction accuracy for Clusters 1-3 was higher than the prediction accuracy for Buildings 1 and 2.
In general, the electrical load pattern of buildings or building clusters could change from a variety of causes, and hence the further apart the current and forecast times, the higher the uncertainty [9]. erefore, both the DA and the first prediction time point of the WA electrical load forecasting models yielded more accurate predictions because they modeled electrical load patterns of the day before using  Cubist [50] Cubist committees (sequence generation of rule-based models (similar to boosting)): 1, 10, 50, 100 neighbors (single integer value to adjust the rule-based predictions from the training set): 0, 1, 5, 9 12 Computational Intelligence and Neuroscience TSCV. In addition, because the DA electrical load forecasting models utilized the electrical load of the day before as an input variable, they exhibited more accurate prediction performance than the first prediction time point of the WA electrical load forecasting models. Hence, we confirmed that the electrical loads during the week were more significant input variables than the electrical loads of the same days of the week. Overall, we also confirmed that the forecasting models displayed low prediction accuracy when the electrical load was close to zero. From November to December 2011 (out-of-sample), Buildings 1 and 2 showed a sudden higher electrical load than that on other days. us, the   Computational Intelligence and Neuroscience 13 electrical load forecasting models had a large prediction error because they were not trained on these electrical loads.
To find the best forecasting method, we ranked them by considering all the TSCV performance metrics for each building or cluster. We then calculated the average rank using the rank for all buildings and clusters for the DA-DPLF and TDLF and WA-DPLF and TDLF for each method. Table 17 presents the ranks for each method for each performance metric and the average ranks. We confirmed that the Cubist method exhibited the best rank in the table. In addition, we demonstrated that the proposed Cubist model outperformed the MLR model in all aspects of DA forecasting, as shown in Table 18.

Statistical Verification.
To demonstrate the validity of the proposed method, we conducted three statistical tests. e paired sample t-test was used to demonstrate the effectiveness of TSCV, and the Wilcoxon signed-rank and Friedman tests were used to confirm the difference in prediction performance between the proposed model and the other models. Here, the p value gives the probability of    Table 19 shows that the MAPE and CVRMSE p values were both less than 0.05. erefore, we can confirm the validity of the TSCV used in this study.
e Wilcoxon signed-rank test [58,59] is used to confirm the null hypothesis to determine a significant    Table 20. Because the p value in all cases is below the significance level, the proposed model is superior to the other models.

Model Interpretation.
We determined the variable importance of the proposed model at each test point (one day or one week), according to the TSCV cycle. Figures 6 and 7 present heat map graphs for the DA forecast, revealing the importance of the input variables listed in Table 7. Figures 8  and 9 present heat map graphs for the WA forecast, revealing the importance of the input variables listed in Table 7. e analytical results obtained, as shown in the figures, can be summarized as follows. In the calendar data, we confirmed that the holiday was the most significant variable for the forecasting model, and the variables for the days of the week were highly important. e variables for the day did not significantly affect the model performance. We also confirmed that, overall, the temperature data were essential for the model performance, and the average temperature was a key input variable for the proposed model. In the historical load data, the electrical load from the previous day was the most important variable, and the electrical load one-week prior was also crucial for the DA forecasting model. In the WA forecasting models, the electrical load one-week prior was the most significant input variable in the historical load data. Here, we can see that the adjacent autocorrelation variables (e.g., historical load) are essential for the model performance. We also presume that the DA forecasting models performed better than the WA forecasting models because they could reflect the crucial historical load variables, both the day before and the week before.

Discussion.
e experimental results showed that PLS exhibited poor prediction performance in TSCV. PLS is a popular method to deal with multicollinear relationships between output and input variables [48]. Even though we predicted the electrical load by setting the PLS hyperparameter automatically, it performed poorly. Although the historical load was highly correlated with the actual electrical load, PLS did not adequately reflect the historical load in TSCV, and hence we conclude that this caused the prediction accuracy to be low. XGB and CatBoost are state-ofthe-art technologies. XGB performed satisfactorily, but CatBoost did not. Because most input variables are continuous, CatBoost could not use its advantages, such as ordered target statistics. erefore, we concluded that XGB is better suited for DPLF and TDLF than CatBoost. e main advantage of the Cubist method is the addition of multiple training committees and "reinforcement" to balance the weights better. erefore, we presume that Cubist can achieve satisfactory prediction performance because it predicts the next time point by adjusting the weights of the input variables better through TSCV. e light GBM (LightGBM) and neural network methods also performed well in STLF but were not considered here. ese methods require a sufficient   [18,60]; however, it takes a significant amount of time to collect enough data because only one dataset for daily peak load and total daily load is generated per day. Moreover, these methods are better configured for the Python environment [60]. In particular, neural network methods require high-performance computer specifications [9]. In this paper, we only considered several datasets collected over three years, and a little over 700 tuples were used for the first model training. erefore, we did not expect these methods to formulate a robust forecasting model with such small datasets. In the future, we will apply these methods to perform interpretable hourly electrical load forecasting or VSTLF.  IV02  IV01   IV03  IV04  IV05  IV06  IV07  IV08  IV09  IV10  IV11  IV12  IV13  IV14  IV15  IV16  IV17  IV18  IV19  IV20  IV21  IV22  IV23  IV24   Importance  80  60 IV03  IV04  IV05  IV06  IV07  IV08  IV09  IV10  IV11  IV12  IV13  IV14  IV15  IV16  IV17  IV18  IV19  IV20  IV21  IV22  IV23  IV24   Importance  100  80  60  40 20 0   IV02   IV01   IV03  IV04  IV05  IV06  IV07  IV08  IV09  IV10  IV11  IV12  IV13  IV14  IV15  IV16  IV17  IV18   Importance  80  60  40 20 0 Computational Intelligence and Neuroscience

Conclusions
In this paper, we developed a novel forecasting model for interpretable short-term electrical load forecasting. To do this, we collected five different electrical load datasets with temperature and holiday information. We constructed different input variables by considering four forecasting types: day-ahead DPLF and TDLF and week-ahead DPLF and TDLF. We built the proposed model based on Cubist, a rule-based model, and applied TSCV to address the lack of data and reflect recent electrical load trends. e experimental results demonstrated that the proposed model showed excellent prediction performance. In addition, we conducted interpretable electrical load forecasting for each building or building cluster using the variable importance produced by the proposed model.
We found that applying the TSCV method can improve prediction performance, except for PLS, and that the Cubist method performed satisfactorily using a small dataset. It was challenging for CatBoost, a state-of-the-art technique, to produce excellent prediction performance because almost all input variables were configured as continuous. Overall, we confirmed that the higher the electrical load, the higher the prediction accuracy. TDLF and the day-ahead forecasting model had a better prediction performance than DPLF and the weak-ahead forecasting model. However, it was difficult to adequately train the forecasting models on sudden electrical load fluctuations because the amount of data was smaller than the amount of data for hourly load forecasting or VSTLF.
We plan to perform interpretable VSTLF, such as 10minute or 15-minute-interval load forecasting, using neural network methods such as activation maps or an attention mechanism. In addition, we will make an effort to develop various methodologies for explainable forecasting in interpretable forecasting. We also plan to find variables that can reflect building characteristics and include them in the forecasting model.  Importance  100  80  60  40  20  0   IV02   IV01   IV03  IV04  IV05  IV06  IV07  IV08  IV09  IV10  IV11  IV12  IV13  IV14  IV15  IV16 IV17 IV18 Figure 9: Example of variable importance for WA-TDLF (Building 1).
Data Availability e data that support the findings of this study are available from https://www.dropbox.com/s/y0bcrulfqywra1x/ Datasets.zip?dl � 0, but restrictions apply to the availability of these data.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this article.