A Load Forecasting Algorithm for Power Internet of Things Using Training Data Dimension Expansion and Ensemble Learning Technique

With the development of the power internet of things (IOT), load forecasting will play an important role the power system. It can optimize the power generation planning and improve the economical operation of power IOT. In this paper, a new loading forecasting algorithm for power IOT is proposed using training data dimension expansion and ensemble learning. In the offline phase, the obtained meteorological data and time information is normalized to remove the unit effect at first. Then, the Hampel filter is used to cope with the outliers of the meteorological data from sensors. Through the preprocessing, the fingerprint of the training data is constructed. Next, the matrix multiplication method is proposed to expand the dimension of training data fingerprint information. Finally, the ensemble learning combining multiple long short-term memory (LSTM) networks are proposed to obtain multiple power load forecasting models and corresponding weight coefficients. In the online phase, the obtained meteorological data and time information are preprocessed to form the input of and power load forecasting models. The final power load forecasting is obtained by linear weighted sum method with intermediate forecasting result. In the proposed algorithm, more features of training data can be obtained by the data dimension expansion. Moreover, the ensemble learning using LSTM can make fully use of the timing sequence of training and improve the generalization performance of offline training. Experiment results illustrate that the proposed algorithm has better forecasting performance than existing methods.


Introduction
With the increasing requirement of energy consumption, a clean, low-carbon, safe, and shared power system has received many attentions for both academic and industry [1,2]. The new energy revolution called "energy + Internet" becomes a research hotspot [3]. In China, the State Grid Corporation began to develop the ubiquitous power internet of things (IOT) for smart grid in 2019 [3]. The power IOT connects people and devices that are related to the power grid. By collecting device-related data using sensor, the data is sent to the server which is processed by big data analysis, cloud computing, and artificial intelligence technologies. It effectively integrates communication infrastructure resources and power system infrastructure resources which can realize the interconnection of all things in the power system. Now, the power IOT can promote the efficiency of energy services and new energy consumption dramatically. In [4], an architecture for ubiquitous power IOT is proposed which brings great convenience to the collection, transmission, processing, and storage of data. In [5], the characteristics of power IOT, its application, and the needs of smart grid for communication coverage and data acquisition are proposed.
For the requirement of power IOT construction, power load forecasting plays an important role in the operation planning of power system. The authors of [6] proposed a load forecasting method by two groups of features. The half-hour electrical load variables obtained from the smart meters through the IOT technology together with the lagged load and calendar variables are used for short-term load forecasting by multilayer perceptron neural networks. In [7], an ultrashort term load forecasting method for the power IOT is proposed by federated learning. It learns the model parameters from the data distributed in multiple edge nodes. Since rapid and accurate power load forecasting can optimize the power generation planning and improve the economical operation of power IOT, it is urgent to study the load forecasting technique for power IOT [8].
Recently, the load forecasting is performed with different kinds of techniques in the literature which can be broadly divided into two groups: traditional and artificial intelligence-based techniques.
For traditional technique, the statistical method is mostly used for load forecasting. In [9], considering the load forecast is a conditional expectation of load given the time, weather conditions, and other explanatory variables; load forecasting can be calculated directly from given parameters. In [10], the general exponential smoothing is used to develop an adaptive load forecasting system using observed values of integrated hourly demand. In [11], a modified autoregressive moving average-(ARMA-) based method with non-Gaussian process is used to improve the     In order to cope with the limits of traditional technique, artificial intelligence-based technique is developed. First, some machine learning-based methods are proposed. In [12], a kernel-based support vector regression (SVR) method is used for load forecasting by choosing kernel function. In [13], the artificial neural network (ANN) is used to obtain the load forecasting model for weekdays and weekends, respectively. Now, with the development of machine learning theory, some deep learning methods are used for load forecasting, because it has greater number of hidden layers to deal with the complicated nonlinear patterns. In [14], a multiscale convolutional neural network with time cognition is proposed for load forecasting. Since the recurrent neural network (RNN) is effective to capture non-stationary training data, a pooling-based deep RNN in [15] is proposed to solve the overfitting problem in load forecasting. Note that vanishing gradient and exploding gradient problem arise in RNN can reduce the prediction accuracy. Thus, the long short-term memory (LSTM) network has been taken into account to solve these problems. In [16], an effective methodology using the LSTM network is developed to make a precise forecasting under more complex time series load data condition.
According to the above discussion, we continue to study deep learning-based load forecasting algorithm. Since the time and meteorological data, such as wind speed, temperature, and pressure, are critical affect the power consumption [17], in this paper, a deep learning technique is used to training the relationship between the above factors and power load. The contributions of this paper can be summarized as follows: (1) A deep learning-based load forecasting algorithm is proposed by LSTM. It can effectively avoid the insensitivity of training data time series in the prediction process (2) For data preprocessing, the Hampel filter can eliminate the outlier caused by sensor noise. Moreover, the feature extraction network using random matrix multiplication can obtain better feature representation which can improve the offline training performance (3) For regression learning, the ensemble learning framework which combines multiple LSTM networks is used to train the relationship between the Output gate Input gate h t Figure 6: The description of LSTM architecture.   Journal of Sensors the optimal parameters in the global sense can be obtained, and the final load forecasting performance can be improved The remainder of this paper is organized as follows. Section 2 gives the algorithm framework. The offline phase description and the online phase description of the proposed algorithm are explained in Sections 3 and 4, respectively. Field test and performance analysis are illustrated in Section 5, and conclusions are drawn in Section 6.

Algorithm Framework
According the block diagram shown in Figure 1, the proposed algorithm contains two main phases: offline training phase and online forecasting phase. For offline training phase, it contains three important steps: (1) training data preprocessing, (2) feature extraction using dimension expansion, and (3) regression learning using ensemble learning. For another, in the online forecasting phase, after the similar preprocessing of the obtained data, the trained feature extraction network and forecasting model are used to estimate the final load forecasting result. In the following, each step of the proposed algorithm will be described in detail.

Offline Phase Description
3.1. Training Data Preprocessing. In this step, data normalization and outlier detection are used for data preprocessing. First, as we know, some measurement environments and the    5 Journal of Sensors sensor noise may lead some measurements become outlier in the data collection. In the paper, the Hampel filter is used to remove these outliers [18]. For a given vector x = fx 1 , x 2 , ⋯, x n g, the observation window of each measurement is defined at first. Assume the half-length of the window is k; the total length of the defined window is calculated as 2k + 1 (including the given measurement). Then, the median of all measurement data in the window is calculated as The standard deviation of each measurement for the absolute value of the median is written as After filter, the measurement can be described as If the measurements exceed three times of the obtained standard deviation, the median is used to substitute the measurement data.
Taking the temperature and pressure information as an example, Figures 2 and 3 describe the measurement data preprocessing results. It can be seen that after Hampel filter process, some outliers are removed. Thus, Hampel filter has effects on outlier deletion.
Next, the collected meteorological data and time information are scaled in order to lead them fall into a special range. In this paper, the min-max method is used to make the obtained data information into [0 1] range which can be given by [19] x n ′ = where x n , x n ′ describe the original and normalized data, respectively, and x max and x min are the maximum and minimum values of this kind of given data.   Journal of Sensors The purpose of data normalization is to remove the unit effects for the following training. Through this process, different kinds of data are at the same level. And the comprehensive evaluation and analysis can be carried out.
Thus, through the above preprocessing, the fingerprints of training data are described in Figure 4, where the fingerprint contains time information (month, hour) and meteorological data (wind speed, temperature, and pressure). The size of fingerprint is five dimensions. The label of training data is actual load.

Feature Extraction Using Training Data Dimension
Expansion. After the data preprocessing, in the next, the training data expansion is used to obtain more features for training. Traditional data dimension expansion for deep learning contains random augmenter and lower-bound cropper [20]. The idea of random augmenter is to randomly generate a binary mask that can be multiplied by the data vector. The lower-bound cropper augmentation technique leverages this observation to increase the training data size. In particular, for a given collected data vector, any entry whose value is less than a certain threshold is a candidate to be removed (set to zero). All combinations of these entries can then be added to form the new training data. As shown in Figure 5, in this step, the random matrix is proposed to expand the dimension of the fingerprint.
Assume the original fingerprint is defined as A with size 1 × m; in order to expand the dimension of a vector, a random matrix B is defined for multiplication operation. In order to meet the requirement of calculation rule and dimension expansion, the size should be m × n (where n > m). The optimal element can be obtained from the training with following regression learning.
At last, the expanded fingerprint can be written as where the size of C is n. Each element is obtained by multiplying vector with matrix. Note that since the size of expanded fingerprint is larger than that of the original fingerprint and the element can be determined with the offline training, more detailed feature for training data can be obtained through above dimension expansion.

Regression Learning Using Ensemble
Learning. In this section, in order to improve the training performance of the offline phase, ensemble learning framework is proposed for regression learning [21]. Moreover, the LSTM network is chosen as the sublearner which can make fully use of the time series of the training data.
First, some basic knowledge of LSTM network is described [22]. According to the LSTM architecture shown in Figure 6, it contains four basic components: cell, input gate, output gate, and forget gate. The function of each component is described as follows. Cell is the core of the compute node. The information can be transferred over random time intervals through the cell. The gate traces the flow of the input and output data from the cell. The input gate and output gate are used to control the input and output. The forget gate is used to control the retention degree of historical information. Moreover, the sigmoid activation function is introduced to make the output between [0, 1]. When the output is 0, it means that all the information in the previous state is discarded. When the input gate is 1, all information in the previous state is retained.
A unit of LSTM is defined as Input is defined as Cell update is defined as  where σ is sigmoid activation function; ∘ is the Hadamard product; x t and h t are described as the input and output of the current node; W xi , W hi , W xf , and W hf represent weight matrix; b i , b f , b o , and b c are the biased values; i t , f t , and o t describe the results of input gate, output gate, and forget gate, respectively; g t is the update condition; and x t is the data input.
Second, the regression learning using ensemble learning framework is proposed. As we all know, an ensemble with a  Journal of Sensors number of sublearners has much stronger generalization ability than that of sublearners. Sublearners are usually generated from training data by a basic learning algorithm such as neural network, SVM, or other machine learning approaches. The diversity of each sublearner is vital to training performance of ensemble learning. Therefore, most ensemble methods use some effective strategies to improve the diversity of sublear-ners [23,24]. Three main kinds of ensemble learning contain Adaboost, bagging, and boosting. As shown in Figure 7, by the data normalization and Hampel filter preprocessing in Section 3.1, the obtained fingerprint is used for data dimension expansion and regression learning. For each sublearner, the random matrix is generated at first and then the LSTM is used for regression learning. The task of learning is to obtain the 9 Journal of Sensors optimal element of random matrix and the parameters of LSTM network. At last, the feature extraction network with optimal element-based random matrix and the load forecasting model is given. In the offline phase, more sublearners are used for regression learning. Figure 8 shows the proposed framework of LSTM network. It can be seen that it contains two LSTM layers, one dropout layer, one full connection (FC) layer, and RELU layer. Both function and parameters of each layer are described as follows: (1) LSTM layer: for the first layer, the length of the state vector unit is 8, the time stride is 1, and the activation function is tanh. For the second LSTM layer, the length of the state vector unit is 16 and the time stride is 1 (2) Dropout layer: the aim of this layer is to reduce the number of actual training parameters by disconnecting neural network randomly. In this paper, the parameter is chosen as 0.1 (3) FC layer: in this layer, it can map the feature from one space to another. The activation function is chosen as RELU (4) RELU layer: the aim of this layer is to mitigate the effect of overfitting At last, the weight parameter is chosen for online forecasting data fusion. In this paper, the weight parameter is determined with the offline training error. If the offline training error of the sublearner is small, the corresponding weight parameter is larger. Otherwise, the weight parameter is small. For ith sublearner, the weight parameter can be described as where e i is the training error of the ith sublearner. Note that in the proposed offline training, the random matrix is chosen as the feature extraction network. The feature extraction network and regression learning network are combined with each other for offline training. Thus, the performance of offline phase can be improved.

Online Phase Description
In this section, when the meteorological data and time information are obtained, the load forecasting model is used for load forecasting. According to the block diagram shown in Figure 9, the load forecasting process is described as follows: First, based on the obtained time information and meteorological data, the data preprocessing proposed in offline phase is used to construct the fingerprint by normalization and Hampel filtering. Then, the fingerprint is chosen as the input data for each load forecasting model. At last, the final load is estimated by the linear weighting method which is given byĝ where g i is the intermediate load forecasting of the ith load forecasting model.

Experiment Parameter and Environment.
In the experiment, the training data are chosen for a community of Suzhou, Jiangsu Province. All the measurement data are collected with the time interval of 15 min in 24 hours. The procedures are based on Windows 10 operating system Python 3.7. The pandas, numpy, and matplotlib library routines in Anaconda and Keras training framework are chosen for offline training.

Performance Evaluation Index.
In this paper, the mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE), and determinate coefficient (R 2 ) are chosen for performance evaluation which are shown in (11)- (14). The above performance evaluation indexes describe the forecasting performance from different aspects. The MAE is defined as the average absolute error between the predicted value and the observed value. The MAPE is the percentage value description of the MAE. The RMSE represents the standard deviation of the difference between the predicted value and the observed value. The determination coefficient is used to evaluate the fitting degree of regression model coefficients. The higher the value is, the better the model is. 10 Journal of Sensors where q n andq n describe the true and forecasting of the load, q is the mean of the load forecasting, and N is the number of load to be forecasting.  Figure 10 shows the training performance of the proposed algorithm, where the parameter of data dimension expansion is 16. We can find that when the number of iterations increases, the MSE of load forecasting will gradually decrease. Taking the iteration number is 80 as an example, the MSE approaches the minimum value of 0.032 which means the loss function converges. Therefore, under this iterative condition, the load forecasting model obtained from offline learning can be used for online prediction estimation.

Performance Description of the Proposed Algorithm.
In this section, the forecasting performances of the proposed algorithm with different parameters are described. Taking the number of training data as 18000 as an example, Figure 11 illustrates the true and forecasting load under different times, when the parameter of data dimension expansion is 16 and the number of LSTM is 2. It can be seen that under different time and meteorological data conditions, the load is changed dramatically, but the load forecasting can be also close to the actual value no matter how the actual load changes. In order to describe the algorithm performance more clearly, Figures 12 and 13 show the statistical error of the load forecasting. When the number of training data increases, the forecasting error will decrease and the forecasting performance can be improved. From the figures, when the number of training data is 18000, the MAE and RMSE are only 94.87 and 146.19, respectively. If the number of training data is more than 9000, the parameter R 2 is close to or larger than 0.8. Therefore, through the statistical analysis, it can be concluded that the proposed algorithm can fully satisfy for practical application. Table 1 describes the statistical error analysis with different numbers of LSTM, when the number of training data is 18000. As expected, when the number of LSTM increases, the load forecasting performance can be improved dramatically. Thus, without considering the training time cost, it is a better method to increase number of LSTM for performance improvement.

Performance Comparison.
In this section, there are two existing load algorithms: (1) the training data is straightly used for training with LSTM (no data dimension expan-sion+no ensemble learning), and (2) the training data dimension is expanded at first, and then the LSTM is used for training (data dimension expansion+no ensemble learning) chosen for algorithm comparison. Figures 14-17 describe the statistical error comparison of different Figure 19: The description of testing data import.

11
Journal of Sensors algorithms, respectively. From the experiment results, it can be seen that the load forecasting performance of all three algorithms becomes better, when the number of training data increases. Taking 15000 training data as an example, the MAE, RMSE, MAPE, and R 2 of the algorithm are 107.76, 162.73, 0.09, and 0.87, respectively. However, the four error statistical parameters of the data dimension expansion+no ensemble learning method, which the perfor-mance is more close to the proposed algorithm, are 161.34, 213.94, 0.13, and 0.78. Thus, according to the statistical error analysis, the proposed algorithm has the best load forecasting performance among three methods. The reason can be concluded as follows. The data dimension expansion technology can describe the feature of training data more clearly. Moreover, the ensemble learning can improve the efficiency of offline learning.   Journal of Sensors 5.5. Hardware Transplantation Experiment. In order to show the performance of the proposed algorithm for practical application, in this paper, the raspberry pi shown in Figure 18 is used for hardware transplantation [25]. First, the deep learning tools, such as Tensorflow and Keras, are installed, and then, all the libraries for the experiment are configured. In the following, the transplantation are described by the trained load forecasting model and testing data. According to Figure 19, the testing data set "history_ weather_15min.csv" is imported in raspberry pi. The month, hour, wind speed, temperature, and pressure in the testing data are chosen for the load forecasting. Then, the obtained load forecasting model in the offline phase is loaded in the hardware platform which is shown in Figure 20. At last, the actual load and load forecasting are written in files "W.xlsx." From the output shown in Figures 21 and 22, columns B and C describe the actual load and load forecasting, respectively.

Conclusions
In this article, a deep learning-based loading forecasting algorithm for power IOT is proposed. In the proposed algorithm, two data preprocessing, min-max normalization and hampel filter, are used to construct the fingerprint of the training data. Then, matrix multiplication method is proposed to extract the fingerprint feature by dimension expansion. Finally, the ensemble learning using multiple LSTM networks is proposed for offline training and obtain power load forecasting models. In the proposed algorithm, the data dimension expansion can obtain more features for training data. The ensemble learning using LSTM can make fully use of the timing sequence of training and improve the generalization performance. The experiment is carried out to evaluate the forecasting performance. Through the experiment results, it can be seen that the proposed algorithm has better load forecasting performance than chosen existing approaches.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that they have no conflicts of interest.