Data-Driven Approach for the Short-Term Business Climate Forecasting Based on Power Consumption

With the fast development of intelligent data-mining technologies, some advanced arti ﬁ cial intelligence approaches are widely developed and employed to help the decision-making of enterprises and government. The application of advanced and intelligent approaches successfully helps the enterprises and government ﬁ nd out the valuable information hidden in the massive economic data. This study presents a novel data-driven approach to forecast the short-term business climate using the electric power consumption data of large enterprises. In addition, the climate conditions, interactions between di ﬀ erent industries, the business cycle, and some other related variables are also considered and included in the developed forecasting model. To be speci ﬁ c, the business climate prediction model based on support vector machine (SVM) is proposed ﬁ rstly, and the human-simulated particle swarm optimization algorithm (HSPSO) in our previous work is employed to optimize the parameters of the developed forecasting model. Secondly, a novel power-consumption-based business climate index (BCI) system is developed and comprehensively analyzed. The developed BCI system that contains the index for each separate industry (BCI-I), the index for tertiary industry (BCI-T), the index for secondary industry (BCI-S), and the index for the entire province (BCI-P) is proposed. In addition, the developed BCI system is employed to normalize the output of SVM-based forecasting model to directly indicate the business climate, which is very important to the decision-making of enterprises and government under the background of smart cities. Finally, the real data of Guangdong province in China, including the gross output values (GOV) and detailed power consumptions of more than 38000 enterprises, are employed to test the proposed approach. Experimental results show that the GOV of each industry and the whole society predicted by HSPSO-SVM matches the real data well. Moreover, the predicted BCI can directly indicate the business climate in advance, which is of great value for economic-decision and policy-making of both enterprises and government.


Introduction
Prediction of economic indices is of great value for economic-decision and policy-making of both enterprises and government. The worldwide scholars have studied different methods to predict the gross domestic product (GDP), energy demand, power load, stock market, and so on [1][2][3][4][5][6]. For example, Sozen and Arcaklioglu predicted the net energy consumption of Turkey with the gross national product (GNP) and GDP [7]. Peng et al. studied the method to predict China's stock market with the nonferrous metal price volatility [8]. Wu and Wang predicted the railway freight traffic time series with the maximum Lyapu-nov exponent [9]. Kaur et al. proposed the reforecasting method for the enhanced power load prediction [10]. Azadeh et al. predicted the electrical energy consumption with the artificial neural network (ANN) and genetic algorithm (GA) [11]. Liu and Fang forecasted the industrial output value using the collected power consumption data [12].
The electric power, which appears everywhere in the modern society as the commonest production input, can directly reflect the production activity. With the development and installation of smart meters, the detailed data of electric power consumption can be collected, and much valuable information can also be achieved by analyzing these data. In this paper, the detailed power consumption data of large enterprises and some other related variables, e.g., the climate conditions, the interactions between different industries, the characteristics of business cycle, and the law of time series, are employed to predict the gross output values (GOVs). In addition, the business climate indices (BCI) are also proposed to directly indicate the business climate of each industry and the whole society. Considering the promising performance of support vector machine (SVM) on data fitting and predicting, SVM is employed in the proposed GOV prediction model. Moreover, to improve the generalization ability of the proposed model, the human-simulated particle swarm optimization algorithm (HSPSO) in our previous work [13] is used to optimize the parameters of SVM.
Guangdong, located in the south of China, is one of the most developed provinces. In 2013, GDP of Guangdong reached more than 6.2 trillion RMB and exceeded that of Korea. Recent years, Guangdong has become the biggest manufacturing base of China and will certainly play a significant role in promoting the development of China and the world. Some statistical data of Guangdong province for the past 4 years, including the detailed power consumptions and GOVs of more than 38000 large enterprises and the monthly temperatures, are employed to test the proposed approach via numerical experiments.
The rest of this paper is organized as follows. Section 2 presents an overview of the related works. Section 3 presents the HSPSO-SVM-based GOV prediction model. In Section 4, the BCI system including BCI-I, BCI-S, BCI-T, and BCI-P are proposed. Section 5 gives the experimental setup, and the experimental results are reported and analyzed in section 6. Finally, this paper is concluded in section 7.

Economic Indices Prediction.
With the development of data mining approaches in the past decades, a lot of valuable information is achieved by collecting and analyzing the massive data generated in different fields. In recent years, with the proposition of the concept "big data," different kinds of data with various forms (e.g., text, audio, and even video) and generated in any area are employed to create valuable information. In the economic field, prediction of economic indices attracts the attention of worldwide scholars. For example, Long and Wang studied the prediction model of large-scale curves to predict GDP of multiregions of China [14]. Marjanovic et al. predicted the GDP growth rate based on carbon dioxide emissions. In their study, the relationship between economic growth and carbon dioxide emissions is considered as one of the most important empirical relationships, and the extreme learning machine (ELM) is employed to establish the prediction model [15]. Camacho and Garcia-Serrador employed an extension of the Euro-Sting singleindex dynamic factor model to forecast the short-term quarterly GDP growth of euro area [16]. Qiao and Chu examined the ability of fine wine price in forecasting GDP for major developed countries. According to the provided experimental results, the fine wine price contains useful information as an input variable to forecast GDP of the US, UK, and Australia [17]. Barnett et al. forecasted the GDP growth and inflation of UK, and the comparison of different models with time-varying parameters is also conducted in their study [18]. Sokolov-Mladenovic et al. proposed an economic growth forecasting method based on ANN and ELM. In their prediction model, the details of trade in services, exports of goods and services, and imports of goods and services are employed to predict the prospective economic growth [19]. Suganthi and Samuel proposed a modified econometric model to illustrate the interaction between energy consumption, economy, technology, and environment. According to the experimental results provided, the performance of their approach compared favorably against the traditional series regression model [20].
2.2. Support Vector Machine. SVM [21,22] can be classified as the support vector classification and support vector regression. As a computationally powerful tool for supervised learning, SVM is widely used in classification and regression of various real-world problems [23,24]. SVM is principled and implements structural risk minimization that minimizes the upper bound of the generalization. The classical SVM tries to find a function that has at most ε deviation from the actually obtained targets for all the training data and at the same time is as flat as possible. The SVM does not care about the errors as long as they are less than ε; however, they will not accept any deviation larger than this. Compared with other data-driven algorithms, e.g., the ANN [25], SVM owns better prediction ability and has a wider application in a variety of real-world problems.
Assuming fðx 1 , y 1 Þ, ⋯, ðx l , y l Þg is a set of given training samples, where x i ⊂ R n ði = 1, 2, ⋯, lÞ represents the input variables of each sample and the corresponding output variable of x i is y i ⊂ R m ði = 1, 2, ⋯, lÞ; l represents the number of training samples. SVM applied to regression solves an optimization problem described as subjected to Training error : where C is a weight parameter for balancing the complex items of the model and training error, called the penalty factor; b is a threshold value; ε is the insensitive loss function; ξ i and ξ * i are the upper and lower training errors subject to the ε insensitive tube y i − hw, is not in the tube, there is an error ξ i or ξ i * which would like to minimize in the objective function. SVM avoids underfitting and overfitting the training data by minimizing the regularization term 1/2kωk 2 and training error as shown in Equation (3) [26].

Wireless Communications and Mobile Computing
By solving the dual problem in Equation (2), the lag range factors a i and a * i can be obtained, so the regression equation coefficient ω is The SVM regression equation is as follows where Kðx i , xÞ is the SVM kernel function. The radial basis function (RBF), which can be formulized as Equation (6), is the one of the most widely used kernel functions.
In the RBF-based SVM model, the parameters penalty factor C and width of kernel γ affect the performance of SVM significantly. As a result, different optimization algorithms are used to optimize these two parameters to improve the performance of SVM, e.g., SVM optimized by PSO [27,28], the genetic algorithm (GA) based SVM, and so on [29].

GOV Prediction Model Based on HSPSO-SVM
The existing studies about GOV prediction are mainly based on the GOV time series [8], financial variables [16], and some other economic indicators [30]. As electric power is the most direct production input in the modern society, there must exist interaction between electric power consumption and economic output. In addition, different from the other kinds of energy, the phenomenon of inventory does not exist in the electric power. As a result, the variation in power consumption is always regarded as an indicator of economic situation. Based on the idea of "any changes in economic output, power consumption in advance," the electric power consumption data is employed to predict the GOV based on SVM and HSPSO algorithms. Moreover, some other variables that may affect the economic output are also employed in the proposed model, e.g., the climate conditions, the interactions between different industries, the characteristics of business cycle, the law of time series, and so on.

Structure of GOV Prediction Model.
First of all, the economy is classified into several industries for detailed analyzing. To predict the monthly GOV of a certain industry, the former monthly GOVs of this industry for a period are required to reflect the law of time series. Then, considering the relationship between electric power consumption and economic output, the former monthly power consumptions of this industry are employed as the input variables. The former monthly GOVs and power consumptions of other industries, which represent the recent macroeconomic situation and the interactions between different industries, are also included in the model. Moreover, considering that the GOV from January to December in different years may have the similar trend (for example, the GOV of February is always the lowest because of the Chinese spring festival), the monthly sequence number (from 1 to 12) is also regarded as the input variable. The introduced sequence number is used to reflect the aforementioned annual business cycle. Finally, the monthly temperature, which affects the electric power consumption significantly, is also included in the model. As discussed above, to predict the GOV of industry I 1 in month N, structure of the prediction model is shown as Figure 1, in which PC denotes the electric power consumption data. Obviously, to predict the GOVs of n industries, n independent models are required. Figure 1, the SVM is employed in the GOV prediction model. 11 kinds of variables are used as the input of SVM when predicting the GOV of industry I 1 , and the output of SVM is the corresponding predicted value. In the proposed model, the RBF is employed as the kernel function because RBF is proved to outperform the other kernel functions for a lot of problems [31][32][33]. The RBF kernel function is formulized as

HSPSO-SVM-Based GOV Prediction. As shown in
where γ represents the width of kernel and varies from 0 to 1.
As the penalty factor C and the width of kernel γ affect the regression quality of SVM significantly, these two parameters need to be optimized to improve the generalization ability. The HSPSO algorithm proposed in our previous work is proved to be effective on real-valued problem optimization [13]. As a result, HSPSO is used to optimize C and γ. In another word, the HSPSO is used to effectively optimize the parameters in SVM model, in order to obtain promising forecasting performance. Specifically, each particle of HSPSO contains 2 dimensions, which represent the parameters C andγ, respectively. Then, the fitness function is defined to evaluate a certain values of C and γ by testing a set of samples. Finally, run the HSPSO to find the global optimum, which represents the best values of C and γ under the tested samples.
The K-fold cross validation (K-CV) method [34] is used to define the fitness function of HSPSO. To be specific, in the case of certain C and γ values, the training samples are classified into K groups. Then, predict the output of each group by training the SVM model with the rest groups. Finally, the average square value of all the prediction errors is regarded as the error of the previous C and γ values. The fitness function of HSPSO with K-CV method is formularized as 3 Wireless Communications and Mobile Computing where K denotes the number of groups in K-CV; N K denotes the number of samples within the kth group; and GOV i and GÕV i denote the real value and predicted value of the ith GOV. As mentioned above, flow chart of the GOV prediction model based on HSPSO-SVM is shown in Figure 2

Business Climate Index System
Economic indices are used to directly indicate the economic situation in advance. In the modern society, many economic indices are proposed and applied in the economic analysis, business decision, and policy-making. For example, the purchasing managers' index (PMI) [35] indicates the development or recession of economy. The PMI system, which covers manufacturing industry, tertiary industry, or even construction industry, has been established in many countries. The consumer price index (CPI) [36] reflects the changes in price level of goods and services and can directly indicate the currency inflation or deflation.
In this section, the BCI system is studied to normalize the output of GOV prediction model. The BCI system is reported quarterly and contains the following parts: BCI of each industry (BCI-I), BCI of the secondary industry (BCI-S), BCI of the tertiary industry (BCI-T), and BCI of the whole province (BCI-P). BCI-I and BCI-T are directly computed based on the output of HSPSO-SVM model. BCI-S is a weighted sum of BCI-I, and the weight of each BCI-I is equal to the proportion of the corresponding industry. BCI-P is a weighted sum of BCI-S and BCI-T, and the weights of BCI-S and BCI-T are equal to the proportions of the secondary industry and the tertiary industry, respectively. The aforementioned BCI system can be formularized as Equations (10) to (13).
where BCI-I

BCI-S
where ρ i represents the weight of the ith industry; ρ s and ρ t represent the weights of the secondary industry and the tertiary industry, respectively. Note that the primary industry (agriculture) is not yet included in the BCI system. As shown in Equations (10) to (13), the BCI system indicates the business climate from a perspective of year-on-year growth rate. 100.0 is the warning line of BCI, i.e., BCI above 100.0 indicates a growth in GOV and BCI below 100.0 indi-cates the opposite. Moreover, the further distance between BCI and the warning line indicates the greater change may occur in business climate.

Experimental Set up
In this section, the real data of Guangdong province is used to test the proposed prediction approach and the BCI system. Guangdong has more than 38000 industrial enterprises above designated size (IEADS), and about 87% GOV of the entire secondary industry is contributed by these enterprises. The detailed GOV and power consumption data of all these IEADSs from 2010 to 2014 are used in the following experiments.

Industries Classification and Data Preprocessing.
According to the economic features of Guangdong province, the secondary industry of Guangdong is classified into the following 11 industries: electronic information, electrical machinery, petrochemical, textile and apparel, food and drinks, construction material, paper and printing, pharmaceuticals, auto, mining and metallurgy, and others. According to this classification, the detailed data of 38000 IEADSs are preprocessed as Figure 3, in which each of the 11 industries has a separate GOV time series and a power consumption time series after preprocessing.

Normalization and antinormalization.
Based on the prediction model described as Figure 1, the GOV and power consumption data from 2010 to 2014 are transformed into 48 samples. As the order of magnitude of the original GOV and power consumption data is about 10 8 , which exceeds the appropriate input of RBF kernel, all the original data need to be normalized with [37] where x represents the original data and y represents the data after normalization; x min and x max represent the minimum and maximum values of the original data, respectively; and y min and y max represent the lower and upper bound of the target range, which are set to 1.0 and 2.0, respectively. After the normalization, all the GOV and power consumption data are normalized into the interval [38]. As a result, output of the prediction model needs to be antinormalized asx whereỹ represents the output value of the prediction model, andx represents the absolute predicted GOV after antinormalization. x min , x max , y min , and y max are the same as Equation (14).  Table 1.

Experimental Result and Analysis
Experimental results are shown in Figure 4, in which the blue line represents the real data, the solid red line represents the fitting data, and the dotted red line represents the predicting data. It can be concluded from Figure 4 that the GOV of each industry can be effectively predicted by HSPSO-SVM. Although the deviations still exist, e.g., 3-2013 of petrochemical industry, 2-2013 of paper and printing industry, and 2-2013 and 7-2013 of pharmaceuticals industry, most of the predicted GOV match the real data well. As the information center of Guangdong power-grid corporation is established in 4

Prediction of BCI-I.
The prediction of quarterly BCI-Is of the 11 industries of Guangdong province is shown in Figure 5, in which the solid line represents the historical data and the dotted line represents the predicted BCI-I. As shown in Figure 5, the historical and the prospective business climate of each industry can be directly indicated by BCI-I. For example, as shown in Figure 5, all the BCI-I curves in 2013 are above the warning line except the auto industry, which indicates that the business climate of most industries in 2013 is fine. Specifically, according to the BCI-I curves of 2013 shown in Figure 5, the growth rates of electronic information industry, textile and apparel industry, and the auto industry will speed up significantly, especially the auto industry (BCI-Is of auto are predicted to be 98.80, 100.86, 101.56, and 105.55 in the four quarters of 2013, respectively). BCI-Is of the mining and metallurgy industry, construction material industry, and food and drinks industry maintain in a relatively high level beyond the warning line, which indicates that these industries will achieve a steady growth. Finally, the BCI-I curves of the rest 5 industries are beyond the warning line but fluctuate significantly, which indicates a variation in the growth rate.
In the first quarter of 2014, as summarized in Figure 6, all the BCI-Is are above the warning line except others industry, which indicates that all the industries can achieve different degrees of growth in GOV, except a little drop in other industry. Specifically, in the first quarter of 2014, the 11 industries can be classified into 3 groups to be analyzed: (1) BCI-Is of construction material, auto, and pharmaceuticals industries, which are dyed green in Figure 6, are predicted to be the top 3 industries. BCI-I of construction material industry reaches 101.82 in the first quarter of 2014, which is predicted to be the fastest growing industry; BCI-Is of auto and pharmaceuticals industries reach 101.77 and 101.72, respectively, which are also predicted to achieve fast growth rates (2) BCI-Is of mining and metallurgy industry, petrochemical industry, and other industry, which are dyed red in Figure 6, are predicted to be the last 3 industries. BCI-Is of mining and metallurgy industry and petrochemical industry are predicted to be 100.42 and 100.37, respectively, which indicates a relatively low growth rate. BCI-I of other industry is 99.98, which is a little below the 100.0 warning line, indicating that a slight decline may occur in this industry (3) BCI-Is of the rest 5 industries, electronic information, food and drinks, electrical machinery, paper and printing, and textile and apparel, which are dyed blue in Figure 6, are predicted to achieve moderate growth rates in the first quarter of 2014 Compared with the real data, in the first quarter of 2014, the GOVs of construction material, auto, and pharmaceuticals industries are 66.27, 96.32, and 18.99 billion RMB, respectively, which achieve the year-on-year growth rates of 17.85%, 16.04%, and 15.76% and are the fastest growing industries. The GOVs of mining and metallurgy, petrochemical, and other industries are 87.06, 216.17, and 333.24 billion RMB, respectively, which achieve the year-on-year F ir s t-q u a r te            10.50%, 6.52%, 10.37, and 3.21%, respectively, are moderately growing industries. As discussed above, prediction of BCI-I based on the HSPSO-SVM model matches the real data well and can effectively indicate the business climate in advance.
6.3. Prediction of BCI-S, BCI-T, and BCI-P. According to the real data of 2013, GOV of the secondary industry reached 10790.76 billion RMB, and GOV of the tertiary industry reached 2968.90 billion RMB. As discussed in Section 6.1, weights of the secondary industry and the tertiary industry are set to 78.42% and 21.58%, respectively. Moreover, within the secondary industry, proportions of the 11 industries are shown in Figure 7, and the weight of each industry is listed in Table 2. According to the predicted GOV based on      Figure 8. BCIs of the same period in 2013 are also listed in Figure 8 for comparison. As shown in Figure 8, all the BCIs are above the 100.0 waning line, which indicates that the business climates of the secondary industry, the tertiary industry, and the whole society of Guangdong province in the first quarter of 2014 are fine. BCI-S and BCI-P in the first quarter of 2014 are close to those of 2013, which indicates that the business climates of the secondary industry and the whole society are the same as in 2013. However, there is a 0.13 drop in BCI-T, which indicates that the growth rate of the tertiary industry in the first quarter of 2014 may be a little lower than that of 2013. According to the real data, GOV of the secondary industry of the first quarter in 2014 is 1649.15 billion RMB, with a year-on-year growth rate of 9.46%, which is a little higher than 1506.59 billion RMB and 8.07% in 2013. GOV of the tertiary industry of the first quarter in 2014 is 673.87 billion RMB, with a year-on-year growth rate of 10.10%, which is lower than 11.82% in 2013. As analyzed above, it can be concluded that the predictions of BCI-S, BCI-T, and BCI-P based on HSPSO-SVM model match the real statistical data well. The prediction of BCI can effectively indicate the business climate in advance, which is of great value for economic-decision and policy-making of both enterprises and government.

Conclusions
A novel approach to predict the business climate is proposed in this paper, and the BCI system is also established. The output of this study can be summarized as follows.
(1) The GOV of each industry is effectively predicted by the electric power consumptions, the law of GOV time series, the macroeconomic situation, the interactions between different industries, and the climate conditions (2) The SVM is employed as the GOV prediction model, and the HSPSO algorithm in our previous work is used to optimize the parameters of the proposed model to improve the generalization ability (3) The BCI system is established to normalize the output of the GOV prediction model. BCI-I, BCI-S, BCI-T, and BCI-P are included in the BCI system and can used to indicate the business climate of each industry, the secondary industry, the tertiary industry, and the whole society, respectively The real data of Guangdong province, including the GOV and electric power consumption data of more than 38000 IEADSs, are used to test the performance of the proposed approach. Experimental results show that the predicted GOV of each industry matches the real data well, and the proposed BCI system can directly and effectively indicate the business climate of Guangdong province in advance. In a word, this paper provides an approach to predict the business climate by electric power consumption and some other related variables. The proposed approach has been applied in Guangdong power-grid corporation and achieved promising performance.

BCI:
Business climate index BCI-S: BCI of the secondary industry BCI-P: BCI of the whole province IEADS: Industrial enterprises above designated size SVM: Support vector machine HSPSO-SVM: Support vector machine optimized by HSPSO ANN: Artificial neural network GDP: Gross domestic product PMI: Purchasing managers' index BCI-I: BCI of each industry BCI-T: BCI of the tertiary industry GOV: Gross output value PSO: Particle swarm optimization K-CV: K-fold cross validation HSPSO: Human-simulated PSO ELM: Extreme learning machine GNP: Gross national product CPI: Consumer price index.

Data Availability
The prior studies and data are cited at relevant places within the text as references [12].