The energy consumption forecast is important for the decisionmaking of national economic and energy policies. But it is a complex and uncertainty system problem affected by the outer environment and various uncertainty factors. Herein, a novel clustering model based on set pair analysis (SPA) was introduced to analyze and predict energy consumption. The annual dynamic relative indicator (DRI) of historical energy consumption was adopted to conduct a cluster analysis with Fisher’s optimal partition method. Combined with indicator weights, group centroids of DRIs for influence factors were transferred into aggregating connection numbers in order to interpret uncertainty by identitydiscrepancycontrary (IDC) analysis. Moreover, a forecasting model based on similarity to group centroid was discussed to forecast energy consumption of a certain year on the basis of measured values of influence factors. Finally, a case study predicting China’s future energy consumption as well as comparison with the grey method was conducted to confirm the reliability and validity of the model. The results indicate that the method presented here is more feasible and easier to use and can interpret certainty and uncertainty of development speed of energy consumption and influence factors as a whole.
Nowadays China is in the middle term of industrialization and urbanization and is the world’s second largest energy consumer. As we know, energy is an essential material base for economic development. Energy consumption skyrockets along with the rapid and steady economic growth, industrialization, and urbanization in China, which has resulted in a serious imbalance between supply and demand of energy [
Many researchers have studied the relation between energy consumption and economic growth in national or regional level and proposed some forecast models for countries such as Turkey, India, Iran, UK, Finland, New Zealand, and China [
This paper introduces a novel clustering model based on SPA for energy consumption prediction to deal with uncertainty relation between energy consumption and its influence factors. And the proposed model is used to forecast China’s energy consumption, and its feasibility and effectiveness are also further discussed.
The set pair analysis theory put forward by Zhao [
In case application, SPA theory can describe certainty and uncertainty by connection numbers in one system. Meanwhile, IDC analysis of set pair is a dynamic process, in which ideality can transfer into discrepancy or contrary with condition changes. Therefore, SPA can be utilized to clearly interpret the uncertainty of energy consumption forecast.
Cluster analysis is widely used in pattern recognition, image analysis, information retrieval, and other fields. Popular notions of clusters include groups with small distances among the cluster members, dense areas of the data space, intervals, or particular statistical distributions [
Basic principle of SPAbased clustering model is depicted as follows. Based on ordered DRIs of historical energy consumption, first conduct cluster analysis with Fisher’s partition method to obtain categories of development speed. Then analyze the uncertainty of influence factors through identitydiscrepancycontrary analysis. And forecast the energy consumption in a certain year according to the similarities of measured influence factors to each category. The corresponding flow chart is sketched as showed in Figure
Flow chart of SPAbased cluster forecast model.
Conduct cluster analysis with Fisher’s optimal partition method for the ascending DRI of historical energy consumption. And calculate group centroid values of energy consumption and influence factors for each cluster.
A criterion is constructed to conduct set pair analysis, and the corresponding formulas are described to transfer mean values of DRIs into connection numbers in order to calculate the identical degree, different degree, and contrary degree between influence factors and reference sets. Then, combined with weights of influence factors, obtain integrated connection numbers.
Express influence factors of energy consumption to be forecasted in a certain year with connection numbers obtained from the IDC analysis, and calculate their similarities to each cluster.
Construct a forecasting model and predict the energy consumption in a specified year.
To combine the concept of IDC with the development speed, suppose that the reference set is
Let
Coefficient value scope of
The intersection point of two linear functions (
Let set
The similarity between set pairs
The model discussed is applied to analyze and predict China’s energy consumption. Data from China statistics yearbook were used to confirm its validity and effectiveness [
Energy consumption and its influence factor from 1990 to 2010 in China.
Year  Energy consumption 
GDP 
Proportion of secondary industry 
Urbanization level 
Price index 

1990  9.87  18.55  41.34  26.41  100 
1991  10.38  20.25  41.79  26.94  112.97 
1992  10.92  23.13  43.44  27.46  131.45 
1993  11.6  26.36  46.57  27.99  179.71 
1994  12.27  29.81  46.57  28.51  212.03 
1995  13.12  33.07  47.18  29.04  230.51 
1996  13.52  36.38  47.54  30.48  253.99 
1997  13.59  39.76  47.54  31.91  277.61 
1998  13.62  42.88  46.21  33.35  275.14 
1999  14.06  46.14  45.76  34.78  277.61 
2000  14.55  50.04  45.92  36.22  320.36 
2001  15.04  54.19  45.05  37.66  321.01 
2002  15.94  59.11  44.79  39.09  321.3 
2003  18.38  65.04  45.97  40.53  345.07 
2004  21.35  71.59  46.23  41.76  378.55 
2005  23.6  79.69  47.37  42.99  435.36 
2006  25.87  89.79  47.95  44.34  487.17 
2007  28.05  102.51  47.34  45.89  508.12 
2008  29.14  112.39  47.45  46.99  612.75 
2009  30.66  122.74  46.3  48.34  546.59 
2010  32.5  135.39  46.75  49.95  635.65 
Ordered relative numbers of energy consumption and corresponding DRIs of influence factors.
Year  DRI of energy 
DRI of GDP  DRI of proportion of 
DRI of urbanization level  DRI of price index 

1998  1.0022  1.0785  0.9720  1.0451  0.9911 
1997  1.0052  1.0929  1.0000  1.0469  1.0930 
1996  1.0305  1.1001  1.0076  1.0496  1.1019 
1999  1.0323  1.0760  0.9903  1.0429  1.0090 
2001  1.0337  1.0829  0.9811  1.0398  1.0020 
2000  1.0349  1.0845  1.0035  1.0414  1.1540 
2008  1.0389  1.0964  1.0023  1.0240  1.2059 
1991  1.0517  1.0916  1.0109  1.0201  1.1297 
1992  1.0520  1.1422  1.0395  1.0193  1.1636 
2009  1.0522  1.0921  0.9758  1.0287  0.8920 
1994  1.0578  1.1309  1.0000  1.0186  1.1798 
2002  1.0598  1.0908  0.9942  1.0380  1.0009 
1993  1.0623  1.1396  1.0721  1.0193  1.3671 
1995  1.0693  1.1094  1.0131  1.0186  1.0872 
2007  1.0843  1.1417  0.9873  1.0350  1.0430 
2006  1.0962  1.1267  1.0122  1.0314  1.1190 
2005  1.1054  1.1131  1.0247  1.0295  1.1501 
2003  1.1531  1.1003  1.0263  1.0368  1.0740 
2004  1.1616  1.1007  1.0057  1.0303  1.0970 
According to Table
Interval of energy consumption and mean values of DRIs of influenced factors for each cluster.
Category  DRI interval for energy 
Number of samples  DRI of GDP  DRI of proportion of 
DRI of 
DRI of 


[1.00, 1.05)  7  1.0873  0.9938  1.0414  1.0796 

[1.05, 1.08)  7  1.1138  1.0151  1.0232  1.1172 

[1.08, 1.10)  3  1.1272  1.0081  1.0319  1.1040 

[1.10, 1.20]  2  1.1005  1.0160  1.0336  1.0855 
Connection numbers between influence factor and reference set.
Category 





GDP  0.544 + 0.272 
0.557 + 0.264 
0.564 + 0.259 
0.550 + 0.268 
Proportion of secondary industry  0.497 + 0.302 
0.508 + 0.295 
0.504 + 0.298 
0.508 + 0.295 
Urbanization level  0.521 + 0.287 
0.512 + 0.290 
0.516 + 0.290 
0.517 + 0.290 
Price index  0.540 + 0.275 
0.559 + 0.262 
0.552 + 0.267 
0.543 + 0.273 
Integrated connection number  0.525 + 0.284 
0.534 + 0.279 
0.534 + 0.278 
0.529 + 0.281 
Relationship between the number of clusters and the objective function.
In the year of 2010 China’s practical information about energy consumption was used to test and verify this model (see Table
Forecasted results and comparison with the gray method.
Year  Integrated connection number  Similarity 
Proposed model  GM(1, 1) model  






2010  0.539 + 0.275 
0.0167  0.0062  0.0059  0.0115  1.085  1.024 
2015  0.528 + 0.282 
0.0035  0.0070  0.0073  0.0017  1.105  1.051 
As noted above, the method proposed here overcomes drawbacks of conventional methods based on single type information and a static perspective. And it will enable us to provide a more comprehensive background for the characterization of energy consumption and to make appropriate energy policies of a developing country. However, the forecast of energy consumption involves various factors of incompatibility, complexity and diversity, combination, and dynamic uncertainty. Consequently, it would be important to clarify effects of factors on the energy forecast in various time frames. The same weights of influence factors used to calculate the integrated connection degree in the case study may neglect the importance of indicators and effect on the prediction. To provide more information about the most sensitive parameters and improve the forecast accuracy, considerable amount of work both on the sensitivity analysis and on the comparison with other methods is required to conduct with actual indicator weights in future.
Based on mean values of influence factors within 5 years from 2006 to 2010, the energy consumption in 2015 can be predicted with this discussed method as showed in Table
To provide reliable data for the decisionmaking of macroeconomic policy, a rational forecast model for energy consumption is of significance since welltargeted policies and reasonable measures are indispensable for rational energy consumption forecast. However, energy consumption forecast is a complex and uncertainty problem due to interactive factors. In this study, based on historical data of China’s energy consumption and influence factors, a novel clustering forecast model based on SPA was presented to analyze energy consumption. Some conclusions can be drawn as follows.
The results indicate that this novel method used to forecast energy consumption is feasible and effective and convenient for practical applications. This cluster forecast model provides a potential method for other uncertainty problems.
The expressions in terms of connection number for group centroids of influence factors can depict the certainty and uncertainty of development speed as a whole.
Based on the similarities of DRIs of influence factors, interaction among the influence factors and similar information between historical samples and prediction object can be taken into account in the proposed model. Although our work has provided a useful clustering tool for making full use of similar information from historical samples for the energy consumption forecast and analyzing the certainty and uncertainty of evaluation indicators from three aspects embracing identity, discrepancy, and contrary, further investigations will still be in progress with sensitivity analysis to clarify effects of indicators on the prediction in various time frames.
The authors declare that there is no conflict of interests regarding the publication of this paper.
Financial support provided by the National Natural Sciences Foundations, China (no. 41172274 and no. 71273081), is gratefully acknowledged. The authors would also like to express their sincere thanks to the reviewers for their thorough reviews and useful suggestions.