Electricity consumption of metro stations increases sharply with expansion of a metro network and this has been a growing cause for concern. Based on relevant historical data from existing metro stations, this paper proposes a support vector regression (SVR) model to estimate daily electricity consumption of a newly constructed metro station. The model considers some major factors influencing the electricity consumption of metro station in terms of both the interior design scheme of a station (e.g., layout of the station and allocation of facilities) and external factors (e.g., passenger volume, air temperature and relative humidity). A genetic algorithm with fivefold crossvalidation is used to optimize the hyperparameters of the SVR model in order to improve its accuracy in estimating the electricity consumption of a metro station (ECMS). With the optimized hyperparameters, results from case studies on the Beijing Subway showed that the estimating accuracy of the proposed SVR model could reach up to 95% and the correlation coefficient was 0.89. It was demonstrated that the proposed model could outperform the traditional methods which use a backpropagation neural network or multivariate linear regression. The method presented in this paper can be an adequate tool for estimating the ECMS and should further assist in the delivery of new, energyefficient metro stations.
A metro system plays an important role among urban mass transit systems and has a number of advantages over other public transportation modes in metropolitan areas, such as having more reliable services, being able to transport much larger volume of passengers, and being more environmentally friendly. In China, metro networks have exploded in recent decades as the population has rapidly urbanized nationwide. For example, the total length of the metro network in Beijing has reached about 591.7 km, connecting a total of 361 stations since 2017; by the end of 2020, it is projected that the network will be expanded to over 900 km, which expects to accommodate 552 stations in total.
Although the metro is one of the most energyefficient transportation modes, the electricity consumption of a metro system rises significantly with the continuous increase in its operation mileage. Data from the Beijing Subway shows that the whole electricity consumption of all lines added to 1.71 billion kWh in 2016, nearly three times the amount in 2010. Clearly, it has given a serious cause for concern that the overall level of the electricity consumption of the metro system will continue to go up as its network keeps expanding. Furthermore, according to statistical data derived from the Beijing Subway, the electricity consumption of a metro station (ECMS) has taken up approximately half of the total electricity consumption of an entire metro system. To reduce the ECMS is therefore of great significance to cut down the whole electricity consumption of a metro system [
The ECMS describes the full amount of electricity consumed within a metro station, involving all of its subsystems such as HVAC (heating, ventilation and airconditioning), lighting, and other facilities (e.g., platform screen doors and escalators) [
Despite the abovementioned facilities and equipment, how the layout of a station would impact on the ECMS has not been explicitly considered in the existing studies. It turns out that the ECMS may differ markedly across stations, given similar passenger volume and the same facilities. In other words, the ECMS depends largely on the design scheme, especially the spatial structure of the station. Therefore, it is important to establish an estimation model on ECMS considering the design of metro stations, which is an essential tool for the proper design of metro stations for energy saving.
Energy audit and identifying the critical influencing factors on ECMS are the foundation of developing estimation models on ECMS. Fu and Deng [
In recent years, a series of linear models have been proposed for estimating energy consumed by stations, assuming a linear relationship between the energy consumption and its influencing factors. A few examples are as follows. Yang et al. [
To improve the prediction accuracy, the nonlinear backpropagation neural network (BPNN) model has been applied to predict the ECMS based on historical data in the metro system of Hong Kong [
Another issue stems from the size of data for model training. With respect to the prediction of ECMS, a metro network, which at the early stages of development may not provide sufficiently large data samples. A data sample of fairly small size would also render the BPNN model ineffective [
In this study, a SVRbased model is developed to predict the daily ECMS, considering both the internal (e.g., layout of a station) and external factors with respect to operating a station. Given the fact that hyperparameters of the model may have significant impact on its prediction accuracy [
The remainder of the paper is organized as follows. Section
In this section, some major factors influencing the ECMS are described in detail. These factors, which serve as input variables in the proposed model, can be generally categorized into two groups: 1) factors relating to the interior design scheme and 2) other external factors to consider.
Previous studies (e.g., [
In metro stations, escalators and elevators are equipped to improve the quality or security of the service during operating periods. Electrical power consumption by these vertical transportation facilities is related to their quantity and height, which will also be included in the model specification as input variables.
As mentioned above, platform screen doors installed on platforms may also be related to the ECMS, as they could effectively minimize additional heat effect from the underground tunnels when being fully enclosed at the platform edge. This factor will not be considered in this since most of the subway lines in Beijing are equipped with screen doors.
Weather is a key factor influencing the cooling load of centralized airconditioning and ventilation subsystems of the metro stations. Air from outside the stations could bring in a certain amount of heat and moisture, which increases the cooling load of the airconditioning subsystem. In this regard, relative humidity and temperature of the outdoor air should be taken into account as input variables of prediction model.
In addition to weather, passengerflow volume is another major external factor that contribute to the ECMS. Internal heat of a metro station builds up as more and more passengers enter the station. Therefore, the total number of the passengers entering or leaving a metro station should also be taken as an input variable for specifying the model.
As discussed above, the input variables of the ECMS prediction model are listed in Table
Summary of input variables.
Classifications  Influencing factors  Input variables  Unit 

The interior design scheme of metro station  Concourse scale  The area of concourse ( 
m^{2} 
Platform scale  The area of platform ( 
m^{2}  
Plant room scale  The area of plant room ( 
m^{2}  
Staff accommodation room scale  The area of staff accommodation room ( 
m^{2}  
VT facilities  The quantity of VT facilities ( 
—  
The height of VT facilities ( 
m  


External factors  Temperature  Average temperature ( 
°C 
Humidity  Relative humidity ( 
%  
Passenger demand  The total number of passengers ( 
Person/day 
This section describes the development of a SVRbased model for estimating the ECMS, with its three hyperparameters (denoted by
Let
In Equaion (
where
The loss function defines a “tube” (see Figure
Graphical details of
By introducing the positive slack variables
The optimization problem formulated in Equation (
where both
Finally, by introducing a kernel function,
Using the kernel function, a feature space of any dimension can be solved without calculating the map function
where
With selection of appropriate hyperparameters,
According to [
Despite numerous studies looking into the optimization of the hyperparameters, there is still a lack of metrics over which the set of these hyperparameters would be the most suitable for the SVR model. The value of
Flowchart of hyperparameter optimization algorithm.
GA parameter settings.
Population size  Maximum generation  Crossover probability  Mutation probability 

65  60  0.9  0.2 
An individual of the initial population.
These hyperparameters are commonly tuned by minimizing the validation error [
To implement a
(a) Process of crossover, and (b) the process of mutation.
The final SVR model was obtained by the LibSVM toolbox [
Process of building the SVR model to predict ECMS.
This section demonstrates a case study example of estimating daily ECMS on the Beijing metro by using the above proposed method. Section
Considering all the above specified input variables, a historical dataset of 12 metro stations (station I∼ station XII) of a same Beijing metro line was available. The training dataset is composed of historical data of stations I ∼ station XI, covered the period from August 1, 2016 to August 30, 2016. Table
Partial data in training dataset.
Station ID 











I  975  3225  17234  8612  8  7  32  26  37803  9.990 
II  872  4364  3488  1744  12  12  32  26  36510  9.769 
III  656  2695  2879  1313  10  12  32  26  37695  13.149 
IV  896  3498  15743  7570  5  9.2  32  26  35468  11.373 
V  887  5317  3784  1623  12  12.3  32  26  38948  10.162 
VI  635  3769  3562  1974  11  12.4  32  26  39683  10.965 
VII  885  3191  17220  8587  8  7  32  26  36669  11.191 
VIII  834  4336  3467  1682  12  12  32  26  37202  11.457 
IX  602  2635  2867  1301  10  12  32  26  39211  11.353 
X  841  3416  15672  7545  5  9.2  32  26  37713  6.124 
XI  788  5300  3748  1610  12  12.3  32  26  37400  6.176 
The data was preprocessed through the minmax normalization as follows:
where
In this section, numerical cases are implemented to verify the effectiveness of the SVR model. Absolute percentage error (APE), standard deviation (SD), correlation coefficient (CC), relative root mean square error (RRMSE) were used as evaluation indicators. Table
Evaluation indicators and their calculations.
Indicators  Calculation 

APE 

SD 

CC 

RRMSE 

As mentioned above, hyperparameters
Figure
The predicted results of the 10th test sample.
Prediction accuracy of the SVR model.
Sample ID  Actual value  Prediction value (×10^{6} kWh)  SD  APE (%)  RRMSE (%) /CC 

1  10.430  10.243  0.201  1.79  1.39/0.89 
2  10. 043  10.026  0.112  0.17  
3  10. 334  10.159  0.193  1.70  
4  9.820  10.032  0.119  2.16  
5  10.635  10.393  0.140  2.28  
6  10.661  10.562  0.160  0.93  
7  10.289  10.242  0.201  0.46  
8  10.193  10.073  0.142  1.17  
9  10.199  10.249  0.185  0.50  
10  10.111  10.034  0.204  0.77 
The regression curve of predicted values.
From Figure
As shown in Table
Apart from the proposed SVR model above, both the BPNN [
The BPNN model consists of three layers: an input layer, a hidden layer and an output layer. Let
where,
In addition, comparable result of the MLR model was obtained directly by using the IBM SPSS Statistics 20 software. Note that the sample data for training both the BPNN and MLR models were the same as the one used for SVR model. The predicted results by different models are illustrated in Figure
Predicted results by the SVR, BPNN, and MLR model.
Predicted results of SVR model, BP neural model and MLR model.
Model  Prediction value  Prediction result  

CC  RRMSE (%)  Maximum APE (%)  Average SD (×10^{6} kWh)  Maximum SD (×10^{6} kWh)  
SVR  0.89  1.39  2.28  0.166  0.204 
BPNN  0.75  2.86  4.67  0.345  0.465 
MLR  0.61  2.27  5.19  —  — 
In Figure
In this subsection, the holdout validation method is also implemented in the process of hyperparameters optimizing to verify the superiority of the validation scheme applied in this paper, i.e., fivefold crossvalidation. Figure
Comparison of different validation methods.
Comparison between fivefold crossvalidation method and holdout validation method.
Method  Prediction value  Prediction result  

CC  RRMSE (%)  Maximum APE (%)  Average SD (×10^{6} kWh)  Maximum SD (×10^{6} kWh)  
Fivefold crossvalidation  0.89  1.39  2.28  0.89  0.204  
Holdout validation  Sample set A  0.83  5.76  7.95  0.83  0.538 
Sample set B  0.65  4.01  9.26  0.65  0.613 
Impact of influence factors on model prediction result.
Missing variable  Maximum APE  CC 

Average temperature  9.02%  0.37 
Relative humidity  8.52%  0.48 
Area of concourse  6.66%  0.76 
Area of platform  7.60%  0.71 
Area of staff accommodation room  6.58%  0.84 
Area of plant room  5.98%  0.88 
Quantity of escalators or elevators  6.76%  0.81 
Height of escalators or elevators  6.89%  0.81 
Number of passengers  6.00%  0.74 
As illustrated in Figure
Nine influencing factors of the ECMS forecasting model are analysed in this section. Firstly, each input variables are removed in turn. Then the hyperparameters optimization algorithm is implemented based on the surplus eight input variables of training data, and the optimization process would be repeated 20 times. With the 20 sets of hyperparameters, the SVR model is trained based on the training data which consists of eight input variables. Finally, the predicted results can be obtained by giving the input data of prediction samples. Table
As shown in Table
In this subsection, a new metro station apart from the above existing stations is employed to validate the performance of the proposed SVR model in estimating the ECMS during design stage. The realworld parameters and energy consumption of this station over ten days are collected and given in Table
Testing dataset.
Sample ID 











1  635  3769  1974  3562  11  12.4  32  26  39683  10.965 
2  635  3769  1974  3562  11  12.4  19  87  38505  9.976 
3  635  3769  1974  3562  11  12.4  30  63  37022  9.742 
4  635  3769  1974  3562  11  12.4  27  85  36761  10.181 
5  635  3769  1974  3562  11  12.4  27  85  36141  10.236 
6  635  3769  1974  3562  11  12.4  25  78  35300  10.380 
7  635  3769  1974  3562  11  12.4  27  85  36236  9.910 
8  635  3769  1974  3562  11  12.4  26  50  34974  9.690 
9  635  3769  1974  3562  11  12.4  24  57  35047  10.289 
10  635  3769  1974  3562  11  12.4  26  32  36126  9.790 
The prediction results by the proposed SVR model are shown in Figure
The prediction results of a real case.
This paper proposes a new approach to estimating the ECMS given data of a small sample size. The major factors influencing the ECMS are discussed, including average air temperature, relative humidity, area of some key components (i.e., station concourse, platform, staff accommodation room and plant room) of a station, number of passengers, and both number and heights of escalators/elevators. All the above nine variables are proposed as the input variables of a SVR model, and the hyperparameters of the SVR model is optimized by GA. The case studies based on actual data validated the effectiveness of the proposed SVR and demonstrated the SVR model could achieve higher prediction accuracy than a BPNN model and a MLR model. The proposed SVR model provides a promising alternative approach to predicting the ECMS of new metro stations.
The prediction of traction energy consumption for a new metro line is the next step of our research, as the amount of energy consumed by train traction also accounts for a large proportion of the overall energy consumption in a metro system.
The data used to support the findings of this study are included within the article, through tables. Further details may be available to the reader, from the authors, upon request.
The authors declare that they have no conflicts of interest.
This research is supported by National Natural Science Foundation of China (71571016 and 71621001).