Probabilistic Forecasting Method of Metro Station Environment Based on Autoregressive LSTM Network

. With the increasing number of metros, the comfort and safety of crew and passengers in metro stations have been paid great attention. The environment forecasting has become very important for decision-making. The outputs of the traditional point prediction methods are some exact values in the future. However, it might be closer to the real conditions that the predicted variables are given a probability range with a diﬀerent conﬁdence rather than exact values. This paper proposes a probabilistic forecasting method of metro station environment based on autoregressive Long Short Term Memory (LSTM) network. It has a good performance to quantify the uncertainty of environment trend in a metro station. Seven-day ﬁeld tests were carried out to obtain the measured data of 7 internal environmental parameters in a metro station and 8 external environment parameters. In order to ensure the prediction performance, the random forest algorithm is used to select the input variables for the proposed probabilistic forecasting method. The selected input variables and the previous predicted values are as the input variables to build the probabilistic forecasting model. The proposed method can realize to predict the probabilistic distribution of internal environmental parameters in a metro station. This work may contribute to prevent emergency events and regulate environment control system reasonably.


Introduction
e metro is one of the most efficient public transport modes to solve the problem of traffic congestion in urban areas [1]. However, the continuous increase of passengers brings some negative environmental problems [2,3]. erefore, it is necessary to analyze the environment trend in a subway station and develop a relative accurate model to predict the internal environmental parameters of the subway station [4].
In recent years, a data-based empirical modeling is a widely used alternative to mechanistic modeling since it requires less specific knowledge of the studied process [5][6][7][8].
In previous studies, the goal of environmental prediction is to obtain exact future values. Xiao-Ping et al. made the research progress of air pollution prediction based on artificial neural network [9]. Chen and Shao improved the traditional Back Propagation (BP) [10] neural network algorithm by adding momentum factor and changing learning rate. e established new model was applied to the urban air quality prediction [11]. Wang et al. used genetic algorithm to optimize the initial weights and threshold of the BP neural network in simulation [12]. Lu and Viljanen developed a network by nonlinear autoregressive with external input (NNARX) model and genetic algorithm, and it showed the suitability of neural networks to perform predictions [13]. However, the actual future results are affected by many uncertain factors, and it is very difficult to give accurate prediction values. Kamal et al. investigated the effectiveness of Artificial Neural Network (ANN) model for predicting the ambient air quality.
is study illustrates that ANN can simplify and speed up the computation of the ambient air quality and provides an interesting alternative to air quality monitoring [14]. Bodri andČermák developed an artificial time-delay feed-forward neural networks to predict Surface Air Temperatures (SAT) for six hours up to one day, and the model provided a good fit with the measured data [15]. Ramedani et al. proposed a new methodology based on ANN for generating daily GSR data [16]. Huibing put forward BP neural network prediction method to solve the problem that the environment temperature measurement accuracy is not high and it has large time delay. Simulation results show that the accuracy of temperature measurement has been significantly improved, especially on measurement delay [17]. Qu et al. developed a modeling method based on sliding time window Random Vector Functional Link Neural Network (RVFLNN) and solved the problem of slow computing speed with big data [18].
eir study improved the prediction speed while ensuring the prediction accuracy. All these research studies have good ability to model the nonlinear and dynamic system and can realize the accurately prediction.
e goals of these methods are to get exact values in every time steps. However, the real results in practice can be affected by many factors. It may be more reasonable to predict their probability distributions with different confidences rather than exact values. A good forecasting is to make predictions for an uncertain future, and its forecasting results should be shown in a form of probability distributions [19][20][21]. Probabilistic forecasts serve to quantify the uncertainty in the future, and they are an essential method to make an optimal decision [22]. Compared with exact prediction, probabilistic forecasts give more information. It can reveal the possible variation range of predicted parameters and determine whether the parameters exceed the maximum allowable values and its probability of occurrence. us, it can help an environment control system to adjust its operation for extreme and rare events [23,24]. is will prevent some emergency accidents [25]. ere are many studies of probabilistic forecasts in some fields. Aznarte investigated the convenience of quantile regression to predict extreme concentrations of NO 2 , and they improved the probabilistic forecasting and allowed for the prediction of the full probability distribution, which in turn allowed to build models for the tails of this distribution [26]. Wan et al. proposed an Extreme Learning Machine (ELM) based on probabilistic forecasting method for wind power generation using the historical wind power time series as the inputs alone [27].

Testing
Instrument. An environmental monitor, named CPR-KA, as shown in Figure 1, is used to investigate the environmental conditions. Its pump suction rate is 300 mL/ min and data sampling period is 2 minutes. is equipment uses highly sensitive electrochemical sensors to monitor the concentrations of environmental pollutants, SO 2 and NO 2 , uses Photoionization Detector (PID) and infrared sensors to monitor concentrations of VOC and CO 2 , respectively, uses light scattering sensors to monitor concentration of PM 10 , and uses integrated temperature and humidity sensor to monitor temperature and RH. It can measure a variety of internal environmental parameters and pollutant concentrations. Its measurement range and accuracy are listed in Table 1.

Measured Metro Station.
e measured metro station is a transfer station with full-height platform screen doors. It is an underground station, which adopts a separated island platform design pattern. e design parameters of Heating Ventilation and Air Conditioning (HVAC) system are as follows: (1) Rated conditions of HVAC system: the dry-bulb temperature is 28°C and the range of relative humidity is 40%-70% in the station platform for summer rated conditions (2) Ventilation rate: the ventilation air volume in the platform is 5.78 × 10 4 m 3 /h and the fresh air is 1.08 × 10 4 m 3 /h e environmental monitor is located in the middle of the platform and 1.2 m above the platform ground, as shown in Figure 2. e 8 external parameters that may affect the internal environmental parameters in the metro station are also collected at the same time. e passenger flow and the arrival frequency of metro vehicle are automatically recorded. Typical external atmospheric parameters, including outdoor temperature and RH, are collected from http://data. cma.cn/. Typical outdoor air quality data, including PM 10 , CO, NO 2 , and SO 2 , are obtained from http://beijingair. sinaapp.com/. During the 7-day investigations, a total number of 2800 observed environmental data are collected from the metro station. In this paper, we define some terminologies. Passenger flow, arrival frequency of metro vehicle, outdoor temperature, outdoor RH, outdoor PM10, outdoor CO, and outdoor NO 2 and SO 2 are defined as the external environmental parameters. Eight external environmental parameters are the input variables of the forecasting model. Seven parameters collected in the metro station, including CO 2 , VOC, SO 2 , NO 2 , PM 10 , temperature, and RH, are defined as the internal environmental parameters. ey are the output variables of forecasting models.

Influence Analysis for Input Variable Selection.
In order to eliminate the influence of irrelevant variables on the model performance, this paper uses the random forest algorithm to obtain the influence of external environmental variables on the predicted internal variables in training and prediction [28][29][30]. According to the results of influence analysis, the key variables can be selected as the input variables of the network. Figure 3 shows the procedure of external variables' influence analysis. e random forest algorithm is used to analysis the degree of influence, W, of external environmental variables, V, on the prediction parameter, Y. e threshold value, g, is set. When w i > g, the corresponding external variable will be retained into the input variables, X, of autoregressive LSTM network. Note that Y and X are time series; therefore, they can also be denoted as Y t and X t , respectively.
Random forest algorithm is based on the ensemble learning method [31] and uses the decision tree as a basic learner [32]. First, the influences of V on X is calculated in one decision tree; then, the average of all decision trees is calculated to get final influence W. For a specific prediction variable, Y, the above process is as follows [33]: represents concatenation. Extract K bootstrap datasets [34], D k K k�1 , from D, and in the meanwhile remains K Out-Of-Bag (OOB) datasets, D k K k�1 .
(3) Train the kth decision tree regression model C k with dataset D k , and calculate its prediction accuracy, E k , using the corresponding OOB dataset, D k .
(4) Add noise to external parameter, v i , in the OOB dataset, and calculate the prediction accuracy of model C k again, and the changed accuracy after adding noise is denoted as When all the decision trees are processed using the above steps, the degree of influences can be calculated with equation (1) for a given external variable, v i : where E k is the prediction accuracy without any disturbance to the external parameters when training the kth decision tree; E k i is the prediction accuracy after adding noise to external variable; and K is the number of decision trees. e degree of influences, W, of V on Y can be obtained using the above steps. e larger the w i , the greater the variable's contribution to the predictor. erefore, we extract input variables by setting a threshold value, g. e external variable will be retained as an input variable when w i > g. Denote these input variables as X � x i Z i�1 , as shown in Figure 3. Because g is a learnable parameter, we determine the optimal g after comparing different values.

Procedure of Probabilistic Forecasting Method.
e probabilistic forecasting method proposed in this paper can obtain the Gaussian distributions of the predicted environmental parameters in the future time points based on past observations. e overall procedure is summarized in Figure 4 and it contains the following four steps: Step 1: environmental data preprocessing. e internal environmental parameters in station platform were measured every 2 minutes for 7 days starting from 21 October 2019. e Butterworth lowpass filter algorithm is used to deal with the raw data [35,36]. e useful signal and noise are separated and the high-frequency interference signals are filtered out [37]. e transfer function of Butterworth low-pass filter is given by  where N is the order of filter; w is the frequency, rad/s; and w c is cut-off frequency. e frequency-amplitude curve of a filter includes passband and stopband and transition band. For passband and stopband, its two rules are shown in equations (3) and (4), respectively: where w p and w s are edge frequencies of passband and stopband, respectively; δ s is the deviation of amplitude between filter and ideal filter in stopband; and δ p is the deviation of amplitude between the filter and ideal filter in passband.
Step 2: degree of influence of external variable and input variable selection. Although eight types of external variables are measured, they have different influences on the internal environmental parameters in the metro platform. If all the external variables are taken as the input variables of the forecasting model, it might lead to worse prediction results. e presented probabilistic forecasting method based on the autoregressive LSTM neural network is a machine learning algorithm. Its input variables have a great impact on the performance of machine learning algorithm [38][39][40]. In general, the collected data are not entirely suitable as input variables of neural network. It is significant to reduce the number of input parameters in order to avoid overfitting and accelerate the training speed of the model. It has been used in some research studies and has obtained good results, and these research results showed the important role of input variable selection [41,42]. Seven internal environmental variables are collected in the metro station, including CO 2 , CO, CH 2 O, VOC, SO 2 , NH 3 , NO 2 , PM 10 , temperature, and RH. In this paper, the probabilistic forecasting method will predict these parameters. e predicted parameters are denoted as Y. Correspondingly, external environmental variables are denoted as where M � 8. e influence of external variables V on prediction parameter Y will be analyzed using the random forest algorithm. Denoting the degree of influence of external environmental parameters by W � w i M i�1 . e greater the w i of the external environmental parameter, the greater its influence on the internal environmental parameter, and vice versa. We denote a v 1,t External variable influence analysis and input variable selection Figure 3: External variable influence analysis and input variable selection.
Step 1: environmental data preprocessing Step 2: degree of influence of external variable sand key variable selection Step 3: training autoregressive LSTM network models in a metro station Step 4: predicting environmental parameters manually threshold value, g, to select external environmental parameters. erefore, variables whose weights are less than g are eliminated in order to exclude their negative influences on model performance.
Step 3: training autoregressive LSTM network models. After Step 2, the external environmental parameters with high degree of influence are selected as the inputs of the autoregressive LSTM network together with historical data at the previous time step [43]. Denote the input variables for each prediction parameter by , where Z is the number of input variables. e prediction parameters are the internal environmental parameters in the metro station. Because different prediction parameters have different input variables, it is necessary to adjust the structure of LSTM network in order to better predict the changes of different environment parameters. e number of input layer nodes in the network structure is Z, and it is equal to the number of input variables. e training dataset and the test dataset are divided by 7 : 3; the first 70 percent of observations are used to develop the prediction models and the remaining 30 percent of observations are used as a test dataset.
Step 4: predicting the environmental parameters in a metro station.
In this prediction process, we use the same network structures and parameters in the training process. However, in this process, there is a slight difference from the training process. e prediction variables are known in the training process, but they are unknown in the prediction process. In order to continue the prediction process, we use the rolling window prediction that can feed the last outputs back as the input until the end of the prediction range.

Model of Autoregressive LSTM Network.
For internal environmental parameter prediction, it is important to build a conditional distribution. us, the proposed model can be denoted as where t 0 is the time point which splits the past and the future; τ is the length of prediction range; Y t 0 +1: t 0 +τ and Y 1: e first half part named the condition range contains the past information, and the remaining part is called the prediction range. e model utilizes the past values of prediction variable, Y 1: t 0 , and the external variables, X 1: t 0 +τ , to predict the future values, Y t 0 +1: t 0 +τ . Y t 0 +1: t 0 +τ is assumed to be unknown at prediction time, and X 1: t 0 +τ is known external variables.
For each time point, the problem can be parametrized by the output h t of an autoregressive LSTM network: where h is a function implemented by LSTM cells; Y t is internal environmental parameter Y at time t; ℓ(·) is the likelihood to fit the distribution of predictive variables; and θ(·) is a function that computes the parameters of the likelihood.
e autoregressive model means that the observation at last time step, Y t− 1 , and the previous output of the network, h t− 1 , are fed back as inputs for the next time step. e likelihood, ℓ(Y t | θ(h t , Φ)), is a fixed distribution whose parameters are given by a function θ(h t , Φ) of the network output h t .
It is significant to choose a good distribution for the proposed model. e environmental parameters are assumed following the Gaussian distribution according to the research results by some researchers [44,45]. It is greatly convenient to construct the LSTM network because the Gaussian distribution has mean and variance. us, for the study in this paper, the distribution of Y is determined as Gaussian distribution, so the likelihood can be denoted by equation (7), and its parameters, the mean μ and standard deviation σ, are given by equations (8) and (9). e mean is given by an activation function of the network output, and the standard deviation is obtained by applying an activation function followed by a softplus activation function: where μ and σ are the mean and standard deviation, respectively; h t is the network output; and w and b are weights and bias of nonlinear transformation, respectively. For the training and forecasting processes, their network structures are the same, but there is a slightly difference to calculate Y, as shown in Figure 5. For the training process, the values of Y are assumed to be known, but they are unknown in prediction process. e value of Y at the last time step can be the input of model. In order to continue the prediction, a sampled value should be obtained from the distribution of the last time step. ey will be described and discussed in Sections 3.3 and 3.4, separately.    Mathematical Problems in Engineering parameters of the likelihood, θ t � θ(h t , Θ), using equations (8) and (9). Finally, the model parameters are optimized using

Training Process.
where h t is the output of the network and Y t is the actual value of prediction variable. In equation (10), our goal is to maximize the probability of Y t in the predicted Gaussian distribution. μ and σ can be optimized directly via stochastic gradient descent by computing gradients. Figure 5(b) illustrates the prediction process. e network structure and the parameters in the training process are same as the prediction process. However, the inputs of prediction network are different from training network, and the actual values of prediction variable are unknown in the time range of [t 0 + 1: t 0 + τ]. erefore, Y t 0 +1: t 0 +τ ∼ P Φ (Y t 0 +1: t 0 +τ | Y 1: t 0 , X 1: t 0 +τ ) can be obtained from the prediction distribution and used as the one of the input values for the next time steps, and the inputs of the prediction network are (Y t− 1 , X t ), h t and Y t at each time step t. Here, t ∈ [t 0 + 1: t 0 + τ].

Prediction Process.
By rolling window prediction, the distributions at all prediction time steps could be given. e whole prediction is as follows. First, h t 0 is obtained from the end of training process. en, h t 0 +1 is calculated with the inputs of X t 0 +1 , Y t 0 , and h t 0 . After getting the network output, h t 0 +1 , the Gaussian likelihood, ℓ(Y t 0 +1 | θ t 0 +1 ), can be built. Finally, Y t 0 +1 ∼ ℓ(Y t 0 +1 | θ t 0 +1 ) is drawn and fed back for the next point t 0 + 2.

Results of Data Preprocessing.
e collected environmental data are processed by these steps: removal and replacement of outliers, missing data imputation, and noise smoothing. Figure 6 shows a part of data after preprocessing.

Time Series Processing for LSTM Network.
Probability forecasting method is based on autoregressive LSTM network, so the input variables need to be processed, as shown in  Figure 8 shows the results of the influence of external environmental parameters on the internal environmental parameters.
From Figure 8, the results can be observed: (1) Passenger volume is the main influence factor for carbon dioxide concentration and temperature in the metro station (2) RH, PM 10 , and NO 2 in the outside atmosphere have obvious contributions for RH, PM 10 , and NO 2 in the metro station, respectively For the proposed method, an external environmental parameter will be retained as an input variable when its influence degree, w i , is larger than g. So, g is regarded as a hyperparameter to determine the input variables of the network. In general, a grid search and a manual search are the most widely used strategies for hyperparameter optimization [46][47][48]. Because most of the influence degrees are less than 0.3 according to Figure 8, all available values of g are set as 0.1, 0.2, and 0.3, respectively. e grid search method is used to select the optimal value of g. Each value of g is applied to the model to calculate its RMSE. e value of g with smallest RMSE is selected as the final g. e RMSE is calculated with where T is the length of prediction values and Y t and Y t are predicted and actual data, respectively. Table 2 shows the results of different values of g. ere are 4 kinds of g values. We use every value of them to select the input variables and train different models. e RMSE of these models are shown in Table 2. e value of g corresponding to the minimum RMSE value is considered to be an optimal one, which is a bold font in Table 2.
Finally, every internal environmental parameter has optimal g value according to Figure 8 and Table 2. For example, the minimum RMSE of CO 2 is 0.227, and its corresponding g value is 0.1. e influences exceeding 0.1 are PF and NO 2 . us, the two external parameters, PF and NO 2 , are used to predict CO 2 concentration in a metro station.
From Table 2 and Figure 8, we can draw the following conclusions: (1) Different internal environmental parameters have different optimum g values. (2) e smaller the RMSE value, the better the performance of the model, so the models of CO 2 , VOC, PM 10 , and RH have the best performance with g � 0.1 For these parameters, g is determined as 0.1. (3) For SO 2 , NO 2 , and TEM, g is determined as 0.2.

Results of Environmental Prediction.
e collected 2800 data from the metro station are adopted for predictions. We built different prediction models to predict the corresponding internal environmental parameters. e proposed model based in LSTM network has input layers, hidden layers, and output layers. Because the input variables determine its input layer nodes, different models have different input layer nodes. e number of input layers is decided by its degree of influence. Each model has 2 hidden layers with    Mathematical Problems in Engineering nodes number of 64 and 16, respectively. It has 1 output layer with node number of 2, which represent the mean and variance of normal distribution. e training and test datasets are divided according to a ratio of 7 : 3. e time step is set to 120 s, and the number of training iterations is set to 1000.

Results Based on the ree-Sigma Rule of Distribution.
e evaluation of the proposed model is based on the threesigma rule of distribution. ree-sigma rule is an empirical rule stating that, for many reasonably symmetric unimodal distributions, almost all of the population lies within three standard deviations of the mean [49]. erefore, we define three ranges of predicted normal distribution in different intervals,  Figure 9 shows that the prediction results of probabilistic forecasting method in three ranges. Table 3 shows that the propagation of the actual values falling in different ranges. e traditional ANN results are also calculated and shown in Figure 9, in which "Actual," "ANN prediction," and "Probabilistic mean" represent the measured data, the results of ANN prediction, and the results of the proposed method, respectively.
As shown in Table 3, we calculate the proportion of the actual values falling in three-sigma limits for seven internal environmental parameters in metro station. e higher the proportion, the more accurate the probabilistic distribution and the higher the prediction accuracy. e model accuracy can be verified by analyzing and comparing the proportion. In the range of [μ + σ, μ − σ], the proportion will be 68.3% if it obeys the standard normal distribution. So, our goal is to make the propagation of predicted results in this range exceed 68.3%. Similarly, in the ranges of [μ + 2σ, μ − 2σ] and [μ + 3σ, μ − 3σ], our goals are 95.4% and 99.7%, respectively. e mean proportion of these ranges is calculated and listed in the last row in Table 3. Some conclusions can be obtained from Table 3: (1) e results show that the mean proportion in three intervals is 76.75%, 93.00%, and 98.12%, respectively. is means that the normal distribution predicted by our model can effectively cover the change range of predicted variables and give different probability intervals according to three-sigma limits.
(2) e proportions of CO 2 , VOC, SO 2 , PM 10     On the contrary, the proportions of NO 2 and RH are lower than 68.3%. In the range of [μ + 2σ, μ − 2σ], the proportions of CO 2 , VOC, SO 2 , and PM 10 are greater than 95.4%, and they are higher than ones of NO 2 , TEM, and RH. Finally, in the range of [μ + 3σ, μ − 3σ], only the proportions of VOC and PM 10 are greater than 99.7% and the proportions of other variables are lower than 99.7%. In particular, the proportions of NO 2 and RH are smaller than other variables in the range of [μ + 2σ, μ − 2σ] and [μ + 3σ, μ − 3σ]. erefore, we draw a conclusion that the prediction accuracy values of NO 2 and RH are lower than other variables, but those of VOC and PM 10 are higher than other variables.
(3) Most of the ANN prediction results fall into these three intervals, as shown in Figure 9. It means that most of the ANN prediction results are included in the range of normal distribution predicted by the probabilistic forecasting model.

Result Comparison with the ANN Model.
In order to further reveal the prediction performance of the presented probability forecasting method, it is compared with the traditional ANN. As shown in Table 4, we calculated the proportions of the ANN results falling in three-sigma limits for seven internal environmental parameters in metro station. Some conclusions can be obtained from e outputs of the probabilistic forecasting model are a series of normal distributions, and the outputs of the ANN model are a series of exact values. We use the mean, μ, of the probabilistic forecasting model to calculate RMSE and compare it with the ANN model. In addition, the improvement performance of the probabilistic forecasting method compared with the ANN method is calculated with equation (12) and listed in Table 5: where PI is the relative percentage of performance improvement and Y ANN and Y prob are the prediction result RMSE of traditional ANN and the proposed methods, respectively. e results show that the RMSE values of the presented model are lower than the ones of ANN. e mean of all PIs is 0.58, which means 58% improvement over ANN on average. ese data illustrate that if only using the mean value to evaluate, the probabilistic forecasting model is closer to the actual values, and its accuracy is better than that of the ANN.

Conclusion
A probabilistic forecasting method for the internal environment in metro station is proposed on the basis of the autoregressive LSTM network. is method can predict the probabilistic distribution of internal environmental parameters in a metro station. e 2800 observations from the measured metro station are used to illustrate the proposed model and its performance is well compared with ANN. Some results can be obtained from our study: (1) e random forest algorithm is used to analyze the degree of influence of external environmental parameters on the predicted variables. For CO 2 and temperature, the results show that the passenger flow is the most important influence parameter. Other   internal environmental parameters, such as RH, PM 10 , SO 2 , and NO 2 , are mainly influenced by their corresponding parameters in the outside atmosphere, respectively. (2) e proposed model can build a conditional distribution between past data and future data, and its results are a series of distributions of mean and standard deviation. e proportions of the actual values falling in three-sigma limits are 76.75%, 93.00%, and 98.12%, respectively, which shows the reliability of the proposed model. (3) Compared with the ANN model, the proposed model can predict the internal environmental parameters in the metro station platform. e results show that there are 69.13%, 90.61%, and 96.50% of ANN prediction results falling in three-sigma limits on average, respectively, which is lower than the proportions of probabilistic forecasting. In addition, the proposed model has 58% improvement over the ANN on average if using its mean of predicted distribution to compare.
e above results show that probabilistic forecasting model is more suitable for predicting internal environmental parameter in a metro station than the ANN. e most important contribution of the proposed method is that it can provide extreme prediction values in future time, such as the upper and lower boundaries with corresponding probability. erefore, it can support the decision-making and give more information to adjust the operation of HVAC system in a metro station.
ere are many factors affecting the internal environmental parameters in the metro station. In this paper, only 8 external variables are selected as the input variables of the prediction, and this may miss some useful variables, such as outdoor weather.
is may reduce the accuracy of the prediction to a certain extent. In the future research, more factors may be considered to increase the reliability of the model. Multiple metro stations will be considered to collect their experimental data in the future in order to improve the performance of the model.

Data Availability
e author have no right to share data.

Conflicts of Interest
e authors declare that they have no conflicts of interest.