Accurate transfer demand prediction at bike stations is the key to develop balancing solutions to address the overutilization or underutilization problem often occurring in bike sharing system. At the same time, station transfer demand prediction is helpful to bike station layout and optimization of the number of public bikes within the station. Traditional traffic demand prediction methods, such as gravity model, cannot be easily adapted to the problem of forecasting bike station transfer demand due to the difficulty in defining impedance and distinct characteristics of bike stations (Xu et al. 2013). Therefore, this paper proposes a prediction method based on Markov chain model. The proposed model is evaluated based on field data collected from Zhongshan City bike sharing system. The daily production and attraction of stations are forecasted. The experimental results show that the model of this paper performs higher forecasting accuracy and better generalization ability.
Bike sharing systems are in place in many cities in the world and are an increasingly important support for multimodal transport systems [
Bike sharing system consists of bikes, roads, and fixed stations, whose demand has clear deference with motor vehicle and private bike. The demand forecasting model of motor vehicle and private bike can hardly be adapted to bike sharing system [
A full understanding of demand is a crucial step to improve the prediction accuracy. Different bike sharing systems may be divergent; nevertheless, significant influence factors are the same, such as lanes, population, economic and social conditions, festival, workday, weather, and land use [
Easily operated and effective regression model is the main method to forecast the usage of bike sharing system at present, considering important demand influence factors, such as population, weather, workday, land use, and environment [
Above is the summary of the whole demand forecasting of system, which can provide invaluable references to station-level demand prediction. Stations are the basic unit of bike sharing system; station-level demand impacts the system’s planning, design, and operation directly. However, few studies focus on the station-level demand forecasting. There exists an obvious difference between the whole demand of system and station-level demand, due to the station’s particularity and constraint among stations. So the method of the whole demand prediction cannot suit well station-level demand forecasting; more efforts should be taken to find the reasonable prediction methods of station-level demand.
Traditional station-level demand forecasting methods are still mainly based on regression models, which fully consider influence factors [
Please note that the lack of standardized evaluation procedure (data, duration, error metric, etc.) forbids doing a fair comparison between them. Table
The summary of public bike demand forecasting studies.
Authors | City | Timespan | Object | Methods | Error metric | Publishing year |
---|---|---|---|---|---|---|
Fanaee-T and Gama [ |
Washington DC | 2 years | The system demand | Weka’s regressors | Relative Absolute Error (29.98%), Root |
2014 |
Borgnat et al. [ |
Lyon | 2 years + 8 months | Linear regression | Mean Relative Error: 12% | 2011 | |
Giot and Cherrier [ |
Washington DC | 2 years | Various regression | The best performing one clearly beats the baselines | 2014 | |
Xu et al. [ |
Hangzhou | 6 months | A hybrid model: clustering, SVM | Error |
2013 | |
Salaken et al. [ |
Washington DC | 2 years | Fuzzy Inference Mechanism | The min Relative Absolute Error: 75.41% | 2015 | |
Cagliero et al. [ |
New York | 13 months | Bayesian | Precision (79.1%) | 2016 | |
|
||||||
Froehlich et al. [ |
Barcelona | 13 weeks | Station-level demand | Bayesian network | Relative Absolute Error: 0.08 | 2009 |
Rixey [ |
Three operational US systems | 11 months + 6 months + 6 months | Linear regression | Adjusted |
2013 | |
Singhvi et al. [ |
New York | 1 month | Regression | Adjusted |
2015 | |
Kaltenbrunner et al. [ |
Barcelona | 7 weeks | ARIMA | Mean Absolute Error: 1.39 | 2010 | |
Yoon et al. [ |
Dublin | 27 days | Modified ARIMA | Root Mean Square Error 5 min: 0.91, 60 |
2014 | |
Fournier et al. [ |
Ottawa | 1 year | Regression | Error rate (20.8%) | 2017 |
Through above analysis, we can conclude that regression model is an ideal method to forecast the whole demand of bike sharing system. However, regression model is not very suitable for station-level demand forecasting. In addition, it is uncertain whether the other methods, like ARMIA, Bayesian network, and so on, can work well for different bike sharing systems.
The main purpose of this paper is to build up a reasonable and efficient station’s demand forecasting model, which considers most significant influence factors and the constraints among stations. Therefore, we put forward a hybrid model based on Markov chain and have a test to inspect it.
Markov process is widely used in modeling the dynamics of stochastic systems and the state transitions of complex stochastic systems. A Markov process
The probability of
Multistep transition probability can be calculated according to one-step transition probability and the Markov property as follows:
If the transition matrix
According to Markov chain properties in finite state space, stationary distribution
Markov chain is a special case of Markov process defined as follows.
The proposed Markov chain model for bike demand estimation will predict the probability of rental and returns at each station. The probability will then be converted into the actual bike transfer numbers based on the total travel numbers of rental bikes. The detailed algorithm is as follows.
This paper will select a variety of effective models to forecast the whole traffic flow of system, like regression model, neural network, SVM, hybrid models, and so on. According to the prediction accuracy of each model, we choose the best one as the final prediction.
An
Assume
Assume
The model uses the stationary distribution property of Markov chain model; namely,
The steady-state probability vector of bike rental and return can be calculated by (
Zhongshan is a medium-sized city in China, with a population of approximately 600,000 and the urban area of about 170 square kilometers. The city’s bike sharing system was first launched in 2011. Up to now, it has developed with an inventory of about 7000 bikes, which spreads over 167 stations in the urban area. The bike renting requires users to pay by a fare card that is linked to a user account. However, the bike usage is free in the first hour of a day. The mode share of bikes in Zhongshan is currently at 20%, among which shared bikes occupy 5%. The utilization ratio of Zhongshan bike sharing system has reached 97% and average daily usage frequency is 11,500 times. Being welcomed by users and operated stability of Zhongshan bike sharing system is the important reason why we select it as predictive object.
In general, bike sharing systems will experience an unstable period and a stable period from construction planning to normal operations. Zhongshan bike sharing system was put into operation in 2011. In order to ensure the stability of the system within the predicted time, this paper selects data in 2013 as the training data and forecasts the demand in 2014. Excluding the invalid data and the usage information of administrators, we get 5,645,070 times travelling data, including 365 days’ information in 2013 and 224 days’ information in 2014. Each data includes station number, time, user card number, bike number, fee, and other information of renting and returning shared bikes. Some useful information can be obtained by mining these data, like the trip matrix, spatial and temporal information of users, and so on.
This paper selects 167 public bike stations in Zhongshan urban area as the research object. For the convenience of readers to know the distribution of stations’ location and their traffic volumes intuitively, we did a survey of 167 stations’ traffic volumes in 2014, as shown in Figure
The distribution of bike sharing stations’ cumulative usage in 2013 in Zhongshan.
Figure
The prediction of the whole demand is the first step of station’s demand forecasting; the precision of this has a major influence on the final result. The prediction period of this article is shown by day; one prediction is done per day. So, when choosing the demand influencing factors of bike sharing system, we take short-term influence factors as the principal thing. The preliminary influence factors are seasons, holidays, weekends, temperature, rainfall, weather, wind, special case, and so on. In order to avoid the limitation of single model, this paper chooses MLP (neural network), support vector machine, and regression model to predict the daily traffic volume of Zhongshan bike sharing system, respectively. The architecture of MLP is shown below: Input signal
The forecasting results of three methods illustrated in Table
Error rate comparison of these models.
Methods | MLP | SVM | Linear regression |
---|---|---|---|
The average relative error | 0.26 | 0.174 | 0.115 |
|
|||
Description | The train error: 0.149 | (i) |
(i) R: 0.821 |
In the multilayer perceptron neural network, three layers of network (1 hidden layer) are set up, the number of nodes is taken as few hidden nodes as possible, the input layer 7 nodes are selected, the output layer is 1 node, and the hidden layer is 5 nodes. The initial weight value of a random generator is designed to generate a random number of −0.5~+0.5. The minimum training rate is 0.9. The dynamic parameter selection value is 0.7. Allow the error to be generally 0.001~0.00001. The number of iterations is 1000 times. Sigmoid parameter is 0.9.
In the SVM model parameters with RBF kernel, the parameters of the initial options for “
Usually initial parameters often cannot guarantee the optimal model of building; therefore, this paper uses grid optimization method in the set interval (
In linear regression model, parameter selection needs to meet the lowest possible residual and few variables, measure is adjusted
Table
The result of regression model.
Variable | Coefficient |
|
Sig. |
---|---|---|---|
Constant | - | 16.430 | .000 |
Spring | .217 | 8.472 | .000 |
Autumn | .188 | 7.540 | .000 |
Working day | .309 | 12.875 | .000 |
Temperature | .067 | 2.761 | .006 |
The quality of weather | .212 | 8.229 | .000 |
Special case | −.568 | −22.384 | .000 |
The prediction of bike sharing system’s daily demand in 2014 as shown in Figure
The predictable results of system’s daily demand in 2014.
After acquiring the forecasting results of system’s daily trip volume, we forecast the production and attraction of each station. To understand the proposed model to predict effects at different times, the date will be divided into two categories: special date and normal date. Special date refers to the date when a particular event occurs, for instance, the Spring Festival or some extreme weather. In this case, the public bike system is abnormal. As for the normal date, it will be divided into a working day and a nonworking day to test the effect of differences in demand forecasting.
In this paper, we use the real data of 2013 as the original input data of
First, we analyze the overall predicting results. Evaluation index is the average relative error for all sites, as shown in Table
The prediction error of different types of date.
The type of day | Special day | Normal day | |
---|---|---|---|
Nonworkdays | Workday | ||
The amount of data | 23 days | 62 days | 139 days |
The average relative error of all stations | |||
Production | 53.12% | 29.36% | 23.69% |
Attraction | 50.81% | 27.56% | 22.15% |
“Nonworkdays” include normal weekends and national statutory holidays; another date is workday.
As an auxiliary illustration, we use data in 2013 as the original data, prediction of 2014 data, and the timespan is one year, which shows the error condition of the prediction results and the original results in the same period. Figure
The distribution of the average relative error of all stations at all days.
Prediction error of stations’ production
Prediction error of stations’ attraction
All the above is the overall analysis of traffic volume prediction. Next, this paper will analyze the prediction of a single site, making readers have a more clear understanding of the prediction effect. We make an average of 224 days for predicting error on a single site and classified statistics, as shown in Table
The statistical analysis of stations in different level relative error.
The average relative error | Production | Attraction | ||||
---|---|---|---|---|---|---|
The amount of station | PP | the average absolute error | The amount of station | PA | the average absolute error | |
<0.2 | 83 | 70.00% | 12.40 | 69 | 60.21% | 13.16 |
0.2–0.3 | 55 | 21.94% | 8.91 | 77 | 33.20% | 8.98 |
>0.3 | 29 | 8.07% | 6.57 | 21 | 6.59% | 6.44 |
|
||||||
Summation | 167 | 100% | - | 167 | 100% | - |
PP: the proportion of these stations’ production in the system; PA: the proportion of these stations’ attraction in the system.
The distribution of predictive error of stations’ production and attraction.
At present, there is a lack of standardized evaluation system for public bike system, which directly affects the judgment of the results of the forecast. A reasonable and effective evaluation system needs more research. In this paper, the relative error is the main evaluation index and the absolute error is added as a supplementary explanation, which can be more objective for judging the predicting results.
Effective demand forecasting is very important for bike sharing system planning and the daily operation management based on the analysis of the recent public bike demand forecasting research. We built a prediction system based on Markov chain model of site demand, outperforming already developed solutions. With the model performance especially in the study of Zhongshan bike sharing system, we find the following conclusion.
First of all, the whole demand and station-level demand have something different in bike sharing system. For the whole demand forecasting, the regression model can work well but cannot satisfy the station-level demand’s claim. To ensure good predictions, the station-level demand forecasting needs to consider not only the most significant influence factors, but also the constraints among stations which few papers have noticed. And this paper did it with good prediction result.
Second, traffic flows among stations are very uneven, so the single evaluation index can hardly reflect the complete forecasting result, either the relative error or absolute error. Due to the differences in public bicycle travel behavior of different cities, it is difficult to form a unified evaluation method and evaluation index. This paper based on the relative error and absolute error as the main evaluation index built the demand forecasting evaluation method of Zhongshan City public bike system. More data is needed to establish a standardized evaluation process; it will be done in a future job.
Finally, the hybrid model represents an important development direction of demand forecasting methods, which may eliminate inherent defects of single model and withhold advantages of various models. In the future, we will try to take advantage of the hybrid model to correct the forecasting method of this paper.
Simultaneously, we will make efforts in dynamic demand forecasting of bike sharing system in next work.
The authors declare that they have no conflicts of interest.
This work was supported by National Natural Science Foundation of China under Grant no. 51178403, the Fundamental Research Funds for the Central Universities (no. SWJTU11CX080 and no. 2682014CX130), Program for New Century Excellent Talents in University (NCET-13-0977), Chengdu Science and Technology Bureau (no. 2014-RK00-00034-ZF), Science & Technology Department of Sichuan Province (no. 2014RZ0037), and the Science and Technology Innovation Practice Program for Graduate Student, Southwest Jiaotong University (no. YC201507105).