Research Examine the Prediction Error of Ride-Hailing Travel Demands with Various Ignored Sparse Demand Effects

The accurate short-term travel demand predictions of ride-hailing orders can promote the optimal dispatching of vehicles in space and time, which is the crucial issue to achieve sustainable development of such dynamic demand-responsive service. The sparse demands are always ignored in the previous models, and the uncertainties in the spatiotemporal distribution of the predictions induced by setting subjective thresholds are rarely explored. This paper attempts to ﬁll this gap and examine the spatiotemporal sparsity eﬀect on ride-hailing travel demand prediction by using Didi Chuxing order data recorded in Chengdu, China. To obtain the spatiotemporal characteristics of the travel demand, three hexagon-based deep learning models (H-CNN-LSTM, H-CNN-GRU, and H-ConvLSTM) are compared by setting various threshold values. The results show that the H-ConvLSTM model has better prediction performance than the others due to its ability to simultaneously capture spatiotemporal features, especially in areas with a high proportion of sparse demands. We found that increasing the minimum demand threshold to delete more sparse data improves the overall prediction accuracy to a certain extent, but the spatiotemporal coverage of the data is also signiﬁcantly reduced. Results of this study could guide traﬃc operations in providing better travel services for diﬀerent regions.


Introduction
Ride-hailing services achieve shared mobility between travel demand and idle supply via Internet matching. By effectively improving vehicle utilization, this energy-efficient mode of transportation reduces fuel consumption, traffic congestion, and vehicle emissions, which is conducive to urban sustainability [1][2][3]. Benefiting from the rapid development of information and communication technologies, the service platform has collected a large number of order data, which also provides strong support for the analysis of travel behavior, traffic management, and energy consumption [4,5]. Compared with traditional taxi services, online booking effectively integrates the time and locational information between passengers and vehicles, which reduces the waiting time and improves the overall efficiency [6]. To meet the diversified and customized travel demand of residents in different urban areas, a variety of ride-hailing services have been constantly updated and evolved, which effectively alleviates the problems in traditional transport services, for example, taxi-hailing difficulty in rush hour and low service levels [7,8]. However, due to the high heterogeneity of individuals' travel requests in different times and spaces [9][10][11], the problem of mismatching between drivers and passengers is still common; for example, cruising drivers spend much time finding passengers, and travel requests cannot be responded on time during peak hours.
us, accurate short-term prediction of ride-hailing travel demand is needed and timely to solve the mismatching problems.
A large number of existing studies have explored the temporal and spatial patterns of travel demand in ridehailing [12][13][14][15]. ey usually focus on central urban areas where demand is high and omit or ignore the areas with sparse demands by setting a threshold of the minimum demand. Because the effects of the areas with sparse travel demand are relatively small, the treatment of the small travel requests does not influence the overall prediction performance of the models. Although private car ownership is abundant in cities [16], it possibly leads to a social equity issue in practice as passengers living in rural areas or less dense areas will have to face a lower level of services anyhow. Moreover, from a methodological point of view, the selection of the minimum demand threshold based on experience lacks a unified standard. More importantly, an improper setting of the minimum demand threshold may unavoidably increase the uncertainty of demand distribution and spatiotemporal sparsity of travel demand, which misleads the demand prediction and vehicle dispatch. A proper method that considers the sparse travel demand is needed. In particular, the effects of incorporating sparse travel requests into the overall demand prediction, which has been overlooked for a long time in the existing literature, need to be further investigated. erefore, in this paper, we attempt to assess the effects of the sparsity difference of ride-hailing travel requests in different urban areas on travel demand prediction. We comparatively utilize three deep learning methods that have been proposed in the literature to verify the influence of the choice of minimum demand threshold on the accuracy of short-term travel demand prediction. To overcome the issues of square-based spatial partition, that is, two different kinds of adjacent neighbors between each square, sideconnected and corner-connected, hexagon partition, which is considered to have a symmetric and equivalent distribution, is adopted. Specifically, three hybrid models that combine a convolutional neural network and long shortterm memory (H-CNN-LSTM), convolutional neural network and gate recurrent unit (H-CNN-GRU), and hexagon partition and convolutional LSTM (H-ConvLSTM) are applied. e results of the models are evaluated using the big data from Didi Chuxing GAIA Initiative, an open data project of Didi company [17]. e rest of this paper is organized as follows. e "Related Works" section briefly reviews the existing literature on traffic demand prediction with an emphasis on the use of deep learning methods. e method adopted in this paper will be presented in the "Methodology" section. e "Experimental Result" section will introduce the data and present the experimental implementation details, followed by the conclusions and outline directions for further research in the "Conclusions" section.

Related Works
How to effectively mine the potential spatiotemporal characteristics of mobility patterns and accurately predict travel demand using ride-hailing data has been increasingly concerned in the domains of demand-responsive mobility. In this section, we mainly discuss the related work of travel demand prediction and sparse demand processing.
Early researches on traffic prediction are mainly based on time series models, such as the autoregressive integrated moving average (ARIMIA) model and its integration with other models [18][19][20][21][22]. Pavlyuk [23] compares ARIMA with different vector autoregressive models to discuss the influence of temporal aggregation on the spatiotemporal prediction accuracy of traffic flow. Machine learning models provide researches with more options. Zhu et al. [24] utilize a linear conditional Gaussian (LCG) Bayesian network (BN) model to consider both continuous and discrete variables for short-term traffic flow prediction. e results indicate that the prediction accuracy increases significantly when both spatial data and speed data are included. Lu and Zhou [25] propose a short-term highway traffic state prediction method based on a Kalman filtering model, which highlights the advantage of combining a polynomial trend model and historical patterns. Li et al. [26] combine the complementary advantages of wavelets analysis and least squares support vector machine (LS-SVM) models to predict short-term travel demand. Compared with the other models, the proposed model not only has better prediction performance but also is capable of capturing the nonstationary characteristics of short-term traffic dynamic systems. Phiboonbanakit and Horanont [27] compare the prediction results of standard random forest (RF), decision tree (DT), and gradientboosted regression tree (GBDT) with actual operational data to reveal the mobility patterns of taxi trajectories. Liu et al. [28] propose a combined model of the random forest model (RFM) and ridge regression model (RRM) and take environmental and meteorological factors into account to predict the taxi demand in hotspots. e results indicate that the prediction effect of the combined model is better than those of RFM and RRM. Li et al. [29] develop a Markov-based time series model (MTSM) framework to predict traffic network conditions by integrating archived and real-time data under various external conditions, including weather, work zones, incidents, and special events. Sharma et al. [30] present an artificial neural network-(ANN-) based short-term traffic volume prediction model for two-lane undivided highways with mixed traffic conditions in India. ese models are fit for different scenarios. However, with the explosive growth of the scale of data and the computational complexity between different dimensional features in the big data scenario, these methods will struggle to capture complex temporal and spatial correlations [31].
Deep learning has been widely used in traffic prediction [32,33] in recent years due to its successful performance in big data processing and computational vision. CNN has a strong ability to capture local trend features and scale-invariant features when the nearby data points typically have a strong relationship with each other [34]. Zhang et al. [35] partition a city into a grid map based on the longitude and latitude and apply CNN to predict the travel flow in real time. Fedorov et al. [36] employ the state-of-the-art Faster R-CNN two-stage detector together with an SORT tracker to address the problem of traffic flow estimation with data from a video surveillance camera. By setting gated recurrent learning units inside the network, LSTM has the ability to capture the time characteristics of long and short states and has been broadly used in time series prediction [37]. Tang et al. [38] propose a Genetic Algorithm with Attention-based LSTM and combine with spatiotemporal correlation analysis to predict urban road traffic volume. Zhao et al. [39] further improve the computational efficiency of the LSTM model by simplifying the gating structure using GRU. To overcome the shortcomings of a single technology, the fusion of multiple intelligent prediction methods is also the focus of recent research. Deep neural network models combining CNN with LSTM have been widely used in different disciplines, including traffic speed prediction and travel demand prediction [40][41][42][43].
In the process of traffic demand forecasting, the road network is often partitioned using squares. In recent years, hexagon partition has become increasingly popular in different domains. Hexagon partition has many advantages over square and other forms of partitioning. For instance, in the case of image processing, hexagon partition has higher computational efficiency, better robustness, and more accurate image alignment [44][45][46]. Because of the better approximation of the shape to a circular, a hexagon per unit area has a lower perimeter than a square. is improvement in reducing bias produced by edge effects allows hexagon partition to better aggregate travel demands with similar travel characteristics. Ke et al. [14] propose a hexagon-based convolutional neural network to predict the short-term supply-demand gap in ridesharing services. In addition, each square is connected to adjacent neighbors in two ways, by edges and angles, respectively. However, hexagon partition, which is symmetrically equivalent to its six adjacent neighbors [47], can better characterize the connectivity in the hierarchical network topology and become a popular grid division method for optimal scheduling of ride-hailing [48,49].
Furthermore, as a common data processing method, setting the minimum demand threshold has been applied in many studies where the central urban areas with intensive travel demand are often the research focus. For example, Yao et al. [15] filter out the samples with a demand value of less than 10. Ke et al. [14] investigate the grids with daily requesting orders larger than 100. Huang et al. [50] empirically limit their research to the areas within the fourth Ring Road of Chengdu. Although the deletion of sparse demand is statistically indisputable, theoretically, it destroys the topological reliability of subsequent spatiotemporal correlation analysis. In addition, the selection of these threshold values has brought great spatiotemporal uncertainty to travel demand prediction. ese issues, however, have not yet been addressed in the existing literature. erefore, the current paper contributes to the existing literature by systematically assessing the effects of the sparsity of ride-hailing travel requests in different urban areas on travel demand prediction. Our ultimate goal is to increase the reliability of travel demand prediction by avoiding the excessive omission of sparse data (e.g., in rural areas). Due to the complex nature underlying big data, various deep learning methods are compared based on hexagon partitions. To the best of our knowledge, this is the first effort to quantitatively identify the effects of sparse data in travel demand forecasting by setting various threshold values. e findings are expected to draw attention in the transportation community to this ignored but important issue. e results of the comparative analysis among different deep learning algorithms will also offer additional insights into the performance of the various cutting-edge methods.

Methodology
To predict travel demand, we first divide a city into uniform hexagon partitions based on latitude and longitude coordinates and divide the whole day into uniform time intervals. Based on the partition in space and time, the ride-hailing orders at hexagon partition i and time interval t are aggregated as travel demand y t i . e short-term travel demand prediction problem aims to predict the future time interval travel demand y t+1 i for ride-hailing in a specific partition of the city using multiple historical local spatial data Y t−h+1 i , . . . , Y t i collected until time interval t, as shown in Figure 1. Y t i is represented by three-layer local adjacent partitions centralized at hexagon partition i. Since the computational propagation of deep learning is based on matrix transformation, the three-layer local adjacent partitions travel demands Y t i are transformed into the matrix or tensor through parity coordinate transformation following previous studies [14].

H-CNN-LSTM Model.
We define X i as the transformation matrix, and the position of the hexagon partition in a matrix is indexed by its parity coordinates. e transformation matrix of the three-layer local adjacent partitions is established as follows: where y t i 0 � y t i denotes the travel demand for the central hexagon partition of Y t i . Figure 2 shows that the input of the CNN model is the image reflecting the spatial characteristics of travel demand; that is, the input X t i for hexagon partition i and time interval t is a 5×9 matrix (as shown in equation (1)) transformed by the three-layer local adjacent partitions centralized at hexagon partition i. e spatial feature of the demand is extracted by convolution layers, and the convolution operation between each layer is transformed as follows: where a is the ReLU function, which is selected as the activation function in this paper. W t k and b t k are parameter collections of the k th layer. * denotes the convolution operation.
e last convolution layer X t i,K is transformed into a dense vector that can be written as F t i � flatten(X t i,K ), where flatten denotes the concatenating procedure. Finally, we use a fully connected layer to reduce the dimension of the dense vector F t i and learn the essential spatial feature for location i and time interval t. e output of this layer can be described as follows: where W t fc and b t fc are trainable parameters and a is the ReLU function.
LSTM is a special RNN model that is proposed to solve its gradient dispersion problem. In this model, LSTM is adopted to memorize the characteristics of the temporal dimension of travel demand. e key to LSTM is the cell state, which ensures the memory and circulation of information. It has the ability to remove or increase information to the cell state by carefully designing structures called gates, which consist of forget gate f t i , input gate i t i , and output gate o t i . e specific functional relationship of LSTM is shown as follows: where ∘ denotes the Hadamard product, which calculates the elementwise products of two vectors, matrices, or tensors with the same dimensions, and σ and tanh denote the nonlinear activation function of a sigmoid function and a hyperbolic tangent function, respectively. i and time interval t + 1 can be finally obtained by a fully connected network as follows: where W fu and b fu are trainable parameters and σ is the sigmoid function, which is selected as the activation function of fully connected layers.

H-CNN-GRU Model.
In this section, GRU is applied instead of LSTM to capture the temporal characteristics. As a variant of LSTM, GRU eliminates the cell state and uses the hidden state for information transmission. erefore, its simpler structure facilitates faster computation, and it has a better performance on limited training data. GRU combines the forget gate and input gate into a single update gate z t i , which controls how much information needs to be forgotten in the hidden layer h t−1 i of the previous moment and how much information needs to be added from the candidate hidden layer h t i of the current moment. e candidate hidden layer h t i is similar to C t i in LSTM, which can be regarded as the new information at the current moment. e reset gate r t i is used to control how much of the previous memory needs to be retained. e specific functional relationship of GRU is shown as follows: where ∘ denotes Hadamard product and σ and tanh denote the activation function of a sigmoid function and a hyperbolic tangent function, respectively. W z , W r , W h and b z , b r , b h are trainable parameters. As presented in the previous section, a dense vector h t i can be output, and the prediction demand y t+1 i can be obtained by giving the input F t i , h t−1 i . Figure 3 is a structural diagram of the H-CNN-GRU model. Figure 4 shows the architecture of the H-ConvLSTM model. As an improved form of the LSTM model, ConvLSTM has convolutional structures in both the input-to-state and state-to-state transitions and has good performance in simultaneously capturing temporal and spatial features.

H-ConvLSTM Model.
Similar to LSTM, ConvLSTM also consists of inputs H t i , and the other gates. ese states are converted to 3D tensors (X t i ⟶ X t i ) suitable for the convolution operation, the last two dimensions of which are rows and columns of spatial information. e historical local spatial features of a certain partition can therefore be directly taken and transited in the model. e specific functional relationship of ConvLSTM is shown as follows: x Figure 1: ree-layer local adjacent partitions [14] 4 Journal of Advanced Transportation where * denotes the convolution operator and the other parameter settings are consistent with those of LSTM. e output state H t−1 i is finally taken into a fully connected network to obtain the prediction demand y t+1 i .

Loss Function.
e symmetric mean absolute percentage error (SMAPE) between the estimated and real demand is sensitive to sparse demand, while the root mean square error (RMSE) can better assess the prediction performance of the central area by amplifying the impact of larger outliers [15]. erefore, a composite loss function can be created as follows: where ε is a very small value to avoid having a zero value in the denominator and λ is a hyperparameter.  Figure 4: e architecture of H-ConvLSTM.

Dataset and Preprocessing.
e dataset used in this study is the online ride-hailing order data provided by the Didi Gaia Plan platform of Chengdu in November 2016. Firstly, latitude and longitude are allocated into 35 × 46 hexagon partitions, and the side length of each partition is 800 m. Secondly, a time interval of 30 min is used to label each order data point based on its starting time. e hexagon partitions and time-labeled order data points are then intersected in Quantum Geographic Information System (QGIS) to obtain the spatial labels. Finally, the ride-hailing travel demands can be easily aggregated in different time intervals and spatial partitions.
Here, to evaluate the sparse effects of different models in the prediction of travel demand, two types of minimum demand thresholds are selected to screen samples with different degrees of sparsity. In the training and testing processes of the proposed deep learning models, a travel demand sample y t i needs to be expanded into the corre- i represents the input of the model and y t+1 i represents the corresponding label. e first type is to set the minimum demand threshold as 1, 2, 4, and 8 for all hexagon partitions. If the demand y t i at the center of Y t i is less than the threshold, the sample group Y t−h+1 i , . . . , Y t i , y t+1 i is removed from the corresponding dataset. In order to ensure that the order coverage of different thresholds is roughly close to that of the first type for further comparative analysis, the second type selects all sample groups of the hexagon partition i with average daily demands above 25, 50, 100, and 200. For all datasets with different threshold settings, the first three weeks of data are used for training and the rest for testing. e statistical characteristics of the ride-hailing order data y t i under different types of threshold scales are shown in Table 1. Under the condition of similar order volume coverage (proportion of the order sample to the total demand after removing the sparse demand), the first threshold type covers more partitions in the spatial scale but is discontinuous in the temporal scale, while the second threshold type can predict all time intervals in the temporal scales and only cover limited partitions in the spatial scale.

Model Setup.
e experiment is run on a server with an Intel(R) Xeon(R) Gold-5218 CPU @ 2.30 GHz, 128 GB RAM, and one GPU (NVIDIA Quadro RTX 5000). e proposed model is implemented based on Python 3.6.6 with TensorFlow and Keras. For the spatial view of the CNN of H-CNN-LSTM and H-CNN-GRU, there are 4 convolution layers that use 8, 16, 32, and 32 filters of size 3 × 3, respectively. In order to better capture the spatial characteristics of the input segment, virtual partitions with zero demand values are added as neighbors of the partitions on the border. e output dimension of the CNN, which is also the input size of LSTM and GRU, is set to 64. For the temporal view of LSTM and GRU, the time step is the previous 8 time intervals, and the hidden dimension is 64. In addition, the proposed H-ConvLSTM includes 4 ConvLSTM layers, which have 8, 16, 32, and 32 hidden states. e kernel size of each layer is 3 × 3. Batch normalization and dropout are used for training the model. e training epochs are set as 50 with a batch size of 64. Adam is used for optimization. SMAPE and RMSE are used to evaluate the predictive performance. In the loss function, the hyperparameter λ is set as 100 to ensure that RMSE and SMAPE's influence in the feedback calculation is in the same dimension.

Model Comparison.
We compare our proposed models with the following standard models: (1) ARIMA: autoregressive integrated moving average model, which is a classical algorithm with good predictive performance for time series data. e difference order d is set to 1, with an autoregressive coefficient p and a moving average coefficient q for iterating the previous time intervals between 1 and 8;  Tables 2  and 3. Our proposed hybrid deep learning models that combine spatiotemporal prediction with the advantages of the hexagon partition always achieve lower SMAPE and RMSE values, indicating better performance. Due to the simplified structure of the gate unit, H-CNN-GRU has a faster training efficiency than H-CNN-LSTM. For example, when the minimum demand threshold is set to 1, the training time of H-CNN-LSTM and H-CNN-GRU is 1.32 h and 1.21 h, respectively. Instead of the simple splicing of the spatial view model (CNN) and temporal view model (LSTM and GRU), ConvLSTM adopts convolutional structures in both the input-to-state and state-to-state transitions to integrate the advantages of these models, which reduces the loss of spatial topological relations of the data. is deeper fusion minimizes the prediction error of the H-ConvLSTM model at each threshold setting while increasing the difficulty of training (the training time at the corresponding threshold 1 is 5.52 h). However, considering that there is no order of magnitude differences of the training times among the three proposed hybrid deep learning models and that all training is conducted offline, the potential disadvantage of this increased training time for achieving superior demand prediction results seems to be tolerable. e continuous reduction of the minimum demand threshold gradually enhances the sparsity of the sample data, and the corresponding SMAPE also significantly increases. In the case of similar order coverage, the SMAPE of setting the minimum demand threshold of type 1 is only approximately half of that of type 2. Even the prediction performance of type 1 with a minimum demand threshold of 1 (16.34) is better than either of type 2, which indicates that there are many 0-value demand prediction problems in type 2 that are difficult to predict accurately. Figure 5 presents the spatial distribution of SMAPE of the H-ConvLSTM model under different thresholds. It can be found that, with the increase in the distance from the central areas, the travel demand sparsity of the suburbs gradually increases, and SMAPE in type 2 also gradually increases. However, the corresponding SMAPE prediction results in type 1 show a trend of increasing first and decreasing afterwards. e prediction results of each hexagon partition in type 1 are divided into four interval types according to the size of SMAPE, and the demand composition of each type is statistically analyzed, as shown in Figure 6. It can be found that the sparse demand ratio between 2 and 8 has a positive correlation with the SMAPE value. In addition, it can be seen from the 1-value demand prediction results that when the sample size proportion is high enough, the prediction results of sparse demand can also be improved. erefore, the circular region with the highest SMAPE value in type 1 may be due to the large fluctuation of demand in these regions and the relatively high proportion of sparse demand, and a large number of 1-value predictions makes the outermost region SMAPE value decrease to a certain extent. Figure 7 presents the spatial distribution of the SMAPE difference between H-ConvLSTM and the other two hybrid deep learning models. is shows that when the difference value in the figure is negative, indicating a smaller SMAPE value of H-ConvLSTM, the predicted results achieve better accuracy. In most cases, the prediction accuracy of H-ConvLSTM is better than that of the other two models, especially in areas where the sparse demand distribution is relatively dense.     Journal of Advanced Transportation e continuous decrease in the minimum demand threshold makes the overall sample sparser and makes the overall RMSE smaller. However, since RMSE is greatly affected by the large demand samples, the RMSE value in the central area presents a stable trend, as shown in Figure 8.

Conclusions
To reduce the uncertainty in the mismatch between the supply and demand of online ride-hailing services, in this paper, we propose three hybrid deep learning methods based on hexagon partitions to analyze the effects of sparse data in travel demand prediction. e results are helpful for making policy recommendations to improve the operational efficiency and quality of ride-hailing services. e comparative results of the empirical study highlight that the H-ConvLSTM model has a better prediction performance than H-CNN-LSTM and H-CNN-GRU due to the excellent feature in capturing the temporal and spatial characteristics simultaneously, especially in the area with a high proportion of sparse demand. Because the model adopts convolutional structures in both the input-to-state and state-to-state transitions, it reduces the loss of spatial topological relations of the data. e setting of different minimum demand thresholds changes the sparsity of the whole sample data, which has a significant impact on the prediction results of the models. Since the denominator of SMAPE in the sparse demand is generally small, the proportion of sparse demand between thresholds 2 and 8 has a significant positive correlation with the SMAPE value. In addition, it can be seen from the 1value demand prediction results that when the sample size is high enough, the prediction results of sparse demand can also be improved. erefore, with increasing distance from e SMAPE in the type 1 threshold value is much lower than that of type 2, which indicates that there are many 0-value demand prediction problems in type 2 that are difficult to predict accurately. We also found that RMSE is largely affected by large demand values. Although the overall sparsity of the sample keeps decreasing, the central region shows a stable trend. e current study explored the effectiveness of several advanced deep learning methods to address the effects of sparse data on travel demand prediction. However, more indepth analysis of the spatiotemporal scale of Internet-based ride-hailing demand is needed; for example, more threshold values may be tested. e proposed methods may also be applied and validated in different cities, especially smaller and megacities where large differences in travel orders exist between central and rural areas and between different times of day. Furthermore, to improve the impact of sparse differences on travel demand prediction, multiscale spatiotemporal partitioning may be considered in the future because the resolution of partitioning may determine the resolution of sparse data. Nevertheless, we leave these considerations as our future work.
Data Availability e data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.