Inferring Travel Modes from Cellular Signaling Data Based on the Gated Recurrent Unit Neural Network

,


Introduction
Understanding travel behaviors is important for urban transportation planning and management. A traditional data collection method is to organize residents' travel surveys, which include face-to-face household surveys, or questionnaires completed by telephone, e-mail, and web online. Tese kinds of approaches often lead to some defects, which typically include the huge implementation costs, an uneven sampling rate, the relatively low response rate, and the poor data quality [1,2]. Since the early 2000s, scholars have developed several methods to collect residents' travel information based on GPS data. Compared to traditional manual surveys, GPS-based methods can reduce the response burden and investigation cost. GPS-based travel information also trends to be more accurate and with more details [3]. However, this method requires participants to take a specifc GPS recording device along or install GPS recording software on their mobile phones, resulting in a series of issues such as data privacy, increasing implementation cost, or higher mobile Internet communication costs. Tese shortcomings limit the implementation scale of GPS-based methods.
During the past decades, with the growth and spread of wireless communication services, the estimation of residents' mobility and urban travel characteristics according to cellular signaling data has attracted wide attention. Cellular signaling data are a kind of passive tracking data. Tey are generated when a cellular phone is linked to a communication base station due to various communication services, such as turning on/of, making calls, sending text messages, or connecting to the mobile Internet. Tus, the data can register the phone user's trajectories in the base station networks. On the basis of the potential mapping relationship between the real trip trajectories and the connection sequences of base stations, inferring the user's trip details using cellular signaling data can be expected [4,5]. Compared with travel survey and GPS data, cellular signaling data have the advantages of 24-hour uninterrupted collection, wide spatial coverage, and high sampling rate. Meanwhile, the operators need not install additional acquisition equipment, so it can achieve relatively low data collection cost. Based on these technical advantages, scholars have conducted extensive research on travel information identifcation methods based on cellular signaling data (refer to Literature Review section).
Tis study proposes a travel mode identifcation method based on the gated recurrent unit (GRU) neural network. With 24 features as input, the method can identify four common and typical travel modes, containing walking, cycling, cars, and buses. Furthermore, we design and conduct feld data collection experiments to collect labeled ground-truth datasets, which are fnally provided by the local communication operator. In the experiments, participants' GPS data and trip diaries are simultaneously collected to check and label the cellular signaling data. Finally, we use the collected ground-truth datasets to verify and compare the classifcation ability of the suggested method and other methods based on machine learning or deep learning algorithms in travel mode recognition, including random forest, support vector machine (SVM), BP neural network, recurrent neural network (RNN), long short-term memory network (LSTM), and bidirectional long short-term memory network (Bi-LSTM).
Te paper is arranged as follows: after introducing the identifcation procedure and GRU neural network model in Sections 2 and 3, this paper describes the data collection experiment and analyzes the temporal-spatial characteristics. Section 4 describes the model parameters and the verifcation results of the identifcation methods, while Section 5 describes the discussion and conclusion. Te data are used with the permission of the volunteers, and no data privacy issues occur.

Literature Review
In the early 2G or 3G environment, only a few studies explored the travel information extraction method. Intel employed a method to estimate the moving speed of a cellular phone by observing the change of GSM signal intensity or the frequency of cell area transitions. Based on the assumption that the speed falls within a certain range for a specifc transportation mode, they could infer the phone user's travel mode. However, their method was limited to classifying the travel modes that are easy to detect, including stationary, walking, or making a drive [6]. Wang et al. utilized a k-means clustering method to separate the travel time of all trips into several groups for a given OD based on anonymous cellular signaling data, in order to estimate the percentage of travelers using diferent travel modes [7]. Overall, in an early 2G or 3G wireless communication environment, the location frequency of the cellular signaling data is fairly low, so it is difcult to recognize the individual trafc modes.
Since 2015, scholars have realized that cellular signaling data have good potential to distinguish between road trips and rail trips. Wireless communication base stations are generally arranged along both sides of roads or tracks, so there is a mapping relationship between phone users' base station connection sequences and trafc facility networks. Larijani et al. proposed a rail trip identifcation method based on rule-based heuristics (RBH), which can identify the inbound and outbound stations and travel paths. Tey also developed an APP to help passengers plan travel routes [8]. Tomas Holleczek et al. suggested a similar procedure and organized a manual survey at Orchard Station in Singapore to validate the accuracy. Te outcomes indicated that the proposed approach's identifcation errors for the number of people entering or leaving the station per hour are both approximately 9.5% [9]. Horn et al. proposed a method to extract railway travel mode and departure time based on cellular signaling data. After comparing with the train operating data of the railway department, the identifcation error of the train departure time using this method is less than 5 minutes [10]. Hasan Poonawala et al. presented a model to identify road trips or rail trips, combining the hidden Markov model with the topological properties of diferent trafc networks [11]. Yamada et al. of Osaka University also focused on using the speed characteristics of cellular signaling data and trafc facility network data to distinguish road trips or rail trips. Tey further proposed a simulation model to verify the identifcation accuracy [12]. Tese methods rely on speed as the main indicator, which makes it difcult to diferentiate between transportation modes that have similar speed profles, such as buses and cars. Moreover, the current frequency and accuracy of cellular signaling data are insufcient to capture the subtle variations in speed that could help distinguish between these modes.
In recent years, with improvements in location frequency and accuracy, some researchers have begun to extract residents' multiclass travel modes from mobile phone data. Combining cellular signaling data with trafc facility network data, Qu et al. put forward a mode split model applying RBH with logit model to identify walking, cars, and buses [13]. However, the study only compared the percentage of diferent travel modes with the real mode shared data obtained from the US census, which cannot fully explain how accurately the model can identify individual travel modes. Danafar et al. uses the Bayesian probability method to identify walking, cycling, cars, and public transportation (bus, subway, and tram) [14]. However, this study has not verifed the accuracy of the method for individual travel mode recognition. On the basis of 4G cellular signaling data, Kimberley et al. proposed two supervised methods, RBH with random forest (RF) and RBH with a fuzzy logic model, and an unsupervised method combining RBH with kmedoid clustering, to identify multiclass transport modes, involving walking, cycling, car, metro, train, and tramcar. To verify the accuracy of the algorithms, two simultaneous collection experiments of cellular signaling data and GPS data were conducted in Switzerland. Te evaluation results indicate that the complex model that combines RBH and RF outperforms the other methods, achieving a diferentiating accuracy of 73% [15]. Tis research is a rare empirical study on the identifcation accuracy of fne-grained travel modes.
Despite the evidence from previous studies on the feasibility of extracting fne-grained travel modes from cellular signaling data, some challenges remain unresolved. First, the evolution of mobile communication technology has signifcantly improved the positioning quality of cellular signaling data. As described in subsequent sections, the current average location frequency of cellular signaling data can reach a level of less than 60 seconds, which is much higher than that of cellular signaling data in the early 2G or 3G era. Tus, more studies are required to ascertain the extent to which the accuracy of detecting fne-grained travel modes can be enhanced by this high-frequency cellular signaling data. Second, limited by privacy policy, it is difcult to obtain personal cellular signaling data, which poses great difculties for technical verifcation. Terefore, the identifcation accuracy of fne-grained travel mode using diferent types of methods in real 4G-LTE or 5G wireless communication environment remains to be fully verifed.
Furthermore, a deep learning method has already been broadly and successfully employed in the feld of travel information extraction or prediction. Petersen et al. merged a convolutional layer and a long short-term memory layer into a new deep neural network to predict bus travel time. Te model outperformed other methods the authors compared with, including historical average model, pure LSTM, or Google Trafc, and could fnd the complicated patterns not discovered by the compared models [16]. Kim et al. proposed a long-term recurrent convolutional network to extract transportation modes utilizing GPS data. Te modes are divided into walk, bike, driving, train, bus, and electric mobility scooter. Te validation results displayed that the proposed method has a better performance than other methods from existing studies [17]. Wang et al. presented a transportation mode recognition model based on a residual and LSTM recurrent networks, which utilized several kinds of light-weight sensors internally installed in smartphones. Te model introduced the residual units to improve the model's learning efciency and enhance the detection performance of diferent transportation modes. Te recognition model has been extensively validated and found to achieve the highest recognition accuracy for eight transportation modes [18]. In summary, existing research has demonstrated that deep learning algorithms can achieve high accuracy and robustness in travel information recognition or prediction felds based on GPS data or other high spatial-temporal granularity data. However, in the feld of trip information recognition based on cellular signaling data, which location quality is relatively irregular and whether the deep learning algorithms can maintain this advantage need to be further proved.

3.1.
Overview. When the phone users travel in the city and keep connections with communication base stations, their trajectories can be recorded by the wireless communication network completely through the cellular signaling data. Figure 1 illustrates the overview of the main steps for travel mode identifcation and the key objectives of this paper. For raw data, a preprocessing procedure should be frst conducted to reduce the impact of the noise data. Second, through a trip and identifcation method, the data of each user are segmented into several single trips, each of which represent a moving trip between a pair of OD and contain a single transport. Tat means we identify the main travel mode of each trip. For example, if a trip is completed "walking-bus-walking," we consider this trip's corresponding mode to be the bus. Te trip end identifcation method designed with the same datasets using in this paper was described in the literature [19]. Tus, this paper focuses on the following steps: identifying the travel mode of each single trip segment. Tird, for each moving trip, a variety of temporal and spatial features are extracted from the trajectory and trip characteristics, which are used as input for travel mode identifcation. Finally, taking advantage of GRU neural network in processing data with time series and indefnite length, a deep learning-based model is established to identify the travel mode corresponding to each trip, including walking, bicycle, car, and bus. Furthermore, a ground-truth dataset is used to validate the classifcation ability of the proposed method and to compare the diferences in accuracy and efciency between the proposed model and other machine learning-based or deep learning-based models.

Preprocessing: Data
Cleaning. Raw cellular signaling data usually contain noise data due to wireless communication disturbances or data transmission errors. Drifting data and oscillation data are the most common types of noise data.
(1) Drifting data. Drift is a phenomenon, where a mobile phone abruptly switches to connect to a faraway base station during its continuous connection to the wireless communication network. For drifting data, we used a speed-based method to eliminate noise data. Initially, the shifting speed between two successive data l i and l i+1 is computed. When the speed exceeds a threshold V d , data l i+1 is marked as a possible outlier. Ten, we compare the Euclidean distance of data l i to l i+1 , d i−i+1 , and the distance of data l i to l i+2 , d i−i+2 . If the d i−i+1 is greater than d i−i+2 , the data l i+1 is removed as a drifting data.
(2) Oscillation data. Oscillation, or the ping-pong efect, refers to the phenomenon of a mobile phone signal switching frequently among several base stations, leading to adjacent data exhibiting a handof pattern such as "1-2-1" or "1-2-3-1." For oscillation data, a pattern-based method is Journal of Advanced Transportation introduced to remove noise data. When the adjacent data show the oscillation pattern "1-2-1" or "1-2-3-1" and the time interval between the frst and the last data is shorter than the threshold T o , only the frst and the last data are kept and the rest are deleted. On the basis of repeated tests, V d and T o are separately set to 200 km/h and 150 s.
After removing the drifting and oscillation data, we further processed the duplicated data. When the phone user generates intensive communication behavior, several cellular signaling data may be continuously generated on the same base station. Tese data have the same coordinates and the handover speed is all 0 km/h, which may interfere with the model's performance for distinguishing various travel modes. Tus, for duplicated data, we retained the frst and the last data and removed the other ones.

Trip Segment.
After preprocessing the individual cellular signaling data, the next step is to identify its trip ends. For this purpose, scholars have proposed a variety of identifcation methods, which can be primarily classifed into two types of categories: (1) Rule-Based Methods. Tese methods usually detect trip endpoints by comparing the spatial-temporal features of the cellular signaling trajectories, which can be diferent in the two states of staying or moving. As a large-scale dataset, the most direct and efcient method to process the cellular signaling data is setting some simple fltering rules, including distance threshold or time threshold. Calabrese et al. suggested that the virtual central location formed by consecutive points is the trip ends of the phone user when the coverage radius of the consecutive points is less than 1 km [20]. Wang et al. considered that if a phone user stays in a certain area for more than 15 minutes, the user is considered to be in a stay state [21]. Schlaich et al. set a similar time threshold as 60 minutes [22]. Ni et al. regarded a group of continuous trajectory points that satisfy the spatial distance less than 200 m and the duration longer than 30 minutes as stay points [23]. Te values of the thresholds should consider the communication network characteristics of the research area as much as possible, and their rationality relies heavily on the subjective experience of the researchers. Terefore, the parameters proposed in one study may not be easily applicable to another city, which hinders the widespread promotion and application. (2) Clustering-Based Methods. Tese methods mainly use the diferences in the shape, volume, or density of the cellular trajectory clusters in the moving or staying state to identify trip ends. Clustering algorithms are usually unsupervised algorithms, which can avoid the infuence of researchers' subjective experience to a certain extent. Chen et al. employed a clustering method based on a statistic model for extracting clusters, which does not require a prespecifed number of clusters. Subsequently, to distinguish between true activity locations and stay points during movement (trafc jams and waiting for buses), they used a logistic regression model with two explanatory variables (a shape variable and a volume variable) to extract the true activity locations [24]. Jiang et al. proposed an improved DBSCAN method to identify trip ends. First, they used a genetic algorithm to optimize the clustering radius under diferent base station densities, and obtained a series of optimal parameters related to the base station densities. Second, when using DBSCAN to process cellular phone points into clusters, the proper clustering radius is selected according to points' surrounding BS densities, thereby reducing the identifcation error that may be caused by the fxed parameters [25]. However, clustering-based methods sufer from low model efciency due to the large amount of distance calculation between trajectory points during execution. Moreover, these methods face challenges in deploying on distributed computing servers, which limits their applicability to large-scale (such as city-wide) datasets.
To further enhance the accuracy and robustness of the trip end identifcation method, we developed a model based on the random forest algorithm, which leveraged the powerful performance of machine learning algorithms in pattern recognition. Te model details and procedures were reported by Yang et al. [19]. First, we enriched each cellular signaling data with four types of feature attributes, and incorporated external data (POI) to increase the distinction of feature attributes between the "moving" and "staying"  states. Ten, we built a random forest model and optimized the model parameters using methods such as crossvalidation. Finally, we validated the precision and recall of our proposed model utilizing the same ground-truth data used in this research. Te results showed that our model outperformed rule-based methods, clustering-based methods, and three other machine learning algorithms in terms of overall identifcation performance. Moreover, the proposed method could continuously adapt to the identifcation objects and improve the identifcation accuracy as more input data were available. Furthermore, our method could be implemented in a distributed computing environment, which made it suitable for analyzing travel information and urban travel characteristics from a large-scale dataset.
After the identifcation of the trip ends, each user's cellular trajectory can be segmented into several single trips, each of which represent a moving trip between a pair of ODs and contain a single travel mode. Te following step, as well as the focus of this paper, is to identify the corresponding travel mode of each single trip.

Feature Selection.
Features can be used to describe the diferences in the trajectories of cellular signaling data between diferent travel modes, which are usually calculated by the physical characteristics of the trajectories. Te choice of the feature parameters has a signifcant infuence on the model identifcation performance. Based on the generation principle, we select two types of features: motion features from the cellular records and features from OD trips, which contain 24 specifc feature parameters.

Motion Features of Cellular Records.
First, 21 motion features are calculated directly from the adjacent cellular signaling data records of users or records in specifc time windows, such as average distance, speed, or time. Tese features refect the motion and trajectory diferences in the wireless communication network within the same time range caused by the diferent travel modes' moving speed. Te location coordinates of cellular signaling data are approximately replaced by the coordinates of the communication base station, which means that it cannot directly refect the user's activity trajectory. However, the diferences in switching rate and frequency between communication base stations are strongly related to the actual diferences in moving speed or frequency when the phone users adopts diferent travel modes. Table 1 shows nine types of features extracted from the cellular signaling data records, including 21 specifc features. Figure 2 visually shows the diference between linear distance ZD T and cumulative distance LD T calculation.

Features from OD Trip.
In the daily travel of residents, the distance and time information from the origin to the destination has a major infuence on the choice behavior of travel modes. When the trip distance is long, residents are usually more likely to choose cars or buses. When the travel distance is short, residents tend to prefer convenient transportation modes such as walking or cycling. Even if some transportation modes may have very similar speed characteristics in some congested sections, there are still signifcant diferences in total travel distance, travel time, or travel speed from a comprehensive perspective. Terefore, when identifying transportation modes for each trip, adding travel information between the origins to the destination is expected to increase the accuracy of travel mode identifcation. Terefore, this paper selects three characteristics between ODs for each trip, including the Euclidean distance D OD , Euclidean distance D OD , and the average speed V OD .
In summary, this paper selects 24 features as input parameters of the GRU neural network model. Tese features include physical quantities such as distance, speed, and time with diferent dimensions or units. In order to prevent diferent dimensions or orders of magnitude from afecting the accuracy of model training, this paper employs a Z score standardization method to normalize all features. Te Z score standardization method utilizes the mean and standard deviation of the original data for standardization processing, and the processed data follows a standard normal distribution. Te Z score standardisation method is shown in (1), where X * denotes the normalized characteristic value, X represents the original characteristic value, and μ and σ represent the mean and standard deviation of all samples, respectively.

Gated Recurrent Unit Networks.
From the perspective of machine learning, the detection of travel modes from cellular signaling data has the following characteristics: (1) Basically, it is a typical "many-to-one" classifcation problem in the domain of pattern recognition, that is, judging which travel mode belongs to a single trip (including several pieces of data). (2) Te cellular signaling data sequence corresponding to a trip clearly has time series features and the length of the sequence is uncertain. (3) Trajectories generated by diferent travel modes have signifcant diferences in speed, distance, base station connection frequency, and other features. Tese characteristics have some similarities with the characteristics of pattern recognition problems such as speech recognition and text classifcation. In deep learning algorithms, the GRU neural network is a typical neural network structure that is able to process data with time series features or serialized data. It can also process data types with indefnite length and has achieved successful applications in complex pattern recognition felds, for instance, the computer vision or the natural language processing [26,27]. Terefore, drawing on the successful experience of GRU in the above felds, we attempt to introduce it to solve the problem of trafc mode identifcation according to cellular signaling data. In 2014, Cho Kyunghyun from New York University introduced a neural network model called GRU (gated recurrent unit). Te GRU neural network can be regarded as a simplifed model of LSTM, which preserves the ability of LSTM to integrate long-term and short-term memory, but reduces the complexity of the cell structure, the amount of parameters, and the training time. Te main simplifcation of GRU is to merge the forget gate and the input gate in LSTM into a new update gate. Figure 3 illustrates the structure of the GRU neural network [28].
From an external structure perspective, the input and output structures of the GRU neural network are similar to those of the ordinary RNN model. Each unit inputs two variables and outputs two variables. In Figure 3, x t donates the input at the current time, C t is the hidden layer state, y t represents the output at the current time, and C donates the GRU structure. It can be seen that the hidden state C t at time t relies not only on its corresponding input data x t but also on the hidden state C t−1 at the prior time, as shown in (2). U and W are the weight coefcients between diferent network components.
C t � f Ux t + Wc t−1 .
Te internal structure of the GRU neural network is displayed in Figure 4 [29]. Te GRU model simplifes the internal neurons into two gate structures: update gate and reset gate. In this fgure, x t donates the input of the neuron, y t is the output of the neuron, Z t donates the GRU output of the update gate, r t is the output of the reset gate of GRU, and h t represents the candidate hidden state at the current time. σ represents the sigmoid activation function.

Motion features Description D b
Euclidean distance between the coordinates of two adjacent data T b Te diference between two adjacent data timestamps V b Euclidean distance of two adjacent data divided by the time diference LD T For data l i generated at t i , the distance of adjacent data is calculated and summed as the cumulative distance LD T in the time window [t i − 0.5T, t i + 0.5T]. T is set to 5 min, 7 min, 9 min, and 11 min, respectively, so we obtain 4 features ZD T For data l i generated at t i , the distance between the frst and last data is calculated as the linear distance ZD As shown in equations (3)-(6), the update gate Z t is formed by multiplying a weight matrix with a concatenation vector of the prior hidden state h t−1 and x t . Ten, the sigmoid activation function is applied to transform the elements in this vector into real numbers in the [0, 1] range, and this vector serves as the gate control state of the update gate. Te reset gate r t is similar to the update gate, but uses the parameter weight W r of the reset gate. Te candidate hidden state multiplies the result of applying the reset gate state value to the h t−1 vector with x t and concatenates it with x t . Te concatenated vector is converted into a vector of real numbers between −1 and 1 using the tanh function. When outputting information, GRU applies update gates to h t−1 and candidate hidden states, respectively, and sums them up. Ten the result is used as output information for the current state. Te abovementioned analysis shows that each neuron in GRU participates in the decision-making process for each information output, creating dependencies among the neurons. In general, reset gates are more active for shortterm dependencies, while update gates are more active for long-term dependencies [25].
3.6. Model Construction. Figure 5 illustrates the travel mode identifcation model based on GRU, which consists of an input layer, a GRU layer, a fully connected layer, and an output layer. First, the features of each trip segment are computed based on the cellular trajectory points and fed into the deep learning model as inputs. For a trip composed of n cellular signaling data points, 24 corresponding features are computed for each point, transforming an n-dimensional vector into an n × 24 matrix. Te n × 24 feature matrix is employed as the input of the neural network and trained on the GRU layer. Te GRU layer can not only perform model classifcation based on the input attribute values at a single time point, but also efectively capture the correlation between longer sequence feature values prior to the current time point, which can better handle data with temporal dependencies. Te output of the GRU layer is served as the input for two fully connected neural networks, and fnally, the output of the fully connected layer is taken as the input for the output layer. Te output layer converts all output values from the fully connected layer into probability values between 0 and 1 through a sigmoid function and outputs the model results at the last node. Te output layer selects the travel mode with the longest cumulative time as the fnal mode for the trip. Te GRU model is trained by minimizing the loss function in the training set. Travel mode identifcation is a typical multiclassifcation problem, so it is more appropriate to choose a multiclass cross-entropy loss function, as shown in equation (7). L represents the loss function, and we take minimizing the loss function as the model training objective. Specifcally, the calculated loss value in the equation represents the error value between the probability distribution of the neural network output and the real probability distribution of the label. Te model is trained by minimizing this error value. X represents the input sample, and Y represents the result output by the neural network. P i,m represents the probability that the i-th input sample is predicted as the m-th category. In this paper, m represents four travel modes. Y i,m represents whether the m-th category is the true category of the input sample x i . If it is true, it is 1; otherwise, it is 0.

Experiments and Data
y t-1 y t y t+1 x t-1 x t x t+1 Figure 3: Te structure of the GRU neural network. Journal of Advanced Transportation type, as well as travel mode. Travel purposes include going to work, going to school, seeking medical treatment, dining, entertainment, shopping, leisure, and returning home. Te stay areas cover diferent areas with diferent base station densities, such as urban areas and suburban areas. Travel modes include walking, nonmotorized vehicles, cars, and buses, which are commonly used in this city. All volunteers received formal training and participate in a re-experiment after the training to ensure that they can profciently complete the data collection tasks according to the plan in the formal experiment. In the formal experiment, each volunteer carried a mobile phone with a SIM card from the local operator, and the phone had a GPS data collection application installed. During the experiment, the GPS data collection application remained on, and the volunteers recorded detailed travel logs, including activity locations, arrival/departure times, and travel modes. In future research, GPS data and travel logs can be used to determine the real travel status corresponding to each mobile phone signal data. Te GPS data recording APP and samples of travel logs are shown in Figure 6.
With the consent of volunteers who signed confdentiality and authorization agreements, the operators provided cellular signaling data for all volunteers during the experiment, which provided a rare opportunity for this study to obtain users' cellular signaling data and corresponding real travel information. During the experiment, a total of 179377 pieces of cellular signaling data were generated by all volunteers who collected more than 200 pieces of real travel chain information. Table 2 displays the typical felds of cellular signaling data. Global Identifer and User ID can both act as the unique identity code for each phone user. By combining LAC (location area code) and CI (Cell ID), the identity code of each base station can be determined. Based on the start time and end time, the duration of the communication service can be calculated easily. Te longitude and latitude in the table represent the location of the base station that was connected when the communication service was generated.
Based on cellular signaling data, this paper focuses on identifying four kinds of travel modes: walking, cycling, car, and bus. After further extraction and screening of the dataset for these four modes, 620 travel segments were collected in this experiment, including 69,059 cellular signaling data. After preprocessing, the total sample size was 62,020. Considering that deep learning algorithms require a higher number of data samples, a sliding time window-based sample construction method is used to process trips with more than 60 cellular location records. Te sliding window length is set to a minimum of 50 cellular location records with an increment of 10, while a moving step size is set to 10 location records. Te fnal data sample set was obtained by randomly oversampling the minority class samples to improve the balance of the data samples. Table 3 displays the sample size corresponding to each travel mode used for model training. In one-hot encoding form, we added a travel mode label to the cellular signaling data corresponding to each trip.

Data Characteristics.
Te location quality of cellular signaling data is infuenced by the location frequency, which is determined by the frequency of communication services generated by users. Using the cellular signaling dataset collected in the synchronized data collection experiment, we conducted a statistical analysis of location frequency. Overall, each volunteer generated an average of 1425 cellular signaling data per day. As shown in Figure 7, the probability that the time interval between adjacent data is less than 30 seconds exceeds 70%, with an average time interval of 48 seconds and a median of 20 seconds. Compared with early signaling data [30], the location frequency of cellular signaling data in the 4G environment has increased signifcantly, providing possibilities for inferring multicategory and fne-grained travel modes.

Journal of Advanced Transportation
Te accuracy is another factor that afects the location quality of cellular signaling data, which can be described by the distance error between the real coordinates and the coordinates of the base station. In the data collection experiment, cellular signaling data and GPS data are collected simultaneously. When matched to the data generation time, a total of 97,267 cellular signaling data were successfully matched to GPS locations at the same time. GPS data can represent the user's real location coordinates, while cellular signaling data can refect the location coordinates of the corresponding communication base station. A statistical analysis of the distance error between these two types of posited data is conducted. As shown in Figure 8, more than 53% of the location errors are within 300 m and more than 73% of the location errors are within 500 m. Te average value of location error is 357 m and the median is 278 m.

Model Specifcation.
Te GRU neural network model training was completed under the TensorFlow 2.2.0 deep learning framework installed in Python 3.9.7. Te processor used for the model training environment is Intel ® Core i5-7200U @2.5 GHz, with a memory capacity of 4G and an operating system of Window 10. Te NVIDIA GeForce 940MX graphics card with 2G video memory was used for training.
As a complex deep learning algorithm, the GRU neural network contains a large amount of hyperparameters that afect the deep learning-based models' classifcation accuracy. To enhance the recognition and generalization ability of the model, we introduce multiple parameter optimization strategies during model training and testing. First, we start by dividing the dataset into two parts: a training set and a test set. 70% of the data samples were randomly selected as the training set, and the remaining 30% were used as the test set. Te training set is applied to train the deep learning model and adjust its parameters, while the test set is only employed to test the generalization ability of the fnal model. Second, during the model training process, we introduce a fve-fold cross-validation strategy. Te dataset used for training the model is randomly divided to fve parts, with four parts served as a training set and one part served as a validation    set. Tis results in fve well-trained models. After all models' loss functions converge, we select the model with the lowest loss value as the best model. Moreover, to prevent overftting during the training process, we add a dropout strategy to the fully connected layer of the model. During training, the model randomly ignores some neuron information so that it does not rely too much on local features, thereby making the model's generalization ability stronger. Te dropout rate value is set to 0.5. Te fnal parameter settings of the model are displayed in Table 4. Te training set was applied to construct the GRU neural network model with these parameters, and the model loss curve during training is demonstrated in Figure 9.
Te model loss decreases rapidly as the number of training times increases, and reaches a minimum value of 0.59 when the training rounds are 60. Te model has some degree of overftting at this time. Ten, as the model continues to train,    the model loss slightly increases and stabilizes around 0.9, which indicates the best overall accuracy and generalization ability of the model. Te model training took 18 minutes and 1 second in total.

Te Performance of Identifying Travel Modes.
Travel mode identifcation is a complex multiclassifcation problem. To evaluate the model's classifcation ability, the identifcation results were categorized into three groups: True Positive (TP), False Negative (FN), and False Positive (FP). TP represents the right part of all identifed travel modes, while FN denotes the real travel modes that were not detected, which can be viewed as the missed part. Similarly, FP refers to the travel modes that were found but did not match with real samples, which can be viewed as the incorrect part. Subsequently, for the purpose of comparing the overall performance of diferent recognition methods, three indicators of precision, recall, and F score were introduced as model assessment indicators. As shown in equations (8)-(10), precision is the correctly recognized samples of a certain travel mode to the entire quantity of samples recognized as that mode. Recall is calculated as the ratio of the correctly recognized samples of a certain travel mode by the model to the number of actual samples of that mode. F score is a weighted harmonic mean of precision and recall and can more comprehensively refect the model's classifcation ability. Table 5 demonstrates the travel mode identifcation results of the test set. Te test set contains 770 trips, corresponding to four travel modes. Walking has the highest precision and recall, which are 97.9% and 95.9%, respectively. Tis is mainly due to the lowest moving speed and achievable travel distance of walking, which leads to more obvious diferences in most features from the other three travel modes, making it easier to be recognized. Similarly, with higher average moving speed and longer travel distance, car has the second highest precision and recall, which are 94.4% and 93.4%, respectively. In contrast, the recognition performance of bicycles and buses is relatively poor, and these two modes are most likely to be misidentifed. Te main reason is that most roads in the city have dedicated lanes for nonmotor vehicles, and the average travel speed of bicycles under the exclusive road rights is close to that of buses. At the same time, both nonmotor vehicles and buses can cover short and medium distance trips in the city. Tese similarities lead to overlapping intervals in the calculation results of features such as distance and speed for cellular signaling data generated by nonmotor vehicles and buses, which in turn causes the model to easily confuse these two modes. In addition, among the motorized travel modes, there is a situation of misidentifcation between buses and cars. One possible reason for this is that the roads are more congested during peak hours in the morning and evening. Terefore, the speed and distance diferences between the two travel modes are not obvious. Tis leads to errors in the recognition results. Due to the high misidentifcation rate between buses and nonmotor vehicles and cars, the precision and recall of bus mode are both 83.5%, which are the lowest among the four modes of transportation. Overall, the recognition model constructed in this paper has a positive performance for the four modes of transportation, and the precision, recall, and F score can reach 90.5%.

Comparison of Diferent Algorithms.
We frst compare the travel mode identifcation performance of the model based on the GRU neural network and models based on other classical machine learning algorithms, such as random forest, support vector machine (SVM), and BP neural network. Figure 10 displays the comparison result. It indicates that the recognition performance of the three machine learning algorithms is relatively close, with their F scores ranging from 83.1% to 85.2%. In comparison, the method based on the GRU neural network has a better recognition performance, and its F score is about 6% to 7% higher than that of machine learning methods. As a deep learning model, the GRU neural network has advantages such as more neurons, more complex hidden layers, and the ability to use long-term or short-term features. Terefore, it has shown stronger fne-grained travel mode recognition capabilities.
In this paper, we further compare the accuracy and efciency of various deep learning-based identifcation models in the fne-grained travel mode recognition task, including recurrent neural network (RNN), long short-term memory (LSTM), bidirectional long short-term memory (Bi-LSTM), and the GRU neural network proposed. Four models used the same training and test sets. During the model training process, similar parameter optimization strategies were adopted for all four models to ensure that   each model's parameters were fully optimized. Te results are provided in Figure 11. It displays that the recognition ability of the four methods based on deep learning algorithms is not signifcantly diferent. Among them, the F score of the model based on GRU is the highest, reaching 90%, and the F score of the model based on RNN is the lowest, reaching 86.9%. Te error between the two models is only 3.1%. However, the diference in model training time between the four methods is more obvious. Te training time of the model based on RNN is 922 seconds, which is the shortest. Te training time of the model based on Bi-LSTM is 2309 seconds, which is the longest and 2.5 times that of the model based on RNN. Te training time of the model based on GRU is 1081 seconds, which is the second shortest among the four models and about 17% more than the shortest model training time. Terefore, in terms of model recognition accuracy and model efciency, the model recognition model based on the GRU neural network has the best overall performance among the four deep learning models.

Conclusions
Using large-scale cellular signaling data to extract residents' travel information ofers a potential opportunity for comprehensive, real-time, wide-area analysis, and monitoring of urban travel activities. Building an efcient, accurate, and robust method for travel mode identifcation is one of the key steps in this process. Te existing travel mode identifcation methods have some limitations, such as the unsatisfactory performance for fne-grained travel mode identifcation, and the lack of sufcient empirical evidence for the existing identifcation technology. Deep learning algorithms have demonstrated their powerful ability to solve complex classifcation problems across domains such as natural language processing and text sentiment analysis. Tis paper makes two contributions. First, it proposes a travel mode identifcation method utilizing a GRU neural network model. Using 24 features as model input, this method can identify fne-grained travel modes, including walking, cycling, car, and bus. Second, with the support of mobile communication operators, this paper designs and conducts synchronized data collection experiments, obtains individual detailed cellular signaling data, and empirically assesses the identifcation performance of the method in this paper and other existing models.
Te empirical results indicate that the identifcation model suggested in this paper has a favorable performance for four modes of transportation, with a precision, recall, and F score of 90.5%. Tis performance is better than other identifcation models based on machine learning, including random forest, support vector machine, and BP neural network. Moreover, considering both the model recognition accuracy and the model training efciency, the model based on a GRU neural network also outperforms the other three recognition models based on deep learning algorithms, including recurrent neural network (RNN), long short-term memory network (LSTM), and bidirectional long short-term memory network (Bi-LSTM).
Te method presented in this paper also has some aspects for optimization and validation. First, due to factors as experimental cost, the cellular signaling data used for training and validating the travel mode identifcation model consist of about 62,000 records. To increase the size of the dataset for model training, we use the sliding time window method to generate more datasets. Tis is reasonable in the theoretical research stage of the model. However, before applying this method to the big-data platform, more real data, instead of synthetic data, are required to conduct adequate model performance validation. Second, as shown in Figures 10 and 11, the models based on deep learning algorithms have higher recognition accuracy. However, deep learning models need more training data and use more computing resources because of a larger number of parameters, which implies that the implementation cost of deep learning-based models is higher. Terefore, in the implementation process of the big-data platform, the choice of travel information identifcation model ultimately depends on a comprehensive evaluation of two major factors: recognition accuracy and computing efciency.
Te current recognition accuracy of cellular signaling data is not sufcient to achieve the identifcation of travel mode chains. For example, for the combination of travel mode walking-bus-walking, it is difcult to identify the walking trips after departure or before arrival based on cellular signaling data. In the future, based on the highprecision positioning technology in the 5G environment, combined with more source data information, such as vibration, temperature, sound, and other built-in information of mobile phones, it is expected to further explore and realize the identifcation method of the abovementioned travel mode chains.

Data Availability
Te data that support the fndings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest
Te authors declare that there are no conficts of interest with respect to the publication of this paper.