Individual Travel Knowledge Graph-Based Public Transport Commuter Identification: A Mixed Data Learning Approach

Commuters are the stable travel group for the public transportation (PT) service system. Accurately identifying the PTcommuters is conducive to promoting PTservice quality and development of urban sustainable transportation. +is paper extracts individual PT travel chain information and constructs individual travel knowledge graphs of PT passengers based on the association matching algorithm and the theory of multilayer planning. A mixed dataset is formed by associating individual travel chains with travel survey data. Seven travel characteristic indicators regarding travel performance and spatiotemporal travel characteristics are extracted. +e identification model of PT commuters is developed based on a three-layer backpropagation neural network (BPNN). +e optimal model structure of neuron node number, transfer function, and learning rate are discussed quantitatively according to the minimization of model errors.+e evaluation indexes of overall accuracy and kappa coefficient of the constructed model are 94.5% and 87.9% separately. +e results indicate that the model identification accuracy is acceptable, and the proposed characteristic indicators and systematic modelling procedure are effective. +en, the model performance is compared with the other five machine learning models further. +e results confirm that the proposed model has a better identification accuracy and viability, and the model performance will improve with the increase of the sample size.


Introduction
With the continuous penetration of the concept of sustainable transport and green traveling, especially, the Chinese government put forward the goal of "carbon peak" and "carbon neutral" in 2021, and public transportation (PT) has become an increasingly important transportation option for the residents. According to official statistics, the total number of PT trips was 12.6 million accounting for 55.9% of the total number of motorized trips downtown in 2019 as compared to 48.1% in 2015, Beijing [1]. PT has occupied the largest share in the urban transportation market in the Chinese context. With a better understanding of the travel patterns of transit riders, transit authorities will be able to evaluate their current services to reveal how best to adjust their marketing strategies to attract higher PT usage [2].
Nevertheless, it is pointed out that there are prominent differences in the travel characteristics between PT commuting passengers and others [3]. erefore, it is of great significance to effectively grasp the travel demands and mobility characteristics of the PT commuters, which is conducive to improving urban sustainable transport service. For this purpose, realizing accurate identification of the PT passenger category is the premise of revealing the travel demand and characteristics of heterogeneous passengers.
Most of the previous studies have attempted to apply smart card transaction data and travel survey data for analyzing mobility characteristics of PT commuters and detecting their behaviour differences. Some studies identified PT commuters by analyzing the commuting travel characteristics including travel mode, travel spatiotemporal regularity, and travel-route selection diversity [4,5]. Jun and Deng took the threshold values of travel frequency and departure time standard deviation as the classification standard; then, PT passengers from IC card data were divided into three categories: commuting, ordinary, and random [6]. Ma et al. proposed that commuters' travel regularity and spatiotemporal repeatability can be measured from the aspects of residence, workplace, and departure time; then, PT commuters were identified by leveraging spatial clustering and multicriteria decision analysis approaches [3]. Zou et al. proposed a rule-based recognition method that is utilized to identify the commuters from the perspective of spatiotemporal features, personal property, and travel behaviour [7]. However, although the aforementioned studies enable us effectively realize the passenger classification with significant commuting characteristics, they are inapplicable to accurately identify commuters with unapparent commuting behavioural characteristics.
us, the selection of multidimensional and spatiotemporal travel behavior indicators is the crucial link to realize the identification of these atypical commuters. Besides, some relevant studies collected the travel data including travel purposes, travel mode, origins, and destinations of trips through the resident travel survey [8,9], but the behaviour classification of passengers in the whole sample could not be realized due to the high cost and limited samples of travel survey. Moreover, many previous studies only used a single data source such as a travel survey or smart card transaction and lacked the integrated utilization of multisource travel data to extract more multidimensional travel behavior characteristics. e intelligent PT system has been effectively improved with the emerging technology development of Internet of ings, big data, and cloud computing. In addition, the rapid evolution of artificial intelligence and machine learning technology also provides methodological support for datadriven PT passenger classification. Some previous studies identified the PT commuters based on the intelligent algorithm including association rules algorithm [10], convolutional neural networks (CNNs) [8], Naïve Bayes probabilistic model [11,12], support vector machine and decision tree [12], and statistical analysis model [13]. Zhang et al. identified the commuters among numerous bus passengers by using the IC data with the cluster analysis [14]. Allahviranloo and Recker used Markov chain models to study the sequential choice of activities; then, the sequential multinomial logit (MNL) models and multiclass support vector machines (K-SVMs) were adopted to identify the activity pattern of in-home, work, maintenance, personal, pick up/drop off, and stop [15]. Rafiq and McNally analyzed transit-based activity-travel patterns by classifying users via latent class analysis, and data from the household travel survey were collected to classify the transit users [16]. Manley et al. also used the density-based spatial clustering of applications with noise (DBSCAN) algorithm to identify the travel spatiotemporal regularity of individuals, and the spatial and temporal regularity difference of each cluster was derived through a continuous long-term observation period [17]. e DBSCAN model was improved and applied to classify the passengers under a much lower calculation complexity [18,19]. Sun and Yang established the Bayesian probabilistic relations from travel survey data; then, a Naive Bayesian method was constructed to identify PT commuters [11]. Moreover, they proposed a Naive Bayesian classifier model to identify PT commuters. e results showed that the model can identify the objectives using smart card data without requiring travel regularity assumptions of PT commuters [20]. Bösehans and Walker utilized the centroid clustering algorithm and k-means procedure to cluster the staff and students; then, the main travel mode of staff commuters and student commuters was identified and analyzed [21]. Weng and Lv selected the characteristic indexes of the average number and gap time of smart card transactions and departure time stability of weekdays from IC card transaction data; then, a commuter identification model was constructed by using the gradient boosting decision tree (GBDT) algorithm [22]. e above studies showed the methods of using intelligent models could identify the variation of traveler identity attributes and detect the categories of PT passengers. However, most of the previous studies on the analysis of passengers' commuting characteristics oversimplified the definition of PT commuters [6], and the characteristic variables of identification models were incomplete. Many studies characterized the PT commuters considering partial travel pattern characteristics, such as the simple frequency count [4], spatial travel patterns, [6] and travel time characteristics [8]; more comprehensive indicators including the spatiotemporal travel modes and travel choice characteristics should be adopted. Additionally, the structure design and parameter adjustment were not discussed in the modelling process quantitatively. erefore, the applicability and extensibility of these methods need to be improved further. Figure 1 shows the keyword structure relationship visualization of the aforementioned related literature. e prominent keywords of PT passenger identification are travel pattern, behaviour, information, neural network, and prediction model. It can be acquired that the literature structure relationship among neural networks, knowledge graphs, and travel patterns is relatively weak. erefore, exploring the types of residents' travel patterns combined with the neural network and knowledge graph is beneficial to enrich the research achievements. e artificial neural network (ANN) algorithm has the advantages of self-learning, self-organization, favorable fault tolerance, and the ability of highly nonlinear mapping from the input to the output which can figure out the classification problems with better performance. What is more, more than 80% of ANNs employ the error backpropagation (BP) algorithm or its improved algorithm to construct their structures [23]. erefore, we adopt the BP neural network (BPNN) framework to develop a mixed data learning model that is employed to identify PT commuters accurately. is paper is aimed at proposing a systematic process approach to identify the PT commuters based on the PT travel chain data and multimode travel graph. e proposed method based on a three-layer BPNN model contributes to illustrating the relationships between the estimated passenger categories and multidimensional travel behavior characteristics. Wherein, a quantitative analysis method is used to determine the model parameters and structure, which improves the scientific and systematic construction of the proposed PT commuter identification model. e result is expected to lay a solid foundation for multidimensional analysis of passenger's travel demands and enhance understanding of the composition of urban travel groups and their behaviour performance during monitoring of the smart card data. Besides, the identified behavior characteristics of commuter groups are conducive to traffic managers to improve PT services and their sharing rate. is paper is structured as follows. e data foundation is introduced first, followed by the extraction of individual passenger travel chains. e identification method of PT passengers is explained in the following order: (1) construction of travel knowledge graph of passengers, (2) extraction of travel characteristic indicators of PT passengers, and (3) structural design and parameter adjustment of the BPNN model, after which the constructed model is applied to detect the categories of PT passengers, and the model results are verified effectively. e paper concludes by summarizing the research findings and suggesting directions for future research.

Extraction Methodology of Travel Chain Data
is section proposes a method for extracting individual travel chains that reflect the whole travel process of passengers through the collection, correlation, and matching for multisource PT data, and attempt to lay a foundation for the construction of the PT commuter identification model.

Multisource PT Data Acquisition and Processing.
e multisource PT trip data used in this paper including the smart card (automated fare collection card, AFC card; integrated circuit card, IC card) transaction data, PT network data, and global positioning system (GPS) data of bus are collected at the entire city scale, according to Beijing Transportation Operation Coordination Center (TOCC) and Transit Metropolis Platform. To improve the quality and availability of obtained raw bus data, the GPS data, PT network data were utilized to calibrate the information on the boarding and alighting stations and time; also, the missing data of stations were restored by adopting similar handling methods in literature [24,25]. Considering the detailed process of data handling was not the focus in this section, the related contents can be learned from the aforementioned literature. Moreover, the location of the AFC system in metro stations is fixed, so it is not necessary to check the smart card information of the metro system using the data obtained from the automatic vehicle location (AVL) system.
To effectively extract and analyze the travel information of PT passengers, some valuable fields related to the mobility of passengers can be obtained from the raw data of smart card transaction data, PT network data, and GPS data of the bus. Table 1 shows the selected valid fields of these data.

Extraction of Individual PT Travel Chains.
To clearly understand the individual PT travel behaviour and mine more useful information from smart card transaction data, an extraction process of the individual PT travel chains will be implemented in this section. Each smart card transaction record is defined as a travel stage that reflects information about a segment of a passenger's journey. A travel chain means a continuous journey of passengers over time. Hence, a travel chain could contain multiple travel stages. Figure 2 shows the two-dimensional structure of the individual PT travel chain that includes two transfers and three travel stages in the spatiotemporal dimensions. e horizontal axis represents the travel time and the duration of the trip, and the vertical coordination indicates the spatial mobility from the origin (O) and destination (D); the slopes of the slanted lines can intuitively reflect the speed of mobility for each travel mode. Additionally, the definitions of several parameters in Figure 2 are described as follows: T i_on means the boarding time at the beginning of the i th travel stage, T i_off presents the alighting time at the end of the i th travel stage, D si demonstrates the duration at the i th travel stage, and T ti illustrates the transferring time between the i th travel stage and the (i + 1) th travel stage. We note that D si and T ti are available when the card code of a cardholder is provided, since the data of these variables are defined from two consecutive transaction records. In addition, "Mode i" indicates different modes of PT; "Transfer i" is the process of traffic mode conversion from the travel phase i to the travel phase (i + 1) and "Travelled distance" means the distance between OD. e method of extracting individual travel chains based on multisource PT data includes two steps: multisource PT data integration and the association and matching of passengers' travel information [27]. Figure 3 describes the whole extraction process of the individual travel chains for PT passengers. e first step focuses on integrating the spatiotemporal mobility data and presenting the travel stages of PT passengers. Besides selecting the corresponding attributes per algorithm, some general preprocessing operations were applied to the data. e smart card transaction data were merged into a dataset and then were grouped by the card code and sorted by timestamp. us, the individual smart card transaction data with key fields can be organized preliminarily. e next step consists of four processing substeps: the judgment of transferring time threshold, travel chain structure acquisition, O/D inference of travel stage, and travel feature information matching need to be executed to extract the individual travel chains. We note that three transferring time thresholds need to be discussed resulting from three kinds of mode transferring relations including bus to bus, bus to metro, and metro to bus. In addition, the passengers can transfer to another metro line within the station, and there are no transaction records for tracking the transferring time of metro trips. erefore, the transferring time threshold of the metro to the metro is not included in this study. e smart card transaction records of the bus only provide the alighting time, and the smart card transaction records of the metro contain both the boarding and alighting time, so the transferring time gaps of the bus to bus and metro to bus contain the riding time on PT. us, the three kinds of transferring time thresholds have great differences. Note: the card codes of the smart card data are not always identical to the individuals. For example, a smart card can be shared among family members and friends, or a traveler can hold several cards. However, such usage may not be the majority, especially when registered monthly passes belong to the smart cards [9]. With the rapid development of mobile payment, the PT systems apply to the quick response (QR) code payment besides the traditional smart card payment in Beijing. However, the code rules of QR codes are not consistent with those of the smart cards, and the service operators do not provide the number of QR codes in the transaction application software. erefore, it is infeasible to associate the individual travel data and QR code transaction data that account for about 20% of all transaction data in Beijing. us, this study focuses on the smart card transaction data to effectively introduce and match the corresponding individual travel survey data. What is more, smart card data are ticket-dependent methods, and they typically underestimate the travel demand owing to possible fare evaders in many worldwide transit systems [26]. However, no ticketing system can avoid fare evasion, and the percentage of possible fare evaders is relatively low, so this limitation is ignored in the study. e PT operating companies allowed the use of the smart card data only for research purposes; the individual information had been anonymized prior to the analysis to protect the privacy of cardholders throughout this study.  We use the probability distribution statistical method to extract the values of 95% of cumulative frequency as the transferring time thresholds, which are 112 min, 20 min, and 104 min, respectively.
In the PT travel chain dataset, any travel information of PT can be included more than what appears in smart card transaction data. erefore, more mixed travel information can be effectively obtained from the travel chain data, such as OD points, travel distance, transferring time, the number of transfers, and travel model. Table 2 shows some important travel information of PT passengers obtained from the individual travel chain dataset.
Especially, the field of card type in IC card transaction data provides the elementary category information of PT passengers. It is not difficult for us to intuitively recognize the identity including students, adults, and seniors of PT passengers by the field of card type. us, the numbers of PT travel chains of the passengers with different categories can be obtained severally, and the day-to-day changes in passenger numbers of different passengers can be observed. Figure 4 shows the changes and statistical results of PT travel chains of different passengers from 1 st to 7 th June in 2019, Beijing. e travel chain data covering four consecutive days from 3 rd to 7 th June were workdays, and the number of travel chains is about 8 million every day. e days of 1 st and 2 nd June were weekends, and 7 th June was Dragon Boat Festival which is a Chinese traditional festival, so the number of travel chains in each of these days was slightly lower than that of the workday, and the number was about 6 million. From the relative perspective, the scale of student passengers' travel chains accounts for 4.2% to 5% of the total number of daily travel chains, which was the smallest group. e senior passengers who travel for leisure and recreation by PT account for nearly a third of all trips on weekends or festivals. Additionally, it is not surprising that the group of adult passengers makes up the largest proportion of trips reaching 62% to 65% of the total PT passenger flow.
However, the above analysis is just a coarse-grained category identification of PT passengers, and it is infeasible to infer the main daily travel purposes of the adult passenger. Namely, the passengers' behavioural categories including commuting and noncommuting activities cannot be identified merely according to the card types. erefore, the following part focuses on the model construction and category analysis of the adult passengers selected from the whole sample.

Construction of Individual Travel Behaviour Graph.
To observe and extract the individual travel characteristic variables for identifying the PT commuters more intuitively and effectively, we introduced the knowledge graph system to establish the individual travel behaviour graph in this study. Knowledge graph, as a visual expression way of  characteristic information, owns the advantages of describing the concepts and mutual relationships among objects in the form of symbols, and the network structure which can realize the intuitive expression of characteristic indicators is formed by the connection of relations [28]. erefore, we can realize the individual travel behaviour expression based on the knowledge graph theory from the fragmented and incomplete individual data. e significant effect of the individual travel behaviour graph is to transform the low dimensional numerical data into a high dimensional visual structure.
Based on the individual PT travel chain dataset, the spatiotemporal physical relation network of PT passengers' travel behaviour information including spatial positions, time distributions, and trip routes can be constructed. e construction steps of the individual travel behaviour graph are shown as follows [29]: (1) e first step is to cluster the individual travel spatial locations. e hierarchical system cluster model is applied to cluster the longitude and latitude data of passengers' travel OD points from the travel chain dataset. us, these OD points are divided into different groups from the spatial dimension.
(2) en, the individual travel time of PT passengers is further classified based on the results of travel space position clustering. Firstly, the travel time range of 05 : 00 to 23 : 00 which is the PT operation time in Beijing needs to be split into 2-hour intervals; thus, the travel time was divided into 9 intervals.   e individual travel behaviour graphs are constructed to better understand PT passengers' travel characteristics including trips' spatiotemporal characteristics and travel stability, which is conducive to accurately and hierarchically extracting the input indicators of the PT commuter identification model. Figure 5 depicts the individual travel knowledge graph of a PT passenger that is selected randomly from the travel chain dataset in May 2017, Beijing.
e spatial and temporal characteristics of individual travel behaviour in several continuous weekdays were intuitively expressed.

PT Commuter Identification Modelling
Neural network algorithms are among the most widely applied supervised learning methods in the field of machine learning and artificial intelligence. e method is well known in computer science, and there have been some successful applications of the method in traffic flow prediction [30], traffic model selection [31], and traffic congestion detection [32]. e BPNN method as a multilayer feedforward network is among the most widely applied supervised classification methods. e method is well known in computer science, and there are many successful applications of this method in the field of transportation [33][34][35]. However, the application of this methodology in studying aspects of travel behaviour and passenger category has been extremely limited, especially when it comes to identifying PT passenger categories in the Chinese context. What is more, Guo, et al. proved that a three-layer BPNN can satisfy most of the problems according to the universal approximation theory [36].
erefore, a BPNN model with a three-layer structure was constructed as the identification model for the categories of PT passengers. e overall architecture of BPNN is composed of the input layer, hidden layer, and output layer. e description of the calculation flow of the BPNN model is as follows: (1) Input layer to hidden layer: where α h represents the input value of the h th neuron, d is the number of input variables, x i indicates the input variables of the model, and v ih is the weight to connect x i in the input layer to the neuron α h in the hidden layer. (2) Activation function processing in the hidden layer: where b h is the output value of the h th neuron in the hidden layer, f(x) illustrates the activation function, and c h presents the threshold of the h th neuron. (3) Hidden layer to output layer: where y k is the model output value, w hk illustrates the weight to connect the neuron α h to the output variables in the output layer, k indicates the number of output indicators of the PT commuter prediction model, and q means the number of neurons in the hidden layer.
In addition, the model errors between the model results and the expected results are adopted to improve the model parameters. e model errors E are calculated using the least square method as follows: where y k′ indicates the prediction results, y k represents the training results. e error is taken as the control target to capture the best functions and parameters of the BPNN model. e following part describes the construction of a threelayer BPNN model from two aspects: structure design (feature variables of the input layer and passenger category of the output layer) and parameter adjustment (neuron node number of the hidden layer, transfer function, and learning rate). e details of the model are discussed quantitatively.  Table 3 shows the selected seven characteristic indicators and their profile. By observing the structure and features of individual travel knowledge graphs from multiple perspectives, four travel behaviour characteristic indexes including average travel days (ATDs), the average number of trips (ANTs), OD cluster number (ODCN), and PT roundtrip coefficient (RC) were extracted from the spatial dimension. While from the temporal dimension, the indicator of departure time concentricity (DTC) was developed from the second layer of the travel knowledge graph. Likewise, we selected the indicator of travel path fixity (TPF) from the third layer of the individual travel behaviour graph. Besides, one more comprehensive indicator of travel space equilibrium (TSE) was proposed to measure the frequency with which passengers travel to different activity OD points, through travel behaviour analysis based on individual travel knowledge graphs. ese seven indicators can be used to define and     Table 3 can be directly acquired based on individual travel knowledge graphs. Meanwhile, TSE is denoted by (5) and (6) which introduce the conception of information entropy. erefore, we defined the TSE combined with the paradigm of information entropy function as follows: α i � 0, the passenger didn't go to activity point i on N th day, 1, the passenger went to activity point i on N th day, where m denotes the total number of different activity OD points, i � 1, 2, 3, . . . , m, N indicates the total number of travel days in a month, N � 31, and α i presents the decision variable.

Output Layer Design.
e output layer of this model is designed to predict the PTpassenger categories including the commuter and noncommuter. To acquire accurately the ground truth of the passenger category to train the model proposed in this paper, the RP survey was designed and conducted to obtain the individual attributes and travel behaviour information of PT passengers from 10 th to 27 th May in 2017, Beijing. From the temporal perspective, the survey period covered the morning peak hours (7 : 00-9:00), evening peak hours (17 : 00-19 : 00), and off-peak peak hours. From the spatial perspective, this survey activity involves five subway stations and three bus stations in the downtown area of Beijing, and the land-use attributes cover residential, commercial, and leisure areas. us, 453 valid questionnaires were collected on purpose. e survey comprises two parts. e first part is the travel records for activities which ask for detailed information about respondents' mobility information in the past one week, including the travel days, travel purpose, departure and arrival time, travel mode, and the number of trips. e second part is the sociodemographic characteristics including the main travel purpose (commuting and noncommuting), age, gender, occupation, monthly income, vehicle ownership, and educational status. e key information of travel purpose, which is utilized to mark the passenger categories, is significant to the results and the accuracy of the proposed model. erefore, we emphasized the importance of this question to the interviewees and asked them to complete the question according to the actual situation during the field survey. In general, the commuters are the population whose daily travel purpose are commuting; they may work full time or several days per week. Besides, the card codes of respondents' smart cards were also collected in the form of anonymity through this survey.
us, the corresponding travel chain dataset can be matched, and continuous one-month travel chain data of respondents were extracted from the whole travel chain dataset.
ereafter, the survey data were correlated and matched with the travel chain database through the field of card code, and the respondents whose card type belonged to the adult card were further selected.
us, the multidimensional survey data containing both individual travel chain data and travel survey data of 147 commuters and 42 noncommuters were achieved. en, Cronbach's alpha test and Kaiser-Meyer-Olkin (KMO) test were used to measure the reliability and validity of the collected survey data. e results show that Cronbach's alpha coefficients and KMO coefficients of the collected survey data are all above 0.836 and 0.851, respectively, which implies that the survey data are effective and representative. Table 4 presents the basic information statistics of the respondents.
From the overall perspective, a large proportion of PT passengers are between 26 and 35 years old (about 39%), followed by the group of 21-25 years old which accounts for almost a quarter of the whole samples. Surprisingly, the respondents own a relatively high education level, and about 80% of respondents have a bachelor's degree or above in Beijing. As expected, the overall income level of the PT travel group is slightly lower because the monthly salary of threequarters of the respondents is lower than the average monthly salary (8,476 RMB) of residents in Beijing, 2017. Besides, nearly half of the PTpassengers only own one car, and around 40% of respondents do not have private cars. e low private car ownership saliently results from the strict policy restrictions on the license plate which was introduced in 2011, Beijing.
From the relative perspective, it is interesting to note that the ratio of women in the commuter group is slightly higher, while that is lower in the noncommuter group. Besides, the expected results were found that the proportion of passengers over 50 years old in the noncommuter group is higher than that in the commuter group, due to the retirees in that group. Additionally, the proportion of households with more than two cars is slightly higher in the noncommuting group, which may result from the actual condition that the elderly have more probability to own cars, and a majority of the elderly tend to live with their children who belong to the major car ownership groups in China.

Parameter Adjustment.
In this section, the neuron node number, transfer functions between adjacent layers, and the learning rate are discussed, which is conducive to saliently improving the efficiency and accuracy of the identification model proposed in this paper.

Selection of Neuron Node Number in the Hidden Layer.
e number of neuron nodes in the hidden layer of the BPNN network follows the following functional relationship (7) with the number of input variables and output variables [23]: where n is the number of neuron nodes in the hidden layer, n in indicates the number of input variables, n out means the number of output variables, and α is a constant between 0 and 10.
Since the model has seven input variables and 1 output variable, we can obtain the number of neuron nodes n ⊆ [3,13] in the hidden layer according to (7). Considering the difference of prediction results with the change of the model structure, the BPNN model is executed 10 epochs, while the number of neuron nodes (n) in the hidden layer us, a model classification accuracy can be obtained after each model runs. erefore, the average classification accuracy of the proposed model with different neuron nodes numbers can be achieved, respectively. Figure 6 gives some insight into the relationship between the average classification accuracy and the number of neuron nodes in the hidden layer. It can be acquired that the average classification accuracy of the BPNN model with four neural nodes is highest when other model parameters remain unchanged. erefore, four neuron nodes as the optimal parameter selection for the proposed model were structured in the hidden layer.

Transfer Function Selection.
Transfer functions as the local computing function map the output of neurons to the input of neurons in the adjacent network layers. e transfer functions determine the weights and thresholds of the whole neural network and have an important influence on the prediction results. Some transfer functions such as hyperbolic tangent function Tansig and linear function Purelin which are denoted by (8) and (9) have been used in the threelayer neural network in the field of image processing [37], water quality treatment [38], and environmental engineering [39]. And these corresponding models have achieved prominent prediction effects. erefore, all the training functions were used to train BPNN 10 times, respectively; then, one of them would be selected as the optimal model function based on the prediction accuracy and training time.
where N is the input vector of the characteristic indicators of PT commuter travel behaviour. e indicators of prediction accuracy and convergence rate of the model with different training functions were selected to evaluate model efficiency, respectively. Analogously, the runtime was adopted to investigate the model performance with respect to different thresholds [40]. Figure 7 shows the results of prediction accuracy and training time for diverse training functions. From the relative perspective, it can be acquired from the figure that the training function Trainrp which is developed based on the elastic gradient descent method develops the BPNN model to achieve the best prediction accuracy. Besides, the training time of trainrp is only 18.6% lower than that of the function traingd and 41.7% faster than that of the function traincgf. What is more, the advantage of the training functions will be further highlighted if a larger scale of data is calculated. erefore, we adopted trainrp as the model training function considering the comprehensive performance of the transfer functions.

Learning Rate Selection.
Another key hyperparameter of the BPNN model is the learning rate for gradient descent.
is parameter scales the magnitude of our weight updates to minimize the network's loss function, affects the stability and training time of the model, and determines the weight change in each cyclical training. If the values of the learning rate are too small, model training would progress very slowly due to very tiny updates to the weights in the network. Inversely, if the value of the learning rate is set too large, that could cause undesirable divergent behaviour in loss function and lead to the instability of the neural network. Substantial studies suggest that the learning rate of 0.01 has made the neural network model achieve salient prediction performance [40][41][42]. erefore, we adopted the value of 0.01 as the learning rate of the BPNN model considering these previous achievements.
rough the aforementioned parameter adjustment and optimization, a stable PT commuter identification model proposed in the paper is finally constructed through the foregoing discussion on the parameters and structure of the model. Figure 8 shows the conceptual structure of the PT commuter identification model based on the three-layer neural network model.

Empirical Results
In this section, we describe empirical analysis using the method proposed in Section 2, and the identification model was built and tested by using the travel chain data of respondents harvested from Beijing. To train and complete the proposed BPNN model, we randomly selected the datasets of 145 respondents as the training datasets and 44 respondents' datasets as verification datasets. us, there forms a 145 * 7 matrix from the training dataset and a 44 * 7 matrix from the verification dataset. Figure 9 illustrates the feature dataset processing flowchart for PT commuter identification by the BPNN model. Firstly, the training data containing the characteristic indicator data derived from individual travel chain datasets and the category attribute information of passengers extracted from the survey data are input into the BPNN model. en, the identification model is trained through the iterative adjustment of weight and parameters based on the error backpropagation and self-learning mechanism.
ereafter, the verification dataset is fed into the trained model developed in the previous step. us, the input data are computed and transmitted at each layer of the BPNN model, and the passenger categories could be predicted and estimated by the model. e model performance depends on whether the heterogeneity in the attributes accurately indicates the difference in the passenger categories. erefore, to evaluate the predicted classification accuracy and validity of the PT commuter identification model and data fusion approach proposed in this paper, we adopted the evaluation indicators of overall accuracy (OA) and kappa coefficient (Kappa) which have been successfully applied in the previous studies [43][44][45]. Although the model validation method is relatively simple, it is very effective and clear, and also easy to compare with other model results. en, these two indicators were applied in evaluating the PT commuter identification model proposed in this paper, and the OA and Kappa are estimated by equations (10) and (11): where OA indicates the ratio of the number of correctly classified passengers to the total number of passengers,   Kappa represents the reduced error percentage of predicted classification results compared with the random classification, a ij is the diagonal elements of the confusion matrix, N is the overall sample size, T * j is the sum of the j th column values of the confusion matrix, and T i * is the sum of the i th row of the confusion matrix. We note that the two evaluation indicators are calculated based on the confusion matrix, which is commonly used to compare the errors between ground truth and predicted values in the field of artificial intelligence, especially supervised learning. erefore, the confusion matrix regarding the passenger category was constructed. en, seven characteristic indicators in Table 3 derived from the verification dataset were input into the identification model to estimate the categories of the samples. Table 5 shows the estimation results of the PT passenger category in the confusion matrix.
us, the evaluation indicators of OA and Kappa can be calculated based on the above confusion matrix and equations (10) and (11). e calculated results of these two values are 95.4% and 87.9%, respectively. e classification accuracy of the model can be considered almost identical to the ground truth when the value of Kappa is between 0.81 and 1.00 [46]. e good model accuracy also means that the model-overfitting problem is not prominent. e majority of commuters have high values in the indicators of ATD, ANT, and RC, and they travel by PT at least 3 days per week. However, the commuters who are not correctly identified travel by PT only once or twice a week and have few PT trips because they shift to PT for commuting only when their trips by car are limited by the motor vehicle restriction policy, and adverse weather or major events occurred. ese PT passengers have the commuting purpose while they do not show the spatiotemporal characteristics of typical commuting travel. Fortunately, these passengers are not the focus of traffic regulators and policymakers because their trips do not have much impact on PT network planning and traffic demand forecast. Regarding the group of noncommuters, it is also worth noting that though all noncommuting passengers are identified correctly in this experiment, the noncommuters with similar commuting travel characteristics are likely to be identified as commuters. Particularly, the aforementioned two groups of passengers were not expected to be identified and were in small proportion; thus, such identification errors can be ignored.
In addition, the accuracy and performance of the trained BPNN model in this paper are compared with those of the previous studies further [9,22]. e method of comparative analysis is a common technique to highlight the advantages of models more or less, though different methods have their characteristics under certain conditions. For example, the existing literature verified the better performance of the proposed model in automatically identifying a pilot's brain workload compared with its seven peers including the Gaussian mixture model, infinite student's t-mixture model, and DBSCAN model [47]. e involved models are gradient boosting decision tree (GBDT), Bayes, decision tree (DT), random forest (RF), and Naïve Bayes probabilistic model (NBPM), respectively. Figure 10 presents the compared results of these models. e results show that the BPNN model constructed in this paper has high prediction accuracy and is appropriate to identify the categories of PT passengers. And the indicators described in Section 3.1.1 are effective to distinguish the commuters and noncommuters in terms of these characteristics. In addition, the differences between the results of alternative methods in Figure 10 are not so prominent, which was caused partially by the limited data. With the increase of sample scale, the accuracy and superiority of the proposed model would improve further. Furthermore, the mixed data learning approach based on the BPNN model achieves the transit market segments of PT passengers including the commuter, noncommuter, student, senior, and staff who contribute to the changes in the transit demand. In addition, the results suggest that the proposed PT passenger category identification method can help the transport operators to analyze the travel behaviour differences during the monitoring of travelers. at enables them to analyze the relationship between the originally observed attributes of the integrated trip data and the estimated attributes that are originally unobserved [9].

Conclusions and Discussion
In this paper, the individual travel chains reflecting the whole travel process of PT passengers were exacted through the correlation and matching method based on the multisource PT data collected in Beijing, China. en, the field of card type in smart card transaction data was applied to analyze the day-to-day changes in passenger flow of PT passengers.
ereafter, the multimode travel knowledge graphs were constructed for extracting the characteristic  indicators including ATD, ANT, ODCN, RC, DTC, TPF, and TSE hierarchically from multiple perspectives. Besides, the multimode travel knowledge graphs also contribute to understanding the travel habits and travel features of individuals. e control variable method and the comparative analysis method were applied to fit the optimal parameters of the BPNN model to identify the PT commuters more accurately. e evaluation indicators of OA and Kappa are adopted to verify the model identification accuracy, as shown in equations (10) and (11), demonstrating that the proposed method correctly estimated 95.4% of the passenger categories while the incorrect estimations were caused by residents' noncommuting trips with similar commuting travel characteristics and residents' commuting trips with similar noncommuting travel characteristics. For example, some noncommuters regularly take PT to go shopping or exercise in parks every day, while some commuters rarely use PT to travel only under some special conditions. e model results indicate that the BPNN model proposed in this paper can effectively realize the identification of PT commuters. Also, the relatively good model predictive power shows that the parameter selection process is effective and scientific. Considering that BPNN is a deep learning algorithm, the accuracy of the model would increase with the increase of training samples. In addition to the parameter selection and calculation principle [48], the recognition accuracy of the estimation model is also related to the selected characteristic indicators and the data features and size of the selected sample due to the random selection effect of samples. e results indicate the different features in each travel purpose [9]. is reflects the necessity of individual travel chains and multimode travel knowledge graphs which conduce to capturing and extracting the travel characteristics. Moreover, similar to other inference models, researchers should consider the overfitting properties normally caused by using a large set of features, and cross validation is necessary to prevent overfitting [15]. e empirical data mining analysis in Section 3 showed that the proposed method is capable of helping us to find the behavioural features and illustrate the share of travel purposes and the relationship between the travel characteristics and the passenger categories observed in the smart card data. In addition, the multilayer BP neural network owns good adjustability and adaptability for different types of data, which means that the proposed methodology can also be used to explore the travel demands of passengers using shared travel, taxi, and intercity transportation.
is study aims to propose a systematic modelling procedure to set up the BPNN model with optimal parameters to identify the PT commuters based on mixed data, which contributes to refining passenger travel demands and helping transport operators to grasp and capture behavioural features of different travel groups observed in the smart card data. Some studies, such as the study proposed by Guo et al. [49], also have adopted similar ideas. Besides, comprehensive indicators of the spatiotemporal travel modes and travel choice characteristics were extracted to depict PT-commuter travel behavior. e relationships between the travel characteristic indicators and the passenger categories were captured through the proposed method at the individual level. Additionally, exploring the categories of PT passengers is especially conducive to the traffic management department to carry out targeted research on PT services and increase the attraction of the PT system. And the findings of this paper have been useful to augment the passenger characterization and to better cater to individual transit passengers.
ere are also some limitations in this paper. e heterogeneity and categories of PT passengers were identified in this paper, but the causes and mechanisms of passengers' behavioural differences have not been revealed. In addition, the sample size is not enough due to the low matching rate of smart card and survey data to mine deeply the self-learning ability of the BPNN model, which limits the further optimization of the accuracy of the model. us, the travel behavior analysis of PT passengers based on traffic big data is the next work. ough the results have some boundedness due to the data sample size, this research provides a feasible method and process for identifying the PT commuters. In addition, the computational complexity that is associated with the model structure and parameters is an issue worth discussing, especially in a big data environment. e more complex the model structure and parameters are, the higher the calculation complexity is, which is reflected in the longer calculation time. However, the sample size is limited due to the low matching rate of smart card and survey data, so the computational complexity is not a prominent problem in this manuscript, while this issue will be concerned further when the travel demands of PT passengers were identified based on largescale data in the next work.
In the future, the travelers' dependence on PT will be studied based on the achieved research basis in this paper. en, various travel service modes for different types of passengers will be developed, such as bus rapid transit (BRT), customized bus, demand response bus, and minibus, to provide support for the refined traffic demand scheduling of operating management departments. Besides, the travel choice behaviour and the behavioural influence mechanism of PT commuters with different travel dependence on PT could be studied further.
Data Availability e data generated and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.