Wi-CL: Low-Cost WiFi-Based Detection System for Nonmotorized Traffic Travel Mode Classification

,


Introduction
Intelligent transportation systems (ITSs) and city surveillance systems are important applications of the Internet of Tings (IoT) [1,2] that apply the sensing, control, and communication technologies of ground transportation systems through IoT devices to improve the safety and smoothness of urban road networks [3]. Trafc mode detection plays an important role in helping urban planners and transportation agencies determine the occupancy of road resources by diferent trafc modes at various times [4]. Tis information can be used to plan, design, and operate the multimodel infrastructures required by transportation network users [5]. Additionally, extracting trafc mode information over short periods can help city planners monitor anomalies in trafc networks. Model-based information has also been utilized in other felds, e.g., private route recommendation [6], daily commuting surveys [7], and congestion prediction [8]. Although most existing studies investigated motorized transportation, several scholars have realized the importance of nonmotorized transportation modes in urban transportation systems and have attempted to optimize these nonmotorized transportation modes [9]. Movement data obtained from pedestrians and cyclists are critical in modeling travel behavior and habits, especially in urban surveillance systems [10]. However, collecting travel data from pedestrians and cyclists on streets, sidewalks, and public trails is a considerable challenge [11]. In addition, widely used vehicle sensors (such as induction loops, cameras, ultrasonic sensors, and radars) sufer from several problems, including high installation and maintenance costs and inefciencies in pedestrian and cyclist detection and tracking, because pedestrians and cyclists usually have weak behavioral norms, in contrast to motor vehicles, which are restricted by lane rules [12]. Tus, automatic nonmotorized trafc data collection and mode detection techniques should be developed and applied in practical scenarios.
To address these problems, several studies have proposed special sensors for pedestrian counting, such as infrared sensors, ultrasonic sensors, and pressure footpads [13]. However, these sensors provide data for only specifc points in a network. More importantly, point-based counting techniques fail to detect the same person at diferent points in the network when determining travel routes, destinations, and travel times [14]. Currently, as a low-cost alternative [15], location-aware technology has attracted considerable interest in tracking pedestrian and nonmotorized movements due to the popularity and development of smart mobile devices such as cell phones and tablets. Most early studies on locationaware technologies used the prevalent and well-established global positioning system (GPS) [16], global system for mobile communications (GSM) [17], and accelerometers as data sources [18]. Among them, GPS-based approaches require users to install and run a mobile application to actively transmit GPS records to the center, which is not convenient in the real world [19]. In some studies, GSM data has been suggested as an efective way to track cell phones based on their cellular signal strength. However, the location estimation is very coarse and only appropriate for O-D surveying [17]. Tus, our study focuses on the receiving signal strength indicator (RSSI) values captured by wireless communication signals (such as bluetooth and Wi-Fi). Specifcally, in wireless channels, RSSI values have a mathematical relationship with the 2D distance [20], i.e., the plane distance between the smart mobile device and the detector. Also, it has been reported that the detection rate of bluetooth devices is usually between 5% and 12% since most applications involving Bluetooth technology are carried out in the vehicle network [21]. In contrast, Wi-Fi-enabled smart devices periodically attempt to connect to wireless LAN (WLAN) when sending detection request data. Tus, a simple low-cost monitoring unit that is independent of user participation is sufcient for passively collecting these data, and this device does it involve any hardware or software modifcations [22].
Given the advantages of Wi-Fi detector data for movement mode detection and the fact that previous studies involving Wi-Fi detector data have focused on limited indoor movements, we explored the feasibility of using Wi-Fi detector data to identify nonmotorized trafc modes in urban road environments. In the literature [5,10,22], several trafc mode classifcation systems based on Wi-Fi detector data have been proposed. Te general steps for determining trafc modes using Wi-Fi detection data can be summarized as follows: frst, ofine Wi-Fi data are collected, and the data are cleaned to remove errors and redundant information; then, relevant features are extracted from the cleaned data and delivered to a classifer for training; and fnally, the trafc modes are predicted based on the trained classifer. However, these systems face several problems: (1) the most difcult challenge is the lack of accuracy in distinguishing modes with similar speeds, accelerations, or routes [23]. When diferent trafc modes have similar speeds, accelerations, or movement routes, methods for separating these routes are not sufciently accurate. In particular, existing methods cannot be used to directly distinguish walking from cycling, even if the approaches are valid for vehicle classifcation. (2) Most previous works did not consider noise in RSSI signals. In addition, the ground data as the main classifcation feature were not validated according to the expected movement speed. (3) Traditional machine learning classifcation models, such as logistic regression (LR), support vector machines (SVM), and multilayer perception (MLP), cannot adapt to complex urban trafc environments, resulting in reduced classifcation accuracy.
To address the above issues, we propose a real-time trafc monitoring system that automatically and accurately identifes walking and cycling travel modes in mixed trafc networks using commercial Wi-Fi detectors. Moreover, we tested the proposed system in a realistic trafc environment at the South China University of Technology (SCUT) campus. Te main contributions of this paper can be summarized as follows: (1) We construct a novel, low-cost, portable Wi-Fi classifcation (Wi-CL) system that identifes fnegrained nonmotorized trafc modes by using only Wi-Fi detection data as the information source. Notably, the proposed system is nonintrusive and can be retroftted using existing road infrastructures, such as posts, walls, or barriers.
(2) We propose and validate an RSSI fltering algorithm for mixed trafc networks that suppresses the ambient noise caused by surrounding obstacles. Te experimental results show that the average error in the walking trafc mode is 5.74 m, which is 27.78% less than that of the conventional constant velocity fltering (CVF) algorithm and 7.57% less than that of the Kalman fltering (KF) algorithm. Similarly, the average error in the cycling trafc mode is 5.53 m, which is 19.74% and 18.68% less than the errors in the CVF and KF algorithms, respectively.
(3) We design a recurrent neural network (RNN) model based on a long short-term memory (LSTM) network to identify and classify diferent nonmotorized trafc modes. We extract features from the raw data instead of using the RSSI raw data itself. Te evaluation results indicate that the recognition accuracy of the LSTM model is 25%, 18.78%, and 8.34% better than that of the conventional LR, SVM, and MLP algorithms, respectively.
Te remainder of this paper is organized as follows. Related work is reviewed in Section 2. Section 3 presents the design idea and framework of the Wi-CL system and the inspiration for identifying nonmotorized trafc modes. Section 4 describes the methodology details. Te experimental results and performance evaluation are presented in Section 5. Section 6 discusses the conclusions and outlook of this study.

Related Works
Te development and application of nonmotorized trafc data collection and monitoring systems have been investigated in the feld of public transportation. Te federal highway administration (FHWA) has released the most recent version of its trafc monitoring guide, which includes a new section on monitoring and identifying nonmotorized trafc [11]. Currently, there are two main types of travel data collection methods based on location-aware technologies: user-centric methods and network-centric methods [5].

User-Centric
Methods. Active user participation in the data collection process is required in user-centered approaches. Commonly used data sources include GPS data, inertial measurement unit (IMU) data, or a combination of the two. Zheng et al. [16] proposed a supervised learning method that can infer the travel modes of trafc participants (e.g., walking, cycling, driving, and taking the subway) from GPS data alone. However, in this study, the collected GPS data were not sufcient for classifcation, and the accuracy rate was only 72.8%. Dabiri and Heaslip [24] used convolutional neural networks (CNNs) to predict the travel modes of the original GPS trajectories. Te integration of the best CNN confguration resulted in a maximum accuracy of 84.8%. Although neural networks can achieve better classifcation performance than traditional machine learning methods, using only GPS data reduces the accuracy of the system. Stenneth et al. [25] proposed a method for inferring user trafc travel modes based on geographic information system (GIS) data and knowledge of the underlying trafc network. Te results showed that the detection accuracy of the method was 17% better than that of the GPS-only method. Reddy et al. [18] developed a trafc mode classifcation system using a GPS receiver and an IMU built into a cell phone. Te system used a two-stage decision tree approach and a hidden Markov model (HMM) and achieved an accuracy of more than 90%. However, these studies usually involve high operational costs, as they require additional mobile applications. In addition, the excessive time and equipment energy costs for trafc participants increase the difculty of implementing these methods at large scales to address real-world trafc problems [26].

Network-Centric Methods.
Network-centric approaches attempt to collect data passively without network user intervention. Te primary data sources for network-centric approaches include Wi-Fi, Bluetooth, and GSM data. Sohn et al. [17] used GSM signals collected by cell phones to determine whether a person was standing, walking, or driving. A two-stage logistic regression analysis resulted in an average accuracy of 85% for walking and driving detection. GSM signals are suitable for O-D measurements but insufcient for detecting trafc modes [5]. Yang and Wu [27] used Bluetooth data to classify three travel modes: walking, cycling, and driving. However, in this study, 6.12% of driving modes were incorrectly identifed as bicycling, and 10.53% of driving modes were identifed as driving.
Due to the popularity of Wi-Fi facilities and the prevalence of IoT devices, mobile crowd sensing based on Wi-Fi detection data, such as activity recognition [28], crowd counting [29], and location estimation [30], has become increasingly popular. Abedi et al. [10] compared the efciency of Wi-Fi and Bluetooth devices for human mobility data collection. Teir study showed that Wi-Fi is a more efcient media access control (MAC) address dataset than bluetooth devices for tracking spatio-temporal movements of pedestrians and cyclists. Lesani and Miranda-Moreno [22] developed a Wi-Fi/bluetooth-based sensor for identifying mixed trafc networks. Kalatian and Farooq [5] used Wi-Fi data collected by smartphones to identify and predict people's trafc travel modes. Te results showed that the MLP model had the best prediction accuracy of 86.52%. Unfortunately, most previous studies on Wi-Fi data-based trafc mode detection did not consider RSSI noise. In terms of classifcation models, Vu et al. [31] proposed a new RNNbased method for identifying trafc modes. Te results showed that deep learning methods have faster speeds and higher accuracies than traditional machine learning algorithms with the same learning parameters. However, instead of extracting the LSTM features, they directly input the raw data into the LSTM. Since LSTM models have been shown to have high accuracy in trafc mode detection studies with a large number of classes [32], this study uses LSTM gates for long sequences. Moreover, we extracted a new set of features from the original data instead of using the original data directly.

Inspiration for the Proposed System.
Te design of the proposed system was inspired by the increasing use of smart electronic devices such as cell phones, laptops, and tablets. Every smart electronic device has a unique MAC address, which is usually expressed as 12 hexadecimal digits [22]. According to the IEEE 802.11 white paper [33], Wi-Fienabled smart electronic devices attempt to connect to nearby WLAN by periodically broadcasting probe requests, which are special frames that provide information to particular access points or all nearby access points, including the MAC address of the sender and recognized service set. Wi-Fi-enabled devices broadcast probe signals even when the device is not in use. In addition, each detection request frame from a Wi-Fi-enabled smart electronic device can be captured and stored by Wi-Fi detectors [20]. Te vibration of the signals may be caused by the travel speed, travel time, and diferent trafc trajectories; thus, trafc participants with diferent travel modes generate distinct RSSI signals, and Wi-Fi detectors can capture the dynamic characteristics of these signals as pedestrians and cyclists move if the signals are sensitive, as discussed in the literature [10,22]. Figure 1 compares the RSSI signals acquired in various time domains for diferent trafc modes. Figure 1 shows that diferent trafc modes exhibit distinct time-domain characteristics. Specifcally, the RSSI signal generated by the cycling mode has a larger frst-order derivative than that generated by the walking mode because speeds change more frequently during cycling than during walking. In contrast, the RSSI signals generated by the walking mode have more connections because more time is required for users to walk through the coverage area and for signals to be sent and received repeatedly by the same detector. Tis fgure demonstrates the feasibility of using RSSI signals generated by Wi-Fi detectors to classify diferent trafc modes. In brief, feature extraction and trafc mode classifcation techniques can divide trafc modes in mixed trafc networks into two categories, walking and cycling, according to the RSSI signals generated by the trafc participants.

System
Architecture. Te purpose of this study is to design an enhanced trafc travel mode identifcation system by exploring the sequence information acquired by Wi-Fi detectors. Figure 2 illustrates the architecture of the Wi-CL system. Te fgure shows that the Wi-CL system consists of four main modules: a data acquisition module, a data processing module, a feature extraction module, and a mode classifcation module. Te data acquisition module captures the detection requests broadcast by Wi-Fi-enabled devices in the coverage area, recording information such as MAC addresses, RSSI sequences, and timestamps, and collecting the information into packets [34]. Te packets are stored in internal databases or transferred to central databases via WLAN. To improve the results, multiple sensors can be placed at a site to increase the probability of capturing packets when scanning the channel. In this study, the pedestrian and cyclist tracking data are anonymous to prevent the potential leakage of personal information. Tus, each fxed MAC address is not associated with any personal information, such as names or phone numbers [35]. Te data processing module has three key functions: removing redundant data, and erroneous data generated by motor vehicles; recovering missing data due to packet loss; and reducing signal noise caused by the environment. Te feature extraction module extracts the parameters and relevant features of the model, such as the driving speed, number of connections, and frst-order derivatives of the RSSI signals. Te mode classifcation module has two key components: LSTM training and prediction. In the frst part, the module trains the LSTM model based on the relevant features; in the second part, the signal features corresponding to the MAC addresses are classifed into two diferent nonmotorized trafc modes, namely, walking and cycling, based on the trained LSTM model.
Te system is divided into two phases: an ofine phase and an online phase. In the ofine phase, the system uses Wi-Fi detectors to collect a large amount of smart electronic device data in the coverage area and performs data processing to integrate the raw data into a new set of features and train the LSTM model. In the online phase, the system converts the detected real-time data packets into feature vectors through the data processing and feature extraction modules and feeds the feature vectors into the previously trained LSTM model to calculate the trafc trips with the highest probabilities.

Methodology
Tis section details the operation of the four modules in the proposed system. Te main symbols and meanings are shown in Table 1.

Data Collection and MAC Address Grouping.
In the coverage area, the Wi-Fi detector passively collects data from all surrounding Wi-Fi-enabled smart electronic devices. Te system will encrypt the user's personal and private data and package the MAC address, RSSI signal, and other relevant data [10]. Because each detector captures a large number of MAC addresses during each scan, the raw packets must be grouped according to MAC addresses [16], and the basic idea of this process is shown in Figure 3.
Assume that there are m detectors deployed along the nonmotorized road lanes. For the k th target, the time interval between the frst and last packets detected by sensor m can be expressed as equation (1), and if the target is detected only Te matrix G m k represents the RSSI dataset collected by sensor m for moving target k and is sorted in ascending time order. Te data samples include the MAC address, timestamp, and RSSI value of target k, and each row represents a data sample collected by sensor m at a certain time.

Data
Processing. Since Wi-Fi detectors were not originally designed for trafc sensing, the RSSI values of user devices usually contain more noise in outdoor environments; thus, this noise must be eliminated before the trafc modes can be identifed [36]. Te processing method has three key steps: fltering anomalous and redundant data, recovering missing data, and eliminating data noise.

Filtering Abnormal and Redundant Data.
Te data collected by Wi-Fi detectors inevitably includes some data from motorized vehicles, even in environments where walking and bicycling are the primary modes of transportation, such as campuses and residential areas. Terefore, the frst data processing step for this system is removing potentially inaccurate data generated by motorized vehicles, which can lead to signifcant errors in the identifcation of nonmotorized trafc modes. Similar to the accelerometer-based vehicle detection algorithm [37], RSSI signal-based vehicle detection is based on the fact that vehicles travel faster than nonmotorized vehicles. Tus, we apply an average speed threshold algorithm to identify RSSI signals generated by motor vehicles [38]. Te average operating speed of device k in the monitoring area of detector m can be calculated using equation (3), where ∆l m k denotes the Euclidean distance between target k and Wi-Fi detector m. (3) Wi-Fi detectors cannot estimate target locations based on single RSSI values collected by one detector. In other words, sensors at diferent locations may obtain the same RSSI value due to noise interference. To reduce interference caused by inaccurate data, this study assumes that the target is on diferent sides of the detector when the number of connections is greater than a given threshold [39]. In particular, when target k moves from the frst detected position to the last detected position, the moving distance can be calculated as: where ε m is the threshold value for the number of connections, which is related to the antenna characteristics of the detector, antenna type, and detection radius. When n m k ≥ ε m , the frst and last packets are located on diferent sides of detector m; conversely, when n m k < ε m , the frst and last packets are located on the same side of the detector. If we set ∆l m k � 0, the mode classifcation fails because the number of connections is too small. Tus, in this study, the value of ε m was set to 5.
In 2010, Parkin and Rotheram [40] experimentally measured and analyzed volunteers of diferent ages, genders, bicycle types, and cycling experience in the Leeds, UK, in diferent dimensions and for diferent purposes. Te experimental results showed that over the gradient 3% to +3% the eighty-ffth percentile speed varies from 18 kph to 25 kph, and this suggests that 25 kph is a reasonable design speed to adopt for cycle trafc. Since the actual road gradient in this experiment is within the conditions of the above experiment and does not have a sharply changing road alignment, the experimental result of 25 kph, i.e., 7 meters per second, provided in the literature [40] will be used as the threshold to diferentiate the speed of bicycle trafc in this experiment.

Recovering Missing Data.
Within the communication range of Wi-Fi-enabled smart electronic device detectors, detectors may fail to receive detection requests broadcast by Wi-Fi-enabled smart devices due to shielding and environmental factors. If a packet is lost, the RSSI value captured by the detector is displayed as NULL. In the literature [41], Dong and Dargie demonstrated that the moving average (MA) method is an applicable fltering method for signal fuctuations. Te MA approach uses a set of existing serial data to predict the next phase or phases of data. However, the original MA algorithm changes the entire RSSI signal sequence. Terefore, in this paper, the MA algorithm is improved by interpolating only the missing data [42]. For a certain MAC address k, the modifed MA algorithm can be calculated as follows: where S ff k,i is the i th value of the RSSI signal sequence after excluding inaccurate data and w is the given window size, which has a considerable impact on algorithm performance. Te value of w was set to 4 in this study.

Eliminate Data Noise.
In this paper, we show that the distance-based method does not need a lot of pretraining for parameters. RSSI ranging has the advantages of low cost and low time system requirements and is independent of transmission delay, antenna delay, and other factors. Te strength of the wireless signal can be used to determine the distance between the transmitting node and the receiving node without requiring additional hardware. Terefore, an a priori database is not required for support. However, the Wi-Fi detector is sensitive to the surrounding environment, and the realtime data collected contains a lot of noise, which needs to be removed before the data is used for trafc pattern recognition. In order to overcome the problems of RSSI signal instability and inaccurate range estimation in ranging methods, scholars have proposed noise reduction preprocessing using Kalman flter [43], Bayesian flter [44], and particle flter [45] according to the characteristics of fuctuating real-time RSSI signal. Te main idea is based on iteration. Terefore, if the initial detected RSSI sequence contains large errors, the accuracy of the algorithm will be greatly afected. In our previous work [20], we proposed a constant velocity Kalman flter (CVKF) algorithm for noise reduction. Te CVKF algorithm efectively solves the large error problem in the previous observations of the RSSI sequence by embedding a constant speed flter.

Feature Extraction.
After the above data processing steps, the fltered RSSI signals are fed into a classifer to distinguish diferent trafc travel modes. Regardless of the classifcation technique, dichotomy-free classifcation is possible only if the signals of diferent trafc modes do not substantially overlap in feature space [46]. Speed-related variables are the main classifcation features in the relevant literature [22]. However, the use of speed variables alone does not guarantee satisfactory results. For example, in congested areas, diferent trafc modes move at similar speeds [5]. Terefore, feature selection is crucial in classifcation systems.
Although feature selection is usually not necessary in deep neural networks (DNNs), we fnd that using raw or fltered RSSI signals as input does not provide high prediction accuracy. Various methods have been applied to select the most relevant features for improving classifcation performance, such as analysis of variance (ANOVA) tests [47] and relative mutual information (RMI) [37]. However, the use of statistical tests or mutual information to select the top-ranked features is insufcient because the fltered RSSI signals may still be noisy. In the literature [5], Kalatian et al. proposed the ReliefF algorithm for feature selection and assigned diferent weights according to the importance of the variables, where the basic idea is to estimate the quality of the variables based on their weights to distinguish highly similar detection results.
However, the algorithm takes more time to train and analyze the results since it uses 15 features as input to the classifer. In this study, we identify and select key variables in each category based on the literature results [5]. Te variables in this paper include the movement velocity v eff k , number of connections n nff k , and frst-order derivatives of the RSSI time series S sff k � S sff k,1 , S sff k,2 , . . . , S sff k,t . Te number of connections and frst-order derivatives of the RSSI signals can be easily calculated from the processed data; however, the operating speed of device k is difcult to calculate. In the literature [22], Lesani and Miranda-Moreno used the average travel speed (ATS) [48] as the operating speed variable. However, we found that the speed values estimated by this method are inaccurate because the detection targets often encounter unexpected events, such as extreme weather and congestion, in real trafc environments. Another movement speed estimation method is calculating the ratio of the real-time physical distance to time and converting the fltered real-time RSSI data into the realtime physical distance, which is known as the real-time travel speed (RTS) [49]. For MAC address k, the realtime travel speed estimation method can be expressed as follows: where v rff k,t denotes the real-time movement speed of device k at moment t in the coverage area, d eff k,t is the physical distance between the target and detector at moment t for device k, and τ represents the time interval between two consecutive detections. Te physical distance is based on the fltered RSSI data. Unfortunately, the fltered RSSI signal may still be noisy, so the accuracy of the driving speed calculated by this method needs to be improved. To address this problem, in this paper, we embed the moving average flter into the real-time travel speed and propose a new travel speed estimation method called the real-time fltered travel speed (RFTS).
Te key idea of the RFTS algorithm is to convert the RSSI signal into a physical distance between the moving target and the detector. Typically, the most commonly used propagation model to describe the relationship between the RSSI and physical distance is the logarithmic distance path loss model, as shown in equation (7), where λ represents the environment-specifc loss parameter and B denotes the calibrated RSSI value when the distance between the detection target and detector is set to 1 m. For a given S eff k,j , the distance estimator can be converted to equation (8), where the values of B and λ are determined through extensive experiments [20].

Mode Classifcation.
Traditional machine learning methods rely heavily on manually extracted features, resulting in issues with feature extraction in machine learning-based image recognition, speech recognition, and natural language processing approaches [50]. Fully connected neural network-based methods also encounter various problems, such as too many parameters and an inability to utilize time series information in the data [51]. As more efective RNN structures have been proposed, the ability of RNNs to mine time-series information and semantic information has been fully utilized, and breakthroughs have been achieved in speech recognition, language modeling, machine translation, and time-series analysis [52]. An RNN is a typical DNN, and the most substantial diference between RNNs and traditional neural networks is that each previous output is sent to the next hidden layer during training in an RNN. Recurrent neural networks portray the relationship between the current output of a sequence and the previous information. Structurally, an RNN remembers the previous information and uses it to infuence the output of the following nodes. Tus, the output depends on the current input information and memory units [53].
An RNN has an additional weight, namely, the hidden state of the hidden layer unit, and can process variablelength sequences with a recursive hidden state whose activation depends on the previous state. Terefore, RNNs are suitable for the mutual interpretation of repetitive sequence data.
We assume that the input sequence of the classifcation model is x � x 1 , x 2 , . . . , x t . At moment t, the RNN updates its hidden state h t according to equation (9). ∅ is an activation function, such as a logistic sigmoid with afne transformations. Traditionally, recursive implicit states can be updated by h t � g(Wx t + Uh t−1 ), where g is a smooth bounded function and W and U are weights.
However, RNNs utilize gradient-based optimization algorithms, increasing the difculty of training long sequences [54]. In other words, the rate of change of the weights decreases sharply over time, which tends to result in undertraining long sequences [55]. In contrast, LSTM models have memory that can be read, written, and deleted, and these functions allow LSTM models to select the data that should be remembered [15]. In this study, the proposed RNN model includes an LSTM module and an output layer for classifcation. Te structure of the LSTM module is shown in Figure 4.
Here, c t is the signal that follows the up line, x t is the input vector, and h t is the hidden state (value of the recurrent weight). Te input and previous hidden state enter the forget gate frst. Te output of the forget gate f t can be calculated as follows: where W uf is the weight between the cell state and forget gate and b uf is the additive bias of the forget gate. Te second step determines which input to choose. Tis step has two substeps, as shown in equations (11) and (12).
where W ui and W ug are the weights between the cell state and the input and external output gates, respectively. Moreover, b ui and b ug are the additive biases of the input gate and external output gate, respectively. Next, the LSTM model updates c t according to the outputs of these two gates with equation (13).
Tese changes are applied to h t , and the hidden state is updated as shown in equations (14) and (15).
where W uo is the weight between the cell state and the output gate and b uo is the additive bias of the output gate. In addition, the sigmoid function δ(·) and hyperbolic activation function tan h(·) are used as activation functions. Finally, to identify the walking or biking mode, we input the feature h t , which is extracted in the last LSTM cell into a single perceptron layer. Te output h θ of the model is calculated as follows: where W θ is a weight matrix that transfers the values in the fully connected (FC) layer to the output layer and b θ is a bias factor. In equation (16), the sigmoid function δ(·) is used to transform the logit of a single neuron in the fnal stage to calculate the probability of classifying walking or biking. Te packet captured by each detector is split into windows after processing. If there are p frames in a window, the inputs are passed through the LSTM p times. As previously explained, in this study, each frame in the window has three features. Following feature extraction, the feature values are normalized in the range (0, 1). Finally, all features within a window are input into the LSTM model.

Hardware Platform.
Te hardware used for data acquisition in this experiment is shown in Figure 5. In this fgure, 1 is the Wi-Fi detector charger, which converts from produced by Chengdu DataSky Company, China, and has been proven to be suitable for use in outdoor environments; 3 is the NewsMY-W12 battery source; 4 is the cable connecting the battery source and Wi-Fi detector; 5 is a tape measure used to measure the straight-line distance between the target and the Wi-Fi detector; 6 is a LAN cable that allows the collected data to be transmitted to a personal computer in real-time; and 7 is a laptop.
For short-term feld experiments, the Wi-Fi detector can be powered by a mobile power supply, while for long-term feld experiments, an external AC power cable should be connected to the detector. Before the formal experiment, the DS-007 detector was pretested to evaluate its performance, such as its scan interval, signal fading rate, directional inhomogeneity, detection rate, and packet loss rate.
During the preexperiment, we tested diferent numbers and diferent types of small samples of travelers by calculating the values of B and λ by collecting the RSSI values between the target and the wireless detector at diferent distances. We moved the smart device from 1 m to 15 m in steps of 1 m and acquired 25 RSSI values at each fxed point. Ten, the outliers at each position were discarded by computing the mean and variance of the measurements. In detail, we removed data samples that were greater than one standard deviation from the mean. Subsequently, we performed a logarithmic interpolation of the RSSI data according to equation (8). Te resulting ftting curve is shown in Figure 6. Te calibrated values of B and λ were set to B � 49.51 and λ � 1.2. For the DS-007 detector, a calibrated PM was used in this study. Finally, the RSSI value was converted to a distance with equation (17).
In addition, according to the pre-experiment test results, the efective detection area of the DS-007 Wi-Fi detector can be approximated as a sphere with a radius of 30 m, which is slightly smaller than the 50 m radius of the Wi-Fi detector used by Lesani and Miranda-Moreno [22]. Te efective detection range of the detector depends on the chosen antenna. Antennas with low gains have lower detection rates but more accurate velocity estimations. Terefore, in this experiment, when the linear distance between the intelligent terminal device and the DS-007 Wi-Fi detector was less than 30 m, all connection details were collected by the detector.

Data Acquisition.
To validate the proposed Wi-CL system, four DS-007 Wi-Fi detectors were deployed in a specifc area (namely, a circle consisting of four streets) at the SCUT, Wushan Campus, to collect Wi-Fi trajectories as participants walked or cycled. It is important to note that unlike traditional intrusive trafc sensors such as toroidal induction coils, piezoelectric sensors, and magneto-resistive sensors., the deployment of the Wi-Fi detectors proposed in this paper utilizes existing trafc support facilities such as trafc signal frames or intersection light poles and does not require additional deployment of road structures. Terefore, the deployment of the experimental detectors does not afect the road trafc environment and normal trafc operations. It should be pointed out that the equipment proposed in this paper is easily afected by the multipath efect when conducting data acquisition outdoors. Te multipath efect means that the electromagnetic wave propagates through diferent paths and the component felds reach the receiving end at diferent times according to their respective phases, causing interference and distortion or error of the original signal. Te multipath efect will lead to signal fading and phase shift. Terefore, before sending the collected data into the pattern classifcation system, the data processing and fltering module is required to flter the RSSI signal.
Tis campus has a large-scale pedestrian and cyclist network with reduced vehicular trafc. Terefore, this is a suitable place to test the proposed system. In this study, we recruited four volunteers from the Intelligent Transportation Laboratory of the SCUT for data collection. In the experiment, in order to control the rationality of the experiment, the participants were all school students, aged about 20-28 years old, with a male to female ratio of about 2 : 1. Te gait speed was normal human speed. We conducted 20 replicated experiments at each testing location to reduce random errors. Te participants were encouraged to carry a Wi-Fi-enabled smartphone and move on the road by walking and cycling. A total of 160 trips, including 80 walking and 80 biking trips, were collected; 70% of the trips were included in the training set to calibrate the developed model, and the remaining 30% were used to validate the performance of the classifer. Furthermore, in the WiFibased approach, our pre-experiments demonstrated that  Te trace data in this experiment is labeled data; for example, the MAC Address is unique, so there will be no confusion during detection. Tere are also fltering modules, LSTM modules, etc. for auxiliary identifcation, and they all have good signal noise reduction and feature extraction performance.
Te experimental measurement data used in the analysis were collected during two separate periods: (1) 10:00 to 1100 on July 10, 2019, namely, the fat peak, and (2) 12:00 to 13:00 on July 10, 2019, namely, the noon peak. Two data collection periods were included to compare the impact of crowding on the classifcation performance of the system. Te locations of the DS-007 Wi-Fi detectors and the participants' trajectories are shown in Figure 7. Mutual infuence between the detectors was ignored only when the distance between two detectors was considerably greater than the coverage area of the detectors. On-site, the locations of the detectors were carefully determined to ensure that overlap did not occur between the sensing ranges of diferent detectors. Te shortest distance between the detectors was approximately 200 m, which is greater than the detection radius of 30 m.

Performance Analysis of the Proposed System.
In this subsection, we evaluate the performance of the proposed system based on the collected dataset. Te evaluation has three key objectives: (1) to assess the noise reduction performance of the system; (2) to evaluate the speed estimation performance of the system; and (3) to compare the proposed classifcation framework with traditional classifcation algorithms.

Noise Reduction Performance Analysis.
Te high volatility of RSSI signals causes system errors; thus, the tolerance to RSSI signal fuctuations is an important performance metric. In the literature [20], we demonstrated that the CVKF flter can efectively suppress the localization divergence caused by RSSI signal fuctuations, regardless of whether pedestrians are stationary or moving at low speeds. However, we do not know whether the CVKF flter is equally applicable to bicycles, which travel at faster speeds than walking pedestrians. Terefore, in this study, we frst analyze the noise reduction performance of the system. In the experiment, the RSSI signal was received by the smart terminal device in three scenarios: (1) the tester remained stationary for 80 seconds at a distance of 10 m from the detector; (2) the tester started at a distance of 30 m from the detector, approached the detector, and walked 30 m away from the detector; and (3) the tester started at the detector, moved 30 m away from the detector, rode a bicycle to the detector, and rode their bicycle 30 m away from the detector. While walking or riding, the tester maintained a constant speed as much as possible. Te ground truth was generated at several reference points with measured positions. A stopwatch was used to record the time taken by the tester to pass these reference points and was interpolated to obtain the true position of the ground between these reference points [56]. In addition, we assume that the pedestrian moves at a constant speed between the two reference points.
In fact, since the detection cycle of a single Wi-Fi detector is short. Terefore, it can be assumed that the activity during the detection cycle is a single activity, i.e., the user is either riding a bike or walking. For a case corner task such as someone in the middle of a bike ride, outliers can be handled by the action of the fltering module. Moreover, using the distinction between stationary and active states is relatively simple for the RSSI-based approach, as shown in Figure 8. Figure 8 compares the raw RSSI data and the fltered RSSI data collected in the three scenarios. Te RSSI fltering algorithms use CVF, KF, and CVKF flters. As shown in Figure 8, in schemes (1) to (3), the raw RSSI data collected by the detector are relatively noisy. Tus, if these data are used directly for mode classifcation, large errors may occur. For example, as shown in Figure 8(a), the raw RSSI data fuctuate quickly between 0 dBm and 10 dBm even when the data are collected at the same fxed location. Te raw RSSI data are processed using a fltering algorithm to obtain smoothed  data. In Schemes (1) to (3), the CVF, KF, and CVKF algorithms enhance data with less errors more than unfltered data. However, signifcant error peaks in the frst few RSSI data sequences have a considerable impact on the optimization results of the KF algorithm. In Figure 8(a), the frst few raw RSSI values have large errors at 0-10 s, leading to large errors in the KF algorithm (the fltering performance improves only after approximately 10 s). Unfortunately, in a real trafc environment, the possibility of peak errors in the frst few data sequences cannot be eliminated. Moreover, the CVF algorithm may overft data with peaks, as shown in Figure 8(b) between 50 and 60 s and Figure 8(c) between 12 and 17 s. Tis result likely occurs because the prediction principle of the CVF algorithm is based on a fxed speed, which is not sensitive enough to the actual situation of the RSSI peaks. It is worth noting that the CVKF algorithm proposed in this paper addresses the above two problems.
In addition, this study evaluates the efectiveness of the proposed flter by converting the RSSI into a distance value. Two evaluation metrics are considered: the mean error and the root mean square error. Te estimated distance errors of the CVF, KF, and CVKF algorithms for the three cases are shown in Table 2.
Te distance estimates obtained from the original unfltered RSSI data, including stationary, walking, and biking data, are subject to large errors. Regardless of whether the CVF, KF, or CVKF algorithm is used, the mean error and root mean square error (RMSE) are larger in the sports environment than in the stationary environment. Tis may be related to the fact that people move faster in walking and cycling environments, resulting in larger signal fuctuations. On the one hand, for the walking scenario, the average error of the CVKF algorithm is 5.74 m, which is 27.78% (7.94 m) and 7.57% (6.21 m) less than the average errors of the CVF and KF algorithms, respectively. On the other hand, for the cycling scenario, the average error of the CVKF algorithm is 5.53 m, which is approximately 19.74% (6.89 m) and 18.68% (6.80 m) less than the average errors of the CVF and KF algorithms, respectively. Terefore, the fltering performance of the CVKF algorithm is better than that of the CVF and KF algorithms in these three cases. Moreover, the average error of the KF algorithm in the cycling environment (6.80 m) is larger than that in the walking environment (6.21 m), indicating that the KF algorithm cannot adapt to changes in the cycling environment. In contrast, the CVKF algorithm maintains a better fltering performance, even in the faster cycling environment.

Speed Estimation Performance Analysis.
Travel speed is the main feature of existing trafc travel mode classifcation models based on data collected by smart electronic devices. However, most studies do not validate the estimated travel speeds. In addition to analyzing the noise reduction performance of the system, in this study, we verify the accuracy of the estimated travel speed extracted from the collected MAC addresses and RSSI signals. To verify the accuracy of the estimated travel speeds, the locations of the testers during movement were generated using the specialized open-source software Sensorlog, which is a mobile data collection and annotation application that is efective for collecting mobile location data, as explained in [57]. Tis application can be downloaded from the Google Play Marketplace. In this study, we used the velocity data of the testers collected by this application as ground-truth data. In this experiment, we compared the ATS, RTS, and RFTS data collected in four scenarios. Te four scenarios are defned as follows: (a) walking data collected during the fat peak; (b) walking data collected during the noon peak; (c) cycling data collected during the fat peak; and (d) cycling data collected during the noon peak. Figure 9 shows the vehicle speeds estimated by the three methods in the four scenarios, and Figure 10 shows the cumulative distribution functions (CDFs) of the speed estimation errors for the ATS, RTS, and RFTS algorithms.
As shown in Figure 9, the ground truth speeds for walking and bicycling are relatively stable during the fat peaks (scenarios (a) and (c)). In contrast, during the midday peak hours, the ground truth speeds for walking and cycling decrease sharply. For example, the cycling speed decreases substantially from 1.5 to 2 2 m/s to 0.3∼0.8 m/s within 20-30 s in scenario (d). Tis decrease may be due to the increase in pedestrian network trafc on Damien Hill Road during the noon peak period due to students leaving school. As a result, bicyclists had to reduce their speed when they reached that road section. Te ATS algorithm uses the average moving speed of the moving target in the coverage area, which is infuenced considerably by the frst and last RSSI signal values. If large errors occur, the accuracy of the ATS algorithm decreases rapidly. In addition, this algorithm does not accurately refect changes in the velocity of the moving target during the monitoring period. For example, in scenarios (b) and (d), the ATS algorithm maintains the original value even after the speed of the moving target changes. In addition, when the frst few values in the RSSI sequence have large errors, the speed estimation result of the RFTS algorithm is closer to the ground truth speed than that of the RTS algorithm. For example, in scenario (d), the RSSI values are closer between 0 and 5 s, and the RTS algorithm estimates a velocity value of 0 m/s. However, the RFTS algorithm calculates the average of the estimated velocities within a window, and the fnal smoothed input is 0.5 m/s. Table 3 shows the vehicle speed estimation errors for the three methods in the four scenarios. According to Figure 10 and Table 3 Inevitably, the estimated speed errors obtained by the three algorithms are larger in the cycling environment than in the walking environment. However, it is worth noting that the proposed RFTS algorithm has the smallest error among the three algorithms. In other words, compared with the other algorithms discussed in this paper, the RFTS algorithm is more stable in the four cases, and the estimated travel speed is closer to the ground truth speed.

Classifcation Accuracy Performance Analysis.
We compare the proposed classifcation framework with LR [58], SVM [59], and MLP [5] three machine learning algorithms that are widely used in classifcation models. Te parameters in the LR, SVM, and MLP models are well tuned to achieve good accuracy. Te SVM classifer uses a linear kernel function with a soft edge constant of 1. Te MLP classifer uses the following parameters: number of epochs: 200; optimization method: Adam; number of hidden layers: 2; input and hidden layer activation function: ReLU; all hidden layer activation: 4; output layer activation function: sigmoid; and batch size: 20. In addition, the classifcation framework designed in this paper consists of an LSTM layer and a fully connected layer. Te sigmoid activation function was used in the output layer, and the cross-entropy loss function and Adam optimizer were applied. Te ReLU activation function was used between the outputs of the LSTM layer and the fully connected layer. In addition, the output size of the LSTM layer was 128, and the size of the fully connected layer was 1. When the length of the data in the batch was inconsistent, we padded the data with 0 s in front. For a fair comparison, we perform the same data processing and feature selection methods for the RSSI signals. For each classifer, we perform 10 cross-validations on the collected dataset [46]. For a detailed analysis of the results, the results of each algorithm are shown in Table 4. Te experiments were conducted on a Linux system on a Lenovo G40 computer with an Intel(R) Core (TM) i5-4258U CPU @2.40 GHz, Python version 2.7.15, and Ten-sorFlow version 1.12.0 CPU model. Te classifcation metrics considered in our analysis are accuracy, precision, and recall. We determined the values of these metrics for each travel mode class and reported the average of each class value for each classifer. Te accuracy is defned as the number of blocks that were correctly classifed as belonging (true positive) or not belonging (true negative) to a class divided by the total number of inferences (overall). Te precision is obtained by dividing the number of correctly classifed blocks by the total number of inferences made for that class (true positives + false positives). Te recall is calculated by dividing the number of correctly classifed blocks by the total number of blocks that belong to that class (true positives + false negatives) [7].
In Table 4, in the analysis of the algorithm prediction results, the header columns are the actual labels, and the header rows are the predicted labels. Tese data show the error ratios for diferent error attributes. To further analyze the LSTM model designed in this paper, the accuracy and training loss are shown in Figure 11. We can summarize some interesting fndings as follows: (i) Te classifcation process starts with the training of the LR model. Although the calibration process is simple, the 72.92% accuracy of the LR model is not satisfactory and is the lowest among the four models. Compared with the LR model, the accuracy of the SVM model is improved, reaching a value of 79.14%, which is still below 80%. Te MLP model for predicting the moving modes achieved better recall and accuracy scores than the frst two models; however, the results were still unsatisfactory. To reduce the error in the MLP model and improve the     classifcation prediction accuracy, the LSTM algorithm was experimentally implemented. Te overall prediction accuracy of the LSTM model was 97.92%, and the check-all rate and accuracy of the labeled observations were greater than 90%, which were higher than the respective values of the frst three models. Te results show that for the classifcation of walking trafc modes, the LSTM model exhibits the best classifcation performance among the four models.
(ii) Cycling is the transportation mode with the lowest recall rate. Te recall rates of the LR, SVM, MLP, and LSTM models were 59.33%, 58.33%, 79.17%, and 95.83%, respectively. A large number of observed cycling trips were classifed as walking trips. Tis error may refect the fact that in crowded spaces, cycling and walking share many speedrelated features, which increases the difculty of distinguishing the two as diferent modes. In this case, the LSTM model exhibits the most stable classifcation performance among the four models. Out of 24 bicycle observations, 23 observations were correctly predicted, and only one observation was incorrectly predicted as walking, yielding a recall rate of 95.83%. Tis recall result is more accurate than the results of the LR, SVM, and MLP models. (iii) Figure 11 shows that the accuracy of the LSTM model reaches 95% in the frst 50 epochs during training, indicating that the LSTM model can effectively classify nonmotorized trafc modes. In addition, the loss of the LSTM model decreases signifcantly in the frst 100 epochs. Between the 150 th epoch and the 200 th epoch, the loss does not change substantially, as shown in Figure 11(b). Te results show that the model converges to the optimal solution by the 200 th training epoch. (iv) As shown in Figure 12, the accuracy of the four models of LR, SVM, MLP, and LSTM improves sequentially in diferent time windows. In general, the accuracy of the four types of models slightly decreases as the time window gets larger, but LSTM maintains high performance and high accuracy. Te accuracy stays above 95% in all fve-time windows tested, which is the best among the four types of models proposed.

Conclusion
Tis study considers nonmotorized travel mode classifcation and proposes a nonmotorized travel mode classifcation system using only a single Wi-Fi detector as a data source based on existing research. Te proposed system achieves fne-grained identifcation of diferent trafc travel modes with a low deployment cost, good real-time performance, and satisfactory recognition accuracy. In contrast to other related studies, this study does not combine Wi-Fi detection data with other data sources to explore the travel patterns of trafc participants; thus, our approach is more cost-efective and easier to implement in practice. More desirable results were achieved in terms of processing data anomalies and efectively reducing signal noise. Te proposed RFTS algorithm has the smallest speed estimation error among four comparison algorithms, and the results are closer to the real movement speeds of trafc participants. Moreover, the proposed algorithm achieves good results in terms of travel mode classifcation accuracy, which is our greatest concern, and has the best results among the four algorithms in terms of both classifcation accuracy and recall recognition rate. Following this research, we can also collect more valuable trajectory-type data for data mining, such as origindestination backpropagation, urban trafc state estimation, trafc trip characterization, and trafc safety assessment. Moreover, we can better understand the basic relationship among trafc fow velocity, trafc fow density, and trafc volume. Te above research results can be used as an alternative model in future trafc information monitoring systems in smart cities.
In future work, this study can be improved in several ways, and future research will address the following three issues: (1) validating the efectiveness and reliability of the Wi-CL system for various road geometries and trafc demand modes and conducting more extensive feld experiments with the system; (2) improving the fltering algorithms and classifcation methods; and (3) designing a broader range of urban road network applications that cover a wide range of trafc modes, such as small cars, pedestrians, bicycles, subways, surface buses, and light rails.

Data Availability
Te data that support the fndings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest
Te authors declare that there are no conficts of interest.