Identifying IoT Devices Based on Spatial and Temporal Features from Network Traffic

. With the rapid growth of the Internet of Things (IoT) devices, security risks have also arisen. The preidentiﬁcation of IoTdevices connected to the network can help administrators to set corresponding security policies according to the functionality and heterogeneity of the devices. However, the existing methods are based on manually extracted features and prior knowledge to identify the IoTdevices, which increases the diﬃculty of the device identiﬁcation task and reduces the timeliness. In this paper, we present CBBI, a novel IoT device identiﬁcation approach. On the one hand, CBBI uses a hybrid neural network model Conv-BiLSTM to automatically learn the representative spatial and temporal features from the network traﬃc, such as the position relationship of the internal organization structure in network communication traﬃc, the time sequence of the data packets, and the duration of the network ﬂow. On the other hand, CBBI contains the data augmentation module FGAN that solves the problem of data imbalance in deep learning and improves the accuracy of the model. Finally, we used the public dataset and laboratory dataset to evaluate CBBI from multiple dimensions. The evaluation results for diﬀerent datasets show that our approach achieves the accurate identiﬁcation of IoT devices.


Introduction
With the rapid development of IoT technology, the types and numbers of IoT devices have been growing quickly. e powerful connectivity and convenience of the IoT devices make their applications increasingly widespread, and they penetrate almost every corner of life, including smart wear, smart homes, smart entertainment, and smart travel. According to [1], approximately 31 billion IoT devices were used globally by the end of 2020, and approximately 75 billion IoT devices will be used by 2025, of which smart homes [2] will account for 41%, reaching 12.86 billion.
However, many vulnerabilities exist in current IoT devices and execution environments [3][4][5]. Attackers are increasingly concentrating on these vulnerable IoT devices, using device vulnerabilities to launch attacks [6][7][8][9][10][11]. For example, malicious attackers used a cloud of vulnerable IoT devices to build a sizable and highly destructive botnet to launch large-scale DDoS attacks. In the first 1 TB DDoS attack conducted on the Krebs Security website, more than 400,000 IoT devices were utilized [3]. Additionally, attackers can use these vulnerable IoT devices as proxies for malicious activities, further deteriorating the network environment. With the proliferation of IoT devices with security flaws, IoT device-centric attacks could further increase.
From the defensive aspect, network administrators must implement network access control on all connected devices. Whenever a new device is connected to the network, and if the device can be identified, then the network administrator can take appropriate security precautions. For example, the administrator configures the corresponding firewall rules according to the security requirements of the device, verifies whether the device has known vulnerabilities, or notifies the intrusion detection system to isolate vulnerable devices.
Many of the current studies on IoT device identification concentrate on features based on statistics and manual extraction and then combine fingerprint matching, machine learning, or deep learning to recognize IoT devices. ese traditional IoT device identification methods face the following problems: (1) extracting features manually is a tedious and time-consuming process, and the low efficiency of feature extraction will affect the real-time performance of the classification model. (2) Feature extraction requires a professional domain prior knowledge and even professional feature engineering. Feature engineering involves feature extraction, feature construction, and feature selection, which, undoubtedly, further increases the difficulty of feature extraction. (3) e generalization ability of the model is also a concern. Before the training process in the traditional method, some seemingly trivial features might be discarded. ese features could help the model improve its generalization ability, which would enable the model to be extended to more IoT devices. e feature space generated by traditional feature engineering is relatively small, and it is difficult to extract these subtle features. With the increase in the number and diversity of devices in practical applications, the recognition accuracy of the model could drop sharply. (4) A surging number of IoT devices use encryption protocols, increasing the difficulty of feature extraction and device identification. (5) With the increasing number of IoT devices, the amount of data generated by them is also increasing, and hence, feature extraction requires more time and resources. erefore, it is difficult to meet the current needs using traditional features based on manual extraction.
In this paper, we present CBBI, a novel IoT device identification approach based on Conv-BiLSTM to learn the spatial and temporal features of the network traffic. CBBI contains three modules. e first module is the data preprocessing module, whose main task is to quickly process the raw network traffic generated by the IoT device and convert it into an input that can be used in the deep learning model. e second module is the data augmentation module FGAN that solves the problem of data imbalance in deep learning methods [12,13]. e third module is to establish a deep learning model. We designed a hybrid deep learning model Conv-BiLSTM. Convolutional neural networks (CNNs) can learn the spatial characteristics of network communication traffic, such as the positional relationship of internal organizational structures in the network communication traffic. A bidirectional long short-term memory network (BiLSTM) can extract the time-domain characteristics of the network communication traffic, especially the timing relationship and flow duration of the data packets. e accuracy and generalization ability of the model are further improved by learning the spatial and temporal features simultaneously. Even when confronted with the IoT devices that have similar functions produced by the same device manufacturer, CBBI can use the powerful feature learning capabilities of deep learning to extract the representative features and some potential subtle features from the original traffic, and finally, it can realize the accurate identification of the IoT devices based on these learned features. is paper is an extended version of [4]. Firstly, compared with [4], we added a data augmentation module FGAN to solve the data imbalance. At the same time, the corresponding comparative test was also added. Secondly, we added our own dataset to the experiment and made some experimental evaluations based on the dataset. Finally, we added an additional visualization part to prove that CBBI can learn the representative spatial and temporal characteristics from the device communication traffic.
We summarize the main contributions as follows: (1) We propose a novel IoT device identification method, CBBI. is method does not require any prior knowledge about feature engineering. It avoids the overhead of manual feature extraction and decreases the complexity of IoT device identification tasks. (2) CBBI extracts the spatial and temporal features from the original traffic generated by the device, including some potential subtle features to identify the IoT devices, increasing the generalization ability of the model. (3) CBBI contains the data augmentation module FGAN that solves the problem of data imbalance in deep learning and effectively improves the accuracy of the model. (4) We conduct extensive experiments on the public dataset and laboratory dataset to evaluate the performance of CBBI. e results show the superiority of the proposed model. e remainder of this paper is organized as follows. Section 2 summarizes the related work on IoT device classification. Section 3 describes our proposed IoT device identification method, which includes data preprocessing, data augmentation, and Conv-BiLSTM. Section 4 describes the experiment setup. Section 5 presents the evaluation results and analyses. Finally, we conclude this work in Section 6.

Related Work
For the identification of IoT devices, researchers have proposed many solutions. In this paper, the existing research studies are summarized and discussed from two aspects: device identification technology based on a classification model and device identification technology based on active detection.

Device Identification Technology Based on a Classification
Model. Because of the differences in the software and hardware used in the IoT devices, there will also be subtle differences among the different devices produced by the same manufacturer. Researchers use the subtle differences in the hardware of the device, such as the clock offset [14][15][16][17], as the fingerprint of the device. en, they construct a classification model to realize the accurate identification of the target device. In the traditional method, wireless devices can be identified by some unique radio frequency (RF) fingerprints caused by radio circuits [18,19]. Yuan et al. [20] fingerprinted wireless devices by extracting the features caused by the hardware defects in the analog circuits. An important advantage of using these physical defects as device signatures for device identification is that it is difficult to use other wireless devices to spoof the signature. Brik et al. [21] designed and implemented a technology that uses the passive radio frequency analysis to identify the source network interface card (NIC) of IEEE 802.11 frames.
Radhakrishnan et al. [22] used the arrival interval time of packets in specific traffic types generated by the devices as the feature vectors. ey used these feature vectors to train the artificial neural network (ANN). Miettinen et al. [23] proposed IoT Sentinel, which is a system for the automatic recognition of IoT devices. e system extracts 23 features from the data packets as device fingerprints and identifies the devices using a two-step classification method.
Guo and Heidemann proposed a method that analyses the DNS traffic to detect the IoT devices and identify their type [24]. Marchal et al. [25] automatically identified the type of IoTdevices in the local network based on the periodic background network traffic of the IoT devices. e method needed 30 minutes to identify the type of devices, and the accuracy rate reached 98.2%.
angavelu et al. [26] used controllers to control the gateways based on a software-defined network (SDN). e controller implements the training and updating of the model and sends the newly trained model to each gateway. Sivanathan et al. [27] used statistical attributes such as the device traffic activity cycle, port number, signaling mode, and cipher suite as fingerprint features of the device. en, they used a multistage machine learning classification algorithm to identify the device.
WDMTI [28] uses 18 features extracted from DHCP messages to establish a hierarchical Dirichlet process (HDP) model to identify the wireless devices. is method relies on the bursts of traffic when the device is connected to the network. OWL [29] analyzed the broadcast and multicast packets in the wireless local area networks (WLANs), built a multiview deep learning (MvWDL) model based on the features extracted from each protocol message, and classified the IoT devices.
According to the unique network traffic pattern of the IoT devices, Deng et al. [30], firstly, extracted all available features from each TCP flow header. Secondly, they used the principal component analysis (PCA) algorithm to select the main features that affect device recognition. Finally, they learned the device-specific network traffic signature based on a random forest classifier to achieve device identification. Yin et al. [4] proposed an end-to-end IoT device identification method that directly uses the original communication traffic generated by the device. is method fails to fully consider the problem of data imbalance. In the face of extremely unbalanced datasets, the performance of the model may be greatly compromised.

Device Identification Technology Based on Active
Detection. Active detection refers to actively sending detection packets to the devices in the network, obtaining response packets, and extracting device information by analyzing the information in the response packets. Attackers usually obtain information about vulnerable devices in the network through active detection before launching an attack to improve the accuracy of the attack. Researchers also use the active detection method to determine the state of the devices in the network, so they can take further security measures to ensure the safety of the devices in the network.
In practice, because of the large number of IoT devices and the lack of training data, researchers use banner information instead of device fingerprints to identify the IoT devices. Antonakakis et al. [31] applied the banner rules to analyze the online devices from Censys [32] and Honeypot. Shodan [33] and Censys [32] are the two popular search engines that are mainly used to discover online devices. Both search engines use different protocols (such as HTTP, SSH, FTP, and TELNET) to perform Internetwide scans.
Many researchers [34,35] use banner information acquisition to actively scan the devices in the IP space. ey collect and check the text features from the response, such as hard-coded keywords, and match them with known fingerprints for device identification.
Li et al. [36] established a framework for searching devices on the Internet using network measurement and banner grabbing to obtain services running on the network hosts and to match the response header fields with prestored keywords to retrieve device information.
Feng et al. [37] proposed an acquisition rule-based engine (ARE) that can automatically generate rules for discovering and annotating the IoT devices without any training data. ARE uses the application layer response data from the IoT devices and product descriptions in the related websites to obtain device comments, thereby constructing device rules. It solves the cumbersome and incomplete shortcomings of traditional methods based on manually writing banner information capture rules. Table 1 summarizes the main references aforementioned and shows the features and methods used in the relevant references. e aforementioned research works made important contributions to the identification of IoT devices and promoted the development of network security. e device fingerprint identification method based on the classification model mainly uses the physical difference of the device as the fingerprint of the device. Otherwise, it manually extracts some field values and related statistical characteristics in the device communication traffic. en, it is combined with machine learning or deep learning methods to construct a classification model. e device identification method based on active detection must actively send a multitude of detection packets to the target devices in the network, which is susceptible to packet loss and network delay. In addition, the frequent transmission of probe packets will increase the load on the network and aggravate the deterioration of the network environment. More importantly, if the device does not generate a response or if there is no valid information in the response packets, the device cannot be further identified.

Proposed Framework
e overall structure of the CBBI framework is shown in Figure 1. CBBI is composed of three modules: data preprocessing, data augmentation, and Conv-BiLSTM. Initially, the data preprocessing module converts the raw network traffic generated by the IoT device into an input that can be used in the deep learning model. Furthermore, the data augmentation module FGAN solves the problem of data imbalance in deep learning. Finally, the Conv-BiLSTM module simultaneously learns the spatial and temporal characteristics of the original traffic of the device, which improves the accuracy and generalization ability of the model.

Data Preprocessing.
In general, deep learning models cannot directly use the raw pcap data. ese original pcap files need to be processed into a format suitable for model input. e entire data preprocessing process includes three parts: flow generation, irrelevant field removal, and traffic vectorization.

Flow Generation.
e original communication traffic generated by the IoT devices contains different numbers of data packets, and the length of each data packet is also inconsistent. In other words, the original communication traffic generated by the devices can be defined as P � p 1 , . . . , p n , and each data packet can be defined as e value of i is i � 1, 2, . . . , |n|, where x i represents the 5-tuple information (source IP address, source port number, destination IP address, destination port number, and transport layer protocol type) of the packet, s i represents the size of packet p i , and t i represents the starting time of packet p i .
In this paper, the existing Splitcap tool [38] is utilized to process the original network traffic into a network flow with the same 5-tuple information, where the network flow can be Here, m represents the number of data packets in the network flow. As the data packets in the network flow have the same 5-tuple information, e network flow has a certain time order, and thus, the data packets in the network flow have a sequence, represented by t 1 < t 2 < · · · < t m .
Each network flow is composed of several packets. e network flow contains substantial behavior characteristics of IoT device communication traffic, including the closeness of the relationship among the bytes in the data packet, the duration of each network flow, the number and size of the data packets that constitute the network flow, and the timing relationships among the data packets. ese traffic behavior characteristics can help the deep learning model to better recognize the device and improve the accuracy of the model.

Irrelevant Fields
Removal. CBBI makes use of the traffic behavior characteristics of IoT devices. Here, we need to eliminate some interference data, such as MAC addresses and IP addresses, to prevent these data from affecting the experimental results. In a small LAN, the number of devices is limited, and the MAC addresses of the devices can uniquely identify the devices. ese field values can occupy a relatively large weight in the process of the feature extraction of the deep learning model, which could affect the real recognition and classification ability of the model. It can even lead to the overfitting of the model. e IP address of the device has the same interference effect as the MAC address. In this paper, these interference fields are eliminated in the data processing module to prevent them from affecting the process of model feature learning.

Traffic Vectorization.
e neural network requires the input data to have a standardized format, and we must convert the processed data aforementioned into a suitable input format. e number of data packets in each network Table 1: Summary of related works.

References
Features Method [14][15][16][17] Clock skew - [18][19][20][21] Radio frequency fingerprint - [22] Clock skew ANNs [23] Features from the packet head Twofold identification technique (Random Forest + Edit Distance) [24] Flow-level network traffic and knowledge of servers run by the manufacturers - [25] Periodic communication traffic features KNN [26] Features from DNS queries and HTTP URI's Improved k-means algorithm, Random Forest, SDN [27] Statistical attributes such as activity cycles, port numbers, signaling patterns, and cipher suites A multistage machine learning (Naive Bayes + Random Forest) [28] 18 features of DHCP Dirichlet process [29] Features from passively received broadcast and multicast packets Multiview wide and deep learning framework [30] Features in TCP header per TCP flow PCA, Random Forest [4] Raw network traffic from devices CNN, BiLSTM [31] Banners, honeypots Active scanning [34][35][36] Banners Active scanning, match [37] Banners Active scanning, search and match flow and the size of each data packet are different, and thus, a unified standard must be determined to vectorize the features in the network flow. We performed a statistical analysis based on the public dataset and laboratory dataset. As shown in Figure 2, we found that the number of data packets in the network flows in the two datasets is mostly within 10, and most of the data packets are within 250 bytes in size. According to the statistical information, each network flow intercepts 2500 bytes of data samples. In other words, each network flow selects the first 10 packets (n � 10), and each packet intercepts the first 250 bytes (L � 250 bytes). If the number of data packets N in the network flow is less than 10, or the length of the data packet L is less than 250 bytes, then it is directly filled with 0. e representation of network flow characteristics is shown in Figure 3. e complete data preprocessing algorithm is shown in Algorithm 1.

Data Augmentation.
e network traffic generated by the IoT devices can be transformed into different numbers of data samples after the preprocessing stage. Because of the different functions, the software, and the hardware of the devices, the traffic model generated by each device is very different. For example, the network traffic generated by the video monitoring devices is very large, whereas the network traffic generated by some sensors is relatively limited. erefore, there is a large difference in the number of samples that correspond to the device, which leads to the serious problem of data imbalance. As far as the learning model is concerned, the sample is usually considered to be an unbiased sample of the true distribution. When the training set is largely skewed, it usually does not reflect the true distribution. e imbalance of the sample distribution causes the model prediction result to be biased; in other words, the classification result is biased toward more sample categories, and the result is misleading. erefore, we must adjust the generated sample data to alleviate the imbalance and further improve the performance of the model. is paper uses the GAN-based data augmentation module FGAN, as shown in Figure 4. e generative adversarial network (GAN) is an adversarial network proposed by Goodfellow [39] in 2014. e network framework consists of two parts, a generator and a discriminator. e generator tries to cheat the discriminator by constructing false data. It accepts arbitrary noise p z (z) and generates false data according to the noise, which is recorded as G(z). e discriminator tries to distinguish whether the data came from a real sample or fake data constructed by a forger. e input parameter of the discriminator is x, which comes from p data (x). e output D(x) of the discriminator represents the probability that x is the real data. Both models improve their abilities using continuous learning. In other words, the generator hopes to generate more real fake data to cheat the discriminator, and the discriminator hopes to learn how to more accurately identify the fake data of the generator. e objective function v of FGAN is as follows:

Security and Communication Networks
(ii) Firstly, n sample data x 1 , x 2 , . . . , x n are obtained from the real samples. en, n noise samples z 1 , z 2 , . . . , z n are sampled from the prior distribution noise. Secondly, n samples x 1 , x 2 , . . . , x n are produced using the generator. Finally, the generator G is fixed, and the discriminator D is trained to identify the real data from the generated data as accurately as possible.
(iii) After updating the discriminator for k epochs, the parameters of the generator are updated once with a small learning rate, and the generator is trained to reduce the gap between the generated data and the real data as much as possible. (iv) After many iterations of updates, the final ideal is for the discriminator to be unable to tell whether the sample comes from the output of the generator or the real output.
is paper designs the generator and discriminator in FGAN based on the fully connected network. Detailed information will be in Section 4.3.

Training Conv-BiLSTM for IoT Device Classification.
In this section, we build a deep learning model Conv-BiLSTM for identifying the IoT devices. is model is different from the traditional classification methods based on manually extracted features and statistical features. Firstly, the model can simultaneously learn the spatial and temporal features of the device traffic, which improves the accuracy of device identification. In addition, the traditional device identification method based on manual design and statistical features has some limitations. When these features are designed and selected artificially, the inherent features in the original communication flow of the device are changed, and some potential features are ignored. ese potential features can help to improve the recognition accuracy and generalization ability of the model. In addition, artificially designed features might not fully represent the high-level semantics of the network traffic, and models trained based on these features cannot learn these high-level semantics. e Conv-BiLSTM network model can learn highly semantic features from the original communication traffic generated by the device. e CNN [40] is widely used in the field of image classification because of its influential spatial feature learning ability. e CNN has a convolutional layer, pooling layer, and fully connected layer. e main function of the convolutional layer is to extract features. e pooling layer implements data subsampling without destroying the classification results in terms of reducing the dimensionality of the features, compressing the data, and avoiding the overfitting of parameters. e convolutional layer and the pooling layer play the role of mapping the original data to the hidden layer feature space. e fully connected layer is a fully connected neural network. e weight parameters are adjusted by weighing the proportion of each neuron's feedback. e model also uses dropout to avoid overfitting.
LSTM is a special recurrent neural network (RNN) [41]. e difference between LSTM and the standard recurrent neural network is that the LSTM overcomes the problems of gradient explosion and gradient disappearance by introducing memory units and gate mechanisms, and it performs well in extracting the long-term dependence in the sequence data. e LSTM architecture is composed of an input gate, forget gate, output gate, storage unit, hidden state, and so on. e specific calculation process of the input gate, output gate, and forget gate is as follows.

3.3.1.
Forget Gate. f t is called the forget gate, which indicates that some features of c t−1 are used to calculate c t . f t is obtained by a logical function to calculate the input x t and the last hidden layer value h t−1 . e value of the forget parameter is between 0 and 1, which controls how much information is retained from c t−1 to c t . Here, 1 means to retain the information completely, whereas 0 means to discard the information entirely. Packets Sequence ⟵ Get the top 10 packets in Fi (5) for each P in Packets Sequence do (6) if length(P) > � 250 bytes then (7) Packet Feature ⟵ Get the first 250 bytes in P (8) else (9) Packet Feature ⟵ Get all bytes in P + ″ 0 ″ * (250 − length(P)) (10) Set the MAC address field and IP address field in Packet Feature to 0 (11) Flow Feature ←Flow Feature ∪ Packet Feature (12) end for (13) else (14) Packets Sequence ⟵ Get all packets in Fi (15) for each P in Packets Sequence do (16) if length(P) > � 250 bytes then (17) Packet Feature ⟵ Get the first 250 bytes in P (18) else (19) Packet Feature ⟵ Get all bytes in P + ″ 0 ″ * (250 − length(P)) (20) Set the MAC address field and IP address field in Packet Feature to 0 (21) Flow Feature ⟵Flow Feature ∪ Packet Feature (22) end for (23) for j � 1; j < � 10 − Packets Number; j + + do (24) Flow Feature ⟵Flow Feature ∪ ( ″ 0 ″ * 250) (25) Samples Data.append(Flow Feature) (26) end for (27) Return Samples Data ALGORITHM 1: e algorithm for data preprocessing.

Security and Communication Networks
candidate value vector c t , and i t determines the part of the information to be updated. When updating c t−1 to c t , we must multiply the old state with f t , discard the information that needs to be discarded, and add i t * c t .

Output
Gate. h t can be considered the last output at the current moment. h t−1 is the output at t − 1. o t is a probability vector that is used to determine which part is the output. Firstly, we run a sigmoid layer to determine which part of the cell state is to be the output. en, tanh is used to process the cell state (obtaining a value between −1 and 1). Finally, this value is multiplied with the output of the sigmoid gate to obtain the output.
Unlike other types of deep neural networks, LSTM shares weights at all time steps, which reduces the number of parameters that the network must learn. e BiLSTM [42] is composed of two LSTMs: one LSTM is the input forward, whereas the other LSTM is the output backward. BiLSTM effectively increases the amount of information available to the network and improves the context available for the algorithm. BiLSTM can not only address gradient disappearance and gradient explosion, as in the LSTM, but also learn more context information from the network.
e Conv-BiLSTM network model structure is shown in Figure 5.
e convolutional neural network model used in this paper is improved on the basis of the classical lenet-5 [43]. e convolutional neural network constructed in this paper has seven layers. More detailed network structure information is provided in Section 4.3. e training process of the Conv-BiLSTM model is shown in Algorithm 2. e feature dimension of each sample after CNN is 1600. We reshape the 1600-dimensional data into a 10 * 160 format and input it into BiLSTM, where 10 represents the number of time steps. e vector dimension of each time point is 160. e BiLSTM consists of two layers, each with 512 hidden cells, and each layer uses the sigmoid function for nonlinear operations. e last layer of the BiLSTM network adopts the fully connected layer, and the number of neurons in the fully connected layer is equal to the number of IoT devices. Softmax is used as the activation function, which maps the output of multiple neurons to (0, 1), and the sum of each output is 1. e type with the largest probability value can be selected for multiple classifications.

Computing Platform Configurations.
We use Keras [44] as the neural network framework to construct the Conv-BiLSTM model. e detailed configuration information is shown in Table 2.

Dataset Description.
e UNSW dataset is the traffic data generated by the IoTdevices in two weeks. e dataset contains a total of 22 IoT devices. Some of these devices generate very little communication traffic. For example, Withings_Smart_Scale and Blipcare_Blood_Pressure_meter generated 8 and 13 sample data, respectively, after data preprocessing. In the experimental part, we selected 18 IoTdevices with relatively large sample sizes. We built an IoT device traffic collection platform in the laboratory environment. We collected the two-week communication traffic of 23 IoT devices, covering a variety of device types and device brands, including 360, Amazon, Hikvision, Huawei, TP-Link, Xiaomi, and other common IoT device manufacturers. e device types include smart cameras, smart speakers, smart gateways, smart doorbells, and so on. Also, there are IoT devices of the same brand and type, but of different models, such as the two cameras Hikvision_DS-IPC-E22H-IW and Hikvision_DS-IPC-S12P-IWT from Hikvision and the three smart cameras TP_Link_Camera_IPC42A-4, TP_Link_Camera_IPC43A N-4,and TP_Link_Camera_IPC64C-4 from TP-Link. e data traffic generated by 23 IoT devices was processed to generate a total of 636,789 sample data. Detailed information on the UNSW dataset and the laboratory dataset is shown in Table 3. As shown in Table 3, the number of samples of cameras in the two datasets is relatively vast. e traffic data generated by some cameras in the laboratory dataset is not very large, such as D-Link-DSH-C310, Hikvision_DS-IPC-E22H-IW, and Hikvision_DS-IPC-S12P-IWT. We checked the settings of these devices and found that they adopted the "Standard Definition" video recording method rather than the "High Definition" or "Super Definition" as the other cameras did. Some cameras also

Input:
Samples Data composed of network flows, the dimension of each network flow is 2500. {Epoch, Batchsize, dropout, Loss function} represent some of the parameters during model training.

Parameter Settings.
is section provides detailed information about the FGAN and Conv-BiLSTM network structures used in the experiment. Both generator and discriminator in FGAN are implemented based on a multilayer perceptron (MLP). e specific information is shown in Tables 4 and 5. e input of the generator is a 100-dimensional Gaussian noise vector, and the hidden layer contains 256, 512, 1024, and 2500 neurons. e input of the discriminator contains both real data and generated data, and its dimension is 2500.
e LeakyReLU activation function, dropout, and BatchNormalization are used in FGAN to optimize the model.
Detailed information on Conv-BiLSTM is shown in Table 6, including the structural parameters of each layer of the network, the optimizer, loss function, and other hyperparameters.

Evaluation Metrics.
To evaluate the performance of the neural network model, this paper selects four performance metrics: the recall, precision, accuracy, and F1score: where TP, TN, FP, and FNdenote the true positives, true negatives, false positives, and false negatives, respectively.

Ablation Study.
To verify the effectiveness and rationality of the data augmentation module FGAN in CBBI, we performed the corresponding experiments on the UNSW dataset and the laboratory dataset. We used the precision, recall, and F1-score to evaluate the results of the experiment. Tables 7 and 8 show the experimental results on CBBI, including FGAN, on the UNSW dataset and the laboratory dataset, respectively. From the two tables, it can be seen that FGAN in CBBI has well alleviated the problem that the classification results of a small number of samples are biased toward large sample classes due to data imbalance. e small sample classes iHome, Nest_Dropcam, NEST_Pro-cet_Smoke_Alarm, and Triby_Speaker in the UNSW dataset and D-Link-DSH-C310, Huawei_Smart_Scale, Hua-wei_Smart_Scale, Xiaomi_Air_Purifier, and Xiaomi_Hub in the laboratory dataset have significantly improved the performance after using FGAN. e performance of other categories has also been improved to varying degrees as the samples become more balanced. e data augmentation FGAN module in CBBI realizes the relative balance of the sample and further improves the classification accuracy of the model.

Misclassification Analysis.
To analyze the misclassification of the CBBI model in the two datasets, we give the confusion matrixes of the experimental results in the two datasets, as shown in Figure 6. e classification accuracy of most of the devices in the UNSW dataset is close to 100%. e accuracy of Nest_Dropcam is 96%, and 4% of its data samples are identified as Netatmo_Welcome.
ese two devices are products of two different device manufacturers, however, both belong to the smart camera type, and there are certain similarities in the traffic model. e accuracy of CBBI in the laboratory dataset reached 97.26%, which is not as high as that in the UNSW public dataset. We can determine the following reasons by analyzing the experimental data and the results: (1) the laboratory dataset contains more IoT devices than the UNSW dataset, which increases the difficulty of multiclassification of the model; (2) the number of samples generated by the devices in the laboratory dataset is more unbalanced, which affects the fitting effect of the model; (3) there are more devices from the same manufacturer and type in the laboratory dataset, and there are more similarities between the devices; the confusion matrix shows that IoT devices of the same device manufacturer and type are prone to misclassification between one another. Hikvision_DS-IPC-E22H-IW and Hikvision_DS-IPC-S12P-IWT from Hikvision, Ezviz_Camera_CS-C6CN and Ezviz_Door_CS-DB2C from Ezviz, as well as four devices from TP-Link, have all been misclassified to varying degrees. e worst classification effect in the laboratory dataset is TP_Link_WDA6332RE and Xiaomi_Air_Purifier. e sample size of these two devices is extremely small, and FGAN has improved the classification accuracy of these two devices to a certain extent.

Visualization of Spatial and Temporal Features.
In this section, we input the spatial and temporal feature vectors learned by CBBI from the UNSW dataset and laboratory dataset into the t-SNE algorithm before applying softmax classification to achieve dimension-reduction visualization. e dimensions of each sample input to the t-SNE algorithm in the UNSW dataset and the laboratory dataset are 18 and 23, respectively, which are consistent with the number of IoT devices in the two datasets. e visualized effect of the dimensionality reduction result is shown in Figure 7. In the laboratory dataset, some devices are not highly distinguished, especially for several devices with the same manufacturer and type. e visualization results of these two datasets are consistent with the aforementioned experimental results. e clustering effect of the two datasets is excellent, and the separation distance among the different categories is relatively obvious. In general, CBBI can learn representative spatial and temporal characteristics from device communication traffic, which can be used as the basis for device identification.

Comparison Results.
e classification accuracy of CBBI on the UNSW dataset is 99.83%, which achieves a similar effect to UNSW [27]. e detailed experimental results are shown in Figure 8. As far as we know, UNSW [27] is currently the highest accuracy rate for IoTdevice identification-related work, reaching 99.88%. e study in [27] used 6 months of IoTdevice communication traffic data. In addition, the author achieves accurate identification of IoT devices based on manually extracted features combined with a multistage device identification framework. e UNWS dataset that we used contains two weeks of traffic data, and thus, the communication traffic  Sample number  0  Amazon_Echo  73780  0  360_Camera  5950  1  Belkin_Wemo_Switch  17148  1  Amazon_Echo  9584  2  HP_Printer  2794  2  D-Link-DSH-C310  836  3  Insteon_Camera  216088  3  Hikvision_DS-IPC-E22H-IW  2082  4  Light_Bulbs_LiFX_Smart_Bulb  7226  4 Hikvision_DS-IPC-S12P-IWT 2149 5 Netatmo_Weather_Station 4703         Security and Communication Networks 13 generated was relatively small, especially for several devices such as Nest_Dropcam, NEST_Procet_Smoke_Alarm, and Triby_Speaker. Our method achieves an accuracy rate similar to that of UNSW [27]. Additionally, CBBI does not need to manually extract features, which increases the timeliness of the device recognition. We have implemented more comparative experiments, including CNN, FGAN + CNN, BiLSTM, FGAN + BiLSTM, CNN + BiLSTM, and CBBI. e detailed experimental results are shown in Table 9. Each method gives the accuracy, precision, recall, and F1-score values. We can conclude that FGAN and the simultaneous learning of temporal and spatial features can effectively improve the identification accuracy.
In summary, various experimental results show that our method can effectively and accurately identify the IoT devices. Compared with traditional manual feature extraction methods, this method can not only automatically learn the representative features of devices but also has good classification capabilities. On the other hand, the experimental results on the two datasets also show the effectiveness and flexibility of CBBI, which can address the complex and changeable IoT device environment.

Conclusions and Future Work
In this paper, we propose an IoT identification method called CBBI. is method uses the spatial and temporal features of the original network traffic generated by the IoT devices, which avoids the overhead and cumbersomeness of feature extraction in the traditional methods and reduces the complexity of the IoT device identification task. CBBI has three modules: data preprocessing, data augmentation FGAN, and Conv-BiLSTM. e main task of the data preprocessing module is to quickly process the raw network traffic generated by the IoT device and convert it into input that can be used in a deep learning model. e data augmentation module FGAN solves the problem of class imbalance in deep learning and further improves the accuracy and generalization ability of the model. e hybrid deep learning model Conv-BiLSTM can learn the spatial and temporal characteristics of the device communication traffic. In this paper, we use a public dataset and a laboratory dataset to verify the effectiveness of CBBI. e experimental results show that CBBI has good classification performance, even for some IoT devices from the same equipment manufacturers. In our future work, we will consider a combination of active and passive IoT device identification schemes to realize the identification of unknown IoT devices. [45].

Data Availability
We used the UNSW dataset, which is a publicly accessed dataset (https://iotanalytics.unsw.edu.au/iottraces). e laboratory dataset used to support the findings of this study is available from the corresponding author upon request. Disclosure is paper is an extended version of a conference paper, and the conference name is DSC 2021-IEEE Conference on Dependable and Secure Computing.

Conflicts of Interest
e authors declare that they have no conflicts of interest.