Semisupervised Graph Neural Networks for Traffic Classification in Edge Networks

,


Introduction
In the edge network environment, tens of millions of edge nodes are linked together through countless network nodes for data interaction and analysis. At the same time, more and more edge devices are also joining the Internet of Tings. Each network application has its own corresponding trafc behavior characteristics. With the continuous emergence of various new network applications and network application layer protocols, the complexity of network trafc is increasing and becoming more changeable, dynamic, and heterogeneous. To meet network specifcation requirements for a given type of service, trafc data need to be classifed with a high degree of accuracy to satisfy the QoS requirements. Nowadays, the mainstream trafc classifcation method always arranges its models on the centralized cloud server [1], and the edge terminals only take responsibility for sending the collected trafc to the cloud server to train the trafc classifer. Tat will result in subpar real-time performance and raise edge nodes' bandwidth overhead.
Te centralized processing mode based on cloud computing models has successfully aggregated computing power and storage capacity and performed unifed network management. Due to the limitations of the edge node's hardware resources, it is often necessary to provide relevant services to users through remote cloud computing resources, and the cloud server still bears a huge computing load. Terefore, it is a new trend to mount neural networks on edge nodes. With more and more edge terminals joining the network, the existing centralized cloud computing service has delegated computing resources to the edge, allowing more data processing tasks to be completed nearby. Te current trend in technology development is more inclined to perform tasks on fexible but resource-constrained terminal devices. However, for AI technology, most of the intelligent algorithms are computationally intensive and require strong computing power for support. Due to the limitations of the hardware performance of the edge node device itself and the network communication environment, we are faced with the problem that it is difcult to realize all of them at the same time.
In this paper, we paid attention on how to perform trafc classifcation tasks in edge networks. Te existing trafc analysis and identifcation framework needs to conduct unifed analysis [2][3][4], which leads to the huge bandwidth resources required in the process of transferring all captured packet fles to train a global classifer. When faced with the current situation of edge networks featuring high dynamic, large scale, low bandwidth resources, and weak links, the depth model with a large parameter scale fnds it difcult to play a role on edge nodes. In order to maintain the efciency of edge-side trafc identifcation and traceability, it is necessary to design lightweight trafc analysis models and traceability suitable for cloud-edge end-to-end collaboration scenarios and be able to match and analyze the service QoS.
Network trafc classifcation maps the trafc fow through the network according to its type; thus, the managers can have an overview of the network conditions, which is also thought to be a prerequisite for subsequent management decisions. Tere are usually three types of classifers applied in the trafc classifcation felds [5]: port-based classifcation methods that group trafc's kinds according to port information; machine learning-based methods that classify trafc according to the statistical features (e.g., conventional machine-based approaches connect the statistical signatures from trafc samples to each application type); and expert labor to select the features to ft the models. Te features that are usually used are packet length, packet direction, packet arrival time, and so on. However, in recent years, researchers have focused on automatically learning feature-based trained deep neural network models for wellknown application kinds. Deep learning-based methods make up for the limitations brought by classic machine learning methods as they do not incorporate human labors. Many deep learning models have been applied in the trafc classifcation feld [6][7][8]. Recently, Pang et al. [6] proposed a method that applied graph neural networks to trafc classifcation. To generate graphs, session' packets are extracted as nodes, and edges are used to record the order information for the trafc sessions; however, the chained graph model only considers the order arrangement of the packets and does not explore the interrelationships between the packets in a session.
In this paper, we propose a novel semi-supervised trafc classifcation method based on graph convolutional neural networks. We process the trafc packets uploaded and transform them into graphs to convey their structural information. Ten we use graph neural networks to further extract the features of the trafc data. Finally, we have used multiple GCNNs to expand the training set for the cloud server. On the publicly available network trafc dataset "ISCX VPN and Non-VPN dataset," we verify the efcacy of our model. Te experiment's fndings show that it accomplished outstanding classifcation.
Tis paper is to address the existing issues for trafc classifcation tasks in edge networks as the existing methods like deep packet inspection (DPI) [9] remain, which require unifed trafc identifcation analysis after port mirroring, leading to huge bandwidth resources occupied during the process of copying messages from one or more ports (the source port) of the device to a monitoring port (the destination port) of the device. Te upload of the complete trafc data from edge nodes to a cloud server that is frequently used in traditional methods brings a lot of problems, as the explosive growth of trafc in its volume and complexity will result in consuming a large number of bandwidth resources and poor responsiveness of the edge system, thus it cannot assume real-time performance and impede other normal services. Semi-supervised methods can be a good choice to solve the problem that it is easy to collect data but labeling it is cumbersome, especially when given the massive amount of trafc per second forwarding in and out of edge gateways. And also, to relieve the bandwidth pressure of transferring the trafc data from the edge nodes to the cloud server, the edge nodes will select some samples to upload for the training of the cloud server.
Besides, to better extract the features when generating the raw captured trafc sessions into graphs. If we consider the packets in a session as nodes, all nodes have their own feature information (e.g., sequential features of raw packet streams and statistical features concluded from packet bytes) and structural information (e.g., its relative position in the session and the structure of its byte-length sequence), we can abstract a session to a graph to cover the inter-and intrarelationships it conveys.
Te following is the paper's primary contributions: (1) Te designed model is typically applied to the cloudedge architecture as the edge nodes extract the features from the raw capture fles that come in from the terminal side and select the samples for the training set, while the server side performs the semisupervised learning that tries to fnish graph classifcation jobs with just a few samples of labeled graphs. In our framework, edge nodes use several GCNNs to choose highly representative graph instances from the newly collected data, and after pseudo-labeling, add them to the training set. (2) Rather than only regarding the packets in a session as the nodes in the graph, in this paper, we borrow the concept of "fow granules" to cover the internal information between data packets. Individual packets in sessions are extracted and packed into several granules which are incorporated into the graph as nodes. Te relative positions and structural information of the granules are transformed into edges. A graph that represents a session and is labeled as the session's trafc type is later used in classifcation.
(3) Our model uses graph convolutional neural networks to capture the trafc data's structural information. Our solution surpasses various state-ofthe-art approaches and produces great results on labeled network trafc datasets that are publicly available for trafc classifcation.
Te arrangement of the remaining sections in this paper is as listed: Te proposed trafc classifcation algorithms for semisupervised jobs are then discussed, starting with an introduction to the associated work and a description of the preliminary steps. Te following section of this research paper introduces the datasets utilized and assesses the performance of the proposed model in comparison to previous network trafc classifcation methods. Finally, the paper comes to a conclusion.

Trafc Classifcation.
In light of the swift advancement of network technologies and the explosive growth of the scale of network trafc, network trafc of diverse types requires diferent underlying network resources. Terefore, in order to achieve efcient network management and improve the quality of network service, it is necessary to efectively monitor and classify network trafc.
In recent years, as a result of the rapid development of deep learning in artifcial intelligence and other domains, many academics have started attempting to use deep learning to solve the problem of network trafc classifcation, thus achieving the purpose of online intelligent identifcation of network trafc. Because they extract features without the help of experts, deep learning-based approaches are diferent from conventional machine learningbased methods or packet inspection-based methods. Additionally, deep learning-based methods are more capable of learning than conventional machine learning methods, which allow them to perform better overall [4].
Wang et al. [10] proposed a 1D-CNN-based encrypted trafc classifer extracting features directly from bytes of raw trafc. Lotfollahi et al. [11] proposed a method called deeppacket which used the frst 1480 bytes of each IP packet as model input to perform packet-level trafc classifcation tasks and accomplished excellent performance. Lopez-Martin et al. [12] combined recurrent neural networks (RNNs) and CNNs to categorize trafc for every packet in the session using six extracted statistical features. An RNNbased technique for trafc classifcation termed BSNN was proposed by Li et al. [13]. Long short-term memory (LSTM) or gated recurrent units (GRUs) serve as the foundation for the RNN component of BSNN. Network datagrams are treated as input by BSNN, which provides the categorization outcomes immediately. Liu et al. [14] later introduced the FS-Net, an end-to-end trafc classifcation model in which a multilayer encoder-decoder structure fed with fows' sequential features as packet length sequence was used to further enhance the RNN-based encrypted trafc classifer. In [15], multimodal multitask cutting-edge deep learning approaches are applied in a systematic framework to create a viable mobile trafc classifer, which can jointly learn the shared representation of the sequential features (payload bytes) and statistical features (informative protocol header felds) of sessions. Tat work has been further improved in [16], and explainable artifcial intelligence (XAI) is employed to extrapolate the categorization process of the improved version on the state-of-the-art multimodal trafc classifer [17]. In [18], hybrid neural networks are used to analyze the dual-mode features that are extracted from the raw trafc data.
In Tables 1 and 2, the related work section is complemented by a table categorizing the reviewed works along with their primary distinguishing features so as to position the present contribution efectively. In Table 1, the singlemodal trafc classifer is presented, and within each category, the works are presented in order of publication. Te following defnitions are provided for acronyms and columns. Column "Research" listed the research for comparison. Column "Input Data" means the input data for the deep learning models; in the entries, LX means the Xth layer of the ISO/OSI model. Column "Trafc object," briefy known as TO, means the trafc classifcation granularity adopted, Entry "B" means bifow/session, "F" means fow, "P" means packet, and "D" means IP datagram. Column "DL Classifer" means the deep learning models adopted for the trafc classifer, in the entries, BiGRU means bi-directional gated recurrent unit, CNN means convolutional neural network, LSTM means long short-term memory, MLP means multilayer perceptron, and SAE means stacked auto encoder. Column "Open," briefy known as O, means whether the publicly available dataset has been adopted, Entry "Y" means Yes, "N" means No, and "P" means partial. In Table 2, multimodal trafc classifcation architectures are listed. Te following defnitions are provided for columns and acronyms in Table 2 that are absent from Table 1. Column "Multimodal," briefy as known as MM, means whether the multimodal deep learning techniques are employed. Column "Multitask," briefy known as MT, means whether the multitask deep learning techniques are employed. Column "Supervised Shared Representation" and Column "Training-Phase Specifcation," briefy known as SSR and TPS, respectively, clarify whether the trafc classifer uses those techniques. For all above columns, "Y" means Yes, "N" means No, "P" means Partial, and "-" means not applicable.
We can fnd that most classifers only handled single types of features for classifers and few of them dealt with multimodal inputs (MM columns) with specifc subsets of the heterogeneous inputs being trained on the lowest layers. So, we concluded the single-model and multimodal architectures, respectively, in Tables 1 and 2. A bifow is the most frequently used trafc object (TO column), both for extracting input data and for assigning classifcation labels. Also, deep learning models tend to Discrete Dynamics in Nature and Society extract features from the raw input in an end-to-end way. It is the way that deep learning methods learn automatically and do not involve expert labors so that they gain popularity than machine learning methods in trafc classifcation felds. Numerous alternative methods have been developed for the classifcation tasks, including Deep neural networks (DNNs), various autoEncoders (AEs), one-and twodimensional convolutional neural networks (1D and 2D-CNNs), and various recurrent neural networks (RNNs) (DL classifer column). Besides, it showed that a proportion of these publications validate and assess the performance of their classifers using publicly accessible datasets (open columns).
In this paper, to learn the diferent views of the classifcation object, we have tried to combine the sequential features and statistical features to extract the features from the raw PCAP fles; that is, we have tried to take advantage of the packet length sequence to segment the packet-byte sequence into granules (which could be transformed into the nodes in the graphs). Ten graph neural networks are employed for trafc classifcation.

Semisupervised Trafc Classifcation.
In fact, we cannot label all samples, as it will require a lot of labor. In order to solve the problem of insufcient network trafc data labels, methods based on semisupervised deep learning have gained popularity. Semisupervised learning (SSL) is a key problem in the felds of machine learning and pattern recognition. It is a technique for learning that combines supervised and unsupervised learning. Regarding semisupervised learning, pattern recognition is performed on the basis of both labeled and unlabeled data in huge quantities. Semisupervised learning can produce relatively good accuracy while also requiring the fewest number of workers possible. Consequently, semisupervised learning is receiving increased attention.
In recent decades, numerous deep learning-based semisupervised methods have proven their efciency and effectiveness in the trafc classifcation feld. Deep convolutional generated adversarial networks (DCGAN) have been employed in [19]. Te accuracy of their method is almost the same as that of the supervised method for the labeled large datasets. Te authors in [20,21] adopted autoencoders, which are thought to be a common technology in semisupervised learning. In [22], the author used stacked sparse autoencoders (SSAEs). Te results obtained demonstrate better performance than the traditional model. In [23], the author proposed a variational automatic encoder (VAE)based model for anomaly detection. Te model is superior to other semisupervised learning models, and the evaluation index increases by 5-10%.
Wang et al. [24] proposed a SDN edge gatewayembedded semisupervised trafc classifer based on generative adversarial networks (GANs). By training and testing on the public dataset "ISCX2012 VPN-nonVPN," the experimental results demonstrate that the ByteSGAN can efectively outperform other supervised-learning based methods such as CNN. In this paper, the graph convolutional neural networks (GCNNs) are further utilized in the trafc categorization model. GCNN is a kind of convolutional neural network that can directly act on graphs and utilize their structural information.

Trafc Classifcation in Edge Networks.
In the edge network environment, where network trafc of various heterogeneous types grows exponentially, how to efectively  Discrete Dynamics in Nature and Society perform trafc classifcation tasks in edge scenarios remains a problem for researchers. As mentioned, when dealing with SDN edge gateways, Wang et al. [24] proposed a semisupervised trafc classifer using GANs. In SDN edge gateways, various intelligent devices are connected to the edge gateway through wireless access technologies, and all data packets from these smart devices will be queued on the WAN interface, waiting for the edge gateway to forward them out of order. Te trafc classifcation process is mostly concentrated on the SDN controller. Obviously, SDN controllers will sufer from huge fow processing pressure. Tough in [24] only trafc classifers that are applied in SDN edge networks are considered, it can still give some inspiration for trafc classifers applied in edge networks.
In an edge environment, there is often just a virtualized resource pool made up of many servers. However, when a number of terminal devices are linked to the edge platform via the edge side, there is frequently signifcant resource demand on the edge side. Numerous terminals and sensors are networked to the edge platform in numerous contexts, including medical, industrial, and the Internet of vehicles. Higher standards are needed to be presented for edge clouds. As shown in Figure 1, in the process of edge-side network trafc classifcation and recognition, the DL models are trained based on the labeled network trafc data. However, in the actual situation, the classifer often receives the network trafc of unmarked categories, resulting in misclassifcation and other problems. In addition, because the labeled sample size is too small, the model trained with small samples is easy to fall into over-ftting of small samples and under-ftting of target tasks.
With the increasing complexity of network topology and the explosive growth of network applications, the network trafc on the edge side presents features like nonlinearity, high complexity, and auto-correlation. At the same time, the network trafc of diferent applications varies greatly, which brings difculties and challenges to the accurate marking of network trafc. Te traditional trafc analysis and identifcation framework is based on the cloud server for unifed analysis, which leads to the need for huge bandwidth resources in the process of transmitting all captured packet fles to the cloud server. Terefore, current model applications are more and more inclined to deploy from the cloud to the edge to reduce bandwidth consumption.
In the cloud-edge integration system, the edge gateway can realize local linkage between the device and data processing and analysis without networking [25]. However, the edge node deployment trafc analysis model still has problems. When faced with low-bandwidth resources and weak-link edge networks, the deep learning model with huge parameters is unable to play a role on the edge nodes. In order to perform accurate trafc classifcation tasks on the edge-side network, cloud-edge collaboration can be used to balance the processing pressure of edge nodes and maximize the advantages of cloud computing and edge computing with high processing efciency and low latency.
Te edge gateways and cloud server can cooperate to share the pressure of the central cloud node. Some of the data computation and storage work is carried out by the edge computing node, reducing the computing processing pressure of the edge cloud server to aggregate the trafc data of each node for unifed trafc analysis. Based on this, this paper proposes a semisupervised network trafc classifcation and identifcation method for cloud-side collaboration scenarios. Tis method distributes part of the trafc analysis tasks on the central cloud server and the edge gateway to jointly complete the edge-side trafc identifcation task and realize the efcient use of computing resources.

Graph-Based Traffic Classification in Edge Computing Networks
To deal with bandwidth shortage problems brought by traditional methods, we are considering a cloud-edge integrated collaborative system with several edge gateways and cloud servers. Traditional trafc classifers tend to collect all captured raw trafc fles together, and that may eat up the network bandwidth while transmitting raw fles. In a cloudedge integrated system, we split the trafc classifcation tasks into several phases and put them on edge nodes and a cloud server to fully utilize all computing resources and prevent latency and bandwidth insufciency when putting all models and trafc raw captured fles on one side. Te detail is depicted in Figure 2. At the edge gateway layer, there are mainly two stages as feature extraction and graph generation. Te specifc process is as follows: (1) Te edge-side gateway captures and processes the features of the trafc packets uploaded by the terminal node to avoid a large number of complete packets' information being uploaded to the cloud center, so as to reduce the data processing delay and relieve the resource pressure on the cloud server. (2) At the same time, the edge gateway will use the graph neural networks to further transform the trafc data into graphs, using "granules" to further extract the interrelationship between individual packets within a session. Multiple GCNNs are also used to select and transmit the samples with high confdence to the cloud server.

Graph Generation Based on Granules in Edge Nodes.
Te edge nodes (here we mean edge computing gateways) will collect all raw packet fles (PCAPs) captured by edge devices that are abstracted as edge nodes in the network. Te edge nodes will further process the PCAP fles and put them into the graph neural networks to classify the trafc session. Te defnition of the fow is provided frst. Flow separation is the initial step after receiving the raw trafc fles (PCAPs). Trafc is made up of fows, and a fow is a collection of packets that can be uniquely identifed by the traditional fve-tuple notation (source IP, destination IP, source port, destination port, and protocol). Sessions are made up of packets with an exchangeable pair of network source and destination, and they include all packets sent between two hosts during the course of a session. Tis paper uses a session as the trafc classifcation granularity.
We typically cover 2 types of features: session-level features and the packets' sequential features. Packet length sequence (i.e., the number of bytes per packet) is used as a session-level features. Besides, we typically select the frst M bytes for every packet to compose the byte sequence of this session (packets with length less than M will be pad with zeros to reach M-byte sequence, while the packets' bytes exceeding the range of M will be truncated). As depicted in Figure 3, a bifow classifcation object can be transformed into a vector of X ∈ R n×M , where n is the size of packets of the session. For the packet level, the individual packet can be vertically segmented into several granules, as shown in Figure 3(a), that would be transformed into vertices of the graphs. When segmenting the neighboring packets into the granules in the next part, we use the session' statistical features (a session's packet length sequence) to perform the packet segmentation.
Secondly, the edge gateway further processes the trafc data uploaded by the edge devices. As depicted in Figures 3(a) and 3(b), each packet-byte sequence is vertically segmented according to the session-level information.
To fully explain how to transform the session's packetbyte sequence according to the packet length sequence. At frst, we introduce the concept of "fow granule." Te concept of "fow granule" used in this paper is inspired and derived from [26]. Te term "fow granule" was initially used in [26], which explained how neighborhood data packets with the same packet length might be aggregated to create granules. As a result, an aggregated packet sequence rather than a single, unique packet now represents the information to be processed. Here, we try to segment the packet sequence to extract the internal relationship between the packets themselves. If consecutive interarrival data packets tend to show similarities in their length or very probably appear in the same neighborhood, they will be combined to become a "granule." Te session is composed of a sequence of granules that will still keep the order within the granule itself and within the session. Now suppose, we segment the session sequence into several subsequences to fnd the granule segments. Our goal is to segment the session' packet sequence X � x 1:T � (x 1 , . . . , x T ), where T is the packet size for that session. Neighboring packets with similar sizes can be combined together to make a granule that later can be transformed into a node in graphs. After vertical segmentation, we get the granule G(y)| y�1.2...Y per session as formula (1); Y is the size of the granules per session. (1) After vertically segmenting the packets to make up the fow granules, to further transform the fow granules into graph nodes, the packets within the fow granule are combined, as depicted in Figure 3(c), by calculating the average number of the packet bytes.
where m � 1, 2, . . . , M, and M means the length of the packet sequence. |U j | means the packet size of that granule. Nodes correspond to granules, while edges refer to the adjacency between the granules when transforming every session's packet-byte sequence into graphs. As opposed to the proposed methods in [6], the features of every packet in the session are extracted to represent the graph's nodes and then a chained graph is created according to the session's packet order. In this step, we not only cover the packet-byte sequence itself but also explore the interrelationship between the consecutive data packets, as an individual granule can be represented as a node in graphs.
Each node is associated with a feature vector for that granule. Ten edges generated. By default, each edge is undirected. After obtaining the nodes, we extract the set of edges between nodes according to the adjacency between packets. Here we use undirected edges because undirected edges capture the relative sequence relationship between each granule better than directed edges. Suppose the node feature matrix at this point is F ∈ R y×M , accordingly, we construct node correlation functions, which take a node feature matrix as input and produce the corresponding adjacency matrix. A T ∈ R y×y : If two original messages are adjacent in the raw sequence, an undirected edge is established between the corresponding two nodes.
where Corr(•) computes correlations or dependencies of each channels (nodes) on the basis of X T . Node correlation functions come in a variety of options; here, we calculate the correlations by In this stage, we extract the relationships between the subsequent data packets in addition to using the isolated packet information and transform a session into graphs. Later, we can use the graph neural networks to model the 50  50  373  52  241  181  70  70 Vertically segment the session's packet-byte sequence into granules according to packet length sequence  Discrete Dynamics in Nature and Society 7 trafc data and further mine the inner relationship between the trafc data to its types.

Semisupervised Trafc Classifcation in Edge Server and
Centralized Server. In the last subsection, sessions have been transfered into graphs to convey the structural information between packets within the session. In this stage, our goal is to forecast the class labels of graphs in order to forecast the label of the trafc session that turned into the graph. A node in a graph often symbolizes an object in the real world; in this paper, it means the granules. Moreover, sessions can also be interconnected and abstracted as nodes in the global graph.
Here we explore a more difcult but practically valuable scenario in which a node is a graph instance in and of itself. After the raw byte sequence of the trafc sessions has been transformed into several graphs to model the interdependency within the sessions' packets themselves. We can fnd that the session-level graphs show their similarity when the graphs are categorized in the same application types, which means, a set of graph instances can be modeled into a hierarchical graph connecting individual graphs with edges. Regarding graph classifcation tasks, typical graph-based neural network algorithms often need a large number of labeled graph samples. However, since large-scale labeled graph datasets often come at a signifcant cost in terms of time and efort, graph classifcation jobs frequently encounter the issue of a lack of labeled graph samples. Besides, considering the edge networks, we cannot deploy all the neural networks on the edge nodes since only limited resources are allowed on edge nodes. If all the work is pursued on the centered servers, it would lead to high latency and wasting computing resources. In this paper, active learning techniques are employed to enhance the efciency of semisupervised learning, which unifes the edge nodes and central server.
In this part, we frst introduce a self-attentive graph embedding techniques to include graphs of any size into fxed-length vectors, which are frequently utilized as semisupervised classifcation input. In addition to greatly simplifying the representation of a hierarchical graph, the embedding approach also ofers meaningful interpretations of an individual graph instance through a self-attentive mechanism that distinguishes their role in categorizing a graph instance. Tis phase that connects and unifes the edge nodes and cloud server can be simply separated into two parts as graph embedding and graph-based classifcation, which are depicted in Figure 4. Te former is to transform the graph with variable node sizes into a fxed length vector, and latter can be the classifcation input in the latter part.
As follows, we give the descriptions for the graph embedding part that takes the processed samples from the previous stage as input. As depicted in Figure 5, the purpose of this part is to convert the graph with diferent number of nodes into a vector e n with a unifed dimension and then use it as the input for graph classifcation. Firstly, two layers of GCNs are applied, with the adjacency matrix A ∈ R y×y and attribution matrix F ∈ R y×M as input. Ten, we get H � AReLU AFW 0 W 1 .
In the former formula, A � D − (1/2) (A + I n )D − (1/2) is the normalized form of adjacency matrix A where I n is identity matrix and D � m (A + I n ) im . Here, W 0 ∈ R M×h and W 1 ∈ R M×v are two weight matrices. Next, we use the self-attentive mechanism to assign diferent weights to nodes in the graph, to diferentiate the nodes within a graph. After softmax, we can also get the predicted class probabilities ψ after a fully connected layer.
where W s1 ∈ R d×v and W s2 ∈ R r×d . Te purpose of multiplying W s1 is to convert the node representation from a v-to a d-dimensional space linearly. After that, nonlinearity is added by coupling with the function tanh. W s2 is used to infer the importance assigned to each node within the graph. Lastly, we get the e ∈ R r×v by multiplying S and H. Te predicted class probabilities ψ would be used for picking out the samples sent to the cloud server. To enhance graph classifcation performance, we should choose which samples may be successfully fltered out and applied to a graph neural network-based classifer on a cloud server afterwards. Te framework determines which graph examples are often considered to be signifcant for enhancing the performance of the graph classifcation model by employing a number of supervised classifers. Te training set is then updated with these examples.
We introduce a unifed classifer system, which uses weighted majority voting to combine the decisions of P classifers to decide the fnal label of the graph sample after P isolated GCN-based classifers and obtains the weight by maximizing the performance of the whole expert set. To be more specifc, each classifer has the same set-up as Figure 4 but has a diferent kernel size. When the individual classifer gets its predicted class probabilities ψ, the fnal decisions could be calculated. If the results show the same label as the fnal voting results, then we add weights to that classifer. Te weighted voting method assigns a certain weight to each classifer member, and the weight is obtained by measuring the classifer accuracy of each member on the training set. Te weight is proportional to the accuracy; that is, the base classifer with good classifcation ability is given a larger weight coefcient, while the base classifer with relatively poor classifcation ability is given a smaller weight coefcient, and the integration result depends on the weighted sum. Now, we need to select the samples with high performance gain which is mathematically described as the weighted mean value of the class output probability value calculated by the last phase. To be more specifc, we defne the graph samples that will be picked out as training samples with a weighted mean classifcation probability of P classifers higher than a threshold. Te selected samples would be annotated and added to the labeled training set to improve the efectiveness of the GCN-based classifer on the cloud server and further improve the accuracy of the graph classifcation tasks. 8 Discrete Dynamics in Nature and Society Te Graph Embedding phase tends to enlarge the labeled training set and produce the fx length e ∈ R r×v that would be the model input in the cloud server. Te semi-supervised classifcation phase is set up on cloud servers.
Te defnitions of the problems are given at frst. Graphs are represented as G m � (V, E), V is the set of nodes while E is the set of edges that defne a graph. Te goal is to map the graph to its class label as function f: G m M m�1 ⟶ Y given the set of graphs G m M m�1 . We incorporate the active graph classifcation phase to select a set of graphs G select � G l+1 , . . . , G l+k from the unlabeled samples G U to the labeled training set G L after annotation so that the new training set in the center server can have a better ability to predict the unlabeled class labels. In order to enhance graph classifcation outcomes for semi-supervised learning, our method chooses unlabeled graph samples with high confdence in multiple GCNs' clustering and adds them to the training set after pseudo labeling.
GCN-based models are chosen to be employed in semisupervised training on the cloud server. Now the graph embedding E � e { } L+U i�1 and the adjacency matrix Θ ∈ R (L+U)×(L+U) have been given, which are calculated according to formula (4). Two GCN layers are later used here; the classifcation probability of each graph example will be represented by the Softmax layer as follows: Θ � D − (1/2) (Θ + I n )D − (1/2) is the normalized form of adjacency matrix Θ and I n is represented as identity matrix and D � m (Θ + I n ) im . W 0 Θ and W 1 Θ are weight matrices. Te parameters in edge nodes are not retrained but rather fne-tuned depending on the parameters gained in the previous iteration to further increase the efciency.
In the cloud server, the graph neural network-based semi-supervised trafc classifer uses pseudo labeled samples to further train itself and a softmax layer to get its outputs.

Evaluation Metrics.
All approaches are evaluated based on their accuracy (A.), recall (R.), and F1-score (F1). Te following are the defnitions:
Deep-Packet: Tis method combines the steps of feature extraction and classifcation into a single system and is based on CNNs to process the byte sequence of a packet. On the UNB ISCX VPN-nonVPN dataset, it performs admirably.
GCNN: Tis method uses a chained graph model on the trafc packet data and performs supervised trafc classifcation tasks on graph neural networks over automatically extracted features over the chained graphs.
DISTILLER: Tis method leverages the combined and efcient use of multitasking and multimodal deep learning techniques. It handles three jobs at once: encapsulation, trafc kinds, and trafc application categorization tasks. Additionally, it ofers the trafc object from the packet-level and fow-level viewpoints. However, in this study, we just focus on the problem of trafc type categorization.
ByteSGAN: Tis method typically employs semisupervised learning approaches based on generative adversarial networks (GAN) for the categorization of encrypted data. It is intended to be incorporated in the SDN edge gateways. Te method uses the packet-byte data for the model input.

Efectiveness Analysis.
It is evident that the enhanced algorithm performs well. Te fndings will be displayed in Figures 6, 7, 8, and 9 in order to more clearly illustrate how this method has improved things.

Experimental Setup.
We set P � 3 graph convolutional neural networks built on edge nodes to vote for the last label outcome and set a 1500-byte input sequence length limit; if length is less than 1500, pad zeros; if more, then truncate the byte sequence. We set 2 layers of graph convolutional networks for each classifer and their input and output dimensions which are n × 64, n × 128, n × 256, where n represents the number of nodes. To perform semisupervised trafc classifcation learning, we randomly choose 30% of labeled datasets to train the ByteSGAN and SSGAN (our semi-supervised classifer). Figure 6 shows that our method shows excellent performance when applied to the VPN dataset. Te accuracy, recall, and F1-score all rise when compared to the DISTILLER method by 9.42%, 9.67%, and 8.91%, respectively. Contrasting with the GCNN algorithm, the accuracy and F1-score are improved by the method we use by 0.46% and 3.40%, respectively. Compared to the ByteSGAN semi-supervised algorithm, by 0.92%, 1.57%, and 3.03%, respectively, our technique raises accuracy, recall, and the F1-score. Figure 7 shows that our method applied to the non-VPN dataset performs best. Compared to the DISTILLER algorithm, the accuracy improves by 18.67%, the recall increases by 25.38%, and there is a 17.83% boost in the F1-score. In contrast to the deep-packet algorithm, the accuracy, recall, and F1-score are improved by our approach by 0.87%, 4.55%, and 1.71%, respectively. In contrast to the GCNN algorithm, our method increases accuracy, recall, and the F1score by 2.77%, 5.10%, and 3.61%, respectively. Compared to the ByteSGAN semi-supervised algorithm, our method increases by 1.79%, 4.01%, and 2.69% in terms of accuracy, recall, and the F1-score. Figure 8 shows that our method shows excellent performance when applied to the TOR dataset. Te accuracy rises by 2.74%, the recall rises by 12.41%, and the F1-score rises by 5.75% in comparison to the DISTILLER method. In contrast to the Deep-Packet algorithm, our method increases accuracy, recall, and the F1-score by 0.33%, 2.14%, and 1.83%, respectively. In contrast to the GCNN algorithm, our method increases accuracy, recall, and the F1-score by 0.95%, 2.62%, and 3.57%, respectively. Compared to the ByteSGAN semisupervised algorithm, our method increases accuracy, recall, and the F1-score by 0.92%, 3.22%, and 3.01%, respectively.

Experimental
Results of Non-TOR Dataset. Figure 9 shows that our method applied to the non-TOR dataset performs best. Compared to the DISTILLER algorithm, the accuracy improves by 5.01%, the recall increases by 7.74%, and the F1-score increases by 4.21%. Compared to the deep-packet algorithm, our method increases accuracy, recall, and F1-score by 12.63%, 13.6%, and 12.39%, respectively. When compared to the GCNN algorithm, our method increases accuracy, recall, and F1-score by 7.72%, 12.15%, and 10.16%, respectively. Compared to the Byte-SGAN semisupervised algorithm, our method improves accuracy, recall, and F1-score by 3.0%, 5.38%, and 5.78%, respectively. Figure 10 shows the confusion matrix for the VPN and non-VPN dataset, and we can see that almost all the trafc types have good performance. In the graph generation part, since we extract the trafc features by grouping packets as granules according to their sequential features, we can have better performance than simply transforming the simple packets into nodes and generating chained graphs when we use these graphs to perform trafc classifcation later. Also, we compare our methods with the fully-supervised methods since we want to prove that the SSGCN can have the same efcacy as the supervised methods such as deep-packet using Discrete Dynamics in Nature and Society only 30% part of the labeled dataset. Besides, SSGCN also shows its efectiveness when compared to the semisupervised methods like ByteSGAN, as we employ the graphs to extract the structural information for the trafc data.

Conclusions
In this paper, we have presented a novel semi-supervised trafc classifcation approach based on improved graph convolutional neural networks. In the edge-server integrated-system, the trafc packets uploaded are processed and transformed into graphs. We have used multiple GCNNs to enlarge the training set for the cloud server. Te cloud server performs the semisupervised trafc classifcation tasks based on graph convolutional networks. On publicly available network trafc datasets, we verify the efcacy of our model. Te experiment's fndings show that it is possible to accomplish outstanding classifcation.
In further study, we will investigate these aspects of the suggested methodology: (1) Te majority of current trafc classifers operate inside the predefned trafc categories. Tese techniques cannot handle unrecognized trafc from unrecognized classes. Zero-day applications are trafc classifcations for which the classifer has not been trained. Just a small number of recent studies, many of which rely on locating unlabeled clusters and later classifying them, have ofered solutions for zero-day applications. (2) Te procedure for deploying the network will be extended to the real environment. More metrics will be introduced to measure the trafc classifer' performance.

Conflicts of Interest
Te authors declare that they have no conficts of interest.