HYBRID-CNN: An Efficient Scheme for Abnormal Flow Detection in the SDN-Based Smart Grid

,


Introduction
e Smart Grid is a grid system with automatic control and self-protection adjustment capabilities [1]. It is supported by information and communication technology to achieve reliability, security, and real-time requirements [2,3]. e emerging network architecture Software-Defined Network (SDN) ignores the coaxial hardware structure of the network which separates the control plane and the data plane, and directly implements the virtualized configuration of the switch. It is especially suitable for mobile communication network, wired interconnection network, and sensor network in the Smart Grid [4]. e SDN improves the data transmission capability and network compatibility of the Smart Grid, but it also brings new security issues. e highly centralized network control capability and the damage caused by network abnormal flow intrusion have increased significantly [5]. As the control center of the whole network, the SDN itself may be the target of various attacks, such as DDoS, fake flow, breakthroughs in switches, and attacks on the control layer. e destruction of the SDN will cause all switches under its control to be paralyzed or disorders can have devastated effects on the entire network [6]. In the SDN, collaborative abnormal flow detection across multiple domains requires detailed flow data for each relevant domain, such as the contents of a flow table in the last few seconds. Network abnormal flow has the characteristics of potential and unforeseen attacks. erefore, the detection technology of network abnormal flow is challenged by the demand for larger-scale and higher-dimensional flow data [7].
Recently, most of these studies are based on state transition [8] and artificial intelligence methods [9]. e method based on state transition requires manual calculation and has low recognition accuracy. e method based on artificial intelligence has more advantages in this respect because of network big data. However, most of the researches have not carried out in-depth feature learning of network flow. For large-scale network abnormal flow detection, there are mainly two types of methods. e first type of method relies on sampling data, it uses network flow data to establish a library of attack intrusion behavior patterns, and the collected data including the host's system logs or collected from the network nodes matches the established pattern library. If the match is successful, it is proved to be an intrusion; otherwise, it is a normal behavior [10]. is method can effectively identify existing attacks and maintain them effectively and improve network security at the time. However, with the development of computers and the Internet, more and more new types of attacks appear in the field of vision. e detection accuracy of expert systems has fallen sharply. It has been unable to meet the requirements, and the sampling data itself is not accurate, which may cause the loss of useful information.
Another type of method is to utilize machine learning methods to perform feature extraction and detection classification after constructing features. e massive amount of network data makes machine learning methods more effective than judgment methods based on expert systems [11]. e traditional machine learning methods are just a shallow feature learning classifier. ey have certain limitations when processing complex data. e feature processing that traditional machine learning must do is time consuming and requires specialized knowledge. e performance of most machine learning algorithms depends on the accuracy of the extracted features. Deep learning reduces the manual design effort of feature extractors for each problem by automatically retrieving advanced features directly from raw data [12]. Previous studies have used deep learning to classify mobile encrypted traffic and achieved excellent results [13,14]. In [15], the authors investigated several deep learning architectures, including 1D CNN, 2D CNN, LSTM, Stacked Autoencoder (SAE), and Multilayer Perceptron (MLP) for mobile encrypted traffic classification. Based on this, this paper aims to apply the excellent feature learning capabilities of deep learning to the SDNbased Smart Grids to achieve highly accurate network abnormal flow detection.
To meet the above problems and challenges, we hope to apply the excellent feature learning capabilities of deep learning to the SDN-based Smart Grid to achieve highly accurate network abnormal flow detection. e main contributions of this article can be summarized as follows: (i) First, we design a framework for improving the security of the Smart Grid by applying an abnormal flow detection algorithm in the SDN-based Smart Grid communication network; it can identify abnormal flow and detect the type of attack. (ii) Second, we propose a deep learning algorithm of Hybrid Convolutional Neural Networks (HYBRID-CNN) to detect abnormal flow in the SDN-based Smart Grid communication network. e HYBRID-CNN adopts dual-channel data input, which can extract effective features from 1D and 2D flow data, use the self-attention mechanism to fuse key features, and finally use the fully connected neural network for detection. (iii) ird, we compare the proposed method with the single model and verify the performance improvement of the hybrid model. In addition, we discuss a parameter study to optimize the HYBRID-CNN model. (iv) Fourth, we perform a lot of experimental comparisons on the UNSW_NB15 and KDDCup 99 benchmark dataset. Experimental results show that the HYBRID-CNN significantly outperforms existing approaches in terms of accuracy and False Positive Rate (FPR).
e rest of this article is organized as follows: we discuss related work in Section 2 and introduce the system model and security requirements in Section 3. We then introduce some preliminary knowledge in Section 4. In Section 5, we introduce our proposed algorithm, and then in Section 6 we introduce experimental comparative analysis. Finally, we discuss and conclude in Sections 7 and 8.

Related Work
is section discusses two related types of work, namely, traditional machine learning and deep learning. In the SDNbased network controllers, using traditional machine learning and deep learning to develop flexible and efficient abnormal flow detection schemes presents some challenges. One of the main challenges is how to choose an appropriate feature selection method and another challenge is to accurately grasp the correlation between the selected feature and the abnormal flow detection task and the redundancy between these features [16].

Traditional Machine Learning.
Most of the previous studies were based on traditional machine learning methods, such as Support Vector Machine (SVM), Decision Tree, and Naive Bayes. Naive Bayes algorithm is an important algorithm in the field of machine learning and data mining. It is widely used in the field of machine learning classification, such as text classification and medical diagnosis. Ashraf et al. [17] applied Naive Bayes for network intrusion detection; their basic idea is to select the most likely category based on the Bayesian algorithm under the assumption that the classification is based on feature independence. But this method is only simple shallow feature learning, and it has poor performance for large-scale network flow data. Rai et al. [18] used decision tree C4.5 to perform intrusion detection experiments on the NSL-KDD dataset. In this work, 16 attributes were selected as detection features on the dataset. e proposed algorithm can be used for feature-based intrusion detection, but its accuracy is too low, only 79.52%. Reddy et al. [19] proposed a filtering algorithm based on the SVM classifier to perform the classification task on the KDDCup 99 dataset. is method performed well on the training field but performed poorly in the test dataset and could not effectively detect unknowns' network abnormal flow.

Deep
Learning. In recent years, as a branch of machine learning, deep learning is becoming more and more popular. It is applied to intrusion detection and research shows that deep learning has completely surpassed traditional methods in performance [20]. Kwon et al. [15] utilized Deep Neural Network-based deep learning methods for flow-based anomaly detection. Experimental results evidence that deep learning can be applied to abnormal flow detection in the SDN. Long Short-Term Memory (LSTM) is a special deep learning model of Recurrent Neural Network. It can remember the input and predicted output of any period and solves the problem of gradient vanish and explosion in the Recurrent Neural Network (RNN). LSTM is widely used in the field of Natural Language Processing [21]. Existing researches have been done on abnormal flow detection based on LSTM [22], and they found that the algorithms have a significant performance improvement for sequence learning compared with traditional machine learning methods, but there is still room for improvement in detection rate and accuracy. CNN is a multi-layer network structure learning algorithm. It can learn hierarchical features from a large amount of data and has broad application prospects in the field of abnormal flow detection. Wang et al. [23] proposed an end-to-end classification method for one-dimensional Convolutional Neural Networks.
is method integrates feature extraction, feature selection, and classifiers into a unified end-to-end framework and automatically learns original inputs and expectations. e nonlinear relationship between the outputs has obtained good experimental results. However, the one-dimensional data used in this method is not suitable for local feature extraction, resulting in the detection rate less than the ideal one. In [24], the authors present a new technique for network traffic classification based on a combination of RNN and CNN models that can be used for Internet of ings (IoT) traffic, which provides the best detection results. Wang et al. [25] proposed using CNN combined with LSTM to analyze and detect network flow. It utilizes CNN to learn low-level spatial features of network flow for the first time and then uses LSTM to learn high-level temporal features. e Deep Neural Network completes it automatically, and this method has achieved good results in terms of accuracy and detection rate.
Based on the above works, traditional machine learning methods that are typically used in abnormal flow detection often fail and cannot detect many known and new security threats, largely because those approaches provide less focus on accurate feature selection and classification. It is often inefficient for large-scale network flow. For the current deep learning methods like LSTM and CNN, they often pay more attention to the improvement of the model and ignore the original flow structure features. To address the above problems, we propose a HYBRID-CNN deep learning method for more accurate feature learning. e method utilizes two-channel input structure of 1D data and 2D data: using a CNN to extract local features and using a DNN to extract the global features. Specifically, a self-attention mechanism is added to select the most important features.

System Model View
In this section, we formalize the system model and system security requirements.

System
Model. e Smart Grid uses two-way communication technology to connect many power components to ensure mutual communication between the components. Implementing the SDN on Smart Grid technology separates network control from data forwarding equipment that includes network infrastructure, thereby enabling logically centralized control and enabling the network to be programmed by a central software unit. e control layer, as the brain of the network, carries the controller software. e software-defined routing rules determine where to route flow. ere are programmable network devices in the data plane to route flow according to the rules defined by the controller. e top of the module implements the function of the abnormal flow detection module. As shown in Figure 1, the SDN-based Smart Grid mainly includes the following parts [26].

Physical Plane.
is layer is responsible for packet switching and routing. It includes the basic components of network communication in Smart Grid, such as smart meter, Power Management Unit (PMU), various sensors, and various communication equipment. Different from the traditional network, these basic components cannot make decisions independently because of no control unit. ey are only responsible for collecting the generated key data and forwarding the collected data to the control layer through the programmable SDN switch infrastructure while complying with the rules defined by the controller.

Southbound Interface.
e definition of south interface provides the communication protocol between the physical layer and the control layer. OpenFlow protocol developed by Stanford is currently the most common and standard protocol in south interface [27]. It can realize secure communication in the SDN by determining the message format from a programmable switch to controller.

Control Plane.
As the central brain, the control layer has a SDN controller or more whose task is to manage the forwarding behavior of data flow by determining forwarding rules, which need to be written into the flow table of the programmable switch in the physical layer through the south interface.

Northbound Interface.
e north interface definition provides an interface for communication between the control layer and the application layer and enables Security and Communication Networks 3 application programs to program the network. It abstracts the details of data in the physical layer and allows network administrators, service providers, and researchers to customize the control rules and behaviors of their networks.

Application
Plane. e application layer comprises many Smart Grid applications, including network security function programs such as abnormal flow detection module and flow data filtering module. All these application-defined policies need to be translated into OpenFlow rules that are transferred to the physical layer programmable switch and then transferred from the north interface to the control layer.

System Security Requirements
e Immovability and Concentricity of Network Architecture.
e function of the Smart Grid communication network is generated with the design phase, and it is almost impossible to reconfigure the network based on the real-time needs of the network. In terms of performance and resilience, the bottlenecks will be caused by this nondynamic structure of today's Smart Grid. At the same time, the network will be vulnerable to multiple types of attacks. On the other hand, the highly centralized network control capability increases the damage caused by network abnormal flow intrusion considerably [28]. e SDN is the control center of the entire network. It may itself be the target of various attacks and these attacks will damage the SDN resulting in all its control paralysis or misbehavior of a switch can have a devastating effect on the entire network. erefore, it is necessary to design an effective abnormal flow detection algorithm in the SDN controller.

e Hierarchy of Network Flow.
Network flow has a distinct hierarchy, as shown in Figure 2, where the bottom row shows a sequence of flow bytes. According to a specific network protocol format, multiple flow bytes are combined into a network packet, and then multiple network packets are combined into a network flow. A network flow is divided into normal or malicious tasks, and a deep learning algorithm is used to learn hierarchical features, which has achieved good results.
ese studies urge us to use deep learning to learn the hierarchical features of network flow to complete the task of intrusion anomaly detection.

Working Methodology.
Devices in the physical layer initiate access request through the Internet, and the flow collection module of the SDN controller captures all request flow statistics table information to extract flow features. e abnormal flow detection module includes three stages: data preprocessing, model training, and model validation, as shown in Figure 3. First, the collected flowmeter data are preprocessed, including data encoding, data normalization, data reshaping, and data split. After data preprocessing, the flow data vectors will be featureextracted, feature-fused, and anomaly-detection-classified by the HYBRID-CNN algorithm.
In addition to the powerful anomaly flow detection above, the proposed solution performs end-to-end delivery of detection reports through the SDN as shown in Figure 1.
is is achieved by incorporating the anomaly flow detection model into the core of the SDN control plane. e execution process works in the following order: (i) detection stage, (ii) reporting phase, and (iii) update phase. In the first stage, the control plane encapsulated with the anomaly flow detection model classifies the incoming flow as abnormal and normal. en in the second stage, the report is communicated to the control plane. If the incoming flow is abnormal, the control plane discards the packet and immediately gives up communication with the requesting host. is helps protect the underlying network with malicious content and prevents it from spreading further on the network. During the update stage, the control plane updates the flow table entry of the forwarding device.

Preliminaries
In this section, we briefly describe the general notion used in our proposed algorithm.

Activation Function.
e activation function provides the nonlinear modeling capability of the network. Rectified Linear Unit (ReLU) is the most widely used function [29]; it can keep the gradient from attenuating, thus effectively alleviating the problem of gradient disappearance; the function expression is as follows: the ReLU activation function produces 0 as an output when x < 0 and produces a linear with slope of 1 when x > 0: (1)

Cross-Entropy Loss.
Cross-entropy loss measures the performance of a classification model whose output is a probability value between 0 and 1. It increases as the predicted probability diverges from the actual label. In binary classification, where the number of classes M equals 2, the cross-entropy loss can be calculated as If M > 2 (i.e., multiclass classification), we calculate a separate loss for each class label per observation and sum the results: where y is binary indicator (0 or 1) if class label c is the correct classification for observation o and p is predicted probability that observation o is of class c.

Optimizer.
We use Adam optimizer to learn the network weight parameters. And independent adaptive learning rates are designed for different parameters with calculating the first-order moment estimation and the second-order moment estimation of the gradient. Empirical results prove that Adam has greater advantages over other optimizers in practice [30]. Moving averages of   first moments m t and second moments v t � v t /(1 − β t 2 ), the update rules for Adam are as follows: where ω is model weights, η is the step size, and β, ε are hyperparameters.

Proposed HYBRID-CNN Algorithm
In this part, we first introduce the data preprocessing operation. en, we describe the structure of HYBRID-CNN algorithm and how to detect abnormal flow.

Data Normalization.
Data normalization can speed up the solution, improve the accuracy of the model, and prevent a feature with a particularly large value range from affecting the distance calculation. For the features that there is a very large scope in the difference between the minimum and maximum values, such as "dur," "sbytes," and "dbytes," we apply the logarithmic scaling method for scaling to obtain the features which are mapped to a range. We choose the MIN-MAX scaling method [31] and normalize the data according to the following equation: where X i denotes each data point, X min denotes the minimum value from all data points, and X max denotes the maximum value from all data points for each feature.

Data Reshaping.
For CNN input, its format should be three-dimensional data (height, width, channel), and as a single sample, the channel should be 1, so that we can reshape a single flow sample with a length of s � h * w + 1 to obtain a data structure similar to an image and construct a matrix M of h * w, namely,

Data
Split. For every model we want to train, each model has two datasets: one is the training dataset and the other is the validation dataset. As shown in Figure 4, in order to separate them, we first apply the shuffle method on the dataset to generate random data and then slice the entire dataset to obtain a training dataset and a validation dataset.

HYBRID-CNN.
e structure of CNN is shown in Figure 5. It is an end-to-end deep learning model with powerful feature learning and classification capabilities. It is widely used in image classification, speech recognition, computer vision, and other fields [32]. e network flow contains both abnormal and normal flow, and HYBRID-CNN training is performed at this stage to detect misused attacks, which aims to further categorize the malicious data from stages into corresponding classification strategies, i.e., Scan, R2L, DoS, and Probe. e structure of our proposed HYBRID-CNN algorithm is shown in Figure 6. We divide it into three parts. e first part is feature extraction, the second part is feature fusion, and the third part is the detection classification.

Feature Extraction.
In the feature extraction phase, we use the form of dual input of flow data, which aims to extract the features of flow more comprehensively. e role of the input layer is to receive input data, and the size of the input layer is consistent with the size of the input data, such as a vector For the first input (the upper part of the blue box), every user's access flow essentially is 1D data. We utilize two layers of DNN to extract the global features of the flow. Our motivation is to learn the frequent co-occurrence of features pass by memorizing one-dimensional data. e calculation method of each neuron in the fully connected layer is After the data preprocessing, its input shape is (h * w, 1). In layer 1, we set a neuron, and the shape of the output data is (h * w, a). In the fully connected layer 2, we set b neurons, and the shape of the output data is (h * w, b). e two-dimensional data is straightened to obtain a onedimensional feature vector of h * w * b, 1. In this process, the activation function used is ReLU to obtain the output feature O wide . For the second input (the lower part of the blue box), we reshape the one-dimensional data of the first input into a two-dimensional matrix. We believe that the deeper features can be better learned in the form of two-dimensional matrix input. e CNN uses a sliding convolution kernel to extract local features of flow data. In this part of the network, a convolution layer, a pooling layer, and a flatten layer are included.
One of the limitations of conventional neural networks is poor scalability due to the full connection of neurons; CNN overcomes this shortcoming by convolving each neuron to its neighbors instead of all neurons [33]. Set the input of the i-th layer to x l+1 , the output to x l , and the convolution kernel to k. e convolution operation is performed by the following equation: where f(·) is a nonlinear activation function, ⊗ is a convolution sign, and b l is a bias term. e pooling layer is usually placed after the convolutional layer. By performing a merge operation on a local area of the feature map, the feature has a certain spatial invariance. e merge operation reduces feature size and prevents overfitting. x l+1 is obtained by the following pooling: where down(·) represents the pooling function, β is a multiplicative bias, and b is additive bias. e reshaped shape of the input data is (h, w). We use k convolution kernels with the same shape to extract the convolution features. At first, the data shape is (h − k + 1, k); after pooling, the shape of the data is ((h − k + 1)/2, k). en, through the flatten layer, the data shape is ((h − k + 1)/2 * k, 1), and the output feature O CNN is obtained. For the two extracted features, perform feature fusion to obtain the feature O i (k):  : e structure of the proposed HYBRID-CNN algorithm; it includes feature extraction, feature merge, and classification. e feature extraction aims to extract the feature of flow more comprehensively, the self-attention mechanism aims to fuse key feature, and the classification aims to classify accurately.

Feature Merge.
In the feature fusion part, we use a selfattention mechanism to fuse key features. e essence of the self-attention mechanism is to observe a specific part according to the observation of the need [34]. For self-attention, we get three matrices Q (Query), K (Key), and V (Value) from the input O i (k).
e self-attention mechanism obtains different representations, calculates scaled dot-product attention of each representation, and finally concatenates the results. Specifically, the current representations input into the self-attention layer, and the new representation is calculated. First, we have to calculate the point product between Q and K, and then in order to prevent the result from being too large, it will be divided by a scale �� d k , where d k is the dimension of a query and key vector, and then the results are normalized to a probability distribution using a SoftMax operation and then multiplied by the matrix V to obtain a weighted summation representation. is operation can be expressed as

Classification.
After feature fusion, we use a fully connected layer for detection and classification; all neurons in the previous layer are connected to each neuron in the current layer. e fully connected layer is located before the output layer. After the extracted features are converted into a one-dimensional feature vector, they are connected to each neuron in the current layer to map the high-level features in a targeted manner: e fully connected layer will target high-level features according to the specific tasks of the output layer perform mapping and use the SoftMax and Sigmoid activation function after mapping to get the final classification detection result (normal, abnormal, or attack types). e output layer is a SoftMax function [35]; it normalizes K real numbers into a K probabilities distribution, after applying SoftMax, each component will be in the interval (0, 1), and the components will add up to 1, which can be interpreted to map the nonnormalized output of a network to a probability distribution over predicted output classes. Set z � (z 1 , . . . , z K ) ∈ R K ; the standard SoftMax function σ: R K ⟶ R K is defined by the formula: Hence, the predicted class would be y:

Experimental Evaluation
To evaluate the proposed abnormal flow detection scheme, we conduct the simulation on a 64-bit computer with Intel  [36], which is a mixture of real normal activity flow and attack flow created by the Australian Network Security Center in the network laboratory using IXIA Perfect Storm tool. Table 1 is the list of features and categories. ese features are categorized into five groups: (i) Basic features: they involve the attributes that represent protocols connections (ii) Flow features: they include the identifier attributes between hosts (e.g., server-to-client or client-toserve) (iii) Content features: they encapsulate the attributes of TCP/IP; also, they contain some attributes of http services (iv) Time features: they contain the attributes time, for example, arrival time between packets, start/end packet time, and round-trip time of TCP protocol (v) Additional generated features: this category can be further divided into two groups: general-purpose features, whereby each of them has its own purpose, to protect the service of protocols, and connection features that are built from the flow of 100 record connections based on the sequential order of the last time feature To label this dataset, two attributes were provided: attack_cat represents the nine categories of the attack and the normal, and label is 0 for normal and otherwise is 1.

Performance Metrics.
e performance metrics for abnormal flow detection depend on the confusion matrix constructed for any proven classification problem [37]. Its size depends on the number of classes contained in the dataset. Its main purpose is to compare the actual tags with the predicted tags. e intrusion detection problem can be defined by a 2 × 2 confusion matrix, which includes normal and attack categories for evaluation. e detailed description of the confusion matrix is shown in Table 2.
TP and TN denote the conditions for correct classification, while FP and FN denote the conditions for the mistaken classification. TP and TN refer to correctly classified attack flow and normal flow, respectively, while 8 Security and Communication Networks FP and FN refer to misclassified normal and attack records, respectively. ese four items are used to generate the following performance evaluation metrics. e Accuracy (Acc) is a measure used to evaluate the overall success rate of the model in detecting normal records and abnormal flow and is calculated as e Detection Rate (DR), also known as the True Positive Rate (TPR), is the ratio of correctly classified malicious flow instances to the total number of malicious flow instances. e calculation formula is e False Positive Rate (FPR) is the proportion of normal instances that are misclassified as attack flow in the total number of normal instances. e formula is e Precision (Pre) represents the proportion of the actual normal samples to the samples divided into normal; the formula is e F1 score is used to synthesize precision and recall as an evaluation index. e formula is e configuration of the model structure parameters in this paper is shown in Figure 7. Each column is a model. e input data shape of the DNN part of our proposed hybrid CNN model is (42,1), the data shape through Dense1 is (42,128), the data shape through Dense_2 is (42,64), and then the data shape through Flatten_1 is (2688), the shape of the input data of the CNN is (6,7) through the Conv1D_1 layer, the shape of the data becomes (4,32), followed by Pooling_1, and the shape of the data becomes (2,32). In the Merge layer, the two-channel data are merged into one. After this layer, the shape of the data becomes (2752) and then passes through the Dense_3 layer. As a result, the same shape is formed in each model by these layers in turn.
As shown in Table 3, we set the initial weight parameters to random values, set the batch size to 512, and use our Adam optimizer and binary_cross-entropy loss function to compile the model. To evaluate the performance of the model, we use accuracy as a metric function during training verification.

Method Comparison.
To evaluate the performance of our proposed hybrid CNN model, we performed experiments on UNSW_NB15 dataset. e comparison methods selected are as follows: (i) Naive Bayes [17]: Naive Bayes is a supervised learning classifier based on Bayes theorem. It classifies the problem by combining previous calculated likelihood and probabilities to make the next probability using Bayes rule. (ii) SVM [19]: an SVM is a discriminative classifier formally defined by separating hyperplanes. SVMbased kernels classify the data which effectively works for most of the datasets. Discriminant function: "Linear SVM." (iii) LSTM [22] Table 4 lists the performance comparison between our proposed HYBRID-CNN and some other existing methods. It is worth noting that we select a subset for experiments based on a certain training dataset ratio. e training dataset ratio is defined as the proportion of training samples. e proportion of the dataset is 60%, 70%, and 80%. In each dataset of experiments, we evaluated five methods including our proposed method and evaluated three performance metrics (Acc, DR, FPR). e experimental results in Table 4 show that our proposed HYBRID-CNN compared with other traditional machine learning methods and deep learning methods. Compared with other methods, our proposed HYBRID-CNN can reach Accuracy of 0.9564, DR of 0.9856, and FPR of 0.0442, which means that our proposed method has higher accuracy in detecting abnormal flow than other traditional methods. It is because the combination input using a DNN and CNN has better feature learning capabilities. Figure 9 is a comparison of the training and validation accuracy and loss between our proposed HYBRID-CNN method and the other two methods. All models have been trained for 100 epochs, and performance indicators have been evaluated after each epoch. By comparison, we can find HYBRID-CNN in the training and validation process of the method; the loss convergence speed is much faster. And the best results can be achieved faster for the accuracy improvement, which is obviously better than other methods.

ROC Curves Comparison.
We further plot the Receiver Operating Characteristic (ROC) curves of our proposed HYBRID-CNN and state-of-the-art methods on UNSW_NB15, as shown in Figure 10. e ROC curve of HYBRID-CNN is the closest one to the upper left corner,   indicating better generalization ability against the other methods. All the results reported above demonstrate that HYBRID-CNN outperforms its competitors. We can conclude that HYBRID-CNN effectively handles the abnormal flow detection problem by the ability to compress the original data to more discriminative abstract features, and HYBRID-CNN is capable of efficient abnormal flow detection.

Computation Comparison.
To deepen this investigation, Table 5 reports the number of training parameters  (in millions) and running time required for both the proposed HYBRID-CNN and state-of-the-art methods. We use GPU to accelerate the training speed of all models. It can be noticed that, when training on the UNSW_NB15 dataset, the proposed HYBRID-CNN has fewer trainable parameters and lower training time and testing time. is outcome results from the use of CNN in the proposed method, which can realize efficient parallel computation, and we use as small number of parameters as possible in the structure.

Parameter Study.
ere are various configurable hyperparameters in the model, such as Batch_size α, number of convolution kernels β, convolution kernel size c, and optimizer ε.
ese hyperparameters can only be configured manually but cannot be optimized automatically through the training process, which will greatly affect the performance of the model. Batch_size α is the number of training samples of the neural network after one forward-propagation and back-propagation operation, which means how many samples will be used to evaluate the loss in each optimization process; β is the number of different convolution kernels used in convolution operation, how many convolution kernels there are, and how many feature maps will be generated after convolution; c is the size of convolution kernels. Each convolution kernel has three dimensions of length, width, and depth. In a convolution layer of CNN, the length and width of convolution kernels need to be manually configured. Optimizer ϵ is the type of optimizer used to optimize loss and then update weight parameters. erefore, we deeply analyzed the influence of these super parameters on the performance of our proposed hybrid Our proposed method LSTM [22] CNN-LSTM [24] (d) CNN model. In Figure 7, the parameters of the hybrid CNN model proposed by us are α � 512, β � 4, c � 1 × 3, and ε � Adam. e model training results for these parameters are as follows. Figure 11, we set α to 128, 256, and 512 for experiments. When α � 128, the training and validation loss converge faster in the same period and finally reach the set number of iterations. e best effect is 0.9477. We can know that a smaller Batch_size can speed up the optimization in the same period, but it means that more calculation time is needed to optimize. Increasing the Batch_size properly can improve the running speed and gradient descent direction. With accuracy increasing, the amplitude of training vibration decreases. Figure 12, we set the number of convolution kernels β as 1, 2, and 4 for experiments. When the number of convolution kernels is 1, we can get an accuracy of 0.9403. When the number of convolution kernels increases to 2, the loss convergence rate also increases. At 4, the speed of loss convergence is significantly accelerated. Generally, when the network is deeper, more convolution kernels are often required to fully extract key features. Figure 13, we set the size c of the convolution kernel to 1 × 2, 1 × 3, and 1 × 4 for experiments. When the size of the convolution kernel is 1 × 2, the training loss and accuracy rate will jitter sharply. It is not conducive to convergence. When the size of the convolution kernel is increasing, the loss converges a little faster and the fluctuation range becomes smaller, so it should be better to choose a 1 × 3 or 1 × 4 size convolution kernel. Figure 14, we have selected several commonly used optimizers SGD, RMSprop, Adam, and Adagrad for experimental comparison. When  SGD is used as an optimizer, the effect is not ideal. It can only achieve an accuracy of 0.9259. ere was a large shock at around 40. We can see that when Adam optimizer is used, the initial loss convergence is like other optimizers. In the medium term, the Adam optimizer loss convergence is significantly faster and finally achieves the best. e accuracy is 0.9483.

Ablation Study.
For a thorough analysis, we conduct an ablation study on HYBRID-CNN to analyze the effectiveness of each module. e details of the ablation study based on UNSW_NB15 are listed as follows: (1) w/o attention: we remove the self-attention module from HYBRID-CNN but keep the DNN module and the CNN module We further analyzed the detailed performance of HYBRID-CNN in the ablation study, and the results of the ablation studies are shown in Table 6. Comparing HYBRID-CNN with model (1), we can conclude that the self-attention module can help detect abnormal flow, because attention can capture key features more comprehensively. e effectiveness of DNN can also be demonstrated by comparing HYBRID-CNN with model (2). When we removed the DNN module, accuracy declined because the model could not extract high-dimensional global features. However, when the CNN module was removed, it could be found that the accuracy was greatly reduced, because the model could not extract the local features of the flow, and CNN has a great impact on the results.

Attack Detection.
In order to detect the attack type of abnormal flow, the dataset we used to evaluate the model was KDDCup 99 [38]. e entire dataset has approximately 5 million flow records, each of which has 41 features (the 1-9 features are the basic attributes of the packet, the 10-22 features are the packet content, and the 23-31 features are flow function and 32-41 are host-based features). As shown in Table 7, these attack flow instances can be further divided into DoS, U2R, R2L, and Probe. For the KDDCup 99 dataset, the flow sample has 41 features and a label. We cannot directly reshape a one-dimensional flow dataset into a twodimensional matrix, so a zero feature is used here to add a dummy feature. It does not affect the result and is just for data reshaping.
We made comparisons with the current latest technology, and Figure 15 illustrates the relative comparison of our proposed abnormal flow detection algorithm with the current latest technology model. It is obvious from the obtained results that the proposed model performs better on the KDDCup 99 dataset than the existing scheme in terms of Accuracy, Detection Rate, and F1 score. Proposed HYBRID-CNN model (1) Proposed HYBRID-CNN model (2) Proposed HYBRID-CNN model (4) Proposed HYBRID-CNN model (1) Proposed HYBRID-CNN model (2) Proposed HYBRID-CNN model (4) Proposed HYBRID-CNN model (1) Proposed HYBRID-CNN model (2) Proposed HYBRID-CNN model (4) Proposed HYBRID-CNN model (1) Proposed HYBRID-CNN model (2) Proposed HYBRID-CNN model (4) (d)

Discussion
Evaluation of the UNSW_NB15 dataset shows that our model can provide 95.64% accuracy, which is a major improvement over other deep learning methods. However, it should be noted that the results of the "R2L" and "U2L" attack classes are lower than those of other classes, because the model needs more data to learn. Unfortunately, due to the severe imbalance in the training data of such attacks, the results obtained are not stable. Hybrid detection methods are mainly combined with deep learning models, which can usually achieve higher detection accuracy. Considering the complexity of the deep learning algorithm, the algorithm can use less running time. Of course, our proposed model will spend more time on training, but using GPU acceleration can reduce training time.

Conclusion
In this paper, we consider the problem of abnormal network flow detection of the Smart Grid integrated with the SDN. For the pursuit of accurate detection and guaranteeing network performance, we formulate a deep learning detection algorithm based on the HYBRID-CNN. In particular, our HYBRID-CNN model consists of the double channel feature extraction, key feature fusion, and classification. It gains the benefits of global memorization and local generalization brought by the DNN and the CNN, respectively. Besides, to measure the performance of the proposed algorithm, we analyze the hyperparameters of the HYBRID-CNN. Compared with other existing detection CNN-LSTM [24] Our proposed algorithms, the experiment results show that the HYBRID-CNN has a higher detection accuracy and a lower false alarm rate. In our future work, a problem to be solved is to improve the performance of the model through network structure optimization and automatic hyperparameter tuning. e swarm intelligent optimization algorithm, such as Particle Swarm Optimization (PSO) algorithm and Artificial Bee Colony (ABC) algorithm, can be used to automatically tune hyperparameters, which is an efficient method to improve the detection accuracy. Another problem to be solved is the unbalanced dataset. e detection accuracy of a few types of attacks needs to be improved. We hope to use data augmentation in future work to reduce the impact of the dataset.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this article.