A Mobile Malware Detection Method Based on Malicious Subgraphs Mining

,


Introduction
Mobile device has become an essential social network communication tool which stores a huge amount of user privacy data. erefore, it attracts persistent malicious attacks. Due to the open source policy, Android has become the most popular operating system for mobile devices and has the largest market share. With the widespread use of Android applications, Google company is committed to maintain the safety of its official application market-Google Play Store [1]. Aiming at security problems, Google has used various strategies to fight against malicious attacks such as regularly scanning billions of installed mobile applications, providing remote security services for the mobile device, and isolating malicious websites to protect users. However, Google's large investment only blocks some of the malware that threatens Google Play Store. Many third-party application markets are still facing an increasing numbers of malware. A report [2] showed that 97% of the total number of mobile malware was related to Android platform in 2013, up from 79% during the previous year and 66% in 2011. In 2015, the total number of Android malware rose to 884,774 [3]. In 2019, Android malware variants grew 31% in a year and the total number closed to 20 million [4].
A commonly used detection strategy adopted by commercial antivirus tools (such as Norton and Lookout) is collecting as many as possible malware and extracting signature code as features [5,6]. en, these features are used to match with the signature code that is extracted from target applications to identify malware. Although this strategy can achieve high detection accuracy and low false positive rate (FPR), it still faces two challenges: (1) lagging behind malicious attacks and cannot detect unknown malware and (2) minor changes of applications may lead to failure of the detection method.
To solve the problems, researchers begin to use more effective expert features to recognize malicious code, such as permissions, component information, and APIs [7][8][9]. Many of them also apply machine learning technology to improve the detection performance further. Although these approaches have been proved effective, the grammatical feature-based detection methods are easily affected by code obfuscation and injection technologies.
Fortunately, many research works have [10][11][12] shown that high-level properties of code, especially structural features, can promote the resilience ability of repackaging and code obfuscation technologies. e most commonly used method is to compare the application's structural features with existing malicious ones.
us, it is a wellknown strategy to transform a graph matching problems into an isomorphism. However, the isomorphism problem has been proved to be a nondeterministic polynomial (NP) problem and may be very inefficient when the graphs are large. How to find malicious subgraphs efficiently is still a problem to be solved.
In this paper, we propose a new malware detection method based on malicious subgraph searching. e subgraphs are generated from FCGs of Android applications. During the training phase, the FCGs are generated from the malware of each malware family to get malicious structural features. It can improve the detection efficiency and help to analyze the homology and evolution of variant viruses. In the test phase, structural features are used to automatically detect malware and classify them into different families. In the evaluation phase, several experiments are designed to evaluate the detection performance and execution efficiency of our method.
According to the above descriptions, the main contributions of our method can be summarized as follows: (1) A new efficient Android malware detection method is proposed. In this method, the malicious features extracted from applications are structural. It has a positive effect on the resilience to code obfuscation and repackaging technologies in static analysis. At the same time, the method can find a class of similar variant samples that could be useful for malware detection and new variant analysis. (2) A fast common subgraph searching and matching algorithm based on nodes similarity calculation is designed. ere are several well-known algorithms that can match subgraphs by using graph isomorphism calculation, such as VF2 [13] and graph edit distance algorithms [14]. However, it is a big challenge for them to successfully match a large number of subgraphs with complex structure and large scale. In this situation, our method has much better performance compared with the VF2 algorithm and so on. (3) e evaluation processes of our method are executed based on several datasets of 7520 malware and 4000 benign Android applications. Evaluation results indicate that our method has better detection ability than three previous works and Androguard.
e following parts of this paper are organized as follows. e relevant research studies of our topic are collected in Section 2. Section 3 introduces the whole architecture of our method. e specific execution process of our method is discussed in Section 4. Section 5 applies some experiments to test the detection efficiency of our method. At last, the conclusion and future works are discussed in Section 6.

Related Works
Structural analysis has been widely used in Android malware detection methods. It can be divided into the following two main categories.

Static Structural Feature-Based Detection Methods.
In the analysis of structural features, many research studies have been done to solve the NP problem caused by isomorphism algorithms. At the same time, new methods were designed to find malicious structural features effectively.
Crussell et al. [15] proposed a new method to identify the replication and clone behaviors of mobile applications called "DNADroid." is method constructed function call graphs of applications at first. en, the similarity values of function call graphs were calculated based on the VF2 algorithm. At last, these similarity values were used to evaluate the similarity of applications. In the experiment part, DNADroid was used to test the applications of their experimental dataset. It found that at least 114 applications had been cloned.
Xu et al. [16] proposed a malicious code detection method based on function call graphs. is method firstly extracted function call graphs from mobile applications. Secondly, graph edit distances were calculated based on applications' methods and methods' operation code. Finally, the similarity measurement of applications can be got according to the graph edit distance score. Experimental results showed that this method can identify variants of malware.
Zhang et al. [11] proposed a malicious code detection method based on semantics information called "Droid-SIFT." e method can extract function dependency graphs from mobile applications by implementing a graph generation tool on top of Soot [17]. en, feature vector spaces were constructed according to the similarity of function dependency graphs. At last, these feature vector spaces were used to build two different classifiers to identify malicious code. In their experiment, DroidSIFT was used to test 13500 benign samples and 2200 malware. Experimental results showed that DroidSIFT could achieve 93% detection accuracy and 5.15% false positive rate (FPR).
Suarez-Tangil et al. [18] proposed a new malware detection method which can automatically extract code chunks (CCs) from the control flow graph of Android applications. en, the CCs were used to be analyzed by the text mining model to classify malware into different families.
Hu et al. [19] proposed a static malware detection method called "MIGDroid." e method extracted invocation graphs from Android applications and then divided them into subgraphs. By calculating threat scores of subgraphs, malicious code of applications can be found.
Niu et al. [20] built the opcode-level FCG of Android applications and used the long short-term memory model to analyze malicious behaviors. e detection accuracy of the method is 97% based on their dataset. Gao et al. [21] used the invocator-invocator relationship of Android application's FCG to generate topological signatures. e malware detection method based on the topological signatures was designed and evaluated by 1249 malware and 49000 benign applications.
Sun et al. [22] proposed a new graph-based Android malware detection method called "DroidSim." DroidSim can construct component-based control flow graph (CB-CFG) by using APIs as nodes and control flow precedence order of Android components as edges. e evaluate similarity scores of CB-CFGs were used for malware detection. In their experiment, DroidSim achieved 96.6% detection accuracy on 121 benign Android applications and 706 malware.
Atici et al. [23] extracted static features from control flow graph-based and used machine leaning classification to identify malware.

Dynamic Structural Feature-Based Detection Methods.
John et al. [24] extracted the system call graph of the Android application and used graph convolutional nets to detect malware. e detection accuracy of their method is 92.3% on the experimental datasets with 2130 samples.
Zhang et al. [25] extracted the information of object reference graphs from the execution processes of Android applications to build an object reference information model. en, a two-step malware detection method was designed based on the improved graph isomorphism algorithm.
Abdurrahman and Acarman [26] constructed API call graphs of Android applications and then transformed them into low-dimensional feature vectors. Finally, a deep neural network-based method was designed to detect malware.
Yerima et al. [27] researched the generation process of stateful events and designed a new method to improve the code coverage in dynamic analysis for malware detection.
Lin et al. [28] extracted features from behavior dependency graphs for machine learning classification process. en, a prototype system was implemented to identify malware.
Xu et al. [29] proposed a new efficient representation of system call graph. en, feature vector labels of the representations were used and optimized to improve the classification ability of the SVM algorithm.
Hou et al. [30] extracted dynamic behavior features from weighted directed graphs. en, a deep learning model was applied to identify malware based on these features.
Although both static and dynamic methods can analyze structure features, our method is proposed based on static detection for the following two reasons: (1) static detection can get a complete function call graph without missing malicious subgraphs and (2) static detection does not need a virtual running environment, which means that it can detect a large number of viruses faster than dynamic detection. Differently from the above research studies, our method discards analyzing and matching large-scale graph structures directly. A new method which can iteratively construct structural malicious features from a single node is designed.
It can quickly locate the malicious code generated by code injection technology and identify variant viruses effectively.

Architecture Overview
e architecture of our malware detection method is depicted in Figure 1. e main steps are as follows: (1) Decompile Process. e "AndroidManifest.xml" and ".dex" files of the Android applications are used in our method. us, Android applications whose executable file named Android package (APK) files are decompiled by Androguard [31] to obtain these files. (2) Static Information Collection. Several Python script files are implemented on top of the Androguard tool. ey are used to collect static information from the disassembled code files of step (1), including Android applications' methods, method call sequences, permissions, and APIs. en, the FCG is constructed based on the methods' information. All these static information is output to text files in a uniform format, respectively.
(3) Structural Features Generation. By analyzing the permissions of malware and benign applications, we choose the permissions that are used more frequently in malware as sensitive permissions. en, two kinds of malicious structural features are constructed. e first one is a sensitive permission-based subgraph. It extracts sensitive permissions' relative APIs as initial nodes. e method call sequences which contain these initial nodes are used to construct structural features. e second one is the common subgraph of malware families. A nodes similarity-based subgraph searching method is designed. e method firstly searches the most similar node of the graphs, and then its similar adjacent nodes are searched based on the improved Kuhn-Munkres (KM) algorithm [32]. is process will be iteratively executed until the termination condition (introduced in Subsection 4.3) is met. To improve the execution efficiency, the maximum path length that contains the initial node is 3. Once all structural features are generated, they are put together as a malicious feature set. (4) Malware Detection. e application to be tested should be decompiled and generate its static information as step (1) and (2). e nodes similarity-based matching process is also implemented between the application's FCG and the feature set to identify malware.

Malicious Subgraphs Generation
is section introduces the details of the FCG and malicious subgraph feature generation processes as well as the nodes similarity-based matching method and improved KM algorithm.

FCG
Generation. An Android application implements its operations based on methods and method call sequences.
us, the function call graph contains the affluent behavior Security and Communication Networks 3 information of an application. As mentioned in the previous section, our method constructs the function call graph by implementing a Python script file on top of the Androguard. e Python script file needs to exhaustively search all methods which are likely to be ignored by indirect calls. e first step is to get an Android application's packages, classes, and methods' information from different objects of decompiled files. In this process, Androguard will automatically assign a numeric label to each method. ese numeric tags are fixed, which means they are not changed with multiple executions of decompiling operations. e second step is to search all methods and store them as a graph's nodes in a node set. e third step is to apply a depth-first searching process to find nodes' all related nodes and call relations. To finish this work, two more node sets are established to store nodes' parent nodes and child nodes and then continuously search the parent nodes of every node in the parent node set, as well as the child nodes of every node of the child node set. When all parent nodes and child nodes are searched, all call sequences of a node are collected. At last, all call sequences are joined based on the numeric labels of nodes to construct the complete function call graph.

Sensitive Permission-Based Subgraph Extraction.
Android has set up a permission mechanism to control the access behaviors of applications. It can limit the excessive abuse of user privacy information and system resources by application developers. For example, if a program wants to obtain the information of Wireless Fidelity (WiFi) network status, its request should be written in the "AndroidManifest.xml" file, as follows: <uses-permission android: name � "android.permission.ACCESS_WIFI_STATE" ></uses-permission> Permissions can be used both by malware and benign applications. erefore, a malicious score is assigned to each permission by calculating the frequency of occurrence of a permission between malware and benign applications. e top 20 permissions with the highest malicious score are treated as sensitive permissions in our method. At the same time, the APIs which are related to the sensitive permissions can be found, as shown in Figures 2 and 3.
In the framework provided by Android, a lot of drivers and functions are encapsulated in the bottom layer. Users can invoke these drivers by calling the APIs in their userdefined methods. us, the method call sequences which contain these user-defined methods can be found and used to construct sensitive permission-based subgraphs.

Generation of Common Subgraph of Malware Families.
In our method, the common subgraph is generated based on the nodes similarity calculation process. Considering the different definitions of similarity, there are different computational processes to get the value of similarity, for example, the use of graph isomorphism algorithms to evaluate the similarity based on the graph's structural features or calculate the similar distances of string features which are extracted from structural information and so on.
In Android malware detection, a large number of malicious variants are generated by repackaging technologies. Although many of them also have been processed by obfuscation technologies, they still have partial similarities in internal structures. In addition, code injection technologies can inj[[parms resize(1),pos(50,50),size(200,200),bgcol(156)]]ct malicious code into various kinds of benign applications. It makes completely unrelated applications implement similar attacks. is situation is particularly evident in the variants of the same malware family.
To find malicious code of applications, the nodes similarity calculation process is proposed.
Suppose that a graph is defined as G � (V, E), where V represents the node set and E represents the edge set. If there are two graphs G A and G B , node v a ∈ G A and node v b ∈ V B . v ai is the ith adjacency node of v a , and v bi is the ith adjacency node of v b . e similarity Sim ab of two nodes can be defined as follows:  Δw where D in a and D in b are the in-degree of nodes v a and v b , D out a and D out b are the out-degree of nodes v a and v b , Δw(v i ) is the node similarity value between v a and v b , S(V A , V B ) is the similarity value between the adjacency nodes of V A and V B , and E ab is the optimal value of S(V A , V B ). e calculation process of E ab will be introduced in Subsection 4.4 in detail. α and β whose values between (0, 1) are constant coefficients are used to optimize the matching efficiency which can be determined in the training phase.
According to the above definitions, our common subgraph searching method is designed as follows: (1) Search different function call graphs of malware in the same family, and find the most similar nodes as initial nodes of the structural malicious feature. (2) Continue to search initial nodes' the most similar adjacency nodes, and then add new-found nodes and edges to the existing subgraphs. is process will be iteratively executed once the values of 1/Δw(v i ) or E ab are less than the thresholds whose range is (0.9, 1). When the iterative process has been stopped, the structural feature of a malware family is found.

Optimal Matching Strategy of Nodes.
As mentioned in the previous subsection, a node probably has many adjacency nodes to be matched. It makes the calculation of E ab a multimatching problem. To find the optimal matching result, a new matching method is designed. e first step is to take a pretreatment. During the subgraph searching process, once 1/Δw(v i ) is less than the threshold, our method will continue to calculate the value of E ab . It is necessary to note that there are many adjacency nodes which are leaf nodes. To reduce computational complexity, the similarity value ΔL of these leaf nodes is calculated as follows: where x is the number of leaf nodes of node v a and y is the number of leaf nodes of node v b . If ΔL is less than the threshold whose range is (0.9, 1), the leaf adjacency nodes will be deleted and will go to the next step. Otherwise, it means V A and V B cannot be matched, and the subgraph searching process will be terminated. e second step is to construct a bipartite graph , and E ′ is edges set between V A ′ and V B ′ . As shown in Figure 4, v ai is the ith adjacency node of v a , v bj is the jth adjacency node of v b , and ∃e ij ∈ E ′ represents a possible connection between v ai and v bj . e weight of e ij is w(e ij ). Equation (3) represents the total weight of a matching W T .
us, if there are more than one matche, the max W T (W max ) can be treated as the optimal matching, as shown in the following: To get W max , the third step is to assign a weight w(e ij ) to the edges of E ′ as shown in Table 1. Table 1 shows the edge-weighted matrix of E ′ , where m is the row number and n is the column number. If (m! � n), our method will always make (m < n) and add (m − n) virtual nodes whose value of w(e ij ) is 0. e edge-weighted matrix generation process can be described in Algorithm 1.
According to the edge-weighted matrix, the maximum weight W max of G ′ can be calculated based on the improved KM algorithm. e KM algorithm assigns each node a label value and transforms the maximum weight finding process into the complete match searching. Suppose that the label value of v ai is l(v ai ) and the label value of v bj is l(v bj ), the condition l(v ai ) + l(v bj ) ≥ w(e ij ) should be met during the whole execution of the KM algorithm.
Although the KM algorithm can find W max successfully, the time complexity of KM is (O 4 ). It can be found that the efficiency of the algorithm is obviously reduced when the number of nodes exceeds 1000 as shown in Figure 5. To solve this problem, the KM algorithm is improved as follows: (1) Calculate the label value of nodes based on edge's weight. Let w(e ij ) � l(v ai ) + l(v bj ). In this step, let l(v ai ) � max(w(e ij )) and l(v bj ) � 0. Security and Communication Networks 5   (2) Prepare for augmenting path searching of equal subgraphs. Suppose that M represents a perfect match of the bipartite graph, the augmenting paths are searched to find the optimal matching which makes the sum of edges' weights of M maximum. However, the lack of required edges usually leads to the inefficiency of augmenting path search. To solve this problem, several improvements are taken as follows: Firstly, adjust the label values of nodes. Let where where v aj has not been in the searched augmenting path yet. is change can make more nodes and edges meet the condition l(v ai ) + l(v bj ) � w(e ij ). Secondly, define an array named "slack"; let slack[i] � ∞ before the augmenting path searching each time. When w(e ij ) is searched, let where v ai belongs to the searched augmenting path. v bj has not been in the searched augmenting path yet. is change makes the algorithm never search the edge repetitively until an augmenting path is successfully found.
(3) Search the augmenting path by the Hungarian algorithm to find M. (4) If M is not found, change the label values of accessed nodes as follows: where j ∈ V B ′ and j has not been in the searched augmenting path yet. (5) Execute steps (3)-(4) repetitively until the perfect matching of the equal subgraph is found.
According to the above steps, on the basis of the paths that have been searched, only the newly added edges are searched to ensure that each edge is searched once.

Malware Detection.
When both sensitive permissionbased subgraphs and common subgraph of each family are collected, our malicious feature set can be generated. e detection result of a test application can be obtained by matching its FCG with the feature set. e matching steps are as follows. Firstly, find the similar nodes of structural features' initial nodes in test application's FCG. Secondly, continue to match the initial nodes' adjacent nodes according to the nodes similarity calculation process which is introduced in Subsections 4.3 and 4.4. e FCG can be judged to contain a malicious structure if all nodes of a malicious structure feature are successfully matched. At the same time, the malware family to which the application belongs can be detected. Otherwise, the application is judged to be safe.
In order to promote execution speed, the family feature straining process is taken off-line.

Dataset and Evaluation Metrics.
To test the effectiveness of our method more comprehensively, three malware datasets and one benign dataset are collected. e first dataset (dataset 1) is Genome Project which was collected by Jiang and Zhou [33] in 2012. It contains 1247 malware samples of 49 malware families. is dataset involves many kinds of attack techniques such as repackaging, remote control, personal information stealing, and update attack techniques. e authors' experimental results showed that four famous commercial malware detection tools can only achieve unsatisfied detection rate on this dataset (20-79%). e second dataset (dataset 2) is Drebin which contains 5560 malware samples of 179 families. Its malware samples were collected from mobile application markets in Russia, China, and so on [34]. ese two datasets are chosen because of their high frequency of use in many former works [11,21,24,[35][36][37].
Malware dataset 3 contains malicious apps collected from the virus share [29] and Android malware dataset (AndMal) [38] which contains 713 malware of 42 families. e benign apps are collected from Google Play Store, 360 application market [1,33,38], and so on. All of them are checked by frequently used antivirus softwares McAfee and Kaspersky to ensure their safety, as shown in Table 2.
Based on these datasets, the performance of detection methods is evaluated by True Positives (TP), True Negatives (TN), False Positives (FP), False Negatives (FN), accuracy, FPR, TPR, recall, and precision. ey can be defined as follows. e row of e following experiment is organized into four parts: firstly, the malware family identification ability of our Security and Communication Networks method is compared with several former works in Subsection 5.2; secondly, the FPR of our method in malware family detection is evaluated in Subsection 5.3; thirdly, a 3fold detection process is implemented to evaluate the detection ability of our method on unknown malware in Subsection 5.4; and finally, the runtime performance of our method is discussed in Subsection 5.5.

Compare with Graph-Based Detection Methods.
As mentioned in Section 2, many research studies have applied their malware detection experiments on Genome dataset. us, the detection ability of our method is compared with three efficient former graph-based detection models and Androguard tool in this subsection. e first method is Dendroid [18]. Dendroid put a detailed analysis on CFGs of Android applications and designed a string feature extraction strategy. It successfully transformed the subgraph comparison into text mining. By comparing the similarity of string features, Dendroid can classify similar applications into the same malware family. e second one is "MIGDroid" [19], which is also a subgraph analysis-based detection method. Differently from Dendroid, MIGDroid calculated subgraphs' threat scores according to the sensitive features belonging to them. By evaluating the threat scores, MIGDroid can identify malware of each family. e third method is [23] (it is named "CFG-based method" in this experiment). is method improved the text mining model of Dendroid by adding machine learning classification into it. Table 4 shows the detection rates among these former works and ours.
e experimental results indicate that Androguard gets the worst detection rate because it is overreliant on signatures in malware detection. Without relative signatures, it cannot identify malware even if the variants are very similar to the existing malware. MIGDroid is more easily influenced by the APIs. For example, the variants of "zHash" family are added a lot of garbage code. ese garbage codes have close connections with the rest normal codes. erefore, identifying which part of the garbage codes is benign or malicious is difficult for MIGDroid. Dendroid classifies malware into families based on CCs. However, the CCs' distribution is irregular. It means some families may have a large number of CCs, but some families may have few. At the same time, when families are similar to each other, the classification performance of Dendroid may decline, such as DroidKungFuX, BaseBridge, and AnserverBot. e CFGbased method has good performance on the detection of DroidKungFu family, but the average detection rate is lower than Dendroid. Our method has the highest average detection rate, which means it can locate the malicious structural feature with high efficiency. However, the accuracy of our method is still reduced when it is used to detect high similar families, such as DroidKungFu families.

5.3.
e TPR and FPR Evaluation of Malware Family Classification. In addition to the detection rate, TPR and FPR are also important metrics. e TPR and FPR of our method among malware families are evaluated in two parts. Firstly, the identification result is evaluated by the 8 most closely related malware families. ey are BaseBridge (BB), AnserverBot (AB), DroidKungFu (DK), GoldDream (GD), DroidDreamLight (DDL), Pjapps (PJ), DroidDream (DD), Plankton (PK), and Zsone (ZS). ese families are chosen because they may cause higher FPR than other unrelated families [25]. Table 5 shows the confusion matrix of the family classification. e column represents actual families, and the row represents the identification results. Experimental result shows that the highest FPR is 3.2%. It only appears between BaseBridge and AnserverBot since they contain the virus variants of the same ancestor.
Secondly, 30 representative families of dataset 1 and dataset 3 are selected to evaluate their FPR and TPR in malware detection. Table 6 shows the FPR of our method on each family. e experimental result shows that the highest FPR is 3.8%, and the average FPR of these malware families is 1.1%.

Evaluation with Unknown Applications.
In this subsection, the detection ability of our method on the unknown malware is evaluated. us, the experimental dataset is divided into two different parts: 80 percent random malware of each family (6016 samples with 221 families of dataset 1-dataset 3) are chosen as the training set. e rest applications (1504 malware and 4000 benign samples) are chosen as the unknown test set. In order to evaluate the stability of our detection method, this dataset division process is executed three times to get three different training sets and unknown test sets. e detection result of each time (named result 1, result 2, and result 3) is shown in Tables 7-9.  Tables 7-9 are the confusion matrixes of each detection result. e row is the actual type of applications, and the column is the prediction type of the detection method. Table 10 provides the metrics comparison among result 1 to result 3. It shows that the maximum difference of accuracy, recall, and precision is 1%, 0.1%, and 2.1%, respectively. is result indicates that the detection rate is stable, but it is still affected by different training samples.

Runtime Performance.
In this subsection, the runtime performance of our method is evaluated. e execution efficiency of our method is mainly affected by the size of the       FCGs. us, the test dataset of this experiment is established based on different sizes of applications' FCGs. Each application's FCG size is represented by the number of nodes and edges, which are divided into 10 levels, as shown in Table 11. Figure 5 shows the execution time of our method on these applications. Experimental result shows that the execution time of our method rises from 0.003 seconds to 0.524 seconds, whereas the execution time of the KM algorithm rises from 0.006 seconds to 44.5 seconds. It indicates that the efficiency of our method has been improved obviously.

Conclusion
In this paper, a new structural feature-based Android malware detection method is introduced. e method can automatically extract static features from applications and generate FCGs and sensitive permissions. en, sensitive permission-based subgraphs and the common subgraph of each malware family are constructed as malicious features. At last, unknown applications' safety can be identified by these malicious features. is method is proved useful from the following three aspects. First, the family detection rate of our method is evaluated by comparing with three former works and Androguard. Experimental results indicate that our method can get higher detection accuracy among many representative malware families of our dataset. Second, the detection ability of unknown malware is evaluated. ird, the runtime performance of our method is evaluated by different sizes of applications. All evaluation results indicate that our method can achieve good performance on different kinds of malware with various attack technologies.
Although our method is efficient, the experimental results also indicate that our method can be improved in many ways. Specifically, the next work can be taken in the following two directions: (1) research robust defense methods for malicious obfuscation technologies which can modify the code structures of malicious subgraphs and (2) design more efficient models to store the expanding structural features.
us, more efficient graph analysis models or algorithms should be designed in future works. Moreover, more heuristic features are needed to cope with the rapid improvement of malicious attack technologies.

Data Availability
e data used to support this study are from previously reported studies [33], [34], and [38].

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this paper.