A Novel Malware Detection and Family Classification Scheme for IoT Based on DEAM and DenseNet

,


Introduction
Malware is a kind of software program designed to access a computer system and perform useless or harmful operations. It includes viruses, worms, Trojan horses, advertising software, spyware, blackmail software, and other types. ese kinds of software will obtain confidential data, steal identity, hijack traffic and operating system, encrypt digital assets, and monitor users, which poses threats to users and operating systems. Malware is constantly challenging the network security situation with its continuously increasing growth rate and endless family types. According to the statistics of "malware threat situation report 2020" [1] released by Malwarebytes labs, in 2019, the detection of Windows malware on business endpoints increased by 13%. Malware detection and family classification technology is still a development direction that cannot be ignored. Similarly, the Internet of ings (IoT) devices built on different processor architectures have increasingly become targets of adversarial attacks. Although there are many ways to detect malware on the Internet of ings [2,3], we still need to make further efforts in this field.
Traditional malware detection and family classification use two kinds of malware analysis techniques: static analysis and dynamic analysis. Static analysis disassembles executable programs and analyzes and extracts the characteristic information of code without executing malware. In [4], sequential pattern mining technology is used to detect the maximum frequent pattern (MFP) of the opcode sequence for malware detection in the Internet of ings. In [5], the behavior sequence chain of some malware families is generated, and the similarity between the behavior sequence chain and the sequence of the target process is calculated to detect and classify malware. In [6], malware is identified by combining normalized compression distance (NCD) with the compressibility rates of executables using decision forests. However, static analysis may consume a lot of time in useless code because the code analyzed is not necessarily the code of final execution. At the same time, the reliance of static analysis on disassembly technology also results in malware that can use various obfuscation techniques to hinder disassembly analysis. Some malware makes reverse engineering more complex by encrypting, packaging, and so on, which increases the difficulty of static analysis. Dynamic analysis is the extraction of feature information in the process of code execution, and the analyzed code is the actual execution code. In [7], malicious artifacts are extracted from memory through memory forensics technology, and malware detection is performed by combining the extracted malicious artifacts with the features extracted when executing malware files using dynamic analysis. In [8], the confused malware is detected by proper hook installation and real calculation of malware activity time in user and kernel. In [9], a graph repartitioning algorithm that uses the N-order subgraph (NSG) to convert API call graphs into fragment behaviors is proposed for malware detection and family classification. Besides, the "term frequency-inverse document frequency" (TF-IDF) and information gain (IG) were improved and used to extract thecrucial N-order subgraph (CNSG). However, dynamic analysis of one execution process can only obtain a single path behavior, and some malware has multiple execution paths. At the same time, dynamic analysis has certain risks due to the actual execution of the program. With the development of neural networks in recent years, static analysis and dynamic analysis are often combined with neural networks for malware detection and family classification. In [10], the bigram model is used to represent the opcode, the frequency vector is used to represent the API call, and then convolutional neural network and backpropagation neural network are used to embed features based on opcodes and APIs for malware detection and family classification. In [11], based on the behavior of malware, a classification method based on malware type was proposed, and LSTM was used for a new dataset developed for Windows operating system based on API calls. e main purpose of this paper is to combine the attention module and convolutional neural network to better perform malware detection and family classification. In our framework, we convert the malware samples into gray-scale images and then apply DenseNet with Depthwise Efficient Attention Module to the images. In this process, DEAM can generate feature attention maps to strengthen the attention to malware features, to improve the effectiveness of detection and family classification. e main contributions of our work are as follows.
is paper proposes a new general lightweight attention module, DEAM, which can be widely used to improve the performance of CNNs, while not increasing the amount of calculation. It consists of both the Improved Efficient Channel Attention (IECA) and a new spatial attention mechanism, Depthwise Spatial Attention (DSA). We replace the SENet in the original channel attention mechanism structure in CBAM with ECA-Net to get the IECA of the proposed model. e DSA is constructed by using Depthwise Convolution. We combine the DSA module and the DenseNet for malware detection and family classification. e proposed model performs well on the MalImg dataset, the BIG 2015 dataset, and the dataset built by us and can effectively perform malware detection and family classification. e rest of the paper is organized as follows: Section 2 discusses the related work concerning different techniques such as visualization, the structure of the CNNs, attention mechanism in malware detection, and classification. Section 3 presents our proposed model in detail. e performance of our algorithm is evaluated in Section 4. Section 5 summarizes our work and put forward some suggestions for future work.

Malware Visualization.
In this paper, we use the method proposed in the literature [12] to convert malware into grayscale images. We convert each byte (8-bit binary or 2-bit hexadecimal) in the PE file into a pixel, and the value of each pixel is in the range of [0, 255] (0: black, 255: white). e height of the image is determined by the size of the PE file, as shown in Table 1. rough malware images, we can find that the images of malware from the same family are visually similar, but there are large visual differences between different malware families. Besides, the difference also exists between benign software and malware, as shown in Figure 1. Converting malware into images can help us perform malware analysis. After being converted into images, different parts of the file can be easily distinguished so that we can find the functional parts of the malware.
Converting malware into images for detection and family classification has become a common practice in recent years. In [13], the memory data dump file was converted into a gray-scale image, and the histogram of gradient (HOG) extraction function was used to effectively classify the malware. In [14], a new hybrid model based on image analysis was proposed, which uses similarity mining and deep learning architecture to accurately identify and classify confusing malware. Inspired by the visual similarity between malware samples of the same family, a file-agnostic deep learning method is proposed for malware classification [15]. rough a set of discriminative patterns extracted from the visualized image of the malware, the malware is effectively divided into multiple families. In [16], based on the visual similarity between malware in the same family, a suggestion of directly performing binary texture analysis on gray-scale images of malware executable files was proposed. is technology derives a new combination of second-order statistical texture features based on the first-order and graylevel cooccurrence matrix (GLCM) on the visualized malware to perform confusion and unbalanced malware classification.

Structure of the CNNs.
e Convolutional Neural Networks (CNNs) have greatly promoted the development of image classification with their excellent performance. Recently, in order to improve the performance of Convolutional Neural Networks, researchers have made many changes in three aspects: depth, width, and cardinality. Starting from LeNet [17], the pioneering work of the CNNs, and then the outbreak after AlexNet [18], the structure of CNNs has been becoming deeper and deeper to achieve richer representations. VGGNet [19] proved that increasing the depth of the network can affect the final performance of the network to a certain extent. ResNet [20] introduced a shortcut to make the network have a certain identity mapping ability, and strengthen the correlation of gradients between the layers of the network. GoogLeNet [21] proved that width is another important factor to improve model performance. DenseNet [22] further deepened the idea of ResNet, applied a shortcut to the entire network, realized the dense connection of the network, and strengthened the connection between features of each layer. Image classification technology based on DenseNet has recently been applied to various fields [23][24][25]. However, as the model continues to be expanded in depth, width, and base, the amount of its calculation is also increasing. In order to achieve a better balance between the performance and cost of the model, it is more possible to build a universal bionic mechanism in the deep learning model than to pile up more nonlinear layers.

Attention Mechanism.
e attention mechanism is a deep learning technology that originated from the study of human vision and has been widely used in natural language processing [26,27], recommendation systems, and image classification [28,29]. It mimics the characteristics of the human visual system that selectively focuses on the salient parts, and improves the efficiency of the model by dynamically selecting important features. It can be found from the development in recent years that the attention mechanism has become a common method to enhance the effect of CNNs. e attention map obtained by the attention mechanism from CNNs shows specific areas, which represent the features being focused on.
SENet [30] first proposed an effective channel attention learning block and achieved good performance, proving that attention can improve the expressiveness of the network by enhancing important features and suppressing unnecessary features. In [31], the malware is converted into a gray-scale image and then input into the model combined with SENet [30] and CNN for malware analysis and family classification. After that, the attention module is developed from two aspects: the enhancement of feature aggregation or the combination of channel and spatial attention. GSoP [32] introduced a second-order pool to achieve more effective feature aggregation. CBAM [33] proposed a general attention module for CNNs, which uses max pooling and average pooling to aggregate features and uses the aggregated features for sequential channel attention mechanism (using SENet [30]) and spatial attention mechanism. ECA-Net [34] improved SENet [30] according to the idea of no dimensionality reduction and lightweight and improved the effect while reducing the parameters. ADCM [35] integrates dropout into the attention mechanism according to the idea of lightweight and improves CBAM [33]. In addition, many works use improved attention mechanisms to improve the effect of CNNs [36,37].
Based on the CBAM [33] framework, this paper improves the channel attention mechanism inside and creates a new spatial attention mechanism. A new general lightweight attention module called Depthwise Efficient Attention Module is proposed.

Proposed Model
In order to better perform malware detection and family classification of malware, we proposed a new method based on DenseNet and the attention mechanism. In this section, we will introduce the proposed model in detail. e proposed model is composed of DenseNet and DEAM. Based on DenseNet-121, we construct a DenseNet suitable for the proposed model. DEAM is composed of Improved Efficient Channel Attention (IECA) and Depthwise Spatial Attention (DSA). First, we introduce the architecture of DenseNet. en the proposed IECA and the DSA are described, respectively. Finally, the entire flowchart of our model for malware detection and family classification is represented.

Structure of the DenseNet.
e DenseNet model is a deep learning model developed on the ResNet. In recent years, DenseNet has achieved better results in the field of image classification. e basic idea of ResNet and DenseNet is the same; however, DenseNet establishes a dense connection between all the previous layers and the latter, and it realizes feature reuse through the connection of features on the channel. ese features make DenseNet achieve better performance than ResNet with fewer parameters and computational costs and alleviate gradient vanishing problems.
e DenseNet is mainly composed of DenseBlock and Transition layer. DenseBlock adopts a radical dense connection mechanism; that is, all layers are connected to each other. Specifically, each layer accepts the output from all the previous layers as its additional input, as shown in Figure 2.
In DenseBlock, each layer has the same size and each layer is concatenated with all previous layers in the channel dimension. For a network with L layer, DenseBlock contains a total of L (L + 1)/2 connections. e input of the layer L is as follows: where L represents the number of layers. H L (...) represents nonlinear transformation, which is a combination of Batch Normalization (BN), ReLU, Pooling, and Conv operations. In this paper, the common DenseNet-B structure is utilized, and the bottleneck layer is used to reduce the amount of calculation; that is, the structure BN + ReLU + 1 × 1Conv + BN + ReLU + 3×3 Conv is adopted in this paper. Each layer in DenseBlock outputs k feature maps after convolution, that is, the number of convolution kernels. If we set the channel number of input DenseBlock as k 0 , then the input channel number of L layer is k 0 + k(L-1).
Here, the final convolution of each layer is k, and k is called the growth rate. In DenseBlock, with the increase in the number of layers, the number of input channels will be larger and larger.
Since the input size of the model after passing through a DenseBlock remains unvaried, the channel dimension will continue to increase. erefore, dimension reduction is necessary to reduce computational complexity. e Transition layer is mainly composed of a 1 × 1 convolution and 2 × 2 AvgPooling or MaxPooling, and its structure is BN + ReLU + 1 × 1 Conv + 2 × 2 AvgPooling. It connects two adjacent DenseBlocks and reduces the dimensionality of the output of the DenseBlock. Now the commonly used DenseNet frameworks are DenseNet-121, DenseNet-169, DenseNet-201, and Dense-Net-264. e DenseNet in our proposed model is based on DenseNet-121. Table 2 shows a comparison between Den-seNet-121 and the DenseNet in our proposed model.

Depthwise Efficient Attention Module.
e DEAM we proposed follows the framework of CMBA [33] and consists of two parts, IECA and DSA. For an input feature mapM ∈ R C×H×W (where C denotes channel, H denotes height, and W denotes width), DEAM calculates the relationship between the channels of the feature map through IECA to obtain a 1-dimensional channel attention mapM C ∈ R C×1×1 to focus on important features on the image. en, DEAM calculates the 3-dimensional spatial attention map M S ∈ R C×H×W of the feature map through DSA and pays attention to the position of the feature on the image. e calculation process in the DEAM is as follows: where ⊗ denotes elementwise multiplication.
According to [33], in the DEAM, we connect the two attention mechanisms serially and put IECA in the front of DenseBlock Input Feature DEAM to get the best effect. rough experimental comparisons, when DEAM is put behind the last DenseBlock of DenseNet, the system can achieve the best effect. Since DenseNet is the connection of all layers, the input of each layer is the superposition of all the previous layers. Adding a DEAM behind each DenseBlock will recalculate the calculated value and create a lot of useless overhead. In addition, each DEAM focuses on different features. If the DEAM is in front of or behind each DenseBlock, they may interfere with each other to reduce the effect of the model. Figure 3 describes the process of each attention map, and the detailed information of each attention mechanism is described below.

Improved Efficient Channel Attention.
In the mechanism of channel attention, we consider which features we should pay attention to. Each channel of the feature map is regarded as a feature detector [38]; however, not every channel is very useful for image recognition. By calculating the probability of different channels, the channel attention will be focused on the main features of the image. erefore, through the channel attention mechanism, we can better extract the representative features of malware images and improve the efficiency of malware detection and family classification. e attention mechanism has been widely used to improve the performance of CNNs, among which the more representative ones are SENet [30] and CBAM [33]. However, most of the attention mechanisms are dedicated to complicating themselves to achieve better performance. ECA-Net [34] improved SENet model by lightweight without dimensionality reduction, and the important role of no dimensionality reduction for the attention mechanism is proved. ECA-Net proposes a 1-dimensional convolution to achieve a local cross-channel interaction strategy, which reduces model complexity while improving performance. e formula is as follows: where C1 Ddenotes 1-dimensional convolution, k denotes convolution kernel size of 1-dimensional convolutiony ∈ R C×1×1 ,ω denotes channel attention map, and σdenotes Sigmoid function. Meanwhile, a method of adaptively selecting the size of a 1-dimensional convolution kernel is proposed to determine the coverage rate of local cross-channel interaction. e formula is as follows: where |t| odd denotes the odd number closest to t, c and b are set to 2 and 1, respectively, and C denotes the channel number of the input feature map. is improvement significantly reduces the parameters of the channel attention mechanism and enhances the computational efficiency of the model. In this paper, we use ECA-Net to replace the SENet in the channel attention mechanism structure in CBAM in order to achieve the effect without local dimensionality reduction, and then we get the IECA. We use the max pool and average pool to compress the input feature map in the channel space in order to effectively calculate the channel attention. e average pool gathers spatial information, and the max pool gathers distinctive object features. Two spatial context descriptors (M C avg and M C max ) which represent average pool feature and max pool feature are output from average pool and max pool, respectively. e two spatial context descriptors are combined into a feature vectorF 0 ∈ R C×1×1 by element summation, and the combined feature vector is transferred into a 1-dimensional convolution. e size k of the 1-dimensional convolution Pooling GlobalAveragePooling

Softmax
Security and Communication Networks kernel is obtained by the adaptive selection, and the Sigmoid function is used for the eigenvector of a 1-dimensional convolution outputF 1 ∈ R C×1×1 to obtain a 1-dimensional channel attention mapM C ∈ R C×1×1 . e formula is as follows: where + denotes element summation.

Depthwise Spatial Attention.
Different from the channel attention mechanism, the spatial attention mechanism pays attention to which position on the feature detector is meaningful and which part is a supplement to the channel attention mechanism. e spatial attention mechanism will calculate the probability of different positions on the feature map and focus on the meaningful positions on the feature map. erefore, through the spatial attention mechanism, we can better extract the representative features of malware images and improve the efficiency of malware detection and family classification. e spatial attention mechanism of CMBA [33] first compresses the feature map with the max pool and the average pool along the channel axis [39] and then connects their outputs to generate an effective feature descriptor. Convolution is used for the feature descriptor to generate a 2-dimensional spatial attention map. DSA in this paper is a new spatial attention mechanism that is constructed based on the idea of no dimensionality reduction in ECA-Net [34]. In ECA-Net, it has been proved that avoiding dimensionality reduction is very important for the attention mechanism. DSA uses Depthwise Convolution to calculate the 3-dimensional spatial attention map of the feature map without dimensionality reduction. Depthwise Convolution is a special form of Group Convolution when the number of groups is equal to the number of channels. Depthwise Convolution divides the input features into different groups according to the number of channels and convolves each group separately. In IECA, we only replace the part of SENet [30] to achieve the effect of no local dimensionality reduction in the channel attention mechanism. In DSA, we construct a new spatial attention mechanism through Depthwise Convolution, abandoning the dimensionality reduction of the max pool and average pool along the channel axis in the CMBA, and achieve the effect of no dimensionality reduction. Depthwise Convolution can obtain a prominent information area from each channel, which is more comprehensive than applying the convergence operation along the channel axis [39]. We will describe the detailed operation below.
We apply Depthwise Convolution on the input feature map and use the Sigmoid function on the output feature descriptorF 2 ∈ R C×H×W to get a 3-dimensional spatial attention mapM S ∈ R C×H×W . e formula is as follows:  where Depthwise Conv2D denotes Depthwise Convolution.

e Process of Detecting and Classifying
Malware. e whole process of detecting and classifying malware is described as follows. First, the PE file is converted into a grayscale image using the method in the literature [12]. Second, the converted gray-scale image is applied to a malware detection model, which consists of DEAM and DenseNet. e model is trained using the gray-scale images of known benign software samples and malware samples, as well as   Obfuscator.ACY 1228 9 Gatak 1013 10868  e trained family classification model can effectively identify malware families. e whole process is shown in Figure 4.

Datasets and Evaluation Criterion.
e datasets used for the evaluation of the classification results of malware families in this article are the MalImg dataset from [12] and the BIG 2015 dataset provided by Microsoft for the Big Data Innovators Gathering Anti-Malware Prediction Challenge.
e MalImg dataset is a large-scale unbalanced Windows malware gray-scale image dataset which contains 25 malware families and a total of 9339 malware gray-scale image samples, as shown in Table 3. e malware families in MalImg dataset include worm, Trojan horse, backdoor, and rogue software.
We only use the labeled training set of the BIG 2015 dataset, which contains 9 malware families and a total of 10868 malware samples, as shown in Table 4. Each sample of the dataset has a hexadecimal representation of its binary content and its corresponding assembly file. Both the MalImg dataset and the BIG 2015 dataset are benchmark datasets used in many recent works.
Due to the lack of public datasets for detection, we use our own constructed dataset for indirect comparison with the work of others. We merged the MalImg dataset and the BIG 2015 dataset and randomly selected the same number of malware samples from each of the 34 malware families in the merger. en, we constructed a 1 : 1 detection dataset with the extracted 1087 malware samples and the collected 1087 benign software samples. e diversity of malware families in the dataset ensures the generalization ability of the detection model, so as to avoid the reduction of model generalization performance caused by overfitting when using unbalanced data to train the model.
In order to test the generalization performance of our model, we divided the dataset into the training set, validation set, and test set at a ratio of 6 : 2:2 and repeated each experiment 5 times to reduce experimental errors. e training set is used to train the model, the validation set is used to adjust the performance of the model, and the test set is put into the trained model to test the performance of the model. Our experiment is based on the TensorFlow 2.0 framework. e Adam optimizer and categorical_crossentropy are used in our experiments. Besides, such parameters as accuracy, recall, and F1 score are also used as performance indicators to select the best model in detection and family classification. is is because when there is an imbalance between different classes, the accuracy rate can only reflect the overall prediction level. It ignores the prediction ability of a small number of classes. Sometimes, it can still get a higher level of classification accuracy when there are errors in a small number of classes or key classes. e precision is relative to the prediction results and indicates the correct number of the samples whose predictions are positive. e recall rate is relative to the sample, that is, how many positive samples are correctly predicted. e F1 score combines the precision and recall results. e precision is calculated as follows: where TP is the true positive number and FP is the false positive number. e recall is obtained as follows: where FN is the false negative number. e F1 score is a weighted harmonic mean of precision and recall, as follows:  Figure 5: Accuracy comparison of our model with recent works in detection.    We derive the values of the binary classification task (detection) normally. For multiclassification tasks (family classification), we obtain the precision, recall, and F1 scores of each family separately. After that, the macro-precision, macro-recall, and macro-F1 are calculated by averaging the sum of the evaluation indicators of each family (the macroaverage gives each family the same weight), respectively.

Malware Detection.
We conduct malware detection experiments on the constructed dataset. Tables 5 and 6 show the obtained detection results in the form of a 2×2 confusion matrix, as well as the precision, recall, and F1 score values of each family. For the proposed model, the results of the accuracy, precision, recall, and F1 score are all 99.3%. e DenseNet without DEAM has an accuracy of 99.0%, a precision of 99.1%, a recall of 99.0%, and an F1 score of 99.0% on our constructed dataset. It can be found that DEAM almost does not improve the performance of CNNs in terms of detection, but the numbers of wrong predicted samples in the two experiments are only 3 and 4, respectively. erefore, we believe that DenseNet itself already has high detection capabilities, and adding DEAM can no longer improve the performance of CNNs. Figure 5 gives an indirect comparison between our model and recent works, [7,10,40,41] which proves that the proposed model is superior in detection compared with existing methods. the accuracy is 98.5%, the precision is 96.9%, the recall is 96.6%, and the F1 score is 96.7%. ese performance indicators show that the method achieves better classification and lower misclassification. On the MalImg dataset, Den-seNet without DEAM has an accuracy of 97.9%, a precision of 95.5%, a recall of 94.7%, and an F1 score of 94.6%. Figure 6 shows the comparison between our model and recent work on the MalImg dataset. Experimental results show that our model has the same accuracy as that of in the literature [16], but other performance indicators are slightly lower than those in the literature [16]. Compared with other recent works, [14,15,42] our model has improved performance in malware family classification and robustness in classification imbalance.
e MalImg dataset contains many samples processed through obfuscation techniques, such as packaging and encryption. Among them, Yuner.A, VB.AT, Malex.gen!J, Autorun.K, and Rbot!gen families use the same packaging technology, UPX, which makes them have similar structures, and it is difficult to distinguish them. However, our model classifies the Yuner.A with 100% accuracy; the F1 scores of Malex.gen!J and Rbot!gen are 97.4% and 99.9%, and the F1 scores of VB.AT and Autorun.K are also 93.6% and 95.7%. Allaple encrypts the code part in several layers using a random key. Our model classifies Allaple.A and Allaple.L at a rate of 100%. is proves that our model is robust in both packaging and encryption. Meanwhile, Swizzor.gen!E and Swizzor.gen!I that belong to the same family variants are also classified with the accuracy of 100%.
It can be seen from the comparison of Tables 7 and 8        On the MalImg dataset, the model using CBAM took Swizzor.gen!I for Obfuscator. AD with a rate of 100% in 5 experiments causing a significant drop in the classification effect, which reduced the performance of DenseNet.

BIG 2015 Dataset.
e processing method of the BIG 2015 dataset is similar to that of the MalImg dataset. We downsample the gray-scale images. Tables 9 and 10 give the obtained classification results in the form of a 9×9 confusion matrix, as well as the precision, recall, and F1 measurement values of each family. e accuracy of our model on the BIG 2015 dataset is 97.3%, the precision is 95.3%, the recall is 95.4%, and the F1 score is 95.4%. DenseNet without DEAM has an accuracy of 96.3%, a precision of 94.2%, a recall of 91.4%, and an F1 score of 92.6% on the BIG 2015 dataset. Figure 7 shows a comparison of our model with recent works [15,43] on the BIG 2015 dataset. It can be seen from Figure 7 that our model has improved precision, recall rate, and F1 score compared with [15], which proves that our model can better classify malware families. Our DEAM and CBAM have basically the same classification effect on the BIG 2015 dataset. However, the parameters used by DEAM are much less than those of CBAM, which can effectively improve the calculation efficiency. Based on the experimental results on the MalImg dataset and the BIG 2015 dataset, we can prove that the proposed DEAM has a better effect than CBAM.
rough the comparison between Tables 9 and 10, we can see that the addition of DEAM has increased the classification effect of the model on multiple classes, especially the F1 score on Simda increasing from 71.4% to 87.5%. erefore, it is further verified that DEAM improves the performance of CNNs. ere is a certain gap between the effect on BIG 2015 dataset and that on MalImg dataset. We think that this is due to the larger texture gap between the same family samples in BIG 2015. In this paper, we only use the global image and do not further process the gray-scale image of malware.

Conclusion
is paper proposes a new lightweight and effective convolutional neural network attention module that is defined as DEAM, and combines it with the DenseNet for malware detection and family classification. e proposed method, which is first used in the detection model, converts executable files into gray-scale images, and then the detected malware is used in the family classification model to distinguish different malware families. Experimental results show that the number of DEAM parameters is only onethird of the CBAM parameters, so the DEAM can reduce the attention module parameters and improve the computational efficiency of the model. Besides, it is better than CBAM in performance, which helps to improve the performance of CNNs. e presented model performs well in malware detection and family classification, and it also shows robustness to code confusion and class imbalance problems.
Although the proposed method has good performance in malware detection and family classification, it still needs improvements. For example, our method directly uses the original gray-scale image of the malware in the model and does not process the gray-scale image of the malware. In the future, we will explore these issues to further improve performance.  2015 dataset can be obtained from https://www.kaggle.com/ c/malware-classification/data.

Conflicts of Interest
e authors declare that they have no conflicts of interest.