In recent years, the number of malware on the Android platform has been increasing, and with the widespread use of code obfuscation technology, the accuracy of antivirus software and traditional detection algorithms is low. Current state-of-the-art research shows that researchers started applying deep learning methods for malware detection. We proposed an Android malware detection algorithm based on a hybrid deep learning model which combines deep belief network (DBN) and gate recurrent unit (GRU). First of all, analyze the Android malware; in addition to extracting static features, dynamic behavioral features with strong antiobfuscation ability are also extracted. Then, build a hybrid deep learning model for Android malware detection. Because the static features are relatively independent, the DBN is used to process the static features. Because the dynamic features have temporal correlation, the GRU is used to process the dynamic feature sequence. Finally, the training results of DBN and GRU are input into the BP neural network, and the final classification results are output. Experimental results show that, compared with the traditional machine learning algorithms, the Android malware detection model based on hybrid deep learning algorithms has a higher detection accuracy, and it also has a better detection effect on obfuscated malware.
Mobile phones have become increasingly important tools in people’s daily life, such as mobile payment, instant messaging, online shopping, etc., but the security problem of mobile phones is becoming more and more serious. Due to the open source nature of the Android platform, it is very easy and profitable to write malware using the vulnerabilities and security defects of the Android system. This is the main reason for the rapid increase in the number of malware on the Android system. The malicious behaviors of Android malware generally include sending deduction SMS, consuming traffic, stealing user’s private information, downloading a large number of malicious applications, remote control, etc., threatening the privacy and property security of mobile phones users.
The number of Android malware is growing rapidly; particularly, more and more malicious software use obfuscation technology. Traditional detection methods of manual analysis and signature matching have exposed some problems, such as slow detection speed and low accuracy. In recent years, many researchers have solved the problems of Android malware detection using machine learning algorithms and had a lot of research results. With the rise of deep learning and the improvement of computer computing power, more and more researchers began to use deep learning models to detect Android malware. This paper proposes an Android malware detection model based on a hybrid deep learning model with deep belief network (DBN) and gate recurrent unit (GRU). The main contributions are as follows: In order to resist Android malware obfuscation technology, in addition to extracting static features, we also extracted the dynamic features of malware at runtime and constructed a comprehensive feature set to enhance the detection capability of malware. A hybrid deep learning model was proposed. According to the characteristics of static features and dynamic features, two different deep learning algorithms of DBN and GRU are used. The detection model was verified, and the detection result is better than traditional machine learning algorithms; it also can effectively detect malware samples using obfuscation technology.
The rest of the paper is organized as follows. Section
Researchers are constantly improving and innovating the detection method of Android malware. Malware analysis technology is mainly divided into static analysis and dynamic analysis, and the detection method evolves from traditional machine learning to deep learning algorithms.
The static analysis method refers to extracting malicious features through semantic analysis, permission analysis, etc., after decompiling the APK file. The major advantages of static detection are high efficiency and speed, but it is difficult to identify polymorphic deformation technology and code obfuscation [
The required permissions of the APK can help gain awareness about the risks. Talha et al. [
APIs are also often used as key features in detecting Android malware [
As more and more Android malware avoid static detection through techniques such as repackaging and code obfuscation, dynamic analysis methods based on behavioral characteristics can solve this problem well [
Enck et al. [
Fu et al. [
Ali-Gombe et al. [
Malware attempts to evade detection by mimicking security-sensitive behaviors of benign apps and suppressing their payload to reduce the chance of being observed. Based on the contexts that trigger security-sensitive behaviors, Yang et al. [
Antivirus software can effectively detect Android malware, but it needs to manually extract the signature code and update it on the client side after obtaining the malware sample. This method has a high detection accuracy for known malware, but it also has certain limitations. For example, unknown malware that has not been seen before and malware processed by obfuscation techniques cannot be effectively detected. In order to improve the detection accuracy, in recent years, researchers have used machine learning algorithms to detect malware [
Kuo et al. [
To enhance security of machine learning-based Android malware detection, Chen et al. [
Combining of supervised learning (KNN) and unsupervised learning (K-Medoids), Arora et al. [
In traditional machine learning algorithms, the SVM algorithm is often used for Android malware detection, and it has a good classification effect in many cases. Li et al. [
Traditional machine learning algorithms are usually shallow structures, so they cannot effectively characterize Android malware through correlation features. Therefore, researchers tried to distinguish Android malware using deep learning models. The deep learning model has a wide range of applications in image recognition, speech recognition, and natural language processing, and its strong fitting ability for nonlinear relationship makes it have a good application prospect in malware detection. Commonly used deep learning networks include stacked autoencoder [
Deep learning demonstrated excellent performance in image recognition, so malware can be converted into images, and then deep learning algorithms are used for training and detection [
Pektaş et al. [
In order to improve the detection accuracy and take advantage of various deep learning algorithms, some researchers have proposed malware detection models with a combination of multiple deep learning algorithms. Luo et al. [
The features extraction method combining dynamic analysis and static analysis is adopted, as shown in Figure
Android malware features extraction.
Static features extraction is high-speed and consumes less system resources, which is suitable for large-scale features extraction, it cannot effectively detect obfuscated Android malware. Therefore, this paper uses a hybrid detection method of static analysis and dynamic analysis; a total of 351 features were extracted, including 303 static features and 48 dynamic features.
The extracted static features include resource features and semantic features.
Types and quantity of resource features.
Types | Specific meaning | Quantity |
---|---|---|
Certificate | Timestamp | 2 |
Detail | 8 | |
Name | Package name length, entropy | 14 |
Package name similarity | 7 | |
Stealth | Embedded API | 15 |
Embedded API package name | 4 | |
Embedded permission | 26 | |
Embedded intention | 13 | |
Inconsistent | Inconsistent file suffix | 4 |
Inconsistent resource | 5 | |
Native code | System call | 19 |
Shared library | 7 |
Types and quantity of semantic features.
Type | Specific meaning | Quantity |
---|---|---|
DEX | API call | 32 |
API package | 8 | |
String | 13 | |
Intent and permission | Intension | 16 |
Permission | 67 | |
Meta information | Services | 6 |
Content providers | 3 | |
Broadcast receivers | 2 | |
Activities | 25 | |
Escape | Encryption | 2 |
Reflection | 3 | |
Dynamic code loading | 2 |
Dynamic features are the behavioral characteristics of the Android application when it runs, such as data encryption and decryption, file reading and writing, network data transmission, call, SMS, geographic location, and access to sensitive information. These behaviors can represent the application’s functions and intentions. A total of 48 dynamic features are extracted. The extraction of these dynamic features is mainly based on monitoring related API function calls. Each dynamic feature corresponds to several API functions, and the total number of API functions is 141. Some of the dynamic features and corresponding API examples are listed in Table
Types and quantity of dynamic features.
Type | API examples | Quantity |
---|---|---|
Data encryption and decryption | doFinal | 35 |
File reading and writing | getString | 38 |
Network data transmission | Init | 28 |
Call | Listen | 13 |
SMS | createAppSpecificSmsToken | 15 |
Geographic location | addNmeaListener | 12 |
We select the automatic test tool MonkeyRunner and the dynamic analysis tool Inspeckage to extract dynamic features.
MonkeyRunner is a test tool provided by the Android SDK. It supports writing test scripts to customize data and events and can simultaneously connect to multiple real terminals or emulators to trigger operations of the application software. It can better perform the functions of the Android applications.
The dynamic analysis tool Inspeckage is a simple application software that integrates the commonly used dynamic analysis functions, and a built-in web server can provide a friendly interactive interface for users. Inspeckage can not only obtain basic information such as permissions, components, shared libraries, UID, etc., but also view the behavior of the application in real time. It can customize the hooked API; that is, it can customize the dynamic behavior required for monitoring, and this is also the biggest advantage of the tool.
For static features, most of them are binary features, and only a small number of features are discrete features, and there is no relationship between features. Therefore, the deep learning algorithm deep belief network is suitable for static features. Since the input of the deep belief network is a binary vector, one-hot encoding is used to encode the static features into binary vectors.
The process of one-hot encoding is to convert discrete features into corresponding binary sequences. For example, the discrete values of 0, 1, 2, 3 are encoded to binary sequences of 0001, 0010, 0100, 1000. For discrete features with large value range, the feature vector will be sparse if one-hot encoding is used. For this case, the value range can be finely classified to reduce the feature dimension after one-hot encoding. After one-hot encoding, all static features are concatenated into a binary vector, which is the input of the DBN.
After acquiring the dynamic features of the Android application, the dynamic features are formed into an operation sequence in chronological order. Because the dynamic features are correlated in time, the dynamic behavior of the Android application software can be better fitted through the recurrent neural network. The dynamic feature vector is the input of the GRU network.
Entity embedding is a method of data representation. It encodes structured discrete variables and tries to make the data representation retain the continuous relationship between data. In the implementation, the Keras library in Tensorflow is used to implement entity embedding. There are 48 discrete values (
According to the different characteristics of the static and dynamic features of the Android applications, a deep learning model based on a combination of deep belief network and gate recurrent unit is proposed. The advantage of using the DBN is that the learning speed of static features of Android applications is faster and the performance is better. Compared with the traditional RNN model, GRU can perform better in dealing with longer time operation sequences, with fewer parameters, faster training speed, and less data required to achieve good generalization effect. Therefore, the GRU neural network is more suitable for processing the dynamic features of Android applications.
The DBN-GRU hybrid model for Android malware detection is shown in Figure
DBN-GRU hybrid model for Android malware detection.
DBN is a widely used deep learning framework [
In Figure
Structure of DBN.
DBN uses CD algorithm to optimize the parameters of each RBM during pretraining; Figure
DBN pretraining process.
CD algorithm is used to train the weight matrix
During the pretraining process, the effect of the pretraining is greatly related to the relevant parameters. There are mainly three parameters for pretraining: the learning rate, batch_size, and epochs. Unlike the learning rate in the subsequent fine-tuning stage of the back propagation neural network, in the pretraining of the DBN, the learning rate usually does not need to be changed, but an appropriate value needs to be set. If the learning rate is too large, the model may fail to converge or even diverge, and if it is too small, the gradient descent can be slow.
Using the method proposed by Smith [
A large batch_size can reduce training time and improve stability, but as batch_size increases, the performance of the model will decrease. Therefore, an appropriate batch_size needs to be chosen. Considering the size of the dataset used in this paper, the batch_size is set to 256.
Finally, the epochs of pretraining: according to the value of the learning rate and the value of batch_size, the epochs are set to 10.
Recurrent neural network (RNN) can better deal with the time series data, so it is suitable for training the dynamic behavioral characteristics of Android applications. The traditional RNN has the problem of disappearing gradients, which is especially serious when the time series is long. To solve this problem, researchers made improvements to the RNN and got a variant, that is, GRU [
The GRU neural network is shown in Figure
GRU neural network.
The internal structure of the GRU is shown in Figure
Internal structure of GRU.
The internal calculation process of GRU is as follows: Activation function is tanh:
Calculate the states of gate
Reset using reset gate
Update the memory. The closer the gate signal is to 1, the more information it memorizes, and the closer it is to 0, the more information it forgets:
As mentioned above, combining
Our model uses a single-layer GRU. If a second layer is added, it can capture higher-level correlation of dynamic behaviors theoretically, but this is based on the large number of Android applications and high-dimensional input vectors. For the input 24-dimensional dynamic feature vectors and the training samples that have just passed 10,000, the multilayer GRU neural network will not produce good results and may even cause overfitting.
Parameter initialization refers to the process of initializing the weights and biases before the network training. The initialization of parameters is related to whether the network training can yield good results and converge at a faster speed. If the parameter is too small, the input signal of neuron will be too small, and the signal will slowly disappear after multiple layers. If the parameter is too large, the input signal will be too large, and the activation value is saturated causing the gradient to be close to zero.
The Gaussian distribution is used to initialize the weight matrix parameters. The parameters follow a Gaussian distribution with a fixed mean and a fixed variance. Assuming the number of neurons of a certain layer is
Back propagation neural network uses supervised learning methods to compare the classification results with the labels of Android applications to fine-tune the hybrid deep learning model. This paper uses the BP neural network combined with the softmax classifier, and the gradient descent algorithm is applied to fine-tune both DBN and GRU networks.
Softmax is used in the multiclassification process to map the output of multiple neurons to the (0, 1) interval for classification. Softmax is calculated as
The main function of the BP neural network is to fine-tune the parameters of the deep learning model, and the main basis in the fine-tuning process is to find the global minimum value of the loss function of the model. At this time, the parameters of the corresponding weight matrix are the global optimal. For the softmax classifier, the cross-entropy loss function is selected, as shown in
The cross-entropy loss function not only can effectively measure the similarity between the calculated value and the actual value, but also has a simple form and is easy to calculate and partial derivative, which is very convenient in the gradient calculation.
In the process of back propagation, the most commonly used optimization algorithm is to fine-tune the parameters by gradient descent. In order to solve the problems in the gradient descent method, an improved method mini-batch gradient descent is proposed, which can reduce the fluctuation of parameter update and finally get better results and more stable convergence. However, there are still some problems. For example, it is difficult to choose a suitable learning rate and it is easy to fall into the local optimum. Therefore, some algorithms for further optimization of gradient descent are proposed, such as Momentum, Adagrad, and RMSprop.
In this paper, the Adam (Adaptive Moment Estimation) algorithm is selected for fine-tuning. It is a combination of RMSProp algorithm and Momentum algorithm. The main feature of Adam is the adjustment strategy of the learning rate. The first moment estimation (the mean) and second moment estimation (the uncentered variance) of the gradients are used to adjust the learning rate of each parameter dynamically. The main advantage of Adam is that, after the bias correction, the learning rate of each iteration has a certain range, which makes the parameters relatively stable and has low memory requirements. And in actual operation, the Adam algorithm is simple to use and does not require manual parameter adjustment.
The dataset is divided into benign samples dataset and malware samples dataset. The type, source, and quantity of samples in the dataset are shown in Table
Android malware and benign samples.
Type | Source | Number | Total |
---|---|---|---|
Benign | Google Play | 5000 | 7000 |
APKpure | 2000 | ||
Malware | VirusShare (nonobfuscated) | 4038 | 6298 |
PRAGuard (obfuscated) | 2260 |
The frequently used features between obfuscated malware samples and nonobfuscated malware samples are analyzed. Figures
Top 10 features of nonobfuscated malware.
Top 10 features of obfuscated malware.
Permission-related features (such as Read_SMS, Write_SMS, etc.) of both sample types are used frequently, because permission features are difficult to be obfuscated, and obfuscating the permission features will destroy the inherent structure of APK. However, some sensitive API features (such as Telephonymanager_Getdeviceid, etc.) are used frequently in nonobfuscated malware samples but are used very rarely in obfuscated malware samples, which shows that the malware samples after obfuscation can avoid related detections when calling sensitive APIs.
It is worth noting that Stat_Cert_Diff, the top-ranked feature in the obfuscated malware samples, is a resource feature related to the certificate. It detects whether the time when the certificate is generated and the certificate is used to sign the APK is the same time. The frequency of this feature is high, which indicates that most of the obfuscated malware samples are generated through automatic repackaging. It also shows that the features extracted in this paper (such as Stat_Cert_Diff, Stat_Reflection) are very effective in detecting obfuscated malware samples.
Evaluate the detection effect of the hybrid deep learning model (DBN-GRU) on Android malware through the indicators of precision, recall, and accuracy; the results are shown in Table
Detection effects of different algorithms.
Benign samples | Malware samples | Accuracy | |||
---|---|---|---|---|---|
Precision | Recall | Precision | Recall | ||
SVM | 92.74 | 88.91 | 88.38 | 92.38 | 90.57 |
KNN | 83.83 | 85.65 | 83.90 | 81.90 | 83.86 |
Naïve Bayes | 86.54 | 78.26 | 78.45 | 86.67 | 82.27 |
DBN | 95.98 | 93.49 | 93.06 | 95.71 | 94.55 |
GRU | 93.78 | 91.74 | 91.16 | 93.33 | 92.50 |
DBN-GRU |
The DBN model uses static features. Although the antiobfuscation capability of the static features proposed in this paper has been greatly improved, it is still insufficient in capturing the dynamic behaviors of malware. The GRU model uses dynamic features, which has advantages in dynamic behavior analysis but is insufficient in the types of features. According to the experimental results, based on the combination of dynamic features and static features, the hybrid model of DBN and GRU can improve the detection ability and achieve a better detection effect.
Based on the collected samples, three sample datasets are constructed to evaluate the detection effect of the Android malware detection model on obfuscated samples and nonobfuscated samples. The composition of these three datasets is as follows: Nonobfuscated dataset: benign samples + VirusShare Obfuscated dataset: benign samples + PRAGuard Mixed dataset: benign sample + VirusShare + PRAGuard Each dataset is divided into a training dataset and a testing dataset, with 2/3 of the samples as the training dataset and 1/3 of the samples as the testing dataset.
Table
Accuracy of different training datasets.
Training dataset | Testing dataset | |
---|---|---|
Nonobfuscated dataset | Obfuscated dataset | |
Nonobfuscated dataset | 96.89 | 89.51 |
Obfuscated dataset | 92.32 | 96.58 |
Mixed dataset | 96.78 | 96.24 |
Then, analyze the situation when the training dataset and the testing dataset are different. Using the nonobfuscated dataset for training and the obfuscated dataset for testing, the accuracy rate drops significantly, being 89.51%. Using the obfuscated dataset for training and the nonobfuscated dataset for testing, the accuracy rate also drops to 92.32%. Using the mixed dataset for training, no matter whether the obfuscated dataset or the nonobfuscated dataset is used for testing, the accuracy rates are higher, 96.78% and 96.24%, respectively.
The experimental results show that the richer the sample types of the training dataset, the higher the detection accuracy. The mixed training dataset contains nonobfuscated malware and obfuscated malware. The detection accuracy of all types of testing datasets is high, which can meet the needs of malware detection in actual network.
200 samples chosen from the malware sample dataset are repackaged. The process of repackaging is very simple; just uncompress and reassemble the application software without changing any functions. In that case, the MD5 or SHA hash value of the repackaged application software will be different from the original value. Then, using mainstream antivirus software and the DBN-GRU hybrid model proposed in this article for detection, the results are shown in Figure
Detection effect of repackaged Android malware.
Due to the widespread use of obfuscation techniques in malware, the effect of traditional detection methods is greatly affected. This paper combines dynamic analysis technology and static analysis technology for Android malware detection and builds a hybrid deep learning model based on DBN and GRU. In order to deal with the obfuscation technology, new static features with strong antiobfuscation capabilities are added, and the dynamic features of the application software at runtime are extracted to enrich the Android malware feature set. According to the different characteristics of static features and dynamic features, a hybrid deep learning model with DBN and GRU is used for learning, and the detection effect of this model is verified through comparative experiments.
Due to the limited computing resources of mobile devices, and the fact that deep learning is a compute-intensive task, the Android malware detection model proposed in this paper is suitable for running on high-performance computers. In order to solve this problem, the cloud antivirus technology is recommended, the mobile phone client is responsible for uploading suspicious files, and the cloud-based server is responsible for sample analysis and detection.
The research has the following deficiencies and needs to be improved in future research work. First, the number of samples, especially malware samples, is not enough, and the representativeness of the obtained malware features is still not strong. It is necessary to constantly enrich the types and number of samples in the malware sample dataset; second, the calculating consumption of the hybrid model is larger than that of the separate model and the traditional machine learning algorithm, so further improvement and optimization is needed to reduce the time cost.
The Android malware samples used to support the findings of this study can be downloaded from VirusShare (available at
The authors declare that they have no conflicts of interest.
This work was supported by the National Cryptography Development Fund of China (grant no. MMJJ20180108) and the Fundamental Research Funds for the Central Universities of PPSUC (grant no. 2020JKF101).