ALowComputational CostMethod forMobileMalwareDetection Using Transfer Learning and Familial Classification Using Topic Modelling

With the extensive use of Android applications, malware growth has been increasing drastically. e high popularity of Android devices has motivated malware developers to attack these devices. In recent times, most researchers and scholars have used deep learning approaches to detect Android malware. Although deep learning techniques provide good accuracy and eciency, they require high computational cost to train huge and complex data sets. Hence, there is a need for an approach that can eciently detect novel malware variants with a minimum computational cost. is paper proposes a novel framework for detecting and clustering Android malware using the transfer learning and the topic modelling approach. e transfer learning approach minimizes new training data by transferring well-known features from a qualied source model to a destinationmodel, and hence, a high amount of computational power is not required. In addition, the proposed framework clusters the detected malware variants into their corresponding families with the help of Latent Dirichlet Allocation and hierarchical clustering techniques. For performance assessment, we performed several experiments with more than 50K Android application samples. In addition, we compared the performance of our framework with that of similar existing traditional machine learning and deep learning models. e proposed framework provides better accuracy of 98.3% during the classication stage by using the transfer learning approach as compared to other state-of-the-art Android malware detection techniques.e high precision value of 98.7% is obtained during the clustering stage while grouping the obtained malicious applications into their corresponding malware families.


Introduction
Among several mobile phone devices, the Android operating system has shown an exponential growth in the last few years due to its numerous potential bene ts such as open-source nature, user convenience, and extensibility. Android's openness allows users to download and install billions of free applications. Although Android applications are scanned frequently with Google Play Protect (https://developers.google. com/android/play-protect), still, the users can disable the scanning mechanism and download applications from thirdparty stores. is motivates malware developers to publish their repackaged malicious Android applications in suspicious third-party stores or websites. When a user downloads and installs repackaged malicious apps, the device security is compromised. According to "Kaspersky's" security malware threat report (https://securelist.com/it-threat-evolution-q1-2021-mobile-statistics/102547/), in the third quarter of 2020, approximately 350,000 novel malware applications were detected, of which most of the malicious applications were COVID-19-themed applications. Figure 1 shows the popular Android malware detected in the last quarter of 2020. In the rst quarter of 2021, an increase of more than 10,000 malicious samples was seen as compared to the previous quarter.
Conventional malware detection methods have several limitations. With the extensive utilization of machine learning and deep learning techniques in recent years, Android malware detection using these techniques has appeared, and the accuracy of malware detection mechanisms has been signi cantly improved.
In response to the intense growth of Android malware, extensive research has been conducted on techniques for detecting Android malware using deep learning. Researchers have proposed several techniques using various deep learning models and have obtained many research outcomes. However, malware detection techniques based on deep learning require a large amount of labeled data points to identify malicious threats with optimum accuracy. In most cases, the size of the data set for detecting new malware threats is not always large, and to collect a new data set, the search time also increases. Moreover, to identify a new malware threat, the deep learning models need to be trained again for a new data set from scratch, which is not only resource-consuming but also time-consuming. One efficient solution to overcome the issue of high computational complexity and model retraining is to use deep transfer learning technique. e major aim of utilizing the transfer learning approach in our study is to reduce the computational complexity by transferring well-known feature sets from a trained base model to a destination model with very less training data. e detailed description of the transfer learning approach is given in Section 3.

Our Contribution.
is paper proposes a two-stage framework to detect Android malware and group them into their respective families. In the first stage, we performed visualization-based detection using a traditional CNN (Convolutional Neural Network) model and a transfer learning model. Visualization helps to depict malicious behavior because the graphical features of an obfuscated malware do not change during the detection process [1]. CNNs are used to extract visual features because CNNs can efficiently extract deep features from images and videos. Further, we transfer the CNN methodology to the classification phase using transfer learning to speed up the convergence.
is provides an advantage of training large networks on both large and small novel data sets with less computational cost. e key goals and advantages of using transfer learning in the initial stage are as follows: (i) Knowledge utilization: the source model's knowledge can be used to train new target task. Hence, the new target model does not require training from scratch.
(ii) Model development time: the overall time required to develop and train a new model reduces drastically because only the last few data set-specific layers need to be trained. (iii) Overfitting issues: the problem of overfitting occurs when traditional machine learning and deep learning models are trained on a small data set. Transfer learning overcomes this issue by finetuning the model layers. (iv) Computational cost: deep learning models need a high computational power to train complex and hybrid data sets. is expensive computational cost can be reduced by applying the transfer learning approach.
In the second stage, we utilized two well-known topic modelling approaches LDA (Latent Dirichlet Allocation) and hierarchical clustering to categorize the families of the malicious applications obtained from the classification phase. To cluster the applications, the opcode vector sequences of malicious applications are extracted and the applications with similar functionality are grouped into a common malware family. In our analysis, we considered more than 50K APKs (Android Application Packages). e samples were collected from the official play store (https:// play.google.com/store), Drebin data set [2], Malgenome data set [3], and AAGM (Android Adware and General Malware) (https://www.unb.ca/cic/datasets/android-adware.html) data sets that are publicly available for research. ese data sets are described in detail in Section 3.
Our experimental results show that the APK parameters learned from the traditional model can be transferred to a target model to classify real-life data sets. By using the conventional CNN technique, we achieved an accuracy of 95.4%, whereas the transfer learning technique improves the accuracy to 98.3%. For familial classification, we utilized the LDA clustering approach. To reduce the irrelevant topics, we used a hierarchical clustering technique which provides the highest precision of 98.7%. e proposed framework is more accurate and less timeconsuming and does not require huge amount of computational power and resources. Moreover, the familial classification provides a deep understanding of Android malware. Combining transfer learning with topic modelling and hierarchical clustering provides benefits such as reduced computational power, fast and accurate familial classification, removal of overfitting issues for smaller data sets, and flexibility to add, remove, and transfer features to train new models. e major contributions of our work are summarized below: (i) e CNN model is used to identify and locate the visual prototype of Android malware. Root-level features are extracted from malicious images. (ii) Transfer learning is adopted to accelerate the training phase of the CNN model. Using transfer learning, the model was transferred to the classification phase. e experimental outcomes provided higher accuracy (98.7%) and fewer false positives.   Applied Computational Intelligence and Soft Computing (iii) Topic modelling approaches are used to group malicious apps into their respective families. is grouping provides in-depth knowledge of malware behaviors.
e rest of the article is arranged as follows: Section 2 presents the related work. Section 3 demonstrates the proposed framework methodology, workflow, detection strategy, experimental setup, evaluation outcomes, and comparative analysis. Section 4 describes familial classification, assessment methods, and evaluation results. Finally, Section 5 concludes the study and discusses further studies.

Android Malware Detection Based on Static and Dynamic
Approaches.
e scholarly world has proposed several techniques and solutions for detecting Android malware.
e static analysis approach is one of the most famous approaches for analyzing static source code features to detect Android malware. [4,5] are some of the proposed detection methods based on the static analysis approach. Although this approach is fast, it fails to identify obfuscated malware [6,7]. To overcome this issue, various solutions have been proposed based on the dynamic analysis approach. Rather than checking the source code, the dynamic analysis approach actually executes applications in virtual emulators or real devices to monitor real-time suspicious activities. is requires a large amount of resources and computational hardware.
Wong and Lie proposed IntelliDroid [8], an input generator to analyze Android applications. e proposed generator provides input relative to a dynamic analysis tool and can be paired with dynamic analysis tools. e authors experimented on various inputs and successfully identified malicious behavior and extracted the malicious paths.
Bhatia and Kaushal [9] presented an approach to detect Android malware by performing runtime behavior analysis of the applications. e authors extracted and collected system call traces and identified the frequency of feature sets. ey utilized the J48 Decision tree and Random forest algorithms to calculate the accuracy. e results were good; however, their model was trained on a small-sized data set of 50 applications.
Lorenzo et al. proposed the VizMal tool [10] to analyze the runtime traces of Android applications. e authors traced the execution path of the applications and identified vulnerable checkpoints.
ough the proposed tool works well for identifying malicious traces, the implementation cost and complexity are high.

Android Malware Detection Based on Learning
Algorithms. e difficulty of manually identifying malicious applications has motivated researchers to utilize machine learning techniques to accelerate and automate the detection mechanism. Popular research studies utilize machine learning algorithms to detect and classify zero-day Android threats.
Unlike machine learning, deep learning techniques automatically choose relevant features to train the detection model efficiently. erefore, high-level domain information and manual feature selection are not required.
is has motivated the researchers to propose various deep learningbased Android malware detection tools and techniques. Karbab et al. [11] proposed the MalDozer framework which can automatically detect Android malware and provides familial classification. e authors utilized deep learning techniques to find malicious applications. ey extracted several features from API code sequences of the data set. e framework was implemented on a data set containing more than 30k malicious samples out of 70k total samples. e authors achieved a low false-positive rate, but the framework requires high-end computations to achieve higher efficiency. Yuan et al. [12] implemented DroidDetector, a malicious APK detection engine to detect Android malware. e authors tested thousands of applications and performed a deep investigation using deep learning to detect malware. e proposed engine provided 96.7% accuracy. However, a greater number of critical features can be included in the training stage to improve the accuracy further.
Wang et al. [13] provided a comprehensive survey of deep learning techniques to detect Android malware. e authors investigated several studies based on deep learning architectures and provided a comparative summary. Mu et al. [14] utilized a text classification technique to detect Android malware. e authors extracted API instructions based on the Cuckoo sandbox and applied text processing to detect Android malware. Furthermore, the authors compared the accuracy of their technique with BGRU and LSTM techniques [15]. eir results showed better accuracy; however, the authors did not cluster the detected malware into their families.
Kim et al. [16] proposed a multilayered deep learning framework for Android malware detection. e authors extracted several useful features and refined them using the feature set vector generation technique. Moreover, the authors compared the performance of the proposed framework with other similar deep learning techniques. e proposed framework provided an overall better efficiency; however, it requires high computational power to execute multiple layers. Masum and Shahriar [17] proposed a framework named Droid-NNet to classify Android malware using deep learning. e authors utilized publicly available data sets Malegnome-215 [3] and Drebin-215 [2] to compare the performance of their framework with machine learningbased Android malware detection techniques. eir proposed framework provides a good F-beta score and low false positives. However, the evaluation was restricted to two public data sets.
Azmoodeh et al. [18] presented a deep learning framework to identify Internet of Battlefield ings (IoBT) malware using Android opcode sequences. e authors converted opcodes into vector space and applied deep Eigenspace learning. Finally, the authors showed the robustness of the proposed technique against code injection attacks. Feng et al. [19] presented a two-layer approach to detect malicious Android applications. e authors utilized Applied Computational Intelligence and Soft Computing 3 the first layer to extract static features such as intents, components, and permissions. e outcomes of the first layer are fed into the second layer, in which CNN and AutoEncoder are used for malware detection. e experimental outcomes provided a good detection rate, but the proposed framework requires high computational speed for training complex data sets.
Alotaibi [20] presented a new framework called Mal-ResLSTM to detect and classify Android malware. e proposed framework is based on deep residual LSTM. e author captured static features from applications and converted them into vector space. e obtained vector space is then fed into a deep LSTM network to classify malicious and normal applications. e author utilized Drebin data set performance evaluation. Booz et al. [21] utilized deep neural networks to classify Android malware by analyzing critical system permissions and third-party permissions. e authors applied the grid search technique to identify several combinations of hybrid attributes for deep learning methods. e obtained results had good accuracy, but the searching process required high number of processing cores to handle complex datasets.
Singh et al. [22] proposed a machine learning-based framework to classify Android applications. e authors extracted gray-scale images from the Drebin data set file and obtained manual features by exploiting the APK files. To extract image files, the algorithms based on image processing are utilized. e authors obtained an accuracy of 93%. e framework is capable to classify Android applications, but overfitting problems can occur if the model needs to be trained on a large data set.
Angelo et al. [23] proposed an approach based on the exploitation of API transitions in the call sequence. e extraction of a subsequence of API calls resulted in a malware classification resistant to evasion techniques. e authors compared the detectors using Markov chain and call sequence algorithms. e study outcomes outperformed various malware detection techniques.
Ficco [24] proposed an approach based on ensemble detection. e author utilized a blend of generic and specialized detectors during the analysis process to enhance the detection randomness and to improve the overall detection rate. Moreover, an alpha-count mechanism is also presented to differentiate the speed of various detectors. is mechanism provides the observation time window length which can affect the detection accuracy.

Android Malware Detection Based on Robustness.
Cai et al. [25] presented a novel dynamic and robust Android app classification technique called DroidCat based on dynamic method calls and ICC intents. e proposed technique efficiently handles reflection without depending on system calls and permissions. Moreover, the technique provides better robustness when compared to other similar state-of-the-art static and dynamic malware detection techniques. e authors extracted features from a behavioral categorization study. In addition, 34,343 apps from distinct sources during nine years were used in the evaluation, and performance measure was calculated. ey achieved a better F1 score accuracy of 97% when compared with two state-ofthe-art techniques.
Suarez-Tangil et al. [26] presented the DroidSieve approach based on static features classification. e proposed method exploits obfuscating feature sets and processes the static features in parallel. e authors achieved a detection accuracy of 99.82% with no false positives and familial classification accuracy of 99.26%.
Suarez-Tangil and Stringhini [27] conducted a deep analysis of Android malicious app behavior consisting of more than 1.2 million malware samples with 1.28 K families during a period from 2010 to 2017. e authors aimed to understand the evolution of repackaged malware, separated the components that were unrelated to malware, and analyzed the behavior of malicious riders using differential analysis. e samples were collected from distinct antivirus vendors.
Cai [28] conducted a study to build a systematic environment that continuously mines the mobile software ecosystem. e author focused on the behavioral evolution and performed a big dimension characterization study of the ecosystem. Further, an ecological interaction among three attributes, namely, user apps platform, mobile platform, and app users, was considered. e results provide sustainable app development and security.
Cai et al. [29] presented a study on the execution strategy of Android applications. e authors analyzed the behavior of Android applications about malware concerning execution paths, structures, methodology scopes, and callbacks. ey observed the app execution structure concerning the security platform. Further, the authors traced ICCs and methods of more than 30,000 applications including 15,451 benign apps and 15,183 malicious apps created from 2010 to 2017. Among these apps, the behavioral structure and similarities were identified to trace the difference in the apps.
Cai and Ryder [30] presented a study on Android apps based on longitudinal characterization to identify and observe the build and execution nature of the applications. e authors utilized a lightweight static approach to analyze the execution code of 17,664 applications developed in eight years. Further, they analyzed that applications' functionalities strongly depend on Android system architecture, and this dependency tends to increase with time. Also, the Activity components invocate life-cycle callbacks, and the event callbacks rely on the user interface rather than the system interface. In addition, the ICCs do not contain data payloads.

Android Malware Detection Based on Sustainability.
Fu and Cai [31] investigated the deterioration issues in Android malware detectors. e authors analyzed four stateof-the-art detectors and concluded that the performance of the existing solutions degrades with time. Moreover, the authors proposed a novel approach based on a longitudinal characterization study of applications focusing on runtime behaviors. A comparison was also performed between the proposed approach and the four state-of-the-art techniques to analyze the deterioration problem.
Xu et al. [32] proposed DroidEvolver to detect Android malware which can update automatically without human intervention. e model does not require retraining, and it gets updated through online learning techniques, and hence, the high computational cost is not needed. e authors experimented on 33,294 benign apps and 34,722 malware apps that were created over six years. Further, the authors compared the model with the state-of-the-art MamaDroid model [33] and outperformed it in terms of average F1 measure.
Cai [34] proposed a practical malware detector that can sustain over time without the need for retraining. e author deeply analyzed sustainability issues related to learning-based classifiers. Initially, the author defined sustainability metrics and created the DroidSpan classification system which focuses on sensitive access distribution capturing. Moreover, the author evaluated and compared the sustainability of DroidSpan with five stateof-the-art detectors having 13,627 benign apps and 12,755 malware apps.
e proposed system outperformed the baseline detectors in sustainability. Cai and Jenkins [35] proposed a sustainable Android malware detector that can detect novel malware without retraining. e authors investigated the runtime behaviors of applications and discriminated benign apps from malware by analyzing the behavior. e proposed model can sustain over five years without the requirement of frequent retraining.

Proposed Framework
In this article, a two-stage framework is proposed to classify and combine Android applications. During the first phase, the Android applications were collected from various sources such as Google Play Store (containing benign APKs only), Malgenome, Drebin, and AAGM data sets, respectively. e feature sets are extracted from the collected APK files, and these feature sets are preprocessed and converted into binary vectors. To categorize the applications into malicious and benign apps, static and dynamic parameters are extracted from a set of Android applications.
ese applications are obtained from the official play store, third-party app stores, and publicly available datasets. Static parameters [36] include system services, intent, version, manifest permissions, strings, and components. ese parameters provide metadata information stored in the application. Unlike static parameters, the dynamic parameters [37] such as system calls, files, logs, and network activities provide information about the behavior and control flow of an application. Further, the binary vectors are converted into gray-scale images, and these binary images are used as inputs to the source model for training and classification purpose. e generation process of images from Android APK files is demonstrated in Figures 2 and 3 shows the proposed framework. e static and dynamic feature sets are discussed below: (i) Manifest.Permissions: every Android application contains the Manifest file, which is used to provide information about packages, strings, resources, services, etc. One of the package files inside the manifest file is the manifest.permissions file which is used to provide permission to applications. is file is used to validate permissions during the installation process of the application. Critical permissions include access to SMS, location, and storage. (ii) API calls: Application Programming Interface (API) calls are runtime function calls. ey are initiated when an application wants to interact with system resources. (iii) String values: these values are used to provide text information of resources. ese files are normally stored in Strings.xml file. Information such as app history, version, list of permissions, and app size is contained in this file. (iv) Intent: intents are used to trigger activities in an application. When an application wants to perform a task, intents are triggered to provide runtime APK binding. (v) Dalvik code: these are executable codes that are obtained from Java bytecodes. Dalvik codes are not used in novel Android versions, and they are replaced by ART (Android.runtime) library. e ART library is used to provide debugging options to identify bugs in the application. (vi) Services: Android services are the components responsible for background activities while an application is running. Services do not require user intervention. (vii) Version: this specifies the information about the current version of an APK file. Versions usually change when an application is updated. (viii) Component: Android APKs are categorized into components for better storage. e components store activity, intents, permissions, and resource files. (ix) System calls: the system calls are used by the applications to access Android operating system resources. ey work as system-level APIs to interact with system files. (x) Runtime libraries: Android runtime (ART) is used to provide diagnostic and debugging options.
e feature sets are converted into feature vectors and ranked according to their importance. For example, the importance of parameter "version" is less than the importance of parameter "Permissions." Similarly, all the feature parameters are ranked according to their priority. By ranking the parameters, the insignificant features can be removed and filtering can be done. After filtering the feature sets, the significant feature sets are converted into binary gray-scale images. is conversion is described in the next section.
Applied Computational Intelligence and Soft Computing 3.1. Workflow. In our article, we collected testing samples from Android's official play store using a crawler tool. is tool crawls over the entire application database and extracts APK files from it. e applications stored in the play store database are usually benign because Google uses the Google Play Protect tool to periodically scan applications in real time.
If an application is removed from the play store, it indicates that the application is either vulnerable or suspicious. After obtaining the benign APKs using a crawler tool, the benign app data set and hybrid data set (containing benign and malicious applications) are filtered, and irrelevant features are removed as shown in Figure 4. e obtained feature set is converted to gray-scale images by selecting the binary values from APk files. Furthermore, the gray-scale images are used as an input to the CNN model for training purpose. To reduce the complexity, the initial layers of the CNN model are transferred to a novel model using transfer learning. e final few layers were fine-tuned and updated. Finally, the malicious applications obtained after the training and classification phase were used for the familial classification process. In the overall workflow, the filtering and scanning processes are performed rigorously to obtain real-time updates.

Generating Images from Android Applications.
According to the research criterion, APK visualization can be performed efficiently by using static features of APK file such as AndroidManifest.xml, Dalvik files, string xml files, and resource files [38]. In this article, the malicious images were extracted using these files from malicious APKs. Grayscale images are obtained by converting the files into binary vector pixels. e data set APK archives are extracted to generate the components required for creating image data sets. e APK data have been interpreted as a binary stream and kept in a binary array vector matrix. e APK files are disassembled to generate the 8-bit binary files, and then, they are mapped to gray-scale range of image which is generally 0-255. e binary streams are transformed into vector array matrix to construct a gray-scale image as shown in Figure 2. e major drawback with AndroidManifest.xml and resources.arsc files is that these files generated images of size 64 pixels. And this cannot be expanded to 256 pixels because data loss occurs. As every byte in the binary vector matrix can have a value in the range 0-255, every byte has been transformed into a pixel. e image generation steps are given as follows: (i) Step-1: the data sets having APK archives are extracted to obtain the files namely AndroidManifest.XML, Resources.arsc, Classes.dex, and jar files.
Step-2: the generated files are disassembled to produce 8-bit binary files. e data in the files are interpreted as binary data, and the binary vector streams are generated. (iii) Step-3: the binary vector streams are transformed into an 8-bit array vector matrix. (iv) Step-4: the gray-scale images are constructed using the array vector matrix and are stored in an image data set.
e generated image is used as an input for the traditional CNN model and transfer learning model. In the transfer learning model, the last few layers are updated and initial layers are frozen as these layers specify generalized feature sets. e reason for choosing the CNN model is that it is capable of visualizing the overall geographical areas of an image. Transferring the learned attributes of initial layers helps the novel model to train similar data sets more efficiently with less time.

Convolutional Neural Networks.
A CNN is a type of neural network that is feedforward in nature. CNNs are best suited for classifying images and videos as they efficiently extract geographical parameters and provide good accuracy with very few false positives [39,40]. CNNs can extract data set parameters in narrow image regions as well as full image frames. In the initial stage, CNNs contain convolutional layer and dense layer that are interconnected with each other. e important parameters are separated from the input data set in the training phase with the help of convolutional layer. In this way, the input data set is reduced, and training process is accelerated. To further reduce the feature-set size, CNNs contain max pooling layer which is responsible for merging various neurons to a maximum value. e above-mentioned CNN layers and a completely connected dense layer form the CNN architecture as shown in Figure 5.
A completely connected dense layer takes an input feature set obtained from previous layers and generates output. e most useful advantage of using CNNs is that they can be fine-tuned to extract essential features from malicious data set images without performing feature engineering. However, the CNNs are vulnerable to overfitting problem for smaller data sets. To overcome the  Applied Computational Intelligence and Soft Computing problem of overfitting, the training data can be normalized. A CNN model is able to adopt images of any size. However, to overcome any data loss, we normalized the images to a scale, such as A X B. To input an image of size C XD, there are two possibilities as shown in the following equation: Malicious APK image data set is generated and malicious features are extracted to classify malware graphically. e objective function of generated image in the CNN network is defined as where p i is the target probability of the anchor.
e cross-entropy is defined as    [7,41]. For example, a model trained on a large image data set can be utilized to train the target model on a smaller data set efficiently. is process is illustrated in Figure 6. Transfer learning reduces the computational cost and time complexity as training the model from scratch is not required. With the increase in network complexity, the time complexity, computational cost, and overfitting problems increase drastically. To resolve these issues, the features are extracted by applying fine-tuning and freezing process [42]. Using these two processes, the network features of the learning target can be modified. For example, fine-tuning stops generalized network features to be trained and updates higher level model features. is is achieved by freezing the initial few layers of the source model. After fine-tuning the layers, the novel updated layers are attached to the source model. Figure 7 shows the basic architecture of transfer learning approach.
In Figure 7, the starting layers of the model are frozen and the final layers are fine-tuned.
is is achieved by changing the features at the last layers without modifying the initial layers. e reason for modifying the last layers is that these layers utilize most of the data set, whereas the initial layers deal with generic features such as AndroidManifest file, version, metadata, strings, and other static features. By freezing the initial layers, the computational power requirement is reduced drastically as only the last layers need to be trained. e necessary feature sets of AndroidManifest.XML, classes.dex, and resources.arsc files present inside the image data set are stored in the final few layers of the model, and they are transferred to the target model to train hybrid and smaller image data sets. Hybrid data sets are prepared by using a blend of publicly available data sets. e extracted APK files are filtered to separate necessary files such as classes.dex, manifest.xml, and resources.arsc. is reduces detection time and computational power requirements. Because only the last few layers need to be changed, the learning rate is usually smaller than that of the traditional models.
e problem of overfitting is reduced by using transfer learning, as only essential features are transferred to the novel model and generalized features are frozen.

Detection
Mechanism. CNN models are typically trained on a large image data set as an input. However, the data set samples are limited. Training the CNNs on smaller data sets creates the problem of overfitting. Transfer learning overcomes this issue by training the target model on selected layers only. In this article, we utilized transfer learning of the CNN network which is trained on a hybrid data set containing benign and malware applications and then transferred the learning to train our Android APK gray-scale image data set. is process is demonstrated by the flowchart shown in Figure 8.
e pretrained CNN network model is transferred to the Android APK visual malicious classification model. is transfer learning reduces the training time complexity of the target classification model. e deep features of the model are able to extract geographical features of images. e target model can be trained on a small data set with good accuracy and less false positives. To reduce errors in the image detection process, we define a hybrid loss function to fine-tune the network features. e schematic figure is depicted in Figure 9. where p i is the target probability of the anchor.
v * i is the vector representing the coordinates of the bounding rectangle with respect to the positive flag. v i � v x , v y , v w , v h is a parameter coordinate vector of predicting rectangle. S is the smooth function for the softmax layer. e main implementation consists of the following steps:   (i) Data set description: We utilized three distinct Android malware data sets: the Drebin data set containing 5560 APKs, the Malgenome data set containing 1200 APKs, and the AAGM data set containing 1900 APKs, respectively. ese data sets are publicly available for research purposed. e reason for selecting Malgenome and AAGM data set is to check the accuracy of the transfer model by training the model on this smaller data set. Moreover, the overfitting problem also needs to be analyzed. To generate pure benign data set, we utilized the crawler tool to extract benign APKs from Google Play Store. Table 1 shows the various data sets used in our experimental study.
(i) Pretraining the model: to pretrain the model, we utilized a combination of all the data sets including benign and malicious APKs. Most of the general features are common to all the images. (ii) Fine-tuning the layers: we applied forward and backward propagation methods to balance the weights of the pretrained model. is is required to freeze the general feature sets of the initial model layers. After freezing the initial parameters, we transferred the feature set to a novel model. We chose a learning rate of 0.001 to modify the convolutional layer. e deep feature sets obtained by the initial few layers are general, so we kept these layers as it is while transferring. e middle and last few layers affect the target output of the model, so we fine-tuned these layers and updated the learning rate to 0.0001. Finally, we set the objective function to train the novel model. (iii) Model Training: after obtaining the new configuration file, we trained the novel model by maintaining the learning rate of 0.0001. e process flowchart is depicted in Figure 10.
In Figure 10, the feature sets of malware APK files are stored in "dex" files. ese "dex" files are disassembled and converted into binary files for binary classification. e pixels are obtained during the conversion process, and each binary file corresponds to a gray-scale image. e malicious images are stored in the "resource" folder inside the main file. e CNN model is trained on hybrid data set containing benign and malware files. e config file which was used to train the CNN model is modified in such a way that the initial layers are kept as it is. e middle and final layers are modified, and this configuration is stored in config file. Finally, the novel transfer model is trained using this modified config file.

Training Methodology.
To distribute the data set equally, we utilized 7-fold cross-validation. e gray-scale images are fed into the CNN network in a randomized manner, for example, genuine and malicious APK images can enter the network in a single shot. is reduces the time required to serialize the images. We utilized Tensorflow [43] and Torch [44] libraries in our experimental evaluation. To identify the hyperparameters, we utilized performance metrics such as "precision," "recall," "efficiency," and "F1- Applied Computational Intelligence and Soft Computing score." Total epochs of [15,25,35,55,75,90, 100] are executed in our evaluation considering the hardware and software computational power. To reduce the false positives, we utilized "recall" score. e optimal blend of hyperparameters we got by using 120 epochs for Drebin data set and 15 epochs for hybrid data set, respectively. e classification statistics for the hybrid data set are presented in Table 2. e outcomes are generated by training the CNN model on a larger hybrid data set. e classification statistics for Malgenome and AAGM data sets are shown in Tables 3  and 4, respectively. e precision values are better as compared to the precision values of CNN classification.
is indicates that the true false positives are less, and training predictions are almost accurate. In addition, the binary classification for benign and malicious outcomes is obtained by setting the learning threshold to a very low value.

Detection Strategy.
To detect the malicious samples, we executed the classification test among hybrid data set comprising play store benign APKs, Drebin data set APKs, Malgenome APKs, and AAGM APKs, each containing 5000, 5560, 1200, and 1900 samples, respectively. e evaluation result of the data set produces a score of 95.2% with a false positive rate of 5.4%. is cross-validated score is executed per fold, and it is distributed evenly among the data set. Furthermore, we applied the transfer learning approach and achieved a cross-validated score of 98.3%. While applying transfer learning, we fine-tuned the hyperparameters of the CNN layer and dense layer and updated the configuration file. For this purpose, we utilized the objective hybrid loss function defined in Equation 2.
Transfer learning approach achieves better performance and less false positives as compared to the traditional CNN model. e performance outcomes are presented in Table 5. According to Table 5, the transfer learning approach performs better in terms of improved accuracy, less FPR and computational cost, and no overfitting issues. e convergence rate is also fast with the transfer model as the complete model does not need to be trained from scratch.
We also compared these results with those of similar studies. To compare the models, we chose accuracy rate or efficiency rate, number of false positives, and percentage of malware detected. A comparative analysis of distinct techniques and models is shown in Table 6. Table 7 shows the comparison between training time and computational power overhead of the CNN model and transfer learning model for different data sets. Some of the important metrics used to evaluate the performance of      [47] showed the comparison of training time and computational power overhead for ELM (Extreme Learning Machine) classifier. e authors showed that the computational resource requirement increases with the data set size and complexity. In this work, we utilized the metric mean time to detect (MTTD) to evaluate the effectiveness of malware detectors. MTTD is a key metric widely used in cybersecurity. It provides the average amount of time taken to detect the threats. From Table 7, it can be observed that the transfer learning model outperforms the traditional CNN model as the MTTD value for the CNN model is nearly around 90000 seconds which is higher than the transfer learning model. e main reason behind this is that the transfer learning model need not be trained from the scratch unlike the CNN model. Moreover, the computational power overhead is higher for the CNN model as compared to the transfer learning model because the CNN model contains each and every layer to be trained on the complex data set. Figure 11 shows the comparison of the average time taken to detect the malware using the CNN model and the transfer learning model.
Due to the faster convergence of the transfer learning approach, the hyperparameters are effectively utilized as compared to other approaches. e performance plots of the detection techniques are shown in Figures 12 and 13, respectively.
As depicted in Figure 12, the F-scores of traditional models are lower as compared to the transfer model. e main reason for this is that traditional models require significant computational power during the training phase,    Hence, the overall detection rate improved significantly with negligible false positives.
In Table 6, the accuracy of the proposed technique is compared with similar techniques or models. e proposed method has achieved a good accuracy.
e comparative results are shown in Figure 13. Moreover, the familial classification process also achieved a good accuracy which is described in the next section. Table 8 shows the comparative analysis of various stateof-the-art Android malware detectors based on objective, i.e., detection or categorization or both, familial clustering, robustness, and computational cost. e various detectors are good at some points; however, most of the detectors lack robustness in terms of detecting zero-day and obfuscated malware. Our proposed method detects and categorizes malware with less computational cost requirement. However, the robustness of the method needs to be enhanced in terms of obfuscation, especially, resource obfuscation.

Limitations of the Proposed Method.
e proposed approach aims to detect and categorize Android malware with less computational power requirements. However, several other important factors such as sustainability and performance deterioration need to be considered. e basic goal of our approach is to complement the other detection approaches that require a lot of computation resources for training and testing the models. In our study, we focused on reducing the need for high-end GPUs and RAM and overcoming the problem of overfitting in the case of smaller data sets. e robustness of the proposed method is not as good as compared to some previous studies such as DroidCat [25], DroidSieve [26], and DroidSpan [51]. e main reason for this is that we have considered static feature sets along with dynamic feature sets in our work. e static features lack runtime behavior attributes, and new malware variants dynamically change their behavior and form to evade detection mechanisms. Our proposed approach works well for detecting existing malware; however, the sustainability of the detection approach requires the reselection of various feature sets and layers to be transferred to the target model for training purposes. ough the less computational cost will decrease the training time, the target transfer learning model needs to be retrained for new malware samples in contrast to some prior detectors mentioned above. Model updation depends on several factors such as novel malware behavior, dynamic permissions, resource obfuscation, and system call obfuscation. e problem of retraining the target model can be resolved by considering the overall behavioral features rather than static features. Moreover, the deterioration problem of our proposed learning model can be reduced by considering the evolutionary categorization of Android applications. Fu and Cai [31] focused on the deterioration problem in their study and were able to achieve the highest average F1 score over seven years. e authors have also compared their work with state-of-the-art baselines and outperformed them. Our proposed approach provides good detection accuracy; however, we will try to overcome the above-mentioned limitations in our next study to increase the efficiency of our detector. Our next step will focus on sustainability and performance deterioration issues concerning the transfer learning approach.

Familial Clustering Using Topic Modelling
In this section, the malicious APKs that are detected during the classification stage are grouped together to form a family. To achieve this, we utilized LDA (Latent Dirichlet Allocation) [52] and hierarchical clustering [53] techniques which are based on topic modelling [54]. In topic modelling, the apps are grouped and combined into their respective families on the basis of functionality of malicious applications, for example, applications that show malicious ads are grouped into Android.adware family. Similarly, the applications that require access to system calls and critical permissions are combined into the virus family and so on. To implement this, we created a database of detected APKs and converted their features into topic vectors. Furthermore, vectors with common attributes are combined to form a cluster. Each cluster represents an Android malware family.

Preprocessing APK Database.
To process the database, we applied a filtering technique which is used to eliminate stop words. To achieve this, we utilized the tensorflow python library. After the data gets filtered, we executed the stemming process on topic vectors. Stemming is used to find the core word from a set of words. For example, the malicious applications "com.android.spy.fake," "com.android.spy.install," and "com.android.spy.worm" can be stemmed into spyware. Similarly, other malicious applications can also be stemmed into their respective root topics. is helps to minimize the processing overhead for a large set of applications. e stemming process for malicious applications is shown in Figure 14. After the stemming process, we created an array of topic vectors based on similar features. For example, the malicious APKs having the same functionality and feature set are grouped into a common cluster. Furthermore, we applied the LDA algorithm and hierarchical clustering algorithm to create clusters.

Latent Dirichlet Allocation.
In our experiment, we utilized the LDA algorithm to generate the probability of topic vectors. e likelihood of each malicious APK can be identified by analyzing the feature set vector. If the features of two or more applications are similar in the context of behavior and control flow, the probability that these applications belong to a similar malware family will be higher. e APKs having same probability assignment are combined into a common cluster with the help of the SVM (Simple Vector Machine) [55][56][57] clustering technique. In place of SVM, KNN (K-Nearest Neighbour) [58] can also be used, but SVM gives good results as it considers outliers effectively. We labeled the malicious APKs that have similar probability assignments and generated clusters as depicted in Figure 15. e topic-wise likelihood distribution of data set APKs is listed in Table 9. In Table 9, the applications "spydetector," "trojanfinder," "adsdetector," and so on have the highest probability of being malicious as these applications need access to critical permissions and generate malicious ad links. Similarly, the other applications are distributed topically on the basis of static feature sets.
After distributing the applications topic-wise, the clusters of topics are formed. e cluster with the highest probability value includes maximum topics inside it. e sample clusterforming process is presented in Table 10.
Cluster formation for distinct topics is demonstrated in Table 10. In this table, large number of applications lies in the "news" category, i.e., cluster 8, and very few applications belong to the "medical" category, i.e., cluster 6. One of the drawbacks of classifying the apps category-wise is that most of the apps do not provide relevant information and fall into unknown categories. is problem can be solved by using the hierarchical clustering technique discussed in the next section.

Hierarchical Clustering Technique.
LDA clustering provides a static approach for clustering the malicious applications. To improve the clustering mechanism and achieve dynamic clustering, we executed the hierarchical clustering technique. In this technique, prior topic information is not necessary. erefore, by obtaining the hierarchy of similar feature vectors, the clustering can be done  e hierarchical clustering technique is shown in Figure 17.
In Figure 17, a similarity or relationship matrix is created from the APK feature set. Below are the steps involved in generating the clusters: (i) Step 1: each topic is used to form a topic cluster. (ii) Step 2: the relationship matrix among topics is generated, and the topics which are similar are grouped together into a novel topic T. (iii) Step 3: the relationship between novel topic T and the remaining topics is computed, and this process is repeated until a threshold "T h " is achieved such that the relationship between two topics becomes dissimilar and final topics are obtained. e size of the similarity matrix decreases for every iteration as the clustering process continues. In our experiment, the threshold value T h is set to a value of 0.3, and it is computed on the basis of the maximum similarity percentage among topics. is threshold value provides faster convergence and optimal clustering. e relationship matrix is computed using topic vectorization. e computational formula is as follows: where T X is topic X, T Y is topic Y, X′ is the topic vector of X, Y′ is the topic vector of Y, and Beta is the angle between X′ and Y′. Identifying the similarity between two topic vectors        using the above formula provides a fast merging rate, and irrelevant topics are avoided. is provides less complexity and high robustness in the cluster-forming process.

Assessment Methods.
To evaluate the process, we utilized three famous parameters,"Recall rate ," "F Score ," and "Precision." ese parameters are defined in clustering methodology with the help of the following formulas: where N x and N i are the quantity of topics of class x and cluster i, respectively. N xi is the quantity of topics consisting of class x in cluster i. To determine the F-score, the following formula is used.
e value of F Score (x) of class x is the maximum value among F Score (x, i) values of class x and groupings i. F Score is defined in the following formula: If R x denotes recall rate and P x denotes precision P x of class x, then the overall precision, recall rate, and F Score values will be the average of precision, recall, and F Score for every cluster, respectively. e following formulas define the final values.

Evaluation
Results. e clustering experiment is performed on the topic database of malicious APKs obtained by traditional CNN and transfer learning classification techniques. e performance metric parameters are evaluated for 10, 15, 25, 30, and 45 core topic word features. For "T" topic features with T �10, 25, and 45, the performance achieved is better for Drebin, Malgenome, and AAGM data sets, respectively, as compared to other values of T. We chose maximum values of T as 10, 25, and 45 for these data sets because we achieved good results on these thresholds. By increasing this value further, the complexity of the clustering process increases and the FPR (False Positive Rate) value also increases. For T values less than 10, the accuracy tends to decrease. Table 11 lists the outcome values for various parameters.
In Table 8, the precision achieved is the highest for T �10 for the Drebin data set. Similarly, the precision for Malgenome and AAGM data sets is high for T values 25 and 45, respectively. Sample topic words extracted by LDA and hierarchical clustering are presented in Table 12. e P-R curves according to the quantity of core topics are shown in Figure 18. e words obtained by LDA clustering contain less information than the words obtained by hierarchical clustering. Moreover, some words do not convey the gist of the topic. For example, the words "news," "music," "charge," and "specific" have no relevance to the topics. On the other hand, the words obtained from the hierarchical clustering technique are more relevant to the topics. e topic words for the three data sets with performance metrics are listed in Table 13. e P-R curves according to topic words quantity are shown in Figure 19.
As shown in Table 13, for T values 10, 5 and 15, the results are good for Drebin, Malgenome, and AAGM data set, respectively. For T values greater than 15, highly irrelevant topic words were extracted which produces errors. Hence, the optimal topic threshold for topic quantity is restricted to 10. In our experiments, we considered UPGMA (Unweighted Pair Group Method with Arithmetic Mean) method [59] of hierarchical clustering technique to evaluate the performance because it provides high similarity index for related terms.

Conclusion and Future Work
Cell malware has been present since the launch of smartphones. With the increase in Android popularity, malware applications continue to succeed in escaping security models. In this article, we discussed traditional CNN and transfer learning techniques to detect and classify Android malware. Due to the extensive use of CNN in image processing, the application of CNN on malware images becomes important. We proposed a two-stage framework which converts Android APKs into binary grayscale images. ese images are used as an input to the traditional CNN model. To overcome the problems of overfitting, complexity, and computational cost, we applied the transfer learning approach on the trained model by freezing the initial layers of the pretrained model. e evaluation outcomes show that the transfer learning approach provides better accuracy of 98.3% with very few false positives as compared to the traditional CNN model. We also compared the evaluation results with some similar techniques.
e results show that transfer learning outperforms the traditional techniques and also reduces the computational cost. Further, we combined the detected malicious APKs into their respective malware families using LDA and hierarchical clustering techniques based on topic modelling. We compared the clustering results of the LDA technique with the hierarchical clustering technique and concluded that hierarchical clustering provides better familial classification. In the upcoming study, we would like to extend the utilization of hybrid classification techniques such as LSTM combined with control-flow graphs. We expect that our future study will provide us indepth fine-grained feature sets for more better results. Moreover, we will try to overcome the limitations of the proposed model mentioned in the limitations segment under Section 3 of this paper. Moreover, we will also perform the tests of our proposed model for sustainability and performance deterioration issues.

20
Applied Computational Intelligence and Soft Computing