Spam Detection Approach for Secure Mobile Message Communication Using Machine Learning Algorithms

+e spam detection is a big issue in mobile message communication due to which mobile message communication is insecure. In order to tackle this problem, an accurate and precise method is needed to detect the spam in mobile message communication. We proposed the applications of the machine learning-based spam detection method for accurate detection. In this technique, machine learning classifiers such as Logistic regression (LR), K-nearest neighbor (K-NN), and decision tree (DT) are used for classification of ham and spammessages in mobile device communication.+e SMS spam collection data set is used for testing the method. +e dataset is split into two categories for training and testing the research. +e results of the experiments demonstrated that the classification performance of LR is high as compared with K-NN and DT, and the LR achieved a high accuracy of 99%. Additionally, the proposed method performance is good as compared with the existing state-of-the-art methods.


Introduction
Mobile message is a way of communication among the people, and billions of mobile device users exchange numerous messages. However, such type of communication is insecure due to lack of proper message filtering mechanisms. One cause of such insecurity is spam, and it makes the mobile message communication insecure. Spam is considered to be one of the serious problems in e-mail and instance message services. Spam is a junk mail or message. Spam e-mails and messages are unwanted for receivers which are sent to the users without their prior permission. It contains different forms such as adult content, selling item or services, and so on [1]. e spam increased in these days due more mobile devices deployed in environment for e-mail and message communication. Currently, 85% of mails and messages received by mobile users are spam [2]. e cost of mails and messages are very low for senders but high for receipts of these messages. e cost paid some time by service providers and the cost of spam can be measured in the loss of human time and loss of important messages or mails [3]. Due to these spam mails and messages, the values able e-mails and messages are affected because each user have limited Internet services, short time, and memory [4].
To handle these problems caused by the spam, researchers proposed different techniques to detect the spam e-mails and messages and secure the communication. Details of some of the techniques are presented in this article. Sharaff [5] proposed a method based on machine learning classifiers to classify ham and spam. In the proposed methods, they used four classifiers including iterative dichotomiser, decision tree, simple cart and active directory tree. e weka tool was used for experimental simulations. e proposed method achieved high performance in terms of accuracy. In [6], the e-mail classification method was proposed for the detection of spam. In the system, four predictive machine learning classifiers were used with various data partitions for training and testing of the models. Additionally different hyper parameters values were used in the models. e system obtained good results. Bhat [7] designed ensemble methods based on techniques such as bagging, boosting, and stacking for classification of spam and ham. e data set used in the study was collected from Facebook. e experimental results demonstrated that the bagging ensemble learning approach, using J48 (decision tree) base classifier, performs well than its individual model, and the method achieved high performance in terms of detection accuracy. In [8], a method is proposed for ham and spam detection and principle components analysis and support vector machine were used in the designing of the system. Additionally, the performance evaluation and cross validation methods were used in the system. e proposed technique achieved high performance, and the method effectively detected the spam. Kumar [9] used various classifiers for ham and spam detection. ey used different feature selection algorithms for selection of suitable features.
e experimental results show that the classifier random tree with fisher algorithm achieved high results. e proposed method achieved 99% accuracy. In [10], the spam detection method was proposed using machine learning classifiers and 92% accuracy was achieved. Yang at al. [11] proposed spam detection approach based on multimodal fusion (SDAMF). ey used the deep neural networks model for detection of spam and achieved 98.48% accuracy. In [12], a spam detection method was proposed based on the artificial immune system (ISAIS) and 98.05% accuracy was achieved. In [13], the Phishing e-mail detection system framework was proposed based on supervised and unsupervised methods. Ruano-Ordás et al. [14] proposed the spam detection method. ey used evolutionary computation for discovering spam patterns from e-mail samples.
In this research study, we proposed a spam detection method using machine learning algorithms such as LR, k-nearest neighbor, and decision tree for classification of ham and spam messages. e SMS spam collection dataset was considered for testing of the current research. e dataset was divided into two categories: 30% for testing and 70% for training purpose for the predictive models. e evaluation metrics for performance such as specificity, accuracy, and sensitivity were considered evaluating the proposed study. e results obtained from experiments confirmed that the proposed research achieved high accuracy.
e remaining paper is organized as follows: Section 2 is about the related work to the methodology. In Section 3, experimental work is analyzed and presented in detail. e paper concludes in Section 4.

Methods and Materials
is section shows the research methods and materials of the paper.

Dataset.
e dataset considered in the current research is available on kaggle, a machine learning repository [15]. e dataset "SMS spam collection dataset" contains 5572 instances and two attributes v1 and v2. e v2 is the input messages which are either spam or nonspam. e predicted label v1 has two classes: 0 � nonspam and 1 spam. In the data, 4900 are nonspam samples and 672 are spam samples. e dataset is given in Table 1.

Classification
Algorithms. e following machine learning algorithms were considered for classifications of ham and spam.

Logistic Regression.
LR is a classifier [16,17]. e problem in binary classification is computing the value of predictive y while y ∈ [0, 1]; 0 and 1 are for class negative and positive.

Decision Tree.
A DT is a supervised machine learning algorithm [18,19]. Its shape is like a tree in which each node is a decision node or leaf. is technique of DT is easily understandable and simple for making the decisions. A DT contains external and internal nodes interlinked with each other. Decision can be made based on the internal nodes and the child node to access the preceding node. ere is no child of the leaf node and is linked with a label.

K-Nearest
Neighbor. K-NN is a classification supervised learning algorithm [18]. It predicts the label of class as a fresh input and utilizes the same to its inputs in the training set. e performance of K-NN is not enough good. Let (x, y) be the the training observation and the learning function h: X ⟶ Y, so that an observation x, h (x) can establish y value.

Division of Dataset.
e set data were split into 30% and 70% for validation and training of the predictive model.

Measure for Evaluation of Performance.
To validate the classifier performance, we used metrics such as specificity, accuracy, sensitivity, and execution time which are expressed in equations (1), (2), and (3) which are computed from confusion matrix as given in Table 2.
e formulation of measures is as follows:

Experiments and Result Analysis
Diverse approaches are used for spam detection. Abayomi-Alli et al. [22] presented a comprehensive review of the soft techniques in spam classifications. Acceptability users of SMS spam application on the store of Android App were assessed. Roy et al. [23] proposed a technique to identify short-text spam messages. e proposed model is helpful for different strategies of business. Kaur et al. [24] presented a detailed report on techniques of detection-cum-analysis of compromised accounts and spam detection. Jeong et al. [25] presented a spam detection approach. Cheah et al. [26] proposed an approach for security testing of automotive interface of Bluetooth. Halabi and Bellaiche [27] presented an approach to quantify the performance and service evaluation of cloud security. Tsui et al. used diverse composition for the consequences of development of components on the properties of security [28]. Zhang et al. [29] presented a novel method for evaluating the crowd security of OSN trustworthiness. Mao et al. [30] made a security network of dependency from the access behavior to measure the significance of object security from with broad perspectives.
We performed experiments to classify the ham and spam using the SMS spam collection dataset. Classifiers LR, decision tree, and k-nearest neighbor were used for the classification in this study. e dataset is divided as follows: 30% for validation and 70% for training. e results obtained from experiments are shown in tables and presented in figures graphically. e python on an Intel (R) Core ™ i5 -2400CPU and Windows 10 were used for the experiments and setup to obtain the computation results of the experimental work.

Visualization of SVM Spam Collection Dataset.
In the data, 4900 are ham samples and 672 are spam samples which are shown in Figure 1. Figure 2 shows the ratio of spam and ham messages.  Table 2: Confusion matrix [20,21].

Classification Results of Classifiers.
To perform the classification of the ham and spam messages, in this paper, we used the classification algorithms such as LR, decision tree, and K-nearest neighbor with essential basic hyperparameters. e dataset was divided into two parts for training and testing. e classifiers were trained with 70% of the samples and validated with 30% samples of the data set. All the experiential results are reported in Table 3. According to Table 3, the LR at hyperparameter C � 1 achieved 99% accuracy, 100% specificity, sensitivity 86%, and MCC, 93% and the processing time is 0.494 seconds. e classifier decision tree obtained 98% accuracy, 94% specificity, sensitivity 86%, and MCC 95%, and the processing time is 46.032 seconds. Similarly, the k-nearest neighbor classifier achieved 95% accuracy, 100% specificity, sensitivity 60%, and MCC 80%, and the processing time is 0.630 seconds. e experimental results (according to Table 3), the classification performance of LR is high as compared with the decision and k-nearest neighbor in terms of accuracy. e classification accuracy of classifiers is shown in Figure 3. Similarly, the computation time of LR is low as compared with k-NN and DT. Figure 4 shows the processing time graphically for better understanding. From these experiential results analysis, we concluded that the LR effectively classifies the ham and spam because the achieved accuracy is high. e 100% specificity of the LR model correctly detected the ham messages. Similarly, 86% sensitivity shows that LR spam message capability is good. us, the experimental results suggest that LR is a the best classifier for the classifications of ham and spam successfully. Figure 3 shows the performance of classifications for classifiers including LR, K-NN, and DT. e classifier processing time for K-NN, LR, and DT is shown in Figure 4.

Comparison of Performance with Existing Methods.
e comparison performance of classifications of the current approach is done with the existing approaches in term of accuracy. e current approach achieved an accuracy of 99% which is high as compared with the available approaches. Table 4 shows the accuracy obtained from the current approach along with other approaches available. e performance comparison is graphically shown in Figure 5.

Conclusion
Detection of spam is important for securing message and e-mail communication. e accurate detection of spam is a big issue, and many detection methods have been proposed by various researchers. However, these methods have a lack of capability to detect the spam accurately and efficiently. To solve this issue, we have proposed a method for spam detection using machine learning predictive models. e method is applied for the purpose of detection of spam. e experimental results obtained show that the proposed method has a high capability to detect spam. e proposed method achieved 99% accuracy which is high as compared with the other existing methods.
us, the results suggest that the proposed method is more reliable for accurate and on-time detection of spam, and it will secure the communication systems of messages and e-mails.

Data Availability
No data were used to support the study.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper. Security and Communication Networks 5