Detecting Cyber-Attacks on Wireless Mobile Networks Using Multicriterion Fuzzy Classifier with Genetic Attribute Selection

With the proliferation of wireless and mobile network infrastructures and capabilities, a wide range of exploitable vulnerabilities emerges due to the use of multivendor and multidomain cross-network services for signaling and transport of Internetand wireless-based data. Consequently, the rates and types of cyber-attacks have grown considerably and current security countermeasures for protecting information and communication may be no longer sufficient. In this paper, we investigate a novel methodology based on multicriterion decision making and fuzzy classification that can provide a viable second-line of defense for mitigating cyber-attacks. The proposed approach has the advantage of dealing with various types and sizes of attributes related to network traffic such as basic packet headers, content, and time. To increase the effectiveness and construct optimal models, we augmented the proposed approach with a genetic attribute selection strategy. This allows efficient and simpler models which can be replicated at various network components to cooperatively detect and report malicious behaviors. Using three datasets covering a variety of network attacks, the performance enhancements due to the proposed approach are manifested in terms of detection errors and model construction times.


Introduction
The number of wireless and mobile network subscribers is rapidly growing from day to day due to the flexibility of network access anywhere and anytime and the wide range of evolving capabilities that makes our lives easier.However, with these benefits a plethora of security threats also evolve as a result of the increased number of potentially exploitable vulnerabilities.The growth rate of malicious activities and botnets is jumping drastically to alarming levels according to recent security reports [1][2][3].It is getting even worse for cross-network services with the emerging 4G/5G network technologies.The new era of information systems combines different environments including wireless ad hoc network, cloud computing, mobile applications, social networks, sensor networks, and smart grids [4].
To mitigate the anticipated risks resulting from various cyber-attacks on critical infrastructures and services, a number of algorithms and technologies have been proposed including encryption standards, digital signatures, antimalware packages, firewalls, and intrusion detection and prevention systems.These methods have been proven to be effective in securing privacy and integrity, controlling access to authorized users, and detecting malicious behaviors of known signatures.However, their performance fails to a great extent to handle sophisticated attacks, zero-day attacks, or attacks with varying signatures.A more flexible and adaptive set of approaches based on machine learning and data mining have been proposed to detect the stochastic deviation from normal behavior patterns.This category of methods is known as anomaly-based intrusion or outlier detection which provides a higher degree of automation and reduces the workload on security experts.Despite the variety of methods that have been proposed in the literature, the research on anomaly detection is still evolving to cope with uncertainties, improve the security, reduce false positive rate, and reduce computational costs [10,11].Additionally, since the performance to detect intrusive events is greatly influenced by type and number of attributes utilized [12], it is desirable to analyze and identify the most relevant and influential attributes from the large amount of available data.Multicriterion decision making techniques were originally devised in the operations research field and have attracted attention of several researchers in various domains such as social psychology, business management, and health care [13,14].However, there is not much work done in the area of network security.In this paper, we investigate a new methodology for detecting cyber-attacks in wireless mobile networks based on multicriterion decision making fuzzy classification [15,16].The proposed approach is combined with an attribute selection strategy based on genetic algorithms [17].With the minimum generalization error and the resulting simplicity and reduced computational complexity of the model, the proposed approach is practically feasible to be deployed in different network systems.
The rest of this paper is organized as follows.Section 2 gives a brief background on security in wireless and mobile information systems and Section 3 reviews related work.In Section 4, the proposed methodology is presented.Section 5 describes the adopted datasets and discusses the experimental evaluation and comparison of the proposed approach.Finally, Section 6 concludes the paper.

Background and Motives
In heterogeneous wireless mobile environments, there is no well-defined network perimeter; hence, the security administrator cannot enforce security policies even with the existence of firewalls and encryption.This can be attributed to its inherent nature resulting from device mobility, broadcast channels, pervasive use of multivender multidomain applications, and limited resources in wireless end-systems to implement sophisticated security countermeasures.Figure 1 illustrates a typical example of network topology where some machines are infected with malware and others are passively or actively hacking.Attackers only need to discover and exploit a single vulnerability to attack the entire system.Hence, the strength of the system security is as good as the strength of the least secure point in the system.
Wireless devices (such as smart phones, tablets, laptops, or sensors) can be communicating in an isolated environment or connected through a larger distribution network (such as a local area network, a wide area network, or the Internet) using access points.The former is called ad hoc network whereas the latter is known as infrastructure wireless network which is more common.Thus, cyber-attacks can target any of the software or hardware components in this environment including wireless end systems, wireless channels, access points, or the wired distribution network.It is highly important to detect and respond to these attacks to protect the entire system.

Related Work
Security of mobile information systems has been a core area in research and development.La Polla et al. in [18] surveyed the state of the art of high level attacks and vulnerabilities targeting mobile devices over the period from 2004 till 2011.They concisely reviewed and categorized known mobile malware including viruses, worms, rootkits, and botnets.They also discussed the proposed security solutions with focus on intrusion detection and trusted platforms.In [9], the authors reviewed the threats, vulnerabilities, and commonly available countermeasures for different components of a wireless network including clients, access points, and transmission medium.
Computational intelligence techniques have many characteristics such as adaption and fault tolerance that made them attractive for research on malware and intrusion detection.In [10], a review of 55 related studies between 2000 and 2007 is presented with focus on single, hybrid, and ensemble classifiers.Another extensive review is presented in [19].Examples of these techniques include neural networks, fuzzy inference systems, evolutionary algorithms, artificial immune systems, and swarm intelligence.In [20], a naive Bayesian classifier is applied to identify potential intrusions.Trained on a small subset of KDD'99 dataset and tested on a larger subset, this approach showed superior identification rate.In [21], an evaluation of a number of existing machine learning classifiers is presented for dynamic Android malware detection.In [22], another approach for anomaly detection based on multicriterion fuzzy classification with greedy attribute selection is proposed and evaluated on KDD'99.
Combining security technologies can provide more solid multifaceted solutions against intrusion attempts [23].A number of hybrid machine learning approaches have been proposed as well.For instance, in [24] a machine learning approach is introduced for classifying network activities as normal or abnormal.This approach combines support vector machines with clustering based on self-organized ant colony network.The authors demonstrated that this combination resulted in better classification rate and run time.Anomalybased intrusion detection has attracted the interest of several researchers [10].However, these methods can suffer from increased false positive rate.To gain advantage of misuse detection and anomaly detection, Depren et al. proposed a rule-based decision support system to combine the outcomes of decision tree for misuse detection and self-organizing map for modeling normal behavior [25].
Another important stage that can have significant impact on the accuracy and capability of intrusion detection systems is data preprocessing.A review of data preprocessing techniques for anomaly-based network intrusion detection is presented in [12].During the preprocessing phase, various approaches can be applied such as discretization, normalization, and filtering of most relevant attributes.In [26], the impact of normalization techniques on the performance of support vector machines for intrusion detection is investigated.It has been found that min-max normalization leads to better results in terms of speed and accuracy than other normalization techniques.Another important related issue is attribute selection to reduce the high dimensionality and complexity [27].
Most of the work published in the literature is evaluated using the standard KDD Cup 99 dataset [20,24,26,27].Despite the fact that this dataset has some drawbacks, it is one of the largest datasets, covers a large number of attacks, and remains dominant to benchmark new techniques.Two more recent datasets have been recently collected and disclosed for the assessment of some attacks on IEEE 802.11 wireless channels [28].

Methodology
The overall block diagram for the cyber-attack detection system is shown in Figure 2. It starts with the database of (1)  : prototype's index (2) ℎ: class index (3) : attribute's index (4) Select threshold  for interval selection (5) Generate intervals using a discretization technique (6) Apply greedy hill climbing approach to select most relevant subsets (7) for each class do (8) for each attribute  do (9) for every value in attribute  do (10) Recursively check all values in the next attribute   (11) if Frequency of values ⩾  then (12) Choose intervals for prototype  ℎ  (13) else (14) Discard interval and go next (i.e., end if (16) end for (17) end for (18) end for Algorithm 1: Composing of PROAFTN's prototypes (classification model).captured traffic.After preprocessing and analyzing traffic records and log files, it performs feature extraction to represent each instance with a vector of relevant attributes.The dataset is then partitioned into train, validation, and test datasets.The train dataset is used to construct the detection model whereas the validation dataset is used during training to evaluate the model to avoid overfitting.The test dataset is used after training is over to evaluate the constructed model performance.The process of partitioning, training, and testing can be repeated if cross validation is required.
When datasets include attributes that are not relevant or may contain redundant attributes, this causes delay in building the classification model and accordingly degrades the classification accuracy.Hence, it is preferable to begin with selecting the most relevant attributes.In our case, we used a genetic algorithm attribute selection strategy.So, the target here is to reduce the hypothesis search space and improve the performance in terms of accuracy, scalability, and efficiency.The idea of genetic algorithms is to start with a random population of candidate solutions and then the population evolves by applying genetic operations, evaluation, and selection [17].For attribute selection, each chromosome in the population is composed of a binary string with length equal to the total number of attributes where an attribute is selected if its corresponding bit is 1; otherwise, it is dropped.The fitness function depends on being "highly correlated with the class while having low intercorrelation" [29].The evaluation function for a particular subset of attributes is defined mathematically as follows: where  is the size of the subset ,  ca is the mean of attributeclass correlations, and  aa is the mean of the attributeattribute correlations.This function will have lower values for attributes that are irrelevant (small value for the numerator) and/or redundant (large value for the denominator).
Once the most relevant attributes are identified, a multicriterion fuzzy classification approach is applied to construct a decision model that can assign unknown behavioral patterns to predefined classes.This type of decision problems requires a comparison between alternatives or patterns based on the scores of attributes using absolute evaluations [30].In this case, the evaluation is performed by comparing the alternatives to different prototypes of classes, where the category or class is assigned to patterns based on the highest score value.Each prototype is described by a set of attributes and is considered to be a good representative of its class [31].The complexity of this approach is a function of the number of attributes.Thus, utilizing the smallest subset of relevant attributes greatly improves the time complexity and accuracy of classification.A graphical illustration of the methodology is shown in Figure 3.
To explain how it works, assume the network behavioral pattern is described by a set of  attributes { 1 ,  2 , . . .,   } and a label  identifying its category which belongs to the  classes Ω = { 1 ,  2 , . . .,   }.Given a set of  historical patterns , it is required to construct a classification model  :  → Ω that can accurately predict the target class of each pattern.Once the model is built, it can be used to assign the most relevant class to new unseen behavioral patterns.The model parameters are automatically determined from the training data examples.Then, the constructed model is used for assigning a category to the unseen cases (testing data).This automatic data-driven approach is common to the learning procedures in other machine learning classifiers [32,33].Algorithm 1 explains the proposed induction approach through a recursive process to generate the classification model.The tree is constructed in a top-down recursive divide-and-conquer manner, where each branch represents the generated intervals for each attribute.The branches Figure 3: Graphical illustration of the multicriterion fuzzy classification procedure.are selected recursively to compose the prototypes based on the proposed threshold.Using the generated tree from this algorithm, we can extract the prototypes and then the decision rules, respectively, to be used for classification.Figure 4 illustrates the prototypes' compositions process.The learning strategy is based on utilizing the training set to compose a set of prototypes for each class.For class  ℎ , these prototypes are denoted by  ℎ = { ℎ 1 ,  ℎ 2 , . . .,  ℎ  ℎ }, where  ℎ is the number of prototypes for this class.For each prototype  ℎ  and each attribute   , a fuzzy partial indifference relation   (,  ℎ  ) is defined to measure the degree of resemblance of patterns  to  ℎ  according to   .This fuzzy relation is characterized by four parameters: the interval [ The prototypes in this study are constructed based on the frequency of combined values from all attributes in the dataset.After implementing the supervised discretization technique, each attribute will have a set of intervals and nominal values.The learning strategy starts from the first attribute in the list and selects the first interval or nominal value from list of values that belong to the attribute.Then, it proceeds to the next attribute and selects the first interval/nominal value and then counts the frequency of the occurrences for these combined values in each class.If the frequency exceeds the preselected threshold (e.g., more than 15%) then these values are added to the first prototype.The learning continues until all intervals and nominal values are examined by the above discussed strategy.The target is to reach all values for valueattribute from the first attribute to the last one.
To classify a pattern  to the class  ℎ , PROAFTN calculates the membership degree (,  ℎ ) as follows: where (,  ℎ  ) is the fuzzy indifference relation which is computed as a weighted sum of the partial indifference relations as given by where  ℎ is the weight that measures the importance of a relevant attribute   of a specific class  ℎ : The last step is to assign the pattern  to the class  ℎ that has the maximum resemblance according to the following decision rule:  ∈  ℎ ⇐⇒  (,  ℎ ) = max { (,   )  ∈ {1, . . ., }} .(6)

Experimental Work
For the sake of evaluation of the proposed methodology, we adopted three datasets in our experimental work.Table 1 shows some of the characteristics of these datasets and more detailed description is provided in the following subsection.Then, we describe the conducted experiments and discuss the results.

KDD Cup 99 (KDD'99)
Dataset.This dataset consists of processed dump traffic portions of normal and attack connections to a local area network simulating a military network environment [35].It was prepared from the raw dataset collected and managed by MIT Lincoln Labs as part of the 1998 DARPA Intrusion Detection Evaluation Program.Its first use was in the third International Knowledge Discovery and Data Mining Tools Competition in 1999.Since then, it has become very popular and widely used by most researchers to evaluate and benchmark their research work [20,24,26,27].The dataset has 494021 traffic samples belonging to 22 different attack types in addition to the normal traffic.These attacks fall into the following four categories: Denial of Service (DoS) such as Syn floods, unauthorized access from a remote machine (R2L) such as password guesses, unauthorized access to local root privileges (U2R) such as rootkits, and probing such as port scanning and nmap.Each connection is described with 41 attributes, as described in Table 2, and has a label identifying the traffic type to be normal or one of the attack types.Three attributes are symbolic and five attributes are binary, whereas the remaining 33 attributes are numeric.As shown in the table, these attributes are divided into four groups: basic attributes of individual connections (9 attributes), content attributes within a connection suggested by domain knowledge (13 attributes), time-based traffic attributes computed using a two-second time window (9 attributes), and host-based traffic attributes computed using a window of 100 connections to the same host (10 attributes).

WEP/WPA Dataset.
The traffic samples in this dataset have been recently collected from a controlled wireless home network with enabled WEP/WPA [28].The network topology  5.1.3.WPA2 Dataset.The third dataset has been collected from a corporate network with enabled WPA2 encryption [28].In this network, there are two access points connected to a local area network switch, which is connected to an authentication server (AS) and the Internet.In this scenario, there are five stations: three generating traffic, one monitoring the network, and one hacking.Here, there are four attack types: deauthentication, fake authentication, fake AP, and Syn flooding.The total number of traffic samples is 10000, where 6000 of them belong to normal traffic and the rest are distributed equally for each attack type.Each sample is processed as in the second dataset with Tshark and described with 16 attributes.

Performance Measures.
We used 10-fold cross validation to evaluate and compare the performance of the proposed methodology.The performance is reported in terms of accuracy (Acc), recall (true positive rate), precision, and  1 measure.These measures are computed as follows: where tp refers to true positive, tn refers to true negative, fp refers to false positive, and fn refers to false negative.We also compared the area under the receiver operating characteristic (ROC) curve (AUC) and the time to construct the attack detection model.

Experiments and Results
. The proposed methodology was implemented in Java and ran in a Linux machine.We applied it to the datasets described above with and without attribute selection.For the first dataset, KDD'99, the application of the attribute selection strategy has resulted in only 17 out of the 41 attributes as relevant attributes.Referring to     We conducted a comparative study with three popular machine learning algorithms implemented in [36] with default settings using the stratified 10-fold cross validation.Table 3 summarizes the performance of the proposed method with and without attribute selection and compared it to the other classifiers: naive Bayes (NB), support vector machine (SVM), and multilayer perceptron (MLP).The reported time is the model construction time (in other words, it does not include the time for attribute selection).This table shows consistent results for the three considered datasets.All model constructions have taken reasonable times except for SVM and MLP.Although NB can take slightly less time than the proposed method, its accuracy is much lower.This demonstrates that the proposed methodology can outperform other techniques with improved accuracy and simpler models even with few selected attributes.In general, we observed that the performance for the KDD'99 dataset is much better than for the other datasets.This can be due to the size and nature of the dataset since KDD'99 has more samples and attributes covering larger parts of the search space.
For the proposed methodology, we also reported the performance for each class in the three datasets in terms of precision, recall,  1 measure, and AUC.These results are shown in Tables 4, 5, and 6.For the first dataset, KDD'99, the distribution of traffic samples is skewed where some attacks are very rare.We can notice that the proposed methodology is very accurate when enough samples exist.For the other two datasets, the performance is very high except for two attack types.This can be attributed to incomplete attribute set to distinguish between all traffic types.The comparisons of the per-class performance with other methods are shown in Figures 6, 7, and 8.In these figures, it is desirable to cover larger area of the shape in each direction (class type).Similar conclusion can be drawn as above, where the proposed methodology is promising and can be effective for cyberattack detection.

Conclusion
This paper presents a novel security mechanism for cyberattack detection in wireless mobile networks.It uses historical data to build detection models with the most influential attributes.The proposed hybrid methodology is based on multicriterion fuzzy classification augmented with a metaheuristic approach using a genetic algorithm for attribute selection strategy.The constructed predictive model is then deployed to classify unknown incoming traffic.After capturing, preprocessing, and analyzing traffic, the relevant attributes are then extracted and integrated with the model to decide whether the activity is normal or malicious.Three datasets with various natures and different cyber-attacks are utilized to evaluate and compare the effectiveness of the proposed methodology to detect cyber-attacks on different components of a mobile wireless network.Results showed that the proposed methodology behaved consistently for all datasets with promising detection accuracies and model construction times.In some attacks, the performance was relatively low.However, this can be due to the insufficient number of captured samples, imbalanced distribution of the dataset, or insufficient extracted attributes from the raw traffic.As future work, it is intended to explore more attacks and other datasets and subsequently improve our methodology further.

Figure 2 :
Figure 2: Block diagram for training and deploying the cyber-attack detection model.

Figure 5 :
Figure 5: A typical example of the partial indifference fuzzy relation between the object  and the prototype  ℎ  according to attribute   .

1 Figure 6 :
Figure 6: Comparing the per-class results for KDD'99 dataset using the reduced attribute vector (due to attribute selection) with various methods in terms of precision, recall,  1 measure, and AUC.

Figure 7 :
Figure7: Comparing the per-class results for WEP/WPA dataset using the reduced attribute vector (due to attribute selection) with various methods in terms of precision, recall,  1 measure, and AUC.

Figure 8 :
Figure 8: Comparing the per-class results for WPA2 dataset using the reduced attribute vector (due to attribute selection) with various methods in terms of precision, recall,  1 measure, and AUC.
Illustration of a network topology with wireless and mobile devices where some devices are infected with malware or hacking.

Table 1 :
Some characteristics of the adopted datasets for evaluation.
). Figure5shows a typical example of a fuzzy relation

Table 3 :
Comparisons of accuracy for different approaches using 10-fold cross validation (results are approximated to two decimal digits).All model constructions have taken reasonable time except SVM and MLP.

Table 4 :
The KDD'99 per-class performance of the proposed method with and without attribute selection (approximated to three decimal digits).There are a total of 24200 traffic samples; 15000 of them belong to normal traffic whereas the rest are divided equally for each attack type.The captured traffic from normal and attack processes is preprocessed using Tshark to extract 15 attributes from the MAC headers.
is a single basic service set (BSS) consisting of one access point (AP) connected to the Internet and three stations: one generating real HTTP and FTP traffic (STA1), one running Wireshark to monitor the network and capture traffic (STA2), and one for generating attacks (STA3).In addition to normal traffic, four types of attacks are reported: ChopChop, deauthentication, duration, and fragmentation.

Table 5 :
The WEP/WPA per-class performance of the proposed method with and without attribute selection (approximated to three decimal digits).

Table 6 :
The WPA2 per-class performance of the proposed method with and without attribute selection (approximated to three decimal digits).