Network Anomaly Detection System with Optimized DS Evidence Theory

Network anomaly detection has been focused on by more people with the fast development of computer network. Some researchers utilized fusion method and DS evidence theory to do network anomaly detection but with low performance, and they did not consider features of network—complicated and varied. To achieve high detection rate, we present a novel network anomaly detection system with optimized Dempster-Shafer evidence theory (ODS) and regression basic probability assignment (RBPA) function. In this model, we add weights for each senor to optimize DS evidence theory according to its previous predict accuracy. And RBPA employs sensor's regression ability to address complex network. By four kinds of experiments, we find that our novel network anomaly detection model has a better detection rate, and RBPA as well as ODS optimization methods can improve system performance significantly.


Introduction
With the development of computer network technology and the increasing of the networks scale, computer networks are under the threat of attack from hackers and other technologies, so the security status of the computer networks is becoming the focus of people's attention. Intrusion detection technology, protecting the network security behind the firewall, is becoming the research focus in the recent network security field. As the emphasis and difficulty of the network intrusion detection technology [1], network anomaly detection technology has the deficiency of the low detection rate, high false positive rate, and high false negative rate at present. So in this domain, many researchers proposed lots of useful algorithms [2][3][4][5][6][7][8], but these methods are so simple and single that they cannot be fully adapted to complicated and changeable network. Thus, a novel network anomaly detection mechanism is required to solve the above troubling problems.
Recently, some researches cope with network anomaly detection utilized by Dempster-Shafer (DS) evidence theory [9,10] proposed by Dempster in 1976 and then improved by his student Shafer, which has been widely used in many fields of data fusion, such as expert advisory system, forecasting, image processing, artificial intelligence, and identifying classification. Intrusion detection is a problem of multiclassification essentially, which divides network data into normal data and various types of attacking data. Since simple detection algorithms always suffer from limitations such as low detection rate and high false alarm rate, many researchers apply DS evidence theory into intrusion detection systems. For example, some researchers divide the characteristics of network data into the basic feature set, the content feature set and the traffic feature set. Then they utilize detection algorithm to detect these three feature sets and fuse data through DS evident theory to get the final results. Though the IDS theory based on DS evidence theory has a good detection rate, most of these studies based on the classic DS evidence theory should assume that the intercepted data is independent of each other without confliction. However, conflicts between network data are inevitable, so those researches will lead to unreasonable fusion result, high false alarm rate and miss alarm rate.
To better solve this serious issue, we present a novel network anomaly detection mechanism based on optimized DS evidence theory (ODS) which also can achieve better

Class DS Evidence
Theory. DS evidence theory [9,10] is considered as a general extension of the traditional classical probabilistic inference theory in the finite field. Unlike the conventional Bayes inference method, DS evidence theory without a priori probability still can be used to deal with uncertainty and imprecision information. So we can see that DS evidence theory has greater flexibility.
DS evidence theory is considered as theory built on a nonempty finite field Θ called the recognition framework, which includes a limited number of independent system state { 1 , 2 , . . . , }. An element in (Θ) as a power set of system state Θ is called a system state hypothesis . Through the observation results 1 , 2 , . . . , for system state by each sensor, DS evidence theory can merge these results and infer the former state of system. Here it mainly involves the following concepts.
This function represents the degree of confidence for hypothesis . And the function result is composed of basic confidence values of observation results which supports hypothesis .

Definition 3. Plausibility function is defined as
This function represents the degree of plausibility for hypothesis . And the function result is composed of basic confidence values of observation results which supports hypothesis .
Definition 4. DS fusion rules, for any hypothesis , defining and as the basic probability assignment function (BPA) of two evidences, respectively, state that one obtains basic belief assignment function of the combination evidence from two evidences above as follows: Likewise, one can achieve DS general synthesis rules for the combination evidence from evidences as follows: 2.2. Drawback of Class DS Evidence Theory. The advantage of DS evidence theory mainly focuses on several parts as follows: it can satisfy axiom system that is weaker than the probability, distinguish unknown and uncertainty situation, and continuously shrink the hypothesis set in the light of the accumulation of evidences. The disadvantage of DS evidence theory is that, when dealing with the issue with confidence degree tending to 0, the result computed by DS evidence theory will conflict with expectation result. That is to say, when confidence degree is too small or 0, the results achieved are very different. In the same way, BPA is required to give so many results that calculation is also more complicated. If the hypothesis set is too large, the calculation complexity of evidence theory will increase exponentially.

Optimized DS Evidence Theory.
From formula (4), we can see that the confidence degree of each sensor is the same. That is, each sensor has the same accuracy. Obviously, it does not fit the facts, for example, with doctor for treatment.
The Scientific World Journal 3 A doctor considers that this patient may be suffering from X disease with 99%, or Y disease with 1%. However, B doctor considers that this patient may be suffering from Y disease with 1%, or Z disease with 99%. Then we can achieve this patient suffering from Y disease according to formula (4) merging these different two pieces evidence. But this result is fully fault and does not match reality. Therefore, the synthesis rule from formula (4) only applies to the case with the same precision in all sensors.
To solve this serious issue, we present a novel method to change with conventional DS evidence theory. That is, we combine weights with DS evidence theory. The detailed implementation is that we add weight value to each sensor according to its previous predict accuracy. So we define as the basic confidence values obtained by sensor , and as the previous predict accuracy of sensor ; similarly, as the basic confidence values obtained by sensor , and as the previous predict accuracy of sensor . The DS evidence theory combination with weights as follows:

ODS Network Anomaly Detection Model Design
In this paper, we present a novel network anomaly detection module based on optimized DS evidence theory merging with several kinds of classifiers. In this module, we utilize BMPM, SVM, and BP network as classifiers. Unlike the original fusion rule using the classification feature of those classifiers, the new one utilizes its regression feature, because regression feature can better reflect real-time network environment. Then we consider the merged result as one parameter used to construct BPA of DS evidence theory. And then we will introduce this novel network anomaly detection model in detail, which is depicted in Figure 1. As shown in Figure 1, this module mainly consists of five modules: network connection record module, feature extraction module, data preprocessing module, early detection module, and ODS fusion module, respectively.
Network connection record module utilizes some network sniffer tools, for example, Sniffer, to collect network packets in the network where network anomaly detection host is, and then stores it. That is, this module is used to collect network data.
Feature extraction module is used to extract some features impacting network anomaly detection, which are in the network packets stored by network connection record module. And then we record corresponding features into a feature vector, in order to preprocessing module to use it. Similarly, this module gets rid of unconcerned features for network anomaly detection. Essentially, this module is used to complete feature reduction.
Data preprocessing module is used to cope with feature vector after feature extraction. In addition, some futures in one feature vector are discrete type, such as protocol type, service type and logo, and others are continuous type, such as connection time type, the length of data sent, and the length of data received. Since discrete data needs inputting into detection module in early phase, in order to following work, continuous features need to be discretized. At the same time, these feature vectors also need to be standardized and normalized, in order for these vectors to be normally operated in BP network. In essence, this module is used to do data normalization, discretization, and standardization.
Early detection module is employed to detect the feature vectors that have processed by data preprocessing module and gives the corresponding detection results for DS fusion module later. It is composed of 3 sensors: SVM, BMPM, and BP. To fit with complicated network environment, we optimize sensors, that is, add weights into each sensor and construct 6 classifiers: BMPM N, BMPM A, SVM N, SVM A, BP N, and BP A (Section 4.3). Then we train these classifiers according to distance theory [16] (Section 4.4). Finally, we achieve several results when a network record is coming.
ODS fusion module will utilize ODS evidence theory to merge and analyze these detection results from early detection module. That is, according to regression ability of sensors, we fuse these results by (15) and give decision results, that is, whether the attack or not.

ODS Network Anomaly Detection Model Implementation
In ODS network anomaly detection model, we should solve several key issues: how to combine ODS evidence theory with network anomaly detection, how to construct and decide BPA value in ODS evidence theory, how to decide weight in fusion rules of ODS evidence theory, and how to train 6 classifiers in detail, and so forth. Therefore, in this section, we will introduce the solutions for these serous issues mentioned above in detail.

Combining ODS Evidence Theory with Network Anomaly
Detection. Since in ODS network anomaly detection model, system judges that whether current connection is unusual only according to network feature observed, we only define ODS evidence theory identification framework with two elements: normal status and abnormal status.
Therefore, according to DS evidence theory, we define ODS evidence theory identification framework as { , }, where represents normal status and represents abnormal status. We can see that status and are mutual exclusion, that is, ∩ = 0. Similarly, we can redefine BPA function as : ({ , }) → [0, 1], (0) = 0, ({ , }) + ( ) + ( ) = 1. In above formula, ( ) represents the observation results of current feature by current sensor and considers that reliability of current status belongs to abnormal status. On the other hand, ({ , }) represents the observation results of current feature by current sensor and cannot decide reliability of current status belongs to normal or abnormal status. We will introduce detailed BPA function in next subsection.

Regression BPA.
In this subsection, we first give a hypothesis about network connection status for BPA value and then depict sensors' regression ability and how to compute RBPA value in detail.

Hypothesis and Discussion for Network Connection Status.
Reference [16] said that the distance between abnormal network connection and normal network connection is larger than that between normal network connection and normal network connection. That is, for classifiers, the distance of different data is larger than that of same data. According to this rule, here we give a hypothesis: for a network connection to be seen (unknown), the prediction result will be ( ) with N-classifier. That is to say, N-classifier considers all the network connection as normal network connection all the time, but only gives corresponding different support degrees according to difference of real network connection: high support degree for real normal network connection and low support degree for real abnormal connection. Through this hypothesis, we can see that for a real normal network connection, the prediction result ( ) computed by Nclassifier is larger than ( ) computed by A-classifier and vice versa. From Figure 1, three kinds of classifiers in early detection module, such as SVM, BMPM, and BP are also considered as three sensors. SVM N and SVM A are, respectively, represented support degree of normal and abnormal network connection from SVM sensor. Similarly, this rule is also suitable for BMPM and BP. So when we assign the same parameter for SVM N and SVM A, respectively, they can be considered as a whole sensor. And this whole sensor can give different support degrees to normal and abnormal network connection, respectively. Similarly, this is suitable for BMPM and BP sensor. Therefore in this novel model, the fusion part is considered as ODS evidence theory combining with three kinds of sensors, SVM, BMPM, and BP. If a normal network connection needs processing, no matter which senor (SVM, BMPM, and BP), the support degrees ( ) and ( ) for this network connection are, respectively, achieved by the chosen sensor. And these results satisfy the objective fact of this network connection. That is, in ODS evidence theory, if ( ) is larger than ( ), this network connection is considered as a normal one.
Based on this hypothesis above, if this novel model is required to automatically give current network connection support degree of normal and abnormal status by each sensor, and this result can also satisfy the objective fact of real network connection, we will utilize the features of sensors, such as study ability, regression ability, associative memory ability, and generalization ability. That is because sensors (e.g., BMPM, SVM, and BP) can achieve similar result after training, learning, and regression operations, which is almost equal to the actual result. So we see that sensors with their features can reflect the real network environment.

The Features of Sensors.
In the above subsection, we said that some features of sensors will be selected to help with BPA function construction, so here we utilize regression ability, supervised learning ability of SVM, BMPM, and BP sensor. That can be explained in detail as follows: here we define one class of data as normal network connection data and its corresponding training data , and define another class of data as abnormal network connection data and its corresponding training data . Then these two kinds of data and their corresponding training data are used to train these classifiers depicted in Figure 1. As long as the two kinds of data distribution and training data have obvious difference, when a data record satisfies any kinds of data mentioned above, we can estimate the value of this data record ( or corresponding with training data) utilizing the regression ability of these classifiers. In addition, the estimate values show obvious difference due to data record satisfying different kinds of data distribution.

BPA Based on Regression
Ability. Since network connection status can be represented as normal or abnormal status by different sensors which give different support degrees for them, with this rule, we construct BPA function in ODS network anomaly detection model. When ODS evidence theory is combined with network anomaly detection, assuming that the current network connection is a normal one, corresponding BPA value can be different achieved by different sensors (various classifiers in fusion model). And the BPA values are corresponding with hypothesis , or { , }. Similarly, we also expect that the BPA value of normal The Scientific World Journal 5 network connection assigned by hypothesis is larger, but on the contrary, the BPA value assigned by hypothesis (abnormal status) or hypothesis { , } (unknown status) should be smaller.
After training N-classifier and A-classifier (training classifiers will be introduced in Section 4.4), we can compute BPA value in ODS evidence theory. Currently, SVM N and SVM A can be considered as a whole one, a SVM sensor. For a network connection record, the regression estimates value ( ) computed by SVM N and ( ) computed by SVM A. Due to associated ability of SVM classifier, if this record is a normal network connection record, ( ) will be larger than ( ), vice versa. In addition, this rule is also suitable for BMPM and BP. Therefore, these three sensors, SVM, BMPM, and BP, in system can assess current status for a coming network connection. That is, we can achieve support degrees ( ) and ( ) for normal status and abnormal status, respectively.
Noticeably, in this paper we present a novel method to deal with unknown network connection status as follows: From formula (6)

Weights for Each Sensor.
In the traditional network anomaly detection system, the performance is decided by an estimate parameter F-Score, which reflects that an intrusion detection system performance is good or bad. And the greater the F-Score indicates that the better performance of this system. So in this paper, we extend this important parameter F-Score and propose two new parameters, F-Score-N and F-Score-A. Since with ODS evidence theory, the value depends on its previous accuracy in the process of sensor prediction, we utilize these new parameters to add weights for each senor. Then we will introduce these new parameters in detail.
Here, we define several parameters, respectively, as follows: (i) TP: the number of abnormal connection detected by anomaly detection system (abnormal connection itself); (ii) FN: the number of normal connection detected by anomaly detection system (abnormal connection itself); (iii) FP: the number of abnormal connection detected by anomaly detection system (normal connection itself); (iv) TN: the number of normal connection detected by anomaly detection system (normal connection itself); (v) Precision: the proportion of true abnormal connections of abnormal connections detected by anomaly detection system; (vi) Recall: the proportion of abnormal connections detected by anomaly detection system of true abnormal connections; (vii) F-Score: a balance average parameter for Precision and Recall used to estimate a network anomaly detection system.
With these parameters mentioned above, the formulas are depicted as follows: But these conventional formulas do not satisfy this novel network anomaly detection system presented in this paper, so we propose these new parameters as follows: With these parameters mentioned above, the formulas are depicted as follows: Here we simulate network attack to attack this system and implement network connection record module at the same time. And then these records are stored. The detailed process is depicted as follows.
(1) We utilize corresponding attack software to simulate all kinds of attacks, for example, DOS attack which can be simulated by the combination DOS attack simulator with ping command, and others also can be achieved by this way.
(2) Before these abnormal connections attacking system, we should record IP address and attack types of attack hosts, respectively, IP address of destination hosts, and simultaneously implement network connection record module in network anomaly detection system to record network packets. Here the time window for attack time is 1 hour, and network connection record module also records all the network packets in this period.
(3) Based on these network packets recorded, we filter these network packets according to effective network attack packet standard corresponding with IP address and attack type of attack host recorded before attacking. Then these filtered network packets should be discretized, standardized, and normalized by feature extraction module and data preprocessing module, respectively, in Figure 1. Finally, we can obtain different kinds of network connection feature vectors, such as Normal, Dos, Probe, R2L, and U2R.

Preprocessing Training Data Set.
According to the rule mentioned in [16], we utilize this different distance for same or different kinds of network connection to define BPA value in ODS evidence theory. Before train 6 classifiers, we must preprocess this training data set. As shown in Figure 1, we can see that six basic classifiers in early detection module can be divided into two categories, namely, N-classifier and A-classifier. Before training Nclassifier and A-classifier, we should preprocess training data set, and this will be introduced in detail. First, we analyze training N-classifier as follows.
(1) When a training data set includes normal connections, N-Train and abnormal connections, A-Train, it should be processed before training N-classifier. First, we compute clustering center, N-CORE of N-Train in training data set.
(2) Then we should compute the distance between this normal connection and N-CORE and define this distance value as a positive value, which also corresponds with this normal connection.
(3) Then we should compute the distance between this abnormal connection and N-CORE and define this distance value as a negative value, which also corresponds with this abnormal connection.
(4) Finally, the results from (2) and (3) are stored into N-Dist which corresponds with connection records. And this list N-Dist will be normalized from 0 to 1 and is considered as a training label to train N-classifier.
After processing above, distance corresponding with normal connection is larger than that with abnormal connection in training data set. If these distance values are used as supervised learning training label when training classifiers, these classifiers will learn this phenomenon through associated ability. So we can see that the regression value for normal network connection will be larger than that for abnormal network connection, when a normal network connection and an abnormal network connection need processing. Indeed, this rule mentioned above is also suitable for training Aclassifier.
Then we will train 6 classifiers: N-classifiers utilizing training data set and corresponding N-Dist, A-classifiers utilizing training data set and corresponding A-Dist.

Training Classifiers.
In this phase, we mainly train 6 classifiers in early detection module and compute some parameters in ODS fusion module. Here we divide network connection feature vectors (Normal, Dos, Probe, R2L, and U2R) into two parts according to attack type. Each part includes processed training data set and corresponding list (N-Dist, A-Dist) stored distance value.
(1) One part is used to train 6 classifiers, SVM N, SVM A, BMPM N, BMPM A, BP N, and BP A, and then these classifiers trained will be stored to do prediction in future.
(2) Another part is employed to predict these trained classifiers, and it should record these results including all kinds of attacks and normal connections, that is, these results for TP, TN, FP, and FN depicted in Section 4.3.
(3) Then we can get weights F-Score-N and F-Score-A of all sensors, SVM, BMPM, and BP computed by formulas (12) and (13). Finally, these values are stored into array F-Score-N and F-Score-A, respectively.

Execution Flow of ODS Network Anomaly Detection System.
After training 6 classifiers introduced in Section 4.4, we can easily get weights for each sensor (in Section 4.3). As the same, the BPA values of each sensor, support degree ( ), ( ), and ({ , }), can be achieved easily introduced in Section 4.2. In this section, execution flow of ODS network anomaly detection system will be introduced in detail.
(1) First, we can get a network connection packet by network connection record module, and then a network connection feature vector can be obtained by feature extraction module and data preprocessing The Scientific World Journal 7 module which process the network connection packet achieved one by one.
(2) Then this network connection feature vector will be processed to do regression estimate by 6 classifiers (3 sensors) in early detection module. So we can obtain support degree 1 ( ) and 1 ( ) for normal network connection status and abnormal network connection status after sensor SVM processing this network connection. Next computed by formula (6), support degree 1 ({ , }) for unknown status is also achieved easily.
(3) In this way, we can be easy to obtain 2 ( ), 2 ( ), and 2 ({ , }) corresponding with BMPM, and (4) Here we achieve an ODS evidence theory with weights for -sensors inferred by formulas (4) and (5): In this novel model, we choose SVM, BMPM, and BP as sensors, so the parameter is define from 1 to 3. By formula (14), we can obtain the support degree of this network connection, 123 ( ), 123 ( ) and 123 ({ , }) through fusion 3 sensors. This process needs that we should bring support degree for , and { , } computed by SVM, BMPM, and BP sensors, and weight vector F-Score-N and F-Score-A into formula (14).

(5) The final decision result by system is depicted in
The final decision result can be explained in detail: if 123 ( ) is larger than 123 ( ) and 123 ({ , }), this system considers current network connection as a normal one; as the same, if 123 ( ) is larger than 123 ( ) and 123 ({ , }), this system considers current network connection as an abnormal one; if

Experiments and Analysis
In this section, we would verify the effectiveness of combining ODS evidence theory with SVM, BMPM, and BP sensors and prove that this novel ODS network anomaly detection model can get higher detection rate (DR) and lower false positive rate (FR) for not only traditional attacks but also new attacks. (i) Denial-of-service (Dos)-denial of the service that are accessed by legitimate users, for example, SYN flooding.

Data
(ii) Remote-to-local (R2L)-unauthorized access from a remote machine, for example, password guessing.
(iv) Probing (Probe)-surveillance and probing for information gathering, for example, port scanning.
The test data set has not the same probability distribution as the training data set. There are 4 new U2R attack types in the test data set that are not presented in the training data set. These new attacks correspond to 92.90% (189/228) of the U2R class in the test data set. On the other hand, there are 7 new R2L attack types corresponding to 63% (10196/16189) of the R2L class in the data set. In addition there are only 104 (out of 1126) connection records presented in the training data set corresponding to the known R2L attacks presented simultaneously in the two data sets. However there are 4 new Dos attack types in the test data set corresponding to 2.85% (6555/229853) of the Dos class in the test data set and 2 new Probing attacks corresponding to 42.94% (1789/4166) of the Probing class in the test data set.

Data Set Preprocessing.
Since a connection record in KDD 99 includes not only symbol feature but also continuous and discrete features, we must cope with these features before do experiments. Here Naïve algorithm in Rosetta software [17] is used to deal with continuous feature, and symbol feature can be discretized by general mapping method directly. Then in order to remove different features of various data and 8 The Scientific World Journal achieve general feature and same weights for discretized data, these data should be standardized, and these standardization formulas are introduced as follows:

Experimental Design.
In order to prove that network anomaly detection system with ODS and RBPA has better performance, we design 4 kinds of experiments.
The first experiment is that we choose 3 single methods (SVM, BMPM, and BP) and 4 fusion methods (DS with SBPA, DS with RBPA, ODS with SBPA, and ODS with RBPA) to do detection in the same data set. In this data set, 4000 network connections of each connection type (Normal, Dos, Probe, and R2L) are selected and 249 network connections are chose from U2R type. These data chosen constructed a data set which is divided into 2 parts: training data set and test data set. This experiment is used to prove that the method we presented can detect various attacks and has higher DR and lower FR.
The second experiment is that we also choose these 7 network anomaly detection methods to do detection in R2L data set which has 4000 network connections. And the former 2000 network connections are normal connections and the later 2000 network connections are abnormal connections. This experiment is utilized to prove that the method with RBPA outperforms the method with ODS, and two optimization methods we presented can be used in network anomaly detection simultaneously with better performance.
The third experiment is that we also choose these 7 network anomaly detection methods to do detection in the same data set, like the first experiment. But we compare several parameters mentioned in Section 4.3, such as Precious, Recall, and F-Score. In addition, we utilize ROC curve which shows DR and FR of corresponding method, and AUC which represents the area under corresponding ROC curve to estimate the performance of network anomaly detection system.
The fourth experiment is that we choose 2 network anomaly detection methods (ODS with RBPA and DS with SBPA) to do detection. But here we choose 10% KDD99 data as training data set and test data set mentioned in Section 5.1.1  as test data set. This experiment is used to estimate the new model's detection ability for new attack type. Tables 2, 3, 4, 5, 6, and 7, we can see that the false positive rate (FR) of single detection model, such as SVM, BMPM, and BP, is higher than that of fusion detection model. This reflects that fusion detection method can effectively reduce the FR in the anomaly detection system. From detection rate (DR) and attack number detected by anomaly detection methods, the DR of fusion detection method outperforms that of single method, and fusion method will bring lower FR. In addition, compared to fusion detection model, the variance of almost single detection model is larger, meaning that fusion detection model is not easy to shake, that is, relatively stable. Though the DR of some models for various attacks is high, its FR is still high, for example, BMPM model. Therefore, this novel model with ODS and regression BPA outperforms than others, and it has lower FR and better DR.

Experiments with 4 Attack Types. From
Here, we not only analyze the whole performance of this novel model, but also discuss ODS with weights and regression BPA performance. According to whether BPA and DS evidence theory redesigned, we can achieve 4 results for different combinations shown in Tables 5, 6     According to whether DS evidence theory redesigned (whether adding weights into DS), we can divide these models into two groups without considering BPA design: one is Tables 5 and 6, and another is Tables 7 and 8. In this way, we can compare the performance of ODS, DS with weights (F-Score value as weights) with that of DS. From these two groups, we can see that the FR of ODS is lower than that of DS, and the total DR of ODS is also lower than that of DS. Clearly, most of DR of various attack types with ODS outperform that with DS. Thus, ODS with weights is effective compared with DS. Similarly, according to whether BPA redesigned (whether with sensors' regression ability), we can divide these models into two groups without considering DS design: one is Tables 5 and 7, and another is Tables 6 and 8. In this way, we can compare the performance of RBPA with that of SBPA. For FR and total DR, RBPA is better than SBPA significantly. So RBPA with sensors' regression ability is effective compared with SBPA.

Experiments with R2L Attack.
In this subsection, we mainly focus on the novel model for single attack type according to formula (15). From Figures 2, 3, 4, and 5, we can see that they are achieved by different groups with redesigned or conventional BPA and DS. In these figures, corresponding with 123 ( ) in formula (15), parameter MNP represents the support degree of normal connection for current network connection after it is detected by SVM, BMPM, and BP sensors and merged by ODS. On the contrary, MAP corresponds with 123 ( ). By formula (15), if MNP is larger than MAP, current network connection is considered as a normal one, and vice versa. In this experiment, there are 4000 network connections in each figure, and the former 2000 network connections are normal connections and others are abnormal connections.
First, we analyze and compare Figures 4 and 5 in one group. In Figure 4, some normal connections of the former 2000 network connections overlap together for MNP and MAP. Significantly, some parts of MAP are above MNP, that is, this normal network connection is wrongly considered as an abnormal one, leading to a higher FR. On the contrary, the overlap in Figure 5 is less than that in Figure 4. In this way, Figure 3 outperforms Figure 2 with MNP and MAP. Without considering BPA design, ODS with weights is further effective.
Next, we analyze and compare Figures 2 and 4 in one group. In Figure 2, almost all the normal connections (the former 2000) overlap together for MNP and MAP. However, this overlap is further less in Figure 4. Clearly, this also occurs in the later 2000 connections, abnormal connections. In the same way, Figure 5 is better than Figure 3. In essence, this shows that RBPA method outperforms SBPA method, with lower FR and higher DR. This conclusion is consistent with the results from Tables 5 and 7 or Tables 5 and 8. Without considering DS design, RBPA with regression is further effective compared with SBPA. Moreover, based on Figure 2, we compare Figure 3 with Figure 4. We can see that the results of MNP and MAP are distinguished easily and are suitable for real network better in Figure 4. But the opposite results are obtained in Figure 3, meaning that fuzzy and unseparated results. This leads a higher FR. Only verifying a condition, DS or BPA design, we can get Figure 3 with ODS optimization and Figure 4 with RBPA optimization. From Figures 3 and 4, we conclude  Figure 5 is improved enormously. In a word, no matter which one system chooses, the performance of optimized network anomaly detection system will be improved clearly. Specially, these two optimization methods can be utilized by network anomaly detection system simultaneously, leading a better result than the one with either optimization method.   Table 9 shows that the results of all network normal and abnormal connections used by various anomaly detection methods. And the ROC curve of each method is depicted in Figure 6. In these two experiments, we employ ROC curve that shows the relationship of FR and DR, and AUC that represents the area under ROC curve. Here several parameters are utilized to estimate network anomaly detection system, such as Precious, Recall, and F-Score which are introduced in Section 4.3. Specially, the larger the values of parameters (F-Score, AUC) are, the better the performance of corresponding system is.

Experiments Based on ROC and AUC.
First, compared single detection methods, SVM, BMPM, and BP with fusion detection methods, we can see that single detection methods have smaller values of F-Score and AUC from Table 9; that is, the performance of single detection methods is lower than that of fusion methods. Ensuring an invariable condition in 4 fusion methods, we can analyze the effectiveness of RBPA and ODS. In this way, compared DS with SBPA and DS with RBPA, ODS with SBPA and ODS with RBPA, we can see that the methods with RBPA have higher F-Score and AUC values. As the same, the methods with ODS have higher F-Score and AUC values.
From Figure 6 that shows the ROC curve of 7 network anomaly detection methods, the network anomaly detection method merged with ODS and RBPA has the largest area under corresponding ROC curve (the largest AUC value in Table 9). When they have the same DR, FR of the network anomaly detection method merged with ODS and RBPA is the smallest one. Similarly, when they have the same FR, DR of the network anomaly detection method merged with ODS and RBPA is the highest one. So this fusion method is the best one in these 7 fusion methods.

Experiments with New Attacks.
In this experiment, we utilize 3 network anomaly detection systems (BP, DS with SBPA and ODS with RBPA) to detect new attacks. Unlike experiments mentioned above, the data set used in this experiment is 10% KDD99 and test data set with 17 new attack types in Section 5.1.1.
From Table 10, we can see that the performance of single method is lower than that of fusion method. Most of new attack connections DR are higher than BP, but there still exist some abnormal DR, "sqlattack" for example. With ODS and RBPA optimization, this novel method we presented makes up this defect, which has a better new attack detection performance than others.

Related Work
The use of data fusion in the field of network anomaly detection is presented by Siaterlis and Maglaris [18]. The Dempster-Shafer theory of evidence is used as the mathematical foundation for the development of a novel anomaly detection engine. The detection engine is evaluated using the real network traffic. The superiority of data fusion technology applied to intrusion detection systems is presented in the work of Wang et al. [19]. This method used information collected from the network and host agents and application of Dempster-Shafer theory of evidence. Another work incorporating the Dempster-Shafer theory of evidence is by Hu et al. [20]. Wu et al. [21], proposed a framework of client-server architecture where the mobile agent continuously extracted various features and send to the server to detect anomaly using anomaly detectors. They used multiple distributed servers with different machine learning as a detector for analyzing the feature vector and D-S Evidence theory of information fusion is used to fuse the results of detectors, also proposed a cycle-based statistical approach to find anomaly activity. Zhouzhou et al. [22] presented a new algorithm based on D-S evidence theory to reduce energy consumption in wireless sensors network, which modifies D-S evidence theory and fuses it on cluster-head selection phase and adjusts operation period. The Dempster-Shafer theory of evidence in data fusion is observed to solve the problem of how to analyze the uncertainty in a quantitative way.
Reference [11] presented a novel intrusion detection approach combining SVM and KPCA to enhance the detection precision for low-frequent attacks and detection stability. In order to shorten the training time and improve the performance of SVM classification model, an improved radial basis kernel function (N-RBF) based on Gaussian kernel function is developed, and GA is used to optimize the parameters of SVM. [14] proposed a flow-based anomaly detection system, which is trained with a flow-based data set. In this new system, multilayer Perceptron neural network with one hidden layer is used, which is added interconnection weights by a Gravitational Search Algorithm. Giacinto et al. [23] utilized general classifiers to divide various feature subspaces from the same data set and then merged voting, mean algorithm, Bayes, and decision module together. However, there exists less analysis about detection algorithm and fusion method. Another drawback of this model is higher false alarm rate. The formulation of the intrusion detection problem as a pattern recognition task using data fusion approach based on multiple classifiers is attempted by Didaci et al. [24]. The work confirms that the combination reduces the overall error rate, but may also reduce the generalization capabilities. Ambareen Siraj et al. [25] brought fuzzy cognitive map into fusion network anomaly detection and presented an intelligent network anomaly detection model. Thomas and Balakrishnan [26][27][28] selected artificial neural network as fusion algorithm and constructed fusion network anomaly detection model based on SNORT, PHAD, and ALAD that are open source detection systems. Although it was proved as an effective system, but its detection rate for some attacks was lower. In [29], performance of this fusion model is decided by diversity of various classifiers. [30] presented a novel network anomaly detection system with DS evidence theory and regression neural network, but its detection rate is lower.