An Anomaly Detection Algorithm of Cloud Platform Based on Self-Organizing Maps

Virtual machines (VM) on a Cloud platform can be influenced by a variety of factors which can lead to decreased performance and downtime, affecting the reliability of the Cloud platform. Traditional anomaly detection algorithms and strategies for Cloud platforms have some flaws in their accuracy of detection, detection speed, and adaptability. In this paper, a dynamic and adaptive anomaly detection algorithm based on Self-OrganizingMaps (SOM) for virtual machines is proposed. A unifiedmodeling method based on SOM to detect the machine performance within the detection region is presented, which avoids the cost of modeling a single virtual machine and enhances the detection speed and reliability of large-scale virtual machines in Cloud platform. The important parameters that affect the modeling speed are optimized in the SOM process to significantly improve the accuracy of the SOMmodeling and therefore the anomaly detection accuracy of the virtual machine.


Introduction
As Cloud computing applications become increasingly mature, more and more industries and enterprises are deploying increasing numbers of applications within Cloud platforms, in order to improve efficiencies and on-demand services where resources are limited.Virtual machines for computing and resource storage are core to a Cloud platform and are essential to ensure normal operation of various businesses [1,2].However, as the number of applications increases, the scale of the Cloud platform is expanding.Resource competition, resource sharing, and load balancing within the Cloud platform reduce the stability of virtual machines, which leads directly to a decrease in the reliability of the entire Cloud platform [3][4][5][6][7].Therefore, anomaly detection of virtual machines is an important method for durable and reliable operation on a Cloud platform.
At present, the main methods of virtual machine anomaly detection on Cloud platforms are to collect system operation logs and various performance metrics of the virtual machine status and then determine the anomaly using anomaly detection methods such as statistics, clustering, classification, and nearest neighbor.
The statistical anomaly detection method is a statistical method based on a probabilistic model.This method makes certain assumptions about the conditions [8].However, in real Cloud platforms, the distribution of data is usually unpredictable, which means that the statistics-based method has low detection rates and thus may be unsuitable.Clusteringbased methods group similar virtual machines states together and consider any states which are distant from the cluster center to be abnormal [9,10].Since this method does not need a priori knowledge of the data distribution, its accuracy is better than the statistics-based method.However, it is difficult to choose a reasonable clustering algorithm in clustering-based methods.Self-Organizing Maps (SOM) [11,12], -means [13,14], and expectation maximization [15] are three commonly used clustering algorithms in anomaly detection.The classification-based algorithm mainly includes neural networks [16,17], Bayesian networks [18,19], and support vector machines [20][21][22].The main drawback of these algorithms is the high training cost and the complexity of the implementation.The neighbor-based algorithm detects anomalies based on clustering or the similarity of the data.However, the main disadvantage of this algorithm is that the 2 Mathematical Problems in Engineering recognition rate decreases when the normal dataset that is being detected does not have enough neighbors.
In a Cloud environment, the performance and running status of the virtual machine is represented mainly by the performance metrics.The performance metrics include five primary metrics: CPU, memory, disk, network, and process [23].These metrics can determine whether a virtual machine is abnormal.Reference [23] has a more detailed explanation of the performance metrics of virtual machine.
This paper proposes an SOM-based anomaly detection algorithm which is based on determining the various performance metrics of each virtual machine.This algorithm is different from traditional strategies in that the detection domains of the virtual machines with similar running environments are divided and each domain is trained iteratively in the SOM network.This enables reasonable adaptation of the training of the large-scale virtual machines in the Cloud platform and overcomes the shortcomings of traditional methods where each virtual machine is treated as a training sample.In addition, two important parameters of the SOM network training are optimized, which greatly reduce the training time of the SOM network and the performance metrics of the virtual machine and enhances the efficiency and the accuracy of anomaly detection of the virtual machine in the Cloud platform.
Various experiments were conducted in order to verify the efficiency and accuracy of the SOM-based anomaly detection algorithm.The results show that the sample training speed and detection accuracy are significantly improved by the proposed algorithm.
The rest of this paper is organized as follows.Section 2 describes existing anomaly detection methods.Section 3 describes the SOM-based virtual machine anomaly detection algorithm.Section 4 shows the performance evaluation.And, finally, Section 5 lists the conclusions derived from the experiment.

Related Work
Current anomaly detection methods are mainly based on classification, clustering, statistics, and nearest neighbor methods [24].These methods will now be introduced.
The classification-based method obtains a classifier model from a set of selected data and then uses the model to classify new data [25,26].Shin and Kim proposed a hybrid classification method that combines the One-Class SVM [27,28] hybrid classification method with the nearest mean classifier (NMC) [29].The highly flexible nonlinear correlation model can be easily classified by the nonlinear kernel function in this method [30][31][32].This method introduces a feature subset selection algorithm, which not only reduces the number of classification dimensions, but also improves the performance of the classifier.However, the main disadvantage of this method is slow training and potential for misclassification.
The clustering-based method is an unsupervised learning method [26,33].SNN [34], ROCK [35], and DBSCAN [36] are three typical clustering-based anomaly detection methods.All of these three methods assume that normal samples are within a single cluster within the dataset, and abnormal samples are outside any cluster.However, if a cluster is formed by the anomaly data after a period of clustering, then the anomalies cannot be recognized properly.Additionally, it is important for the clustering that the width of the cluster is accurately selected.The advantages of the clustering-based approach are that a priori knowledge of the data distribution is not required and it can be used for incremental modeling.For example, for anomaly detection of virtual machines, a newly collected virtual machine sample can be analyzed by a model already known for anomaly detection.
A typical nearest neighbor based approach is proposed by Breuniq et al. [37] using a local outlier factor for the data abnormality detection.Any data that requires analysis is associated with a local outlier factor, which is the ratio of the average local density of the  nearest neighbors to the data itself.The local density is the volume of data-centric spheres of the  smallest neighbors divided by .If data is abnormal, then its local density should be significantly different than the local density of its nearest neighbors.
The statistics-based approach is an earlier anomaly detection method, which is usually based on an assumption that an anomaly is an observation point that is not generated by an assumed model and is partly or completely irrelevant [37].Ye and Chen [24,38] used  2 statistics to detect anomalies in the operating system.Assuming that normal data under training is subject to a multivariate normal distribution, then the  2 is where   is the observed value of the th variable,   is the expected value of the th variable (obtained from the training data), and  is the number of variables.A large value of  2 represents an anomaly in the observed samples.

SOM-Based Virtual Machine Anomaly Detection Algorithm
In a Cloud platform, virtual machines with a similar running environment have similar system performances.A SOMbased virtual machine anomaly detection algorithm is aimed at Cloud platforms that have a large number of virtual machines.In this paper, we partition virtual machines with similar running environments; that is, we assign a set of virtual machines with similar properties to the same detection field.This avoids the requirement for SOM network modeling for every single virtual machine, significantly reducing the modeling time and the training cost.For instance, when the proposed method is not used, 100 SOM network models need to be built for 100 virtual machines; however with the proposed method, 100 virtual machines with similar running environments only need one SOM network model to be built.In addition, a SOM network can be trained more accurately by collecting 100 samples than by training using one sample only.
After partition of the virtual machines, SOM network training is used in every domain.In this paper, the two most

SOM-Based State Modeling of the Virtual Machine.
Because prior knowledge of similar performance for virtual machine classification is unknown, the -medoids method is used in this paper for initial classification; that is, the VMs on the Cloud platform are divided into multiple detection domains.The reason the -medoids method is chosen is that, compared with the -means algorithm, -medoids is less susceptible to noise.
The SOM network is generated in each detection domain using the SOM algorithm.The network is constructed as a two-dimensional ( × ) neuron array.Each neuron can be represented as   ,  = 1, 2, 3, . . ., , and each neuron is related to a weight vector, which is defined as   ( 1 ,  2 ,  3 , . . .,   ). is the column subscript.The dimensions of a weight vector  are the same as the dimensions of the training set for training its SOM network.The training set used in this paper includes the CPU utilization performance which reflects the running state of the virtual machine, its memory utilization, and its network throughput.These performance metrics are described by a vector defined as ( 1 ,  2 ,  3 , . . .,   ).
The modeling of a specific virtual machine-based detection domain in SOM requires periodic measurements and adequate collection of the training data (performance ).The collected performance vector  ∈   can be considered to be a random variable within the performance sample space.The VM performance samples collected within a certain time series can be expressed as   (where  = 1, 2, 3, . . ., ).The iterative training of the samples collected within this time series is the modeling process of the SOM virtual machine Therefore, the detection domain modeling algorithm can be summarized as follows.
Step 1 (initialization of the SOM network).SOM neurons are represented by a weight vector (  (0), ,  = 1, 2, 3, . . ., ), where  and  indicate the location of the neurons in the SOM network.In this paper, the weight vector (  (0)) is initialized randomly in the SOM network.
Step 2 (defining the training space of the SOM network for training sample   ).When a training sample   at time  is added to the SOM network, the most suitable neuron needs to be found to be the training center of the neighborhood.For   at time , the most suitable neuron  can be found using (2), and  will be the training center in the SOM network after   is added: After the training center  is defined using (2), we need to set the training neighborhood.According to the definition of SOM, to ensure convergence of the training process of the SOM network, the training neighborhood can be defined as where  (5).The fitting equation is defined as follows: After the training process is completed using ( 5), the convergence of the training process needs to be verified.The process is convergent if every neuron associated with its weight vector in a SOM network is stabilized.The method is described in detail below.
Assume that there is a neuron   in the SOM network and the time index of its latest training sample is  (,)  .Meanwhile, assume that there is a sufficiently small real number  and that convergence of the training process of the SOM network can be checked using the following: In (6), () represents the average deviation between the latest fitting state and the previous value for every neuron with a weight vector in the SOM network after  training samples are used in a training process.Obviously, when () < , the neurons   with a weight vector are stabilized, indicating that the iterative training process can be stopped.When () > , further collection of the training samples is required, and Steps 2 and 3 need to be repeated.

Parameter Setting in the SOM-Based Modeling Process.
The SOM network modeling process is an iterative fitting process that mainly consists of two stages: the initial ordered stage and the convergence stage.There are two important parameters in the training neighborhood function  (,)  : the width of the training neighborhood () and the learning-rate factor ().Correct setting of these two parameters plays an important role in preventing the SOM network training from getting trapped in a metastable state.The processes for setting these two parameters are as follows.
(1) Setting the Width of the Training Neighborhood ().Based on the principle of SOM, () is a monotonically decreasing function of .At the beginning of the training process, the value of () should be set properly so that the radius of the neighborhood defined by  (,)  can reach at least half the diameter of the SOM network [39].In this paper, the value is set to /2.
Since  (,)  is a monotonically decreasing function of ‖  − (, )‖, it can be seen from ( 4 > , the current iteration step will influence   . Therefore, when  = 1 at the beginning of the SOM network training process, the lower bound of () can be determined based on the threshold  and (4).The detailed derivation process is shown as follows.
When  = 1, assume that ‖  − (, )‖ = /2, and the lower bound of ( 1) is then determined by the following inequality derivation process: Based on this derivation, the lower bound of (1) can be determined by (7), where the threshold  = 0.05 in this paper.The following discussion will describe the value of (1) used for setting ().According to (7), () in  (,)  of the initial ordered stage can be defined as follows: When the iteration of the SOM network training is gradually converging, the size of the training neighborhood defined by  (,)  should be constant and can cover the nearest neighborhood of the training center  in the SOM network.In this paper, the nearest neighborhood, that is, the nearest four neurons around neuron  in all four directions (up, down, left, and right) in the SOM network, is shown in Figure 2.
(2) Setting the Learning-Rate Factor ().Since () is a monotonically decreasing function of , the range of () is 0.2 < () < 1 in the initial ordered stage of the SOM training process, and 0 < () < 0.2 in the convergent stage of the SOM training process.Then () can be set to ) ,  > 1000. (9)

VM Anomaly Recognition
Based on SOM Model.The modeling method of VM status based on SOM is described sufficiently in the previous section.In this section, we will describe the recognition of an anomaly using the trained SOM network.After several rounds of fitting iterations, the SOM network can be used to effectively discover the normal state of virtual machines.The normal state is represented by neurons with weight vectors in the SOM network.In other words, a neuron associated with weight vectors in the SOM network can be used to describe whether a class of similar virtual machines is normal.
In order to check whether the current state of a VM is an anomaly on a Cloud platform, we can compare the current running performance of virtual machines with the neurons with weight vectors in the SOM network.In this paper, Euclidean distance is used to determine similarity.If the current state is similar to one of the neurons with weight vectors (assuming that the probability of anomaly is less than a given threshold ), the virtual machine will be identified to be normal; otherwise it will be considered to be abnormal.
Let VM  represent a virtual machine on a Cloud platform.The corresponding SOM network of VM  is defined as SOM(VM  ).The weight vector of each neuron can be represented as in which  is a sufficiently small constant.

Experimental Results and Analysis
4.1.Experimental Environment and Setup.In this paper, the experimental Cloud platform is built on an open source Cloud platform OpenStack [40,41].The operating system CentOS 6.3 is installed on the physical servers for the running virtual machines, on which the hypervisor Xen-3.2 [42,43] is installed.The operating system CentOS 6.3 is also installed on the physical servers for running the Cloud management program, on which the Cloud management components Open-Stack are installed.100 virtual machines were deployed on this Cloud platform.The performance metrics of the virtual machines in this experiment were collected by tools such as libxenstat and libvirt [44,45].For the fault injection method, we used tools to simulate system failures: memory leak, CPU Hog, and network Hog [46][47][48].

First Set of Experiments.
The impact of the SOM network, training neighborhood width, and learning-rate factor values on the performance of the anomaly detection  mechanism of the SOM-based dynamic adaptive virtual machine was evaluated.
Training Stage.Firstly, several virtual machines were selected from 100 virtual machines.One fault was then randomly selected (a memory leak, CPU Hog, or network Hog) and then injected.1000 virtual machine system performance measurements were collected as training samples for the model training during 10 rounds (one second per round), on the 100 virtual machines.
Anomaly Detection Stage.In order to simulate an anomaly in the objects under detection, one of the three faults was randomly injected in the 100 virtual machines.The anomalies in each of the 100 virtual machines were then detected based on the trained model.The detection results were recorded.Several sets of experimental results with different parameter values were obtained.It should be noted that the same fault was injected in each experiment to exclude unnecessary variables.
The experimental results are shown in Tables 1, 2, and 3.
As can be seen from Table 1, there is no obvious change in accuracy using the proposed detection method for different SOM network sizes, which means that the proposed anomaly detection method is not affected by the size of the SOM network.As can be seen from Table 2, the size of the initial trained neighborhood has a significant impact on the detection accuracy.The main reason is that if the training size is too small, it may cause a metastable state in the training process, and further training iterations are required to achieve real steady state.
As can be seen from Table 3, as the initial value of the learning-rate factor decreases, the accuracy of the abnormality detection significantly decreases.The reason is that if the initial value of the learning-rate factor is too small, the contribution of each training sample in the SOM network training is small too.Thus the fitting ability of the SOM network to detect an object is not sufficient, which leads to poor quality of model training, hence decreasing the accuracy of the SOM network-based anomaly detection.
Analysis of the first set of experiments shows that better anomaly detection results can be obtained in DA SOM when the parameters are set as follows: SOM network size = 13 × 13, initial size of training neighborhood = 0.5 dsn, and initial value of learning-rate factor = 1.
The above experiments have been carried out on the training data set.To further demonstrate the effectiveness of the proposed algorithm, the algorithm is tested on the untrained anomaly set (disk Hog).
The experimental results about disk Hog are shown in Tables 4, 5, and 6.As can be seen from Tables 4, 5, and 6, the accuracy of the proposed algorithm still has better accuracy in the untrained data set.The impact of three parameters (som net size, training neighborhood width, and learning-rate factor) on the accuracy is similar with the previous experiments.Several experiments for different techniques and different parameters with the same aforementioned configuration and experimental procedure are applied to obtain the corresponding results.It should be noted that, since the training process is not required for the -NN technique, it started directly in the abnormality detection stage.In addition, to ensure comparability, the training process of the clustering-based method is the same as the proposed method, where an anomaly detection model is built for 100 virtual machines and the training data set is the same as training SOM.Experimental results are shown in Figure 3.
Figure 3 shows that compared to the other two injected failures, the sensitivities of the three techniques to memory leak failure are relatively low.The main reason is that an anomaly does not immediately appear on the failed object when there is fault introduced by a memory leak.It takes some time for this fault to accumulate to eventually cause an obvious abnormality.The consequence of this is that detection systems tend to mistake these objects with an anomaly as normal.In contrast, faults caused by CPU Hog and network Hog events will immediately lead to an abnormal state within the fault object thus minimizing misjudgments, which enhances the sensitivity of all three anomaly detection techniques, as shown in Figures 3(b) and 3(c).
Meanwhile, as shown in each subgraph of Figure 3, compared with the other two anomaly detection techniques, DA SOM maintains a better balance between sensitivity and false alarm rate.In other words, with the same false alarm rate, the sensitivity of DA SOM is better than that of the other two approaches, showing a strong performance in improving warning effect and reducing the false alarm rate.
Moreover, the computational complexity of DA SOM is much lower than that of the -NN in anomaly detection stage while the computational complexity of DA SOM is equivalent to the -M technique.Their complexity is constant with the detected object size and with the parameter  in the -M technique.Meanwhile, during the model training stage, the training cost of -M is higher than that of DA SOM, for the same size of training data.The main reason is that iteration is required in -M on the entire training data set (i.e., the cluster centers need to be updated, and the training data set needs to be reclassified according to the updated center point), while there is only one classification operation for each training sample in DA SOM.

Conclusion
An anomaly detection algorithm based on SOM for the Cloud platform with large-scale virtual machines is proposed.The virtual machines are partitioned initially according to their similarity, and, then based on the results of initial partition, the SOM is modeled.The proposed method has a high training speed, which is not possible in traditional methods when there are a large number of virtual machines.We also optimized the two main parameters in the SOM network modeling process, which highly improved this process.The proposed method is verified on an incremental SOM anomaly detection model.The results showed strong improvements in detection accuracy and speed using the proposed anomaly detection method.

Figure 3 :
Figure 3: Comparison of the three anomaly detection algorithms: DA SOM, -NN, and -M.

4. 2 . 2 .
Second Set of Experiments.The objective of this set of experiments was to evaluate the effect of the VM anomaly detection mechanism based on SOM (represented by DA SOM in the following sections).In order to compare this with other approaches, we use two typical unsupervised anomaly detection techniques in the experiments: (1) nearest neighbor based anomaly detection technique (called -NN) where prior training of the anomaly identification model is not required; (2) cluster-based anomaly detection technique (called -M) where training of the anomaly identification model is required in advance.
() represents the learning-rate factor, which determines the fitting ability of the SOM network for the training sample   in the training process.() represents the width of the neighborhood that determines the range of influence of a single training sample   on the SOM network.According to SOM related theory, to ensure convergence of the training process, () and () should be both monotonically decreasing functions of the number of training iterations .Step 3 (SOM network training based on training sample   ).The training neighborhood has been defined in Step 2. The neurons, which are within the training domain of the SOM network, are trained based on the training sample   according to is a function of the training neighborhood that is a monotonically decreasing function of parameter ‖  − (, )‖ and training iterations ;   is the coordinate vector of the training center  in the SOM network; and (, ) is the coordinate vector of neuron node   in the SOM network.Due to its effective smoothing, a Gaussian function is used is unaffected.Therefore, although there is no clear boundary for the training neighborhood defined by the Gaussian function in this paper, the influential range of a single training sample   on the training of the SOM network can still be limited.Assume that  is a sufficiently small threshold of , after the training iterations have finished.The currently measured performance value of VM  is , and the abnormal state of VM  is VmStatus(VM  ).Then the

Table 1 :
The impact of SOM net size on the detection accuracy.

Table 2 :
The impact of the initial training neighborhood size on the accuracy of SOM.Dsn indicates the diameter of the SOM network.

Table 3 :
The impact of the initial value of the learning-rate factor on the accuracy of SOM.

Table 4 :
The impact of SOM net size on the detection accuracy.

Table 5 :
The impact of the initial training neighborhood size on the accuracy of SOM.

Table 6 :
The impact of the initial value of the learning-rate factor on the accuracy of SOM.