One-Class Classification by Ensembles of Random Planes (OCCERPs)

One-class classification (OCC) deals with the classification problem in which the training data have data points belonging only to the target class. In this paper, we present a one-class classification algorithm, One-Class Classification by Ensembles of Random Plane (OCCERP), that uses random planes to address OCC problems. OCCERP creates many random planes. There is a pivot point in each random plane. A data point is projected in a random plane and a distance from a pivot point is used to compute the outlier score of the data point. Outlier scores of a point computed using many random planes are combined to get the final outlier score of the point. An extensive comparison of the OCCERP algorithm with state-of-the-art OCC algorithms on several datasets was conducted to show the effectiveness of the proposed approach. The effect of the ensemble size on the performance of the OCCERP algorithm is also studied.


Introduction
e one-class classification (OCC) problem is a special class of classification problems in which only the data points of one class (the target set) are available [1]. e task in one-class classification is to make a model of a target set of data points and to predict if a testing data point is similar to the target set. e point which is not similar to the target set is called an outlier. OCC algorithms have applications in various domains [2] including anomaly detection, fraud detection, machine fault detection, and spam detection [2]. e OCC problem is generally considered to be a more difficult problem than the two-class classification problem as the training data have only data points belonging to one class [1][2][3], and traditional classifiers need training data from more than one class to learn decision boundaries. erefore, standard classifiers cannot be applied directly to OCC problems. Various algorithms have been proposed to address OCC problems [1][2][3].
ere are two main approaches to handle OCC problems [2,3]. In the first approach, artificial data points for the nontarget class (outlier) are generated and combined with the target data points and then a binary classifier is trained on this new data. In the second approach, target data points are used to create the OCC models [4].
Ensembles of accurate and diverse models generally perform better than individual members of ensembles [9]. Ensembles of classification models have been developed to improve the performance of one-class classification models [5,[10][11][12].
In this paper, we propose an ensemble method, OCCERP, for OCC problems. In this method, we project data points in a random plane. e distance of a data point from a pivotal point in this random plane is used as an outlier score. We can generate various diverse models by selecting different random planes which can be used to create ensembles. Experiments are done to show the effectiveness of the proposed approach. e paper is organised as follows: Section 2 discusses about related work. e OCCER algorithm is presented in Section 3. Section 4 presents experiments and discussion. Section 5 discusses the conclusion and suggests future developments.

Literature Survey
As previously discussed there are two types of OCC algorithms, and OCCERP belongs to the second type, which will be discussed in this section.
Generative methods are useful for OCC as the target class may directly be modelled from the available training target data points. Density-based methods, such as Gaussian, kernel density estimators, Parzen windows, and mixture models are widely used for OCC problems [3,13]. Densitybased methods estimate the probability density function of the underlying distribution of the training target data points.
en, these methods determine if a new data point comes from the same distribution. e selection of appropriate models and large-scale training data are the problems of this approach.
Nearest neighbour-based (NN-based) approaches are other widely used methods to address OCC problems [1,3,5]. is approach assumes that an outlier point will be far away from neighbour target points as compared to a target point from other neighbour target points [1,5]. e local outlier factor (LOF) method is a density-based scheme for OCC [14], in which a LOF is computed for each data point by taking the ratios of the local density of the point and the local densities of its neighbours. An outlier point has a large LOF score.
Tax and Duin [15] propose the support vector domain description method for OCC. e method finds a hypersphere with a minimum volume around the target class data such that it encloses almost all the points in the target class dataset. Scholkopf et al. [7] propose the use of support vector machines for one class classification. A hyperplane is constructed such that it separates all the data points from the origin and the hyperplane's distance from the origin is maximised.
In reconstruction-based methods [1-3, 16, 17], a model like an autoencoder is trained on the given target class data. e reconstruction error which depends on a testing data point and the system output is used to define the outlier score. An outlier point is likely to have more reconstruction errors.
Clustering-based approaches use a clustering method, like k-means clustering to create clusters [1]. e distance between a data point and its nearest cluster centre is used as the outlier score. e number of clusters and cluster initialization are the problem of k-means type clustering algorithms.
Rahimzadeh Arashloo and Kittler [18] present a nonlinear one-class classifier formulated as the Rayleigh quotient criterion optimisation that projects the target class points to a single point in a new feature space, the distance between the projection of a testing point to that point is the outlier score of the testing point. Leng et al. [19] use a similar approach but use extreme learning machines for the projection.
Ensembles have also been developed for the OCC problems. ere are two approaches for creating ensembles. In the first approach, one OCC algorithm is employed and a randomisation process is used to create diverse OCC models. Lazarevic and Kumar [20] propose the creation of multiple datasets by using feature bagging. e LOF algorithm is then used on these multiple datasets, hence multiple OCC models are created. e outputs of these models are combined to get the final output. Khan and Ahmad [5] use random projection to create multiple datasets. An NN-based OCC algorithm is applied to these multiple datasets. Arthur et al. [21] introduce noise to the dataset to create multiple datasets. Experiments with different OCC algorithms show the effectiveness of the proposed approach. Chen et al. [11] use randomly connected autoencoders to create ensembles of autoencoders. ese ensembles outperformed other stateof-the art OCC methods. Khan and Taati [22] train different autoencoders using different features to create ensembles of autoencoders.
ey show that ensembles perform better than single autoencoders. Isolation forests consist of many decision trees [10]. ese trees are created by using random partitioning. e authors argue that anomalies are susceptible to isolation and therefore have short path lengths. e method has produced excellent results on various datasets. Kanag [23] uses the clustering technique to many clusters. erese clusters are used using the one-against-rest method to create many binary-classifiers. eir classifiers are used as an ensemble to handle OCC problems. Mohammeda and Kora [24] propose ensembles of deep learning models for text classification problems.

One Class Classification by Ensembles of Random Planes (OCCERPs)
For OCC problems, the training data have points from one class. In this section, we will call this class as the negative class. A class consisting of outlier points will be called as the positive class. e motivation of the proposed approach is that if data points are projected on a plane, the distance from a properly selected pivot point to the projection of a given point can be used as an outlier score. e projections of negative class points are expected to be nearer to this pivotal point as compared to the projections of positive class points. Many random planes can be generated. Each plane will generate one outlier score for a given point, and all the scores will be combined to get the final outlier score for a point. Creating appropriate random planes and selecting appropriate pivotal points on these random planes are very important steps of the proposed approach. We use the random linear oracle approach [25] to create random planes and pivotal points. Kuncheva and Rodriguez [25] propose a random linear oracle approach for classifier ensembles. In this approach, they divide the training data points into two groups using a random linear oracle (RLO). is RLO is a random hyperplane which is created by using two randomly selected points from the training data. We use the same approach to generate random planes. To create a random plane, two points are randomly selected from the given negative class. e random plane will pass from the midpoint of the two selected data points and will have the normal going through these two data points ( Figure 1). As these two data points are part of the negative class, the midpoint is expected to be within the boundary of the 2 Computational Intelligence and Neuroscience negative class.
is point will act as a pivotal point to compute the outlier score of a given data point. RLO approach makes sure that there are points on both sides of the hyperplane.
e equation of the plane in n dimension is where A 1 , A 2 , A 3 , . . . , A n are directions and B is a constant. e values of A i and B of a plane for which the normal is going through two points R and S and a point Z(z 1 , z 2 , z 3 , . . . , z n ) is on the plane In random linear oracle, the plane goes from the midpoint of X and Y, therefore Z is defined as follows: e perpendicular distance D 1 from a point P 1 (p 1 , e distance between the point P and.
will be used as an outlier score. Figure 2 shows that D 3 will be small negative class points whereas this value will be large for positive class points.

Combination of Results.
Researchers use different approaches to combine the results of different outlier models such as mean, median, maximum, and minimum [26,27].
ere is no proper justification in literature for selecting one over the other. We did a small experiment with five-fold H S Z R Figure 1: A plane, H created by two points using two data points R and S. e plane is the random plane that will pass from the midpoint, Z of R and S points and will have the normal going through these two data points. Computational Intelligence and Neuroscience 3 cross validation with three datasets to understand their performances. We found that there is no approach which performed consistently best for all the runs. However, we found that the minimum approach has an advantage over other approaches. erefore, we selected the minimum approach to combine the results. To avoid the effect of extreme value, instead of minimum value, we took the mean of five minimum values. All the experiments were done using this combination approach. We did not experiment with numbers other than five. It is noted that the combination of different outlier models in an ensemble is an important research problem. We do not claim that the minimum approach is best. is research problem requires more experimental and theoretical analysis which is beyond the scope of this paper.

Experiments
We conducted experiments by using the scikit-learn python package (https://scikit-learn.org/stable/) and PyOD (a Python toolbox for scalable outlier detection) [30]. Different standard OCC algorithms, Isolation Forests (IFs), One-class SVM (OCSVM), LOF, and autoencoders were used for the comparative study. PyOD was used for these methods. e default parameters for these methods given in PyOD were used in the experiments. For the OCCERP algorithm, the same size and the fixed combination approach was used. 5 × 2-fold cross-validation was used for the experiments. Stratified k-fold was implemented using scikit-learn to ensure that the folds were made by preserving the percentage of the samples for each class. Only the negative class points in the training data were used to train the OCC algorithms. z-normalisation was used to normalise the data. As classification accuracy is not a correct performance matrix due to the highly-imbalanced testing data, we used the average area under the curve (AUC) for the receiver operating characteristics(ROCs) curve as it is generally used to measure the performance of OCC algorithms [10,11]. We carried out experiments with the OCCERP algorithm with 500 random planes (OCCERP (500)). We applied a statistical test, the Sign test [31] to compare the performance of OCCERP (500) against other one-class classifiers. It is based on counts of wins, losses, and ties. If the number of wins is at least N/2 + 1.96 � n √ /2, the classifier is significantly better with p < 0.05. In our experiments, the total number of datasets is 26, therefore if the number of wins is 18, the classifier is statistically better than the other classifier.

Standard Datasets and Domain Datasets.
Various kinds of datasets were used in the experiments [28,29,[32][33][34][35], Some datasets are created as imbalanced datasets [28,29]. Information on these datasets is presented in Table 1. e domain datasets [32][33][34][35] belong to two different domain datasets: normal activity-fall activity datasets and software engineering-related datasets. e domain datasets [32][33][34][35] are naturally imbalanced datasets. Mobilfall data [32] were collected using Samsung Galaxy S3 mobile employing the integrated 3D accelerometer and gyroscope. e data have two classes normal activity and fall activity. We used the data collected from 11 subjects who performed various normal and fall activities. We grouped the German Aerospace Centre (DLR) data [33] into normal activity and fall activity Table 1: Information on the datasets that were taken from [28,29]. e datasets presented before the separating line in the table are taken from [28] whereas the datasets presented after the separating line are taken from [29].  Computational Intelligence and Neuroscience and only used the data from the accelerometer and gyroscope. Only data from those subjects who performed both the activities were used. Coventry dataset (COV) [34] also has two classes normal activity and fall activity, and the complete information of these domain datasets is presented in detail in [5]. Information on these domain datasets is presented in Table 2. Software engineering-related datasets were taken from NASA's metrics data program data repository. is repository has defect data of various software projects written using different programming languages. cm1 and pc1 are written in C. kc1 and kc2 are implemented using C++. Datatrieve is composed of C functions and BLISS subroutines. class-level-kc1-defect-or-not and class-level-kc1-defect-count-ranking use only larger modules of kc1 data. class-level-kc1-defect-count-ranking data has two classes based on if the defects are in the top 5% in defect ranking or not. e software projects are described using different features such as McCabe measures [36] and Halstead measures [37]. Information on these software engineeringrelated datasets is presented in Table 2.

Results.
e results (average AUCROC) for datasets presented in Table 1 are provided in Table 3, which suggest that out of 16 datasets, OCCERP (500) performed best for eleven datasets. LOF method performed best for eight datasets. Both achieved the joint best results for three datasets.
e results (average AUC) for domain datasets (presented in Table 2) are provided in Table 4. e OCCERP (500) performed best for seven datasets out of ten datasets, whereas other OCC algorithms were best for three datasets. If performed best for two datasets, whereas LOF performed best for one dataset. e results suggest the superior performance of OCCERP (500) over other standard OCC algorithms.
Wins, losses, and ties for OCCERP (500) against other OCC algorithms for all 26 datasets are presented in Table 5. As discussed earlier, if the win is equal to or more than 18 the OCCERP (500) is statistically better than that algorithm. e number of wins is at least 18 for OCCERP (500) against all other OCC algorithms. is shows that OCCERP (500) is statistically better than other OCC algorithms.

Conclusion
OCC is a challenging task due to the absence of the outlier class data points in the training dataset. In this paper, we presented OCCERP to address OCC problems. OCCERP creates many OCC models. In each model, a random plane and a pivot point are used to compute an outlier score for a given data point. Outlier scores for the data point are combined using a novel minimum approach. Experiments suggested that OCCERP performed better than or similar to other OCC methods. is shows the effectiveness of the OCCERP method.
In this paper, the RLO approach is used to create random planes and pivot points. In the future, we will develop other approaches to generate random planes and pivot points. e combination of OCCERP with other ensemble approaches, such as bagging [38] (to create different training datasets), is another future research direction. We will also study the performance of OCCERP in the feature space created by random projections and principal component analysis.

Disclosure
Some results were taken from our preprint. e preprint is also referred to in the reference as [17].

Conflicts of Interest
e author declares that there are no conflicts of interest.