An Improved Semisupervised Outlier Detection Algorithm Based on Adaptive Feature Weighted Clustering

There exist already various approaches to outlier detection, in which semisupervised methods achieve encouraging superiority due to the introduction of prior knowledge. In this paper, an adaptive feature weighted clustering-based semisupervised outlier detection strategy is proposed. This method maximizes the membership degree of a labeled normal object to the cluster it belongs to and minimizes the membership degrees of a labeled outlier to all clusters. In consideration of distinct significance of features or components in a dataset in determining an object being an inlier or outlier, each feature is adaptively assigned different weights according to the deviation degrees between this feature of all objects and that of a certain cluster prototype. A series of experiments on a synthetic dataset and several real-world datasets are implemented to verify the effectiveness and efficiency of the proposal.


Introduction
Outlier detection is an important topic in data mining community, which aims at finding patterns that occur infrequently as opposed to other data mining techniques [1].An outlier is an observation that deviates significantly from, or inconsistent with the main body of a dataset, as if it was generated by a different mechanism [2].The importance of outlier detection is in the view of the fact that outliers can provide raw patterns and valuable knowledge about a dataset.Current application areas of outlier detection include crime detection, credit card fraud detection, network intrusion detection, medical diagnosis, faulty detection in critical safety systems, or detecting abnormal regions in image processing [3][4][5][6][7][8][9].
Recently the studies on outlier detection are very active and many approaches have been proposed.In general, existing work on outlier detection can be broadly classified into three modes depending on whether label information is available or can be used to build outlier detection models: unsupervised, supervised, and semisupervised methods.
Supervised outlier detection concerns the situation where the training dataset contains prior information about the class of each instance that is normal or abnormal.Oneclass support vector machine (OCSVM) [10] or support vector data description (SVDD) [11,12], which considers the case that training data are all normal instances, conducts a hypersphere around the normal data and utilizes the constructed hypersphere to detect an unknown sample as an inlier or outlier.The supervised outlier detection problem is a difficult case in many real-world applications, since the acquisition of label information of the whole training dataset is often expensive, time consuming, and subjective.
Unsupervised outlier detection, without prior information about the class distribution, is generally classified into distribution-based [3], distance-based [13,14], densitybased [15,16], and clustering-based [17][18][19][20] approaches.Distribution-based approach assumes that all data points are generated by a certain statistical model, while outliers do not obey the model.However, the assumption of an underlying distribution of data points is not always available in many real-life applications.Distance-based approach was firstly investigated by Knox and Ng [14].An object  in a dataset  is an outlier if at least % of objects in  are further than the distance  from .The global parameters  and  are not suitable when the local information of the dataset varies greatly.Representatives of this type of approaches include -nearest neighbor (NN) algorithm [13] and its variants [21,22].Density-based approach was originally proposed by Breunig et al. [15].A local outlier factor (LOF) is assigned to each data point based on their local neighborhood density.Then a data point with a high LOF value is determined as an outlier.However, this method is very sensitive to the choice of neighborhood parameter.
Clustering-based approaches [17][18][19][20] partition the dataset into several clusters depending on similarity of objects and detect outliers by examining the relationship between objects and clusters.In general, clusters containing significantly less data points than other clusters or being remote from other clusters are considered as outliers.The cluster structure of data can facilitate the task of outlier detection and a small amount of related literatures has been proposed.A classical clustering method is used to find anomaly in the intrusion detection domain [18].In the work of [19], the clustering techniques iteratively detect outliers for multidimensional data analysis in subspace.Zhao et al. [20] propose an adaptive fuzzy c-means (AFCM) algorithm by introducing sample weight coefficients to the objective function and apply it to anomaly data detection in energy system of steel industry.Since clustering-based approaches are unsupervised without requiring any labeled training data, their performance in outlier detection is limited.In addition, most of the existing clustering-based methods only involve the optimal clustering but do not incorporate optimal outlier detection into clustering process.
In many real-world applications, one may encounter cases where a small set of objects are labeled as outliers or belonging to a certain class, but most of the data are unlabeled.Studies indicate that the introduction of a small amount of prior knowledge can significantly improve the effectiveness of outlier detection [23][24][25].Therefore, semisupervised approaches to outlier detection have been developed to tackle such scenarios and have been thought of a popular direction of outlier detection recently.In order to take advantage of the label information of a target dataset, entropy-based outlier detection based on semisupervised learning from few positive examples (EODSP) is proposed in [23].That method extracts reliable normal instances from unlabeled objects and regards them as labeled normal samples.Entropybased outlier detection method is used to detect top  outliers.However, when a dataset initially provides labeled normal and abnormal samples, the algorithm in [23] cannot make full use of the given label information.Literature [24] develops a semisupervised outlier detection method based on the assessment of deviation from known labeled objects by punishing poor clustering results and restricting the number of outliers.Xue et al. [25] present a semisupervised outlier detection proposal based on fuzzy rough c-means clustering, which detects outliers by minimizing the sum of squared errors of clustering results and the deviation from known labeled examples as well as the number of outliers.Unfortunately, some labeled normal objects are finally misidentified as outliers due to improper parameter selection in [24,25].
Most of the previous research equally treats different features of objects in outlier detecting process, which does not conform to the intrinsic characteristic of a dataset.Actually, it is more reasonable that different features have different importance in each cluster, especially for high-dimension sparse datasets where the structure of each cluster is often limited to a subset of features rather than the entire feature set.Some works concerning feature weighted clustering have been studied.Huang et al. [26] propose a W-c-means type clustering algorithm that can automatically calculate feature weights.W-c-means adds a new step into the basic c-means algorithm to update the variable weights based on the current partition of data.Literature [27] develops an approach called simultaneous clustering and attribute discrimination (SCAD).SCAD learns the feature relevance representation of each cluster independently in an unsupervised manner.Zhou et al. [28] publish a maximum-entropy-regularized weighted fuzzy c-means (EWFCM) clustering algorithm for "nonspherical" shaped data.A new objective function is developed in the EWFCM algorithm to achieve the optimal clustering result by minimizing the dispersion within clusters and maximizing the entropy of attribute weights simultaneously.These existing methods about feature weighted clustering encourage scholars to study outlier detection based on feature weighted clustering.
To make full use of prior knowledge to facilitate clustering-based outlier detection, we develop a semisupervised outlier detection algorithm based on adaptive feature weighted clustering (SSOD-AFW) in this paper, in which the feature weights are iteratively obtained.The proposed algorithm emphasizes the diversity of different features in each cluster and assigns lower weights to irrelevant features to reduce their negative influence on outlier decision.Furthermore, based on the convention that outliers usually have a lower membership to every cluster, we relax the constraint of fuzzy c-means (FCM) clustering where the membership degrees of a sample to all clusters must sum up to one and propose an adaptive feature weighted semisupervised possibilistic clustering-based outlier detection algorithm.The interaction problem between optimal clustering and outlier detection is addressed in the proposed method.The label information is introduced into the possibilistic clustering method according to the following principles: (1) maximizing the membership degree of a labeled normal object to the cluster it belongs to; (2) minimizing the membership degrees of a labeled normal object to the clusters it does not belong to; and (3) minimizing the membership degrees of a labeled outlier to all clusters.In addition to the above principles, we simultaneously minimize the dispersion within clusters in the new objective function of clustering to achieve a proper cluster structure.Finally the yielded optimal membership degrees are used to indicate the outlying degree of each sample in the dataset.The proposed algorithm is found promising in improving the performance of outlier detection in comparison with typical outlier detection methods in accuracy, running time as well as other evaluation metrics.
The remainder of this paper is organized as follows.Section 2 gives a short review on possibilistic clustering algorithms.Section 3 presents the detailed description of feature weighted semisupervised clustering-based outlier detection algorithm.In Section 4, the experimental results of the proposed method against typical outlier detection algorithms are discussed on synthetic and real-world datasets.Finally, Section 5 follows our conclusions.
FCM is a well-known clustering algorithm [29], whose objective function is where   is the membership degree of the th (1 ≤  ≤ ) object to the th (1 ≤  ≤ ) cluster.‖⋅‖ represents the  2 -norm of a vector and  > 0 is the fuzzification coefficient.Note that the constraint condition in (2) indicates that the membership sum of each object to all clusters equals one.Therefore, FCM is sensitive to outliers due to the intuition that outliers or noises commonly locate far away from all cluster prototypes.For this reason, Krishnapuram and Keller [30] proposed a possibilistic c-means (PCM) clustering algorithm, which relaxes the constraint on the sum of memberships and minimizes the following objective function: where   is a suitable positive number.In PCM, the constraint (4) allows an outlier holding a low membership to all clusters, so an outlier has a low impact on the objective function (3).The membership information of each sample can be naturally used to interpret the outlying characteristic of a sample.For a certain sample, if it has a low membership to all clusters, it is likely to be an outlier.Afterward, another unsupervised possibilistic clustering algorithm (PCA) is proposed by Yang and Wu [31] and the objective function of PCA is described as where the parameter  can be calculated by the sample covariance:

Semisupervised Outlier Detection Framework Based on Feature Weighted Clustering
3.1.Model Formulation.In this section, we introduce prior knowledge into possibilistic c-means clustering method to improve the performance of outlier detection.First, a small subset of samples in a given dataset  = { 1 ,  2 , . . .,   } is labeled as normal or outlier objects.Each labeled normal object carries the label of class it belongs to.A semisupervised indicator matrix A = (  ) × is constructed to describe the semisupervised information and its entries are defined by the following: (i) If an object   is labeled as a normal point and it belongs to the th cluster, then   = −1, and for all  = 1, 2, . . ., ,  ̸ = , we let   = 1.
Usually data often contain a number of redundant features.The cluster structure in a given dataset is often confined to a subset of features rather than the entire feature set.Irrelevant features can only obscure the discovery of the cluster structure by a clustering algorithm.An intrinsic outlier is easy to be neglected due to the vagueness of cluster structure.Figure 1 presents an example of a threedimensional dataset.The dataset has two clusters ( 1 and  2 ) and 3 features ( 1 ,  2 , and  3 ).In the feature space ( 1 ,  2 ,  3 ), neither of the clusters is discovered (see Figure 1(a)).In the subspace ( 1 ,  2 ), cluster  1 can be found, but  2 cannot (see Figure 1(b)).Nevertheless, only cluster  2 can be clearly shown in ( 2 ,  3 ) (see Figure 1(c)).Therefore, if we assign weights 0.47, 0.45, and 0.08 to features  1 ,  2 , and  3 , respectively, cluster  1 will be recovered by a clustering algorithm.If the weights of features  1 ,  2 , and  3 are assigned as 0.13, 0.46, and 0.41, respectively, cluster  2 will be recovered.In this consideration, each cluster is relevant to different subsets of features, and the same feature may have different importance in different clusters.
In our research, let V  be the weight of the th (1 ≤  ≤ ) dimensional feature with respect to the th (1 ≤  ≤ ) cluster, which satisfies ∑  =1 V  = 1; then the feature weighted distance   between the th object and the th cluster prototype is defined as where the parameter  > 1 is the feature weight index.
The points within clusters usually behave strongly correlated, while weak correlation is shown between outliers.That is, normal points belong to one of the  clusters and outliers do not belong to any cluster.Therefore, a normal point should have a high membership to the cluster it belongs to, and an outlier has a low membership to all clusters.Based on this idea, we define a new objective function and minimize it as follows: where , , and  are the number of objects, features, and clusters, respectively.U = (  ) × ,   is the membership degree of the th object belonging to the th cluster.V = (V  ) × , V  denotes the feature weight of the th dimensional feature with respect to the th cluster.O = (  ) × ,   indicates the th dimensional feature value of the th cluster prototype.  denotes the feature weighted distance between object   and the th cluster prototype.  ∈ {1, −1, 0} is the element in semisupervised indicator matrix A.  > 0 is the fuzzification coefficient and the parameter  can be fixed as the sample covariance according to (6).The positive coefficient   adjusts the significance of the label information of the th object with respect to the th cluster in objective function (8).The larger   is, the larger the influence of label knowledge is.The first term in ( 8) is equivalent to the FCM objective function which requires the distances of objects from the cluster prototypes to be as small as possible.The second term is constructed to force   to be as large as possible.The third term        focuses on minimizing the membership degrees of a labeled outlier to all the clusters and maximizing the membership degree of a labeled normal object to the cluster it belongs to.With a proper choice of   , we can balance the label information weight of every object and achieve the optimal fuzzy partition.
The virtue of semisupervised indicator matrix A in objective function (8) can be elaborated as follows.Recalling the construction of semisupervised indicator matrix A and objective function (8), note that if we know that   belongs to the th cluster, then   = −1 and all the other entries in the th row equal 1.Thus, minimizing        in (8) means maximizing the membership of   to the th cluster and simultaneously minimizing the memberships of   to the other clusters.If   is labeled as an outlier, namely, where all the elements in the th row of A equal 1, then minimizing        in ( 8) means minimizing the memberships of   to all clusters, for an outlier does not belong to any cluster.If   is unlabeled, namely, where   = 0 for all  = 1, 2, . . ., , then the term        has no impact on objective function (8).

Solutions to the Objective Function.
In this subsection, an iterative algorithm for minimizing  SSOD-AFW with respect to   , V  , and   is derived similar to classical FCM.First, in order to minimize  SSOD-AFW with respect to V, U and O are fixed and the parameters   ( = 1, 2, . . ., ;  = 1, 2, . . ., ) are constants.The Lagrange function is constructed as follows: where   ( = 1, 2, . . ., ) are the Lagrange multipliers.By taking the gradient of  1 with respect to V  and setting it to zero, we obtain Then .
Substituting ( 12) into (9), we have It follows that The updating criteria of feature weight V  (1 ≤  ≤ , 1 ≤  ≤ ) are obtained: 1/(−1) The updating way of V  implies that the larger the deviation degrees from all samples to the th cluster prototype regarding the th feature are, the smaller the weight of the th feature is.That is, if the distribution of all data is compact around the th cluster prototype in the th feature space, the th feature plays a significant role in formulating the th cluster.Meanwhile, irrelevant features thus are assigned a smaller weight to reduce the negative impact of them on the clustering process.
To find the optimal cluster prototype O, we assume U and V are fixed and the parameters   ( = 1, 2, . . ., ;  = 1, 2, . . ., ) are also constants.We take the gradient of  SSOD-AFW with respect to   and set it to zero: The updating formula of cluster prototype   is obtained as follows: To solve the optimal fuzzy partition matrix U, we assume O and V are fixed and the parameters   ( = 1, 2, . . ., ;  = 1, 2, . . ., ) are also constants.We set the gradient of  SSOD-AFW with respect to   to zero: The updating formula of   is derived as follows: Formula (19) indicates that a large value of weighted distance   leads to a smaller value of   , for all 1 ≤  ≤ , 1 ≤  ≤ .It should be noted that the membership degree   is also dependent on the coefficient   .The choice of   is important to the performance of the SSOD-AFW algorithm because it serves in distinguishing the importance of the third term relative to the other terms in objective function (8).If   is too small, the third term will be neglected and the labels of objects will not work to promote the cluster structure.If   is too large, the other terms will be neglected, and the negative influence of possible mislabels of objects will be enlarged.The value of   should be chosen such that it has the same order of magnitude with the first term in (8).To determine the parameter   in an adaptive way, in all experiments described in this paper, we choose   proportional to  2  as follows: where  ∈ (0, 1) is a constant.Since the weighted distance   is dynamically updated, the value of parameter   is adaptively updated in each iteration.

Criteria for Outlier Identification.
Based on the above analysis, outliers should hold low membership degrees to all clusters.Therefore, the sum of memberships of an object to all clusters can be used to evaluate its outlying degree.For a certain object   , its outlying degree is defined as Thus, a small value of OD(  ) indicates a high outlying possibility of object   .The outlying degree of each sample in a dataset is calculated, respectively, and sorted incrementally.The suspicious outliers can be found just by extracting the top  objects in the sorted outlying degree sequence, where  is a given number of outliers contained in the dataset or a given number of outliers one needs.
In summary, the description of the SSOD-AFW algorithm is shown in Algorithm 1.
Finally output top  outliers with the smallest outlying degrees.
Computational complexity analysis: Step (2) needs (cNn) operations to compute  cluster prototype.The computational complexity of computing the weights of  features is (cNn) in Step (3).Step (4) requires (cNn) to compute the weighted distances of  objects to  cluster prototypes.
Step (5) needs (cn) to compute parameter   of  objects with respect to  cluster prototypes.Moreover, Step (6) needs (cn) operations to calculate the memberships of  objects to  clusters.Therefore, the whole computational complexity is (cNn), the same as that of the classical FCM algorithm.

Proof of Convergence.
In this section, we discuss the convergence of the proposed SSOD-AFW algorithm.To prove the convergence of objective function  SSOD-AFW in (8) by iterating V, O, and U with formulas ( 15), (17), and (19), it only needs to prove that  SSOD-AFW is monotonically decreasing and bounded after a finite number of iterations.Next Lemmas 2, 3, and 4 verify the monotonically decreasing property of  SSOD-AFW with respect to V, O, and U, respectively.Lemma 5 presents the boundedness of  SSOD-AFW .8) is nonincreasing by updating U = (  ) × with formula (19).
Proof.Similar to Lemma 2, when U and O are fixed, we just need to prove that the Hessian matrix of Lagrangian of  SSOD-AFW (V) at V * is positive definite, where V * is computed by (15).The Hessian matrix is denoted as (V) = (ℎ , (V)), whose element is expressed as follows: Since  > 1 and V  > 0, the diagonal entries of the diagonal matrix are apparently positive.Therefore, Hessian matrix (V) is positive definite. SSOD-AFW (V) attains its local minimum at V * computed by (15).This completes the proof.8) is nonincreasing when O = (  ) × is updated using (17).

Lemma 4. Objective function 𝐽 𝑆𝑆𝑂𝐷-𝐴𝐹𝑊 in (
The proof of Lemma 4 is similar to Lemma 2. 8) is bounded, there exists a constant , and it satisfies  - < .
Proof.Lemmas 2, 3, and 4 verify that objective function in (8) is nonincreasing under iterations according to (15), (17), and (19).Lemma 5 shows that  SSOD-AFW has a finite bound.Though the parameter   needs to be updated in the iteration process, it is a constant in the problem solving using Lagrangian multiplier technique.So   does not affect the convergence of the SSOD-AFW algorithm.Combining the above conclusions,  SSOD-AFW is sure to converge to a local minimum through iterations of V, O, and U by ( 15), (17), and (19).

Experiments and Analysis
Comprehensive experiments and analysis on a synthetic dataset and several real-world datasets are conducted to show the effectiveness and superiority of the proposed SSOD-AFW.We compared the proposed algorithm with two thestate-of-the-art unsupervised outlier algorithms, LOF [15] and NN [13], one supervised method SVDD [32], and one semisupervised method EODSP [23].
Let  be the number of true outliers that a dataset  contains and  denotes the number of true outliers detected by an algorithm.In experiments, top  most suspicious instances are detected out.Then the accuracy is given by The receiver operating characteristic (ROC) curve represents the trade-off relationship between the detection rate and the false alarm rate.In general, the area under the ROC curve (AUC) is used to measure the performance of outlier detection method, and the value of AUC for ideal detection performance is close to one.
For a given outlier detection algorithm, true outliers occupy top positions with respect to the nonoutliers among  suspicious instances; then the rank-power (RP) of the algorithm is said to be high.If  is the number of true outliers found within top  instances and   denotes the rank of the th true outlier, then the metric rank-power (RP) is given by RP reaches the maximum value 1 when all  true outliers are in the top  positions.Larger value of RP implies better performance of an algorithm.

Experiments on Synthetic Dataset.
A two-dimensional synthetic dataset with two cluster patterns is generated from Gaussian distribution to intuitively compare the outlier detection results of the proposed method against the other four algorithms mentioned above.The mean vectors of the two clusters are  1 = (7.5, 9) T and  2 = (1, 3) T , respectively, and the covariance matrixes of them are Σ 1 = ( 1 0 0 2 ) and Σ 2 = ( 1 0 0 2 ).As Figure 2(a) shows, a total of 199 samples are contained in the synthetic dataset, in which there are 183 normal samples (within two clusters) and 16 outliers (cluttered between two clusters).13 normal objects are labeled and marked as symbol "×" and 5 outliers are labeled and marked with symbol " * ", while the rest samples are unlabeled marked with "⋅." Figures 2(b)-2(f), respectively, illustrate the outlier detection results on the synthetic dataset by using LOF, NN, SVDD, EODSP, and SSOD-AFW, where the red colored symbols "∘" denote the detected suspicious outliers.Here, the value of parameter  (size of neighborhood) in LOF and NN is assigned to 3. Gauss kernel function is chosen in SVDD and we set the bandwidth  = 0.3 and the tradeoff coefficient  = 0.45.Besides, the Euclidean distance threshold  in EODSP is set as 0.1 and the percentage of negative set is set to  = 10%.The parameter settings of the proposed algorithm are  = 2.3,  = 0.85, and  = 2.23.In addition to SVDD, the top 16 objects with the highest outlying scores are considered as the results in the other four algorithms.
In Figure 2, it is noticeable that the unsupervised methods LOF and NN as well as the supervised SVDD fail to completely detect all of the 5 labeled outliers.Nevertheless, some normal points in clusters are badly misjudged as outliers.In contrast, the semisupervised EODSP algorithm and the proposed SSOD-AFW algorithm successfully detect all of the 5 labeled outliers.However, EODSP does not completely detect all the unlabeled true outliers, and several true normal samples are improperly identified as outliers.It is concluded from Figure 2 that the proposed algorithm finds all the true outliers in the synthetic dataset and excludes the normal samples, while the other methods do not.
Figure 3 numerically presents the performance evaluation of outlier detection using LOF, NN, SVDD, EODSP, and SSOD-AFW for the synthetic dataset.From Figure 3 we see that the values of accuracy, AUC, and RP of the proposed algorithm all reach 1, outperforming the other methods.
Furthermore, during the experimental process shown in Figure 3, the feature weights of the synthetic dataset learned by formula (15) in our method are V 11 = 0.6727, V 12 = 0.5985, V 21 = 0.3273, and V 22 = 0.4015.To strengthen the effectiveness of feature weights in the proposed SSOD-AFW algorithm, a comparative analysis of the weighted and the nonweighted versions is implemented on the synthetic dataset, respectively.Considering the nonweighted scenario of the proposed algorithm, the outlier detection result on the synthetic dataset is presented in Figure 4.As can be observed from Figure 4, the nonweighted SSOD-AFW ends up tagging 15 true outlying and 1 normal samples as outliers, with one unlabeled true outlier missed.

Introduction of Datasets.
For further verification of the effectiveness of the proposed algorithm, five real-world datasets from UCI Machine Learning Repository [34] (i.e., Iris, Abalone, Wine, Ecoli, and Breast Cancer Wisconsin (WDBC)) are employed to test the performance of the proposed algorithm against LOF, NN, SVDD, and EODSP.As mentioned in Aggarwal and Yu [35], one way to test the performance of an outlier detection algorithm is to run it on the dataset and calculate the percentage of points belonging to the rare classes.So a small amount of samples from the same class are randomly selected as outlying objects or as target objects, for the five datasets.For instance, the original Iris dataset incorporates 150 objects with 50 objects in each   of three classes.We randomly selected 26 objects from class "Iris-virginica" as target outliers and all objects in the other two classes are considered as normal objects.The other four datasets are similarly preprocessed and more detailed description about the five real-world datasets is given in Table 1.

Experimental Result Analysis.
We compare the outlier detection performance of the proposed algorithm with LOF, NN, SVDD, and EODSP on real-world datasets.Each method has its own parameters, and the detailed parameter settings of each algorithm are as follows.The parameters of the proposed algorithm are  = 2.1,  = 0.65, and  = 1.53  for all the five datasets.The strategy of parameter selection for SSOD-AFW will be discussed in the later subsection called parameter analysis.For the other algorithms, those parameters are set exactly as mentioned in their references.It is well known that LOF and NN have high dependency on the neighborhood parameter .In this paper we set  = 5 for datasets Iris and WDBC,  = 3 for dataset Abalone,  = 10 for dataset Wine, and  = 10 for dataset Ecoli.For SVDD method, Gaussian kernel function is employed and the bandwidths  = 0.45 and  = 0.5 on all of the five real-world datasets.In EODSP, the Euclidean distance threshold  is set as 0.1 and the percentage of negative set  is set as 30% for Iris and Abalone datasets, and  = 0.5,  = 30% for datasets Ecoli, Wine, and WDBC.Since we randomly select outliers from target classes for each dataset, each experiment is repeated 10 times with the same number of different outliers.The average accuracy, AUC, and RP are calculated as the criteria of performance of various detection methods.
Figure 5 illustrates the outlier detection results of SSOD-AFW algorithm against LOF, NN, SVDD, and EODSP, respectively, on the five real-world datasets.As can be seen from Figure 5, the proposed algorithm can accurately identify outliers according to the cluster structure of a dataset, with the guidance of the label knowledge.It shows distinct superiority over the other unsupervised (LOF, NN), semisupervised (EODSP), and supervised (SVDD) methods.In particular, the outlier detection accuracy of SSOD-AFW in Figure 5(a) is significantly higher than the others, especially for datasets Iris and Wine.One can know from Figure 5(b) that the AUC values of our method are always higher than the others for all datasets except for WDBC.In terms of RP, SSOD-AFW performs better than the other four algorithms on datasets Iris and Wine, whereas slightly poorer than SVDD on Abalone, poorer than LOF on Ecoli, and poorer than NN on WDBC, seen as in Figure 5(c).
It is worth mentioning that the experiment of the proposed algorithm on WDBC involves one-class clustering problem.Although one-class clustering task is generally meaningless, one-class clustering-based outlier detection is especially meaningful and feasible in our proposal because our approach does not require that the membership degrees must sum up to 1.This is one of the powerful and important characteristics of the proposed algorithm.

The Influence of the Proportion of Labeled Data on
Outlier Detection.In this subsection, we will investigate the influence of the proportion of labeled samples on the accuracies of our methodology.Two typical situations are considered and tested.The first one is that the proportion of labeled outliers increases when the number of labeled normal objects is fixed at a certain constant.The other one is that the percent of labeled normal samples varies while the quantity of labeled outliers is fixed.So two groups of experiments are designed to compare the accuracies of the proposed algorithm against the EODSP, in the situations of different  percent of labeled outliers and normal samples, respectively, on the datasets Iris, Abalone, Wine, Ecoli, and WDBC.In the two experiments, the percent of labeled outliers or labeled normal samples ranges from 0% to 40%, respectively, when the number of another kind of labeled objects is fixed.We randomly select a certain number of labeled outliers or normal samples from each dataset, each experiment is repeated 10 times, and the average accuracies of SSOD-AFW and EODSP are computed.
Figure 6 shows results of the first group of experiments where the percent of labeled outliers varies from 0% to 40%.One can see from Figure 6 that the accuracies of the two semisupervised algorithms are roughly increased with the proportion of labeled outliers becoming larger.This powerfully supports the idea that semisupervised outlier detection algorithms can improve the accuracy of outlier detection by using prior information.Furthermore, the SSOD-AFW achieves a better accuracy than EODSP algorithm for the same proportion of labeled outliers on the five datasets.Especially for Wine, the accuracy of SSOD-AFW is 40% higher than that of EODSP.EODSP addresses the problem of detecting outliers with only few labeled outliers as training data.The labeled normal instances are extracted according to the maximum entropy principle, where the entropy is computed only using the distance between each testing sample and all the labeled outliers.That makes EODSP not flexible as our proposed method due to information deficiencies.
Figure 7 illustrates the accuracy comparison of the proposed algorithm and EODSP, when the proportion of labeled normal samples increases from 0% to 40% and the percent of labeled outliers is fixed.Note that our method obtains a better accuracy than EODSP on all of the five real-world datasets.The accuracy of the proposed algorithm gets larger when the percent of labeled normal samples increases.As mentioned, EODSP emphasizes the semisupervised outlier detection only with few labeled outliers in the initial dataset, but without considering any labeled normal objects.Therefore, the accuracy of EODSP algorithm keeps harper with various proportions of labeled normal objects and always equals the accuracy value of 0% labeled normal samples as well.

Parameter Analysis.
The parameters , , and  are important in our proposed algorithm, which affect the performance of SSOD-AFW.In this section, the influence of each parameter on outlier detection accuracy is studied.
The parameter  is the fuzzification coefficient.Figure 8(a) analyzes the relationship between the outlier detection accuracy of our proposed algorithm and parameter , with  varying from 1.5 to 5.0.The results imply that the highest accuracy is achieved when  ranges in [2,4].So it is reasonable that  value in the experiments shown in Figure 5 has been set as 2.1.The parameter  ∈ (0, 1) controls the importance of the label information in the result of outlier detection.Outlier detection accuracies are testified by varying  from 0.1 to 0.9, which are shown in Figure 8(b).The overall tendency is that the accuracies become larger as  increases.The best results of the proposed algorithm occur and keep stable when  ≥ 0.7.Finally, from Figure 8(c), we conclude that the feature weight index  has small influence on the accuracy of SSOD-AFW in the situation that the other parameters maintain the same settings.So the proposed algorithm is not sensitive to the parameter .In general, the parameter  is suggested to select a constant from (1,4].4.3.5.Execution Time Analysis.Figure 9 analyzes the average running time of the proposed algorithm against the other algorithms on five real-world datasets.The experimental environment is Windows XP systems, MATLAB 7.1 platform, 3 GHz CPU, 2 GB RAM.Because the volume of Abalone dataset is far greater than the other four datasets, the running times of various datasets are distinctly different.In order to facilitate the display, in Figure 9 the horizontal coordinate axis is translated downward a short distance.The result indicates that the proposed algorithm is more time-saving than the  other four typical outlier detection algorithms, except for NN on dataset Wine.In whole, the execution time of the SSOD-AFW is comparable to that of NN and less than those of other algorithms on most of the datasets.

Conclusions
In order to detect outliers more precisely, a semisupervised outlier detection algorithm based on adaptive feature weighted clustering, called SSOD-AFW, is proposed in this paper.Distinct weights of each feature with respect to different clusters are considered and obtained by adaptive iteration, so that the negative effects of irrelevant features on outlier detection are weakened.Moreover, the proposed method makes full use of the prior knowledge contained in datasets and detects outliers in virtue of the cluster structure.It is verified by a series of experiments that the proposed SSOD-AFW algorithm is superior to other typical unsupervised,   semisupervised, and supervised algorithms in both outlier detection precision and running speed.
In this paper, we present a new semisupervised outlier detection method which utilizes labels of a small amount of objects.However, our method assumes that the labels of objects are reliable and does not consider mislabel punishment in the new objective function.Therefore, a robust version of the proposed method dealing with noisy or imperfect labels of objects deserves further studies.Moreover, since only one typical dissimilarity measure named Euclidean  distance is discussed in our method, the SSOD-AFW algorithm is limited to outlier detection for numerical data.The future research aims at extending our method to mixedattribute data in more real-life applications such as fault diagnosis in industrial process or network anomaly detection.

Figure 2 :
Figure 2: Outlier detection results of different algorithms on the two-dimensional synthetic dataset.

Figure 3 :
Figure 3: Performance comparison of different algorithms on the synthetic dataset.

Figure 4 :
Figure 4: Outlier detection result of the nonweighted SSOD-AFW on the synthetic dataset.

Figure 5 :
Figure 5: Performance comparison of various algorithms on the real-world datasets.

Figure 6 :
Figure 6: Accuracy analysis of algorithms EODSP and SSOD-AFW with different percent of labeled outliers on the real-world datasets.

Figure 7 :
Figure 7: Accuracy analysis of algorithms EODSP and SSOD-AFW with different percent of labeled normal samples on the real-world datasets.

Figure 8 :Figure 9 :
Figure 8: Outlier detection accuracy of the proposed algorithm under various parameters on the real-world datasets.

Table 1 :
Description of real-world datasets.