Machine Fault Classification Based on Local Discriminant Bases and Locality Preserving Projections

Machine fault classification is an important task for intelligent identification of the health patterns for a mechanical system being monitored. Effective feature extraction of vibration data is very critical to reliable classification ofmachine faults with different types and severities. In this paper, a newmethod is proposed to acquire the sensitive features through a combination of local discriminant bases (LDB) and locality preserving projections (LPP). In the method, the LDB is employed to select the optimal wavelet packet (WP) nodes that exhibit high discrimination from a redundant WP library of wavelet packet transform (WPT). Considering that the obtained discriminatory features on these selected nodes characterize the class pattern in different sensitivity, the LPP is then applied to address mining inherent class pattern feature embedded in the raw features. The proposed feature extraction method combines the merits of LDB and LPP and extracts the inherent pattern structure embedded in the discriminatory feature values of samples in different classes. Therefore, the proposed feature not only considers the discriminatory features themselves but also considers the dynamic sensitive class pattern structure. The effectiveness of the proposed feature is verified by case studies on vibration data-based classification of bearing fault types and severities.


Introduction
Machine fault classification is an important task for intelligent identification of the condition patterns for the system being monitored.For a mechanical system, vibration monitoring is often employed to evaluate the system dynamics.A specific application considered in this paper is to monitor health condition of a machine or its components, such as bearings, for timely identifying possible faults, which is increasingly significant to reduce machine downtime and ensure high productivity.Once a fault happens in a machine, it makes sense to identify the fault type or the fault severity through vibration data analysis so that time and safety can be guaranteed.There are many reasons leading to machine failures.For instance, poor lubrication, acid corrosion, and plastic deformation could cause the bearing to work in an abnormal condition, respectively [1].In addition, typical damages of the rolling bearing are located at outer raceway, inner raceway, or rolling element.To effectively monitor and recognize the machine condition, the major challenge is to extract reliable features from vibration data which are often disturbed by the environment noise.The traditional features, such as the time-domain features and the frequency-domain ones, are often applied to fault diagnosis [2][3][4][5][6].However, the pattern of vibration signals demonstrates many nonlinear characteristics and the methods mentioned above cannot extract these nonlinear features effectively for classifying fault types and severities.Therefore, this study intends to find a good feature representation of the raw signals that yields higher discriminatory information.
Wavelet transform has the ability to well express the nonstationary signals and represent sensitive features with its multiresolution capability, which has achieved a great success in fault classification [7].As one of the most widely used wavelet transform methods, the wavelet packet transform (WPT) is well-known for its orthogonal, complete, and local property [8].WPT leads to a redundant binary tree of a signal with a set of time-frequency subspaces each of which is made up of a wavelet packet (WP) base vector.The whole subspaces are called a WP library.As we know, different WP bases give rise to different representation of a given signal.Thus, it is important to select optimal WP bases out of the whole WP library for enhanced signal analysis or classification.For classification, the main objective is to find an optimal set of WP nodes that yield high discriminatory information for discriminating different classes as much as possible.This can be realized by the local discriminant bases (LDB) [9].The algorithm identifies optimal LDBs with high discriminatory information by using a dissimilarity measure on the given dataset.Many related works have been conducted in the last two decades to demonstrate the effectiveness of the LDB to achieve a good classification through selecting the optimal WP bases among various redundant WP subspaces [9][10][11][12][13][14][15][16][17][18].Although the discriminatory features can be obtained by selecting WP nodes via the LDB, different node displays different sensitivity in characterizing class information.In the machine learning-based classification approach, the classification accuracy will mainly depend on the sensitive features.Current methods mainly employ the most sensitive bases for classification.We focus on another approach, which is using the dimensionality reduction techniques to mine more sensitive features in the whole set of discriminatory features by LDB.Therefore, in this study, one challenge is how to extract the most useful and sensitive information hidden in high-dimensional data based on the selected WP nodes.
In the past few decades, many useful dimensionality reduction techniques have been employed for fault diagnosis and classification [2,3,[19][20][21][22][23][24][25][26][27].These techniques can be simply divided into two types: linear and nonlinear approaches.Linear dimensionality reduction aims to find a set of lowdimensional bases from high-dimensional data through the linear transformation.Two of the well-known linear learning methods are principal component analysis (PCA) [19,20] and linear discriminant analysis (LDA) [21,22].The other type, nonlinear dimensionality reduction, searches for nonlinear structure hidden in high-dimensional data.There are two main nonlinear approaches including kernel-based techniques [2,23,24] and manifold learning techniques [25][26][27].Manifold learning pursuits the goal to embed data that originally lies in a high-dimensional space in a lower dimensional space while preserving local characteristic properties, for example, local geometric property (Isomap [28]), local embedding structure (LLE [29]), local adjacency relations (LE [30]), and local tangent space information (LTSA [31]).Although these nonlinear manifold learning methods have been effectively developed to machine fault classification, they need heavy computation cost and are complex to be extended for fault classification of a new data [25][26][27]32].He and Niyogi [33] proposed a new linear model, locality preserving projections (LPP), which can reveal the nonlinear manifold structure embedded in the dataset with a kernel that maintains the local information.LPP is provided with the remarkable superiority that it can form an explicit map to the manifold learning algorithm, which is linear and easily operational.Some works have indicated that LPP is beneficial to feature extraction in machine fault classification [3,24].Hence, the LPP is employed in this study to extract the sensitive information hidden in the raw feature data from the selected WP nodes.
In this paper, based on energy features of the nodes selected by LDB algorithm from the WP library, a new effective feature is proposed to mine the nonlinear pattern information by LPP in the case of bearing fault classification.The proposed feature intends to overcome the weakness of the discriminatory WP nodes for characterizing the fault pattern in different sensitivity.Specifically, vibration signals from the bearings with different fault types and severities are firstly decomposed into the WP library and the LDB is then applied to identify the optimal WP subspaces that supply maximum dissimilarity information among them.After that, the root energy of the selected nodes constitutes a raw feature set.Due to the redundant property of the features in representing the fault pattern, some important sensitive information may be submerged among them.Therefore, the LPP is employed to extract the nonlinear sensitive pattern information embedded in the dataset.These sensitive features are finally chosen as inputs to a diagnostic classifier for characterizing bearing types and severities.
The rest of this paper is organized as follows.Section 2 describes the theoretical background and major principle of the proposed feature extraction method that combines the LDB and the LPP.In Section 3, experimental results on bearing fault classification are used to verify the effectiveness of the proposed method as compared to other traditional feature extraction methods.Finally, conclusions are provided in Section 4.

WPT for Signal Decomposition.
The WPT is an excellent signal decomposition tool with well-known properties of being orthogonal, complete, and local [8].In operation, the WPT utilizes a series of low-pass and high-pass filters to filter a signal being analyzed recursively.Through this way, a signal () can be decomposed into a set of WP nodes with the form of a full binary tree by the WPT.Each node possesses a specific time-frequency subspace.Let Ω 0,0 denote a vector space R  corresponding to the node 0 of the parent tree.Then at each level the vector space is split into two mutually orthogonal subspaces by a pair of low-pass and high-pass filters.The split process can be given by where  indicates the level of the tree and  represents the node index in level  with  = 0, . . ., 2  − 1.This process is repeated until level , giving rise to 2  mutually orthogonal subspaces.
Each subspace Ω , is spanned by a set of base vectors , where  0 = log 2  ≥  ( is signal length and  0 is the maximum level of signal decomposition).The vector  ,, represents the WP base function indexed by the triplet (, , ) representing scale, frequency band (oscillation), and time position, respectively.
The WP coefficients of signal () can be calculated in the inner product of the signal with every WP function as follows: where  ,, denotes the th set of WP coefficients at the th scale parameter and  is the translation parameter.In other words, the signal () is decomposed into 2  subspaces with coefficients { ,, } =2  0 − −1 =0 in each subspace.The signal () can then be expressed as where, in the index (, , ), (, ) corresponds to the terminal (leaf) nodes and  ,, are the base vector coefficients at position ().

LDB for WP Selection.
The LDB is a pruning algorithm that identifies the subspaces and their bases that exhibit high discrimination between signal classes using a given dissimilarity measure [9].The optimal selection of LDB subspaces for a given dataset is driven by the nature of the dataset and the dissimilarity measure.Dissimilarity measure is designed to evaluate the "statistical distances" among different classes for each WP node.Numerous dissimilarity measures have been developed so far, such as relative entropy, energy difference, correlation index, and nonstationarity.In this paper, relative entropy is investigated as the dissimilarity measure in identifying optimal WP subspaces.
The LDB algorithm is used to identify the WP nodes that exhibit high discrimination, indicated by large statistical distance between classes.A set of training signals for all  classes are decomposed into full binary WP trees of order .Let each of the signals in the training set be denoted by    , where the index  and  correspond to the th training signal in the th class.The WP tree is pruned by the LDB algorithm in such a way that a node is split if the cumulative discriminative measure of the children nodes is greater than that of the parent node.In other words, a node is split only if the children nodes have better discriminative power than that of the parent node.As a result, the process will end with a subset of terminal WP nodes that contribute to maximizing the statistical distance between different classes.
Mathematically, the LDB selection process is described as follows.Suppose that  , represents the desired local discriminant base restricted to the span of  , , which is a set of base vectors at (, ) node, and Δ , is the array containing the discriminant measure of the same node.Step 1. Choose a time-frequency decomposition method, such as the WPT, to decompose the signals contained in the training dataset.
Step 5. Order the chosen base functions by their power of discrimination.
Step 6. Use  (normally much less than ) most discriminant base functions for constructing classifiers.
After Step 4 is performed, a complete set of orthogonal bases are constructed.Orthogonality of the bases ensures that wavelet coefficients used as features during classification process are uncorrelated as much as possible.Subsequently, one can use all the WP coefficients from each of the terminal nodes of the pruned tree or just use their subset with  highest discriminant bases in Step 6 or employ a statistical method to produce low-dimensional features as the input features of a classifier for discriminating different classes.In this paper, the WP coefficients of the selected optimal WP nodes are taken for calculating the root energy contained in each node.Mathematically, for the WP coefficients  , (),  = 1, . . ., 2  0 − , from each node (, ), their root energy is calculated as where   is the one-dimensional representation of y  and the matrix S ∈ R × is a similarity matrix.A possible way of defining S is as follows: where parameter  ∈ R and  defines the radius of the local neighborhood and is sufficiently small but bigger than 0. Two samples y  and y  are viewed within a local -neighborhood provided that ‖y  -y  ‖ 2 < .
The objective function in (7) with the choice of symmetric weights   (  =   ) will be heavily penalized if neighboring points y  and y  are mapped far apart, that is, if (  -  ) 2 is large.Therefore, minimizing the objective function is to ensure that if y  and y  are close, then   and   are close as well.Based on this point, the local structure of the input data can be preserved.Following some algebraic steps, we can get where D is a diagonal matrix; its entries are column (or row since S is symmetric) sums of S,   = Σ    .L = D-S is the Laplacian matrix.The bigger the value   (corresponding to   ) is, the more important   is.Therefore, the LPP algorithm imposes a constraint as follows: Then, the minimization problem reduces to finding The transformation vector w that minimizes the objective function is finally given by the minimum eigenvalue solution to the generalized eigenvalue problem: where the matrices YLY  and YDY  are symmetric and positive semidefinite.The top several projective vectors that minimize the objective function are the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the manifold, so they are capable of discovering the nonlinear manifold structure [33].In this paper, the top several projective vectors are chosen as the mapping vectors to represent the LDB feature.They characterize the inherent class pattern and thus are hoped to mine the useful sensitive features for classification.

Proposed Feature Extraction Scheme for Data Classification.
In the techniques mentioned above, the LDB and the LPP techniques have specific merits for classification.Specifically, the LDB algorithm focuses on identification of optimal decomposition subspaces for discriminatory feature extraction, while the LPP addresses the nonlinear pattern structure that represents the inherent condition class pattern.
In other words, the LDB focuses on extraction of optimal raw features but each feature characterizes the class pattern in different sensitivity or local sensitivity, while LPP mainly addresses mining inherent class pattern feature embedded in the raw features.Therefore, this paper is proposed to combine the merits of these two techniques for a novel feature extraction.Specifically, the novel feature addresses extracting the inherent pattern structure embedded in the optimal WP nodes.Therefore, the proposed feature not only considers the static discriminatory WP node features themselves but also considers the dynamic sensitive class pattern structure embedded in the samples.The idea of the proposed feature is illustrated in Figure 1.It can be found that although the optimal WP nodes (filled in black in Figure 1) have been selected through the LDB algorithm, they have different sensitivity in characterizing the class pattern.However, after conducting the LPP algorithm on the feature values, a new sensitive feature that clearly represents the class pattern is effectively extracted.In this process, the sensitive feature characterizes a nonlinear class pattern manifold embedded in sample values of the raw features.This indicates that LPP is beneficial to improve the class sensitivity of the selected Based on the principle of combination of LDB and LPP, the proposed feature extraction algorithm can be then described as follows.

LDB-LPP Feature Extraction Algorithm.
A training dataset consisting of  class of signals and a testing dataset are given.
Step 1. Conduct the WPT to decompose the signals contained in the dataset into the WP library with level  via (2).
If the signal is from the training dataset, go to Step 2. Else, if the signal is from the testing dataset, then go to Step 3.
Step 2. Conduct the LDB algorithm to identify the optimal WP nodes that supply maximum dissimilarity information among the training dataset.
Step 3. Calculate the root energy of the coefficients of selected WP nodes to constitute a raw feature set E  via (6).
If the signal is from the training dataset, go to Step 4. Else, if the signal is from the testing dataset, then go to Step 5.
Step 4. Conduct the LPP algorithm to the raw feature value sets of the training dataset to obtain the mapping matrix through solving (12); then go to Step 5.
Step 5. Use the mapping matrix in Step 4 to calculate the new feature values of the dataset.
The proposed features have the most sensitive discriminatory capability and are thus chosen as inputs to a diagnostic classifier for characterizing data classes.To make it clearer, the flowchart of the proposed algorithm is shown in Figure 2 as well as the scheme of machine fault classification.The machine fault classification scheme includes two parts: the LDB-LPP feature values are firstly extracted for both the training and testing signals and then a diagnostic classifier is trained for classification of the fault signals.

Experimental Results and Analysis
In order to evaluate the effectiveness of the feature extraction scheme proposed above for machine fault classification, the bearing data with multiple faults from real bearing experiments are analyzed in this study.an induction motor, a dynamometer, a torque transducer, and control electronics.The resulting vibration was measured by an accelerometer being mounted to the motor housing at the drive end of the motor as illustrated in Figure 3.The accelerometer is a vibration sensor with a bandwidth up to 5000 Hz and a 1 V/g output.Single point faults of size 0.007, 0.014, 0.021, and 0.028 inches were set on the drive-end bearings by using the electric discharge machining approach.These faults were set, respectively, on rolling element, inner raceway, and outer raceway in the experiments.The sampling frequency of the data is 12 kHz with the sample length being 2000 and the motor speed was 1748 rev/min.Datasets A and B to be analyzed consist of ten classes (class labels are marked in Table 1) covering different bearing fault types and severities as listed in Table 1.In the datasets, there are four different fault types including normal, outerrace fault, inner-race fault, and ball fault, and each of the last three fault types includes three different defect sizes of 0.007, 0.014, and 0.021 inches, respectively.In dataset A, the samples are split into 500 training ones (50 in each class) and 500 testing ones (50 in each class), while dataset B contains 250 training samples (25 in each class) and 250 testing samples (25 in each class).This is a complex ten-class problem to identify both the fault type and the fault size for the operating bearing conditions.

Feature Evaluation.
In this study, the decomposition level of the WPT is set to be 6 and the Daubechies 8 wavelet is employed.The selected nodes by the LDB are shown in Figure 4.In the following study, the root energy of a signal decomposed into each selected node is calculated as the raw features in the proposed study, while that in each node at the last layer is used to form the traditional WPT feature for a comparison.
To quantitatively evaluate the capability of LDB feature in pattern classification, three common clustering evaluation metrics are analyzed as follows.The first is a widely used discriminant factor.Suppose that there is a feature vector { 1 ,  2 , . . .,   }, where  is the dimension of feature; then the discriminant factor is defined as follows: where   indicates the between-class scatter to describe the scattered level among different classes, while   is the withinclass scatter which represents the concentrated level in the same classes.These two scatters are, respectively, defined as where   is the total number and    is the average feature vector for samples in the th class and   is the total average of the feature vectors for all classes.It can be seen that the discriminant factor  is a comprehensive indicator that combines between-class scatter and within-class scatter.A larger discriminant factor is better for classification purpose to characterize the discriminating capability of the given feature.The other two clustering evaluation metrics are the cluster accuracy (ACC) and normalized mutual information (NMI) metrics [35], which are defined as follows, respectively.
Cluster Accuracy (ACC).Assuming that   and   are the acquired new label and the provided real label of the given point   , the ACC is defined as follows: where  is the total number of the samples and Map(  ) is the optimal mapping function that ranges each   to match the real label and can be found by the Kuhn-Munkres (KM) algorithm.Here, we assume that the relationship of the identified clusters with the predefined classes is known.Thus, it is easy to imagine that a larger ACC value indicates better clustering and generally better classification.

Normalized Mutual Information (NMI).
It is a mutual information (MI) metric and defined as where   is the number of the acquired samples in class   and t is the number of the provided samples in the ground truth class   .In addition,  , is the number of the intersected samples between class   and class   .A larger NMI reveals better clustering performance, which is beneficial to classification.For a visible purpose and a fair comparison, the dimensions of the LDB and the WPT features are both reduced to 3 by using dimensionality reduction techniques including the LPP and the traditional PCA.Note that the LDB feature followed by the LPP just generates the proposed feature in this study.We then calculated the mentioned three clustering evaluation metrics.Here, -means clustering method is applied to obtain the cluster label in the reduced 3-dimentional features before calculating the ACC and NMI.
The  number used in the -means clustering method is set as  which is the number of the class.What is more, to realize efficient and stable convergence, we set the  initial points as the intermediate point of each class in mathematics.
The neighborhood parameter of LPP is taken as 12.As an illustration, the scatter plots of the ten-class dataset A for training data are drawn in Figure 5.It can be seen that the LDB feature shows a better classification capability than the WPT features.On the other hand, it can be also found that the LPP has a much more excellent classification capability than the PCA.Therefore, the LDB-LPP shows the best classification capability in the between-class and withinclass scatter performance.The extracted feature patterns are also demonstrated in Figure 6, where it can be clearly seen that the third LPP of LDB feature values characterizes a better difference for each class as compared to the third LPP of WPT feature values.Moreover, the quantitative results as listed in Table 2 also support the above statements.It can be seen that the clustering evaluation metrics , ACC, and NMI of the LDB feature are higher than those of the WPT feature, and LPP performs much better than PCA.The combination of LDB and LPP shows the most beneficial performance for classification.Moreover, the clustering evaluation of dataset B (with half the number of samples of dataset A) is also computed here as shown in Table 3.These clustering evaluation values show the same tendency and indicate that the LPP can learn a good nonlinear class pattern structure among the discriminatory LDB feature values.To emphasize the feature performance, the nearest mean classifier, one of the simplest and the most intuitive statistical classifiers, is applied for classification in this study.This classifier is based on the principle of the closest Euclidean distance and the concept of similarity that similar patterns should be assigned to the same class.In this study, the mean vector of the training data in each class is used to represent each pattern class.Patterns of samples can be distinguished according to the minimum distance criterion which means maximum similarity.Moreover, another advanced classifier, Gaussian mixture model (GMM) classifier, is also applied in this study.

Mathematical Problems in Engineering
The recognition accuracy of the proposed LDB-LPP feature and the other comparison features (extracted from the WPT feature without node selection) are shown in Table 4.It can be seen that the proposed LDB-LPP feature  outperforms the traditional features achieved by PCA, LDA, LPP, SLPP, LE, and LLE, which verifies the benefits of the LDB for choosing discriminatory features.Moreover, the GMM classifier further improves the recognition rate of the nearest mean classifier.Note that the recognition accuracy of dataset A is generally higher than that of dataset B because the number of samples in dataset A is bigger.It can be found that the promotion of the recognition accuracy of the LDB-LPP feature in comparison with the other features becomes more obvious in dataset B than in dataset A for two classifiers.For instance, dataset A shows an average promotion 1% for testing by considering the LDB in the LPP feature extraction, while dataset B displays an average promotion 3% (for two classifiers) for testing.Figures 7 and 8 intuitively display that the proposed LDB-LPP feature performs the best among all the comparison features.In this study, the testing recognition accuracy based on LDB-LPP feature is equal to or very close to 100% for two classifiers.These results imply that the proposed LDB joint LPP feature extraction method could obtain significant achievements in improving classification accuracy.

Conclusions
This paper presents a feature extraction method which integrates the LDB and the LPP to explore the useful and powerful characteristics for vibration data-based machine fault classification.The LDB is used to select the most discriminant WP bases from a library of redundant and orthogonal time-frequency subspaces.The input features are produced by the selected optimal wavelet bases but they possess different sensitivity in characterizing class information.The LPP is then employed to acquire the sensitive feature that characterizes the inherent class pattern feature embedded in the raw features for a much better identification accuracy.The proposed feature extraction method combines the merits of the LDB and the LPP and thus displays valuable benefits for data classification.To verify the effectiveness of the proposed method, the vibration data representing different bearing fault types and severities are analyzed by comparing with other features extracted from the WPT feature.The experimental results for bearing fault classification indicate that the LDB-LPP feature is more effective than those feature extraction methods based on the WPT feature without base selection.The presented LDB joint LPP feature extraction method is also hoped to be well-suited to other machine fault classification, such as gears, spindles, and cutting tools, due to the excellent feature representation for the class patterns.Moreover, the technical aspects in the proposed LDB-LPP feature extraction framework can be further improved and strengthened.First, this paper fairly compares the LDB feature and WPT feature in the same decomposition level, which can validate the benefits of the LDB in data classification.However, how to select the well-suited decomposition level in the LDB is still an open issue in the further study.Second, LPP is a typical and effective feature extraction method which obtains the manifold structure in a linear projection.Although the LPP has been successfully used to overcome the weakness of the LDB in this study, it is meaningful to apply the new well-performed manifold learning methods instead of LPP in the proposed framework to further enhance the performance of data classification.This should also depend on how complex the data to be analyzed is.Other possible applications on complex classification remained to be studied in the future.
LDB Algorithm.A training dataset consisting of  class of signals {{ ()  }   =1 }  =1 with   being the total number of training signals in class  is given.

7 Class 1 2 3 4 Figure 1 :
Figure 1: Principle of the proposed feature extraction scheme based on combination of LDB and LPP.

Figure 2 :Figure 3 :
Figure 2: Flowchart of the proposed LDB-LPP feature extraction algorithm and machine fault classification.

Figure 4 :
Figure 4: Optimal WP nodes selected by the LDB at level 6: (a) dataset A and (b) dataset B.

Figure 5 :
Figure 5: Representation of training samples of bearing dataset A: (a) PCA of WPT feature values, (b) PCA of LDB feature values, (c) LPP of WPT feature values, and (d) LPP of LDB feature values.

Figure 6 :
Figure 6: The first three projections of training samples of bearing dataset A: (a∼c) LPPs of WPT feature values and (d∼f) LPPs of LDB feature values.

Figure 7 :Figure 8 :
Figure 7: Recognition accuracy of ten-class classification by the nearest mean classifier: (a) dataset A and (b) dataset B.

) 4
Mathematical Problems in EngineeringTotally, the root energy values of all of the selected nodes are put together to formulate a vector denoted by E  (where  = {(, )} is the subscript set of the selected WP nodes  , ) which is conveniently called the LDB feature.The set of root energy values contained in the WP nodes at the final level  of the WPT is called WPT feature in this paper and denoted by E  .2.3.LPP for Feature Pattern Mining.In this section, we briefly describe the LPP algorithm of learning a locality preserving subspace from a high-dimensional data containing the sample values of a feature vector E  or E  .Let Y = [y 1 , . . ., y  ] ∈ R × denote a data matrix, representing a set of -dimensional samples of size  with zero mean.Now, consider the problem of representing the data matrix Y by a single vector x = [ 1 , . . .,   ] such that   represents y  .We will thus find a linear mapping, denoted by a transformation vector w ∈ R  , from the -dimensional space to a onedimensional space, so that w  y  =   .LPP is a technique that seeks to preserve the intrinsic geometry of the data and local structure.The criterion of the objective function for choosing a map of the LPP is as follows: [34]set.The experimental data are from Case Western Reserve University Bearing Data Center[34].The experimental setup consists of four parts which are

Table 1 :
Description of datasets for bearing data classification.

Table 2 :
Clustering evaluation of different features for dataset A.

Table 3 :
Clustering evaluation of different features for dataset B.

Table 4 :
Recognition accuracy (%) performance for dataset A and dataset B.