Simultaneous-Fault Diagnosis of Automotive Engine Ignition Systems Using Prior Domain Knowledge and Relevance Vector Machine

Engine ignition patterns can be analyzed to identify the engine fault according to both the specific prior domain knowledge and the shape features of the patterns. One of the challenges in ignition system diagnosis is that more than one fault may appear at a time. This kind of problem refers to simultaneous-fault diagnosis. Another challenge is the acquisition of a large amount of costly simultaneous-fault ignition patterns for constructing the diagnostic system because the number of the training patterns depends on the combination of different single faults. The above problems could be resolved by the proposed framework combining feature extraction, probabilistic classification, and decision threshold optimization. With the proposed framework, the features of the single faults in a simultaneous-fault pattern are extracted and then detected using a new probabilistic classifier, namely, pairwise coupling relevance vector machine, which is trained with single-fault patterns only.Therefore, the training dataset of simultaneousfault patterns is not necessary. Experimental results show that the proposed framework performs well for both single-fault and simultaneous-fault diagnoses and is superior to the existing approach.


Introduction
1.1.Background of Engine Ignition Patterns.Although automotive engine ignition systems vary in construction, they are similar in basic operation.All of them have a primary circuit that causes a spark in the secondary circuit, which is then delivered to the correct spark plug at the proper time.The conditions inside the ignition system and the cylinder also affect the ignition pattern in the secondary circuit.Consequently, the ignition patterns reflect the conditions within the ignition system and help pinpoint their faults [1], such as wide or narrow spark-plug gaps and open spark-plug cables.After capturing the ignition pattern, the automotive mechanic compares the features of the captured pattern with samples from handbooks for diagnosis [2,3].This procedure is called ignition system diagnosis.However, there are several challenges for the automotive mechanic which are as follows.
(1) The engine ignition pattern is time dependent.Different engine models produce the ignition patterns of various amplitude and duration for the same kind of fault.Even for the same engine, it may produce slightly different shapes of ignition patterns for each engine cycle due to engine speed fluctuation and various testing conditions.Therefore, there is no exact scale and duration for sample patterns in the handbooks.Hence, the traditional diagnosis merely relies on prior domain knowledge and the engineer's experience.
(2) Practically, the engine ignition-system diagnosis is a simultaneous-fault problem, but many handbooks only provide single-fault patterns for reference.To determine simultaneous faults, the engineer can only extract and analyze some specific features of single-fault patterns from a simultaneous-fault pattern, such as frequency, firing voltage, and burn time, and make a decision about the presence of simultaneous faults according to their experience and knowledge.(3) As suggested in the existing literature [1][2][3], the ignition-system diagnosis based on the shape features and the prior domain knowledge of the ignition pattern cannot conclude a definite answer.It is because many possible faults may occur individually or simultaneously.The handbooks do not provide the rank of the probability of each possible fault.Therefore, to find out a fault based on ignition patterns, many trials for disassembling and assembling of engine parts are often necessary unless the engineer has very rich experience.
To tackle these challenges, an effective feature extraction method for engine ignition patterns is required, which combines domain knowledge (DK), time-frequency decomposition, and dimensional reduction techniques.Moreover, an advanced probabilistic classifier is necessary to provide the rank of each possible fault and reliable diagnostic results.In recent years, some intelligent diagnostic methods based on pattern recognition have been developed for multiclass fault diagnosis (i.e., single-fault diagnosis because only a single fault is identified) of mechanical systems [4][5][6][7][8][9].Generally, these methods include two steps: feature extraction and classification.

Feature Extraction Methods.
Feature extraction is very important because the in-depth and hidden features of singlefault patterns can be detected through frequency subband decomposition.Referring to the existing literature, many classical feature extraction techniques were applied to fault diagnosis; the most typical one is the fast Fourier transform (FFT) [10][11][12][13].However, its main drawback is the unsuitability for nonstationary patterns.Wavelet packet transform (WPT) [1,4,[14][15][16][17][18][19] is another popular time frequency localization analysis method that received a widespread utilization in the past decade.By means of multiscale analysis, WPT can be successfully applied to nonstationary patterns, based on subband coding and a systematic decomposition of a pattern into its subband levels for pattern analysis.Therefore, WPT is employed in this research for feature extraction.
Nevertheless, one drawback for WPT is that the size of the extracted features is larger or equal to that of its original pattern.If the original pattern is of a high dimension, there is a large amount of extracted features that may incur two issues: (1) the high complexity of the trained classifiers because of the huge amount of inputs; (2) there may be many redundant and unimportant extracted features so that noise can be induced.Both of the issues can degrade the classifier performance.Therefore, compensating the drawback by employing dimensional reduction technique such as principal component analysis (PCA) [20][21][22] is suggested.In this research, PCA is selected as the dimensional reduction technique for a simple illustration purpose.More advance techniques could be considered in the future.Compared to other dimensional reduction techniques, PCA has three advantages: (1) it has no hyperparameter; (2) PCA eliminates the interaction of variables because the principal components are independent of each other; (3) the principal components are sorted by their information weights, so some unimportant principal components can be further reduced.Then, the feature extraction approach of WPT+PCA can transform an original ignition pattern into a reduced dimensional feature vector while retaining most of the information content.

Classification Methods.
For classification, a fault can be considered as a label, no matter whether it is a single fault or simultaneous fault.To date, there are only a few researches on simultaneous-fault diagnosis.The typical classification method for simultaneous-fault diagnosis is to build a number of classifiers according to the combination of all possible faults; this method is called monolabel classification [23].However, it is practically difficult to obtain the training data of all possible combinations particularly for ignition patterns.Normally, the number of combination of all faults in an engineering problem is very large that affects the diagnostic accuracy because the complexity of the classifiers will also be immensely increased.Moreover, if a new single fault is added in the future, the number of required training simultaneous-fault patterns grows significantly.To overcome this drawback, Yélamos et al. [23] proposed a binarization strategy using support vector machine (SVM) and applied to simultaneous-fault diagnosis of a simulated chemical process based on time-independent data, in which the labels of the single faults or simultaneous faults were processed as binary vectors, that is, 0 or 1 only.For each label, a binary classifier was constructed using SVM with one-versus-all splitting strategy.Given an unknown pattern, the classifier would output a vector of binary results (0 or 1).From this approach, only single-fault patterns are used for training the classifiers while simultaneous-fault patterns are not necessary.The experimental results showed that the overall accuracy of their binarization approach is almost the same as that of the traditional monolabel approach.This kind of binarization approach sounds good but still suffers from several drawbacks: (1) the approach assumes that informative features are obvious and available that is not always the case for timedependent signal patterns, so this approach cannot be suitable for ignition patterns; (2) the one-versus-all strategy ignores the pairwise correlation between the labels and hence the classification accuracy is mostly degenerated; (3) the approach only considers the presence of a fault, if its corresponding output is close to the classification margin which lacks confidence of correct classification, that is, the degree of belief of faults.
From the practical point of view, a proper classifier has to offer the probabilities of all possible faults.Then the user can at least trace the other possible faults according to the rank of their probabilities when the predicted fault(s) from the classifier is incorrect in the problem.Therefore, it is better to employ probabilistic classifier for simultaneous-fault diagnosis.The probabilistic structure is also suitable for the fault with uncertainty such as engine ignition-system diagnosis.Typically, probabilistic neural network (PNN) [24,25] was employed as a probabilistic classifier.It was shown in [24] that the performance of PNN is superior to SVM based method for multilabel classification.However, the main drawback of PNN lies in the limited number of inputs because the complexity of the network and the training time are heavily related to the number of inputs.Recently, Widodo et al. [6] proposed to apply an advanced classifier, namely, relevance vector machine (RVM) to fault diagnosis of low speed bearings.They showed that RVM is superior to SVM in terms of diagnostic accuracy.Besides, RVM can also handle regression problem [26].RVM is a statistical learning method proposed by Tipping [27], which trains a probabilistic classifier with sparser model using Bayesian framework.RVM can be extended to multiclass version using one-versus-all (1vA) strategy.However, this strategy was verified to produce a large region of indecision [28,29].In view of this drawback, this research is the first in the literature to incorporate pairwise coupling, that is, one-versus-one (1v1) strategy, into RVM, namely pairwise coupled relevance vector machine (PCRVM).As PCRVM considers the correlation between every pair of fault labels, a more accurate estimate of label probabilities for simultaneous-fault signals can be achieved.

Decision Threshold Optimization.
If a probabilistic classification is applied to fault detection, the predicted fault is usually inferred as the one with the largest probability.The other alternative approach is that the probabilistic classifier ranks all the possible faults according to their probabilities and lets the engineer make a decision.These inference approaches work fine with single-fault detection but fail to determine which faults occur simultaneously in the simultaneous-fault problem.It is because the engineer cannot identify the number of simultaneous faults based on the output probability of each label.For instance, an output probability vector for five labels is given as [0.21,0.5,0.69,0.01,0.6].In this example, it is difficult for the engineer to judge whether the simultaneous faults are labels 2, 3, and 5. To identify the number of simultaneous faults, a decision threshold must be introduced and thus a new step of decision threshold optimization is proposed in the current framework other than feature extraction and probabilistic classification.
1.5.Research Objectives and the Proposed Framework.Currently, very little research examines whether the features of single-fault ignition patterns can be reflected in the ignition patterns of some simultaneous faults.If it is feasible, some rational (not all) simultaneous faults are likely to be identified based on the prior domain knowledge and the features of single-fault ignition patterns.In other words, the features about the single faults in a simultaneous-fault pattern could be detected and then classified using the probabilistic classifier trained with the single-fault patterns only.Under this concept, the simultaneous-fault patterns are not necessary for training the classifiers.Once a new single fault is added in the future, the diagnostic system can be easily extended because the issue of combinatory single faults has been eliminated.To verify the feasibility and determine the best feature extraction method, this research proposes to extract the important knowledge-specific, time-domain, and frequency-domain features of the single-fault patterns using the combination of WPT+PCA, FFT, and DK.Then the pairwise coupled probabilistic classifier is trained using a training dataset of these extracted single-fault features in order to identify simultaneous faults for reasonable unseen patterns.Therefore, a feasibility study on this idea for simultaneous-fault diagnosis is an important contribution of this research.Another important contribution of the research is the reduction of required training patterns for simultaneous-fault diagnosis.
This paper is organized as follows.The proposed framework and the related techniques are described in Section 2. In Section 3, the experimental setup is presented, followed by the results and a comparison with latest approach [23] in Section 4 and discussion in Section 5. Finally, a conclusion is given in Section 6.

Proposed Framework and Related Techniques
The proposed diagnosis framework (Figure 1) includes three steps: feature extraction, classification, and threshold optimization.The framework is general so that different feature extraction, probabilistic classification, and threshold optimization techniques could be adopted.In this paper, FFT, WPT, and PCA are examined in the step of feature extraction and their detailed descriptions can be, respectively, found in [22,30,31].In addition, these techniques are combined, respectively, with time-related domain knowledge (DK) for a comprehensive comparison.

Formulation of the Proposed
Framework.Given a sample dataset D = {(x  , l  )} of (single-fault or simultaneous-fault) patterns,  = 1 to   , x  ∈ R  and l  = [ 1 ,  2 , . . .,   ] is a vector of labels of the corresponding single-fault pattern of x  and  is the number of single faults.Here there may be more than one fault in l  so that ∑  =1   ≥ 1,   ∈ {0, 1} for  = 1 to .In Figure 1, the sample dataset is divided into three groups: training dataset, validation dataset, and test dataset where training dataset only involves single-fault patterns.
After applying feature extraction techniques to the patterns {x  }, a set of feature vectors F = {(f  , l  )} is produced.
A training dataset of single-fault patterns only (no simultaneous-fault patterns are necessary) is selected to train a multilabel classifier  class by using probabilistic classification algorithm.Then  class takes an unknown feature vector f as input and outputs a probability vector  = [ 1 ,  2 , . . .,   ] where  is the number of the single-fault labels.Here   = (  | f) ∈ [0, 1] denotes the probability that f belongs to the th label for  = 1 to .Since every   is an independent probability, Σ  is not necessarily equal to one.At this stage, the diagnostic system can provide the probability vector  to the user as a quantitative measure for reference and further use.Afterwards, the multilabel decision vector y = [ 1 ,  2 , . . .,   ] is constructed from  using (1):  where  ∈ (0, 1) is a user-defined decision threshold and   indicates that f belongs to the th label or not (Figure 2).For example, if  = 0.5 and  =  class (f) = [0.72,0.42, 0.82, 0.28, 0.86], then y = () = [1, 0, 1, 0, 1].Therefore, f is diagnosed as a simultaneous-fault (1,3,5).Notice that y = [0, 0, 0, 0, 0] indicates that no fault has been found, and hence the unseen instance f is diagnosed as a normal pattern.

Extraction of Prior Domain Knowledge Features for Ignition Patterns.
When an engine starts firing, its secondary coil produces a rapid high voltage to cause spark plug to produce spark.This high voltage is called the firing voltage.Then the spark voltage decreases until zero.The spark voltage represents the voltage required to maintain spark for the duration of the spark line.The duration is called the burn time.After the burn time, the energy in the ignition coil nearly exhausts, and the residual energy forms slight oscillation in the ignition coil.The entire procedure is shown in Figure 3. Using the ignition pattern to diagnose the engine fault is a common diagnostic method for automotive engineers.With reference to some handbooks [2,3], the following prior domain knowledge for a pattern can be observed for engine fault diagnosis (Figure 3): (1) firing voltage ( 1 ); (2) burn time ( 2 ); (3) average spark voltage ( 3 ).
In this study, all patterns start from the firing voltage ( 1 ) which is at the first sampling point: where  1 is the voltage of the first sampling point.Ideally, the burn time ( 2 ) starts from the spark voltage and ends at the position where the spark voltage falls to zero.However, in practice, the voltage could slightly oscillate after the burn time so that exact zero value may not be reached.In this study, when the voltage falls to 0.1% of the firing voltage, it is considered as zero and the burn time ends.The feature  2 can be obtained as illustrated in Figure 4, where  indicates the end point of burn time, and   is the length of patterns.With the index  and time step , the average spark voltage of the spark line ( 3 ) can be calculated as follows:

Feature Extraction Using WPT and PCA and Combined
Feature Vector.WPT is a generalization of wavelet decomposition that offers a richer signal analysis [31].It is well known that WPT can extract time-frequency features of a signal pattern.Given a set of patterns X = {x  },  = 1 to   , WPT transforms an ignition pattern x  ∈   into a set of 2  coefficient packets  , ∈   , and  is the ceiling function of /2  at level  ( = 1 to 2  ).Then, these packets  , are concatenated as v  = [ ,1 ,  ,2 ,  ,3 , . . .,  ,2  ] as the extracted features of the pattern x  .It is believed that the in-depth and hidden features of the single fault patterns can be detected through the coefficient packets v  after WPT decomposition.WPT is applied to every x  to form a set of features V = {v  },  = 1 to   .Usually, the dimension of v  is large and a certain amount of the features may be redundant.Therefore, PCA is employed for dimension reduction of v while retaining its important information.The details of PCA can be found in [22].After applying PCA to V, a set of eigen vectors h  and eigen values   are returned, which represent the transformation vectors and the importance of the transformed dimensions, where is a normalized eigen value.Knowing the value of , the corresponding transformation matrix H = [h 1 h 2 ⋅ ⋅ ⋅ h  ] is then formed.So F  = H  V is the reduced feature dataset.For any unseen ignition pattern x in the future, its feature vector can be obtained by f  = H  v, where v = WPT(x).By combining the prior domain knowledge, the final feature vector as the classifier inputs is given in the following: 2.4.Relevance Vector Machine.Relevance vector machine [27] is a statistical learning method utilizing Bayesian learning framework and popular kernel methods.In fault diagnosis, RVM is designed to predict the posterior probability of the binary class membership (i.e., either positive or negative) for an unseen input f, given a set of training data (F, t) = {f  ,   },  = 1 to ,   ∈ {0, 1}, and  is the number of training data.It follows the statistical convention and generalizes the linear model by applying the logistic sigmoid function ((f)) = 1/(1+exp(−(f))) to the predicted decision (f) and adopting the Bernoulli distribution for (t | F).The likelihood of the data is written as follows [27]: where w = ( 0 ,  2 , . . .,   )  are the adjustable parameters, and a radial basis function (RBF) is typically chosen for (⋅).
The current objective is to find the optimal weight vector w in (5) for the given dataset F, which is equivalent to find w so as to maximize the probability (w | t, F, ) ∝ (t | F, w) (w | ), with  = [ 0 ,  1 , . . .,   ] a vector of  + 1 hyperparameters.However, it is impossible to determine the weights analytically.Hence, closed-form expressions for either the marginal likelihood (w | ) or equivalently the weight posterior (w | t, F, ) are denied.Thus, the following approximation procedure is chosen [32], which is based on Laplace's method.
(a) For the current fixed values of , the most probable weights w MP are found, which is the location of the posterior mode.Since (w | t, F, ) ∝ (t | F, w (w | ), this step is equivalent to the following maximization: (b) Laplace's method is simply a Gaussian approximation to the log-posterior around the mode of the weights w MP .Equation ( 6) is differentiated twice to give where and Φ is a ×(+1) design matrix with Φ  = (f  , f −1 ) and Φ 0 = 1,  = 1 to , and  = 1 to  + 1.By inverting (7), the covariance matrix (c) The hyperparameters  are updated using an iterative reestimation equation.Firstly, randomly guess  i and calculate   = 1 −   Σ  , where Σ  is the th diagonal element of the covariance matrix Σ.Then reestimate  i as follows: where u = w MP = ΣΦ  Bt.Set   ←  new  and reestimate   and  new  again until convergence.Then w = w MP is estimated so that the classification model (f; w) = ∑  =1   (f, f  ) +  0 is obtained.

Pairwise Coupled RVM.
The traditional RVM formulation is designed only for binary classification; that is, the output is either positive (+1) or negative (−1).In order to resolve the current simultaneous-fault problem, multiclass strategies of one-versus-all (1vA) and one-versus-one (1v1, or specifically named as pairwise coupling) [28] The 1vA strategy is simple and easy to implement.However, it generally gives a poor result [29,33,34] since 1vA does not consider the pairwise correlation and hence induces a much larger indecisive region than 1v1 as shown in Figure 5.
In this study, each   can be an RVM classifier which estimates the pairwise probability that an unknown instance f belongs to the th label against the th label, that is,   (f) = (  | f,   or   ).There are several methods for pairwise coupling strategy [28], which are, however, suitable for multiclass diagnosis only because of the constraint Σ  = 1.Note that the nature of simultaneous-fault diagnosis is that Σ  is not necessarily equal to 1. Therefore, the following simple pairwise coupling strategy for simultaneous-fault diagnosis is proposed.
Every   is trained only by the training data with the th and th labels.Let   =   (f) = (  | f,   or   ) be the pairwise probability of the th label against the th label for an unknown instance f, where   (f) is estimated using RVM.Then,   is calculated as where   is the number of training data with the th and th labels.Hence, the probability   can be more accurately estimated from   =   (f) because the pairwise correlation between the labels are taken into account.With the above pairwise coupling strategy, PCRVM can more accurately estimate the probability vector  and hence generate a higher classification accuracy for simultaneous-fault diagnosis.Boundary constructed using pairwise coupling

Decision Threshold
Figure 5: Indecisive regions (shaded area) using 1vA (a) and pairwise coupling (1v1) (b) [29]. = [ 1 ,  2 , . . .,   ] of the single-fault labels but the desired result is the classification vector y = [ 1 ,  2 , . . .,   ] = [( 1 ), ( 2 ), . . ., (  )].It is obvious that the value of decision threshold  will greatly affect the classification accuracy.For a situation without any prior information, the best estimate of  may be simply set to 0.5, that is, the presence of a fault is considered if its probability is at least 0.5.However, the value of  should be optimized according to the classification accuracy.In other words, the value  should be chosen to produce the highest classification accuracy over a validation dataset.
Besides, the traditional evaluation of classification accuracy only considers exact matching of the predicted label vector y against the true label vector l.This evaluation is however not suitable for simultaneous-fault diagnosis where partial matching is preferred.Therefore, a common evaluation called -measure is employed.
-measure [35] is commonly used as performance evaluation for information retrieval systems where a document may belong to a single or multiple tags simultaneously.This is very similar to the current application that contains a mixture of single-fault and simultaneous-fault patterns.With measure, the evaluation of single-fault and simultaneousfault test patterns can be appropriately done at one time.To define -measure  me , two concepts of precision () and recall () are used so that where  and  are originally designed for single-fault patterns only but can be extended to handle simultaneous-fault patterns.For   single-fault and simultaneous-fault test data, where   11) into (10), the final -measure equation is given in (12).The larger the -measure value, the higher the diagnostic accuracy is With -measure, the value  can be optimized using typical direct search techniques such as Genetic Algorithms (GA) and Particle Swarm Optimization (PSO) [36].

Principle of Detection of Single Faults and Simultaneous
Faults.After an unknown instance f is passed to the above system, a probability vector  is produced.If f is caused by a single fault (e.g., the th fault), f contains only the symptoms of the th fault.Then, in , the corresponding probability   ≥  * so that   = 1 in the decision vector y while all other   = 0,  ̸ = .In other words, ∑   = 1 and hence a single fault is detected.
For the case that f is caused by two simultaneous faults (e.g., the th and th faults), f is constituted by the symptoms of the th and th faults.These symptoms may be overlapping or interdistorted.In the current diagnostic system, probabilities are employed to give the similarity of  against the th and th faults by   and   , respectively.If their symptoms are not highly overlapping or interdistorted, there is a high chance that the corresponding probabilities   ,   ≥  * .Under this circumstance,   = 1 and   = 1, making ∑   ≥ 1 so that a simultaneous fault can be detected.The mechanism is similar for three or more simultaneous faults.By combining these cases, the proposed system can diagnose both single fault and simultaneous faults using classifiers trained with single faults only.Then the classifier is passed to an optimizer to search for the optimal decision threshold based on a validation set   and -measure  me as shown in Figure 7(c), where  class outputs the probability vector  = [ 1 ,  2 , . . .,   ] for each case in  .To optimize the threshold, the -measure  me over   can be evaluated as the fitness value.Since the direct search technique is easily stuck by local minima, it is necessary to run  different times of the optimization step in Figure 7(c) to avoid this issue.For testing and running, the step in Figure 7(d) is very similar to Figure 7(c) except the optimal threshold  opt that has been determined.The choice of parameters of the feature extraction, classification, and direct search techniques are discussed in Section 4.

Experimental Setup
To verify the effectiveness of the proposed methodology, an experiment was set up for sample data acquisition and evaluation tests.The details of the experimental setup and preparation of datasets are presented in the following subsections.
3.1.Data Sampling.In total, a set of single faults and simultaneous faults were imitated and selected as demonstration examples.There are 10 kinds of single faults as described in Tables 1 and 4 kinds of simultaneous faults as described in Table 2.However, there is an issue that the simultaneous-fault patterns are not caused by a random combination of single faults but some reasonable combinations (e.g., it is impossible to have wide spark-plug gap and narrow spark-plug gap at the same time).Moreover, the experimental data show that a simultaneous-fault ignition pattern is caused by a combination of at most three single faults.Beyond these constraints, the ignition patterns cannot be captured due to engine stall.Some sample ignition patterns of these single faults and reasonable simultaneous faults are shown in Figures 8 and 9, respectively.
In this study, five well-known inline 4-cylinder electronic ignition engines, namely, HONDA B18C, HONDA D15B, HONDA K20A, TOYOTA 2NZ-FE, and MITSUBISHI 4G15, were employed as the experimental platforms, and a computer-linked automotive scope meter was used (Figure 10) to capture raw ignition patterns.Different models of engines were used for training in order to enhance the generalization of the classifier.To capture ignition patterns, the sampling frequency of the scope meter was set to a high rate of 100 kHz, that is, 100,000 sampling points per second.Under the software provided by the scope meter, ignition patterns were recorded in a PC and converted into a file of excel format for processing and analysis.
For each case (single fault or simultaneous faults in Tables 1  and 2) in every test engine, sixteen ignition patterns (four patterns for each cylinder) were captured over two different engine testing conditions according to the standard procedure in [3] (1200 rpm and 2000 rpm).As the pattern obtained in each cylinder per engine cycle is somewhat unrepeatable, four patterns per cylinder are required.The reason for causing unrepeatable patterns is that a constant engine speed is difficult to hold during sampling.Furthermore, each cylinder has its own manufacturing error, different inlet and exhaust flow characteristics, and so forth.Finally there were 1600 ignition patterns of single faults (i.e., 10 labels × 4 patterns × 4 cylinders × 2 testing conditions × 5 engines) and 800 ignition

Data Normalization.
As the number of sampling points of every captured pattern is not exactly the same due to engine  (b) Calculate the -measure  me with y(f) and l(f) using (12), that is, find  me over VALID F, where l(f) = [ 1 ,  2 , . . .,   ] is the true classification vector for input f provided from VALID F. (c) Produce next generation of  Until convergence or matching stopping criteria, return the best solution  as the   .(iv) Among all   , k = 1 to M, choose the one producing the highest F-measure  me as the optimal decision threshold  opt .(v) Return the trained probabilistic classifier  class and the optimized decision threshold  opt as the main components of the intelligent diagnostic system.(vi) The performance of  class and  opt can be evaluated with TEST F and  me as illustrated in Figure 7(d).
Algorithm 1: Algorithm of the proposed framework for simultaneous-fault diagnosis of time-dependent ignition patterns.sampling points for all patterns was set to 18,000 in order not to lose any exceptional information.To standardize the duration of all patterns, steady-state values can be appended to the rear part of the patterns if necessary.Normally, the steadystate value for the ignition pattern is equal to zero (0 V).For those patterns having fewer than 18,000 data points, zeros can be appended.Therefore, the durations of all sample patterns were normalized before feature extraction using WPT + PCA.

Allocation of Datasets.
In order to test the diagnostic performance for both single faults and simultaneous faults,

Experimental Results
To All experiments were carried out under a PC with Core i5 @ 3.20GHz and 4GB RAM.All the proposed techniques mentioned were implemented using Matlab R2008a.

Results of Various Combinations of Feature Extraction and
Classification Techniques.The reasonable combinations of DK, FFT, and WPT+PCA for feature extraction were tested as shown in Table 3 along with the corresponding evaluation.
The classification techniques used in the experiment include PNN, RVM, and PCRVM.PNN [24] was selected for comparison because it is a traditional probabilistic classification using radial basis (Gaussian) kernel.The input dimension  of the classifiers for evaluation is subject to the feature extraction technique.In terms of WPT, PCA, and DK, WPT transforms the original patterns of 18000 points into different packets at level .The value of  can be determined using   entropy information.A built-in function bestlev (meaning best level) is available in Matlab wavelet toolbox for this purpose.After carrying out many experiments using the function bestlev,  was tested to be 9 for the sample dataset of ignition patterns.In this study, the common mother wavelet, Haar wavelet, was selected for the purposes of illustration and comparison of different feature extraction techniques.For better performance, different types of mother wavelets could be evaluated in the future.After PCA, the most 22 important dimensions were selected as described in Section 2.3.Therefore, the size of  is equal to 22 plus the three domain features, that is, 25 totally.For FFT and DK, the sizes of  are equal to 18000 and 3 features, respectively.
In the construction of the intelligent engine diagnostic systems with different techniques for comparison, each feature extraction technique was firstly employed to preprocess the training dataset , and then different classification techniques were applied.The performance of every combination was evaluated over  using -measure.In order to reflect the effectiveness of the feature extraction, the classification techniques under  without any preprocessing were also examined.Therefore, there were totally 18 combinations of feature extraction and classification techniques as shown in Table 3.
For classification techniques of PNN, RVM, and PCRVM, several simple settings are necessary.PNN requires a hyperparameter called smoothing factor or spread, which is equivalent to the width of the Gaussian kernel within PNN.If the value of spread is set too high, the trained classifier may easily overfit the training patterns and hence a lower generalization.In the case study, the value of spread for PNN was simply set to be 0.2 according to rule of thumb [37].The RVM and PCRVM employ different classification strategies (1vA versus 1v1) but they share the same set of hyperparameters, namely, type of kernel functions and the corresponding kernel parameters.For illustration purpose, Gaussian was selected as the kernel function (⋅) and its kernel width was set to be 1.0 in order to calculate the design matrix Φ in (7).The experimental results of various combinations of feature extraction and classification techniques are shown in Table 3.In order to evaluate the -measures under different combinations of preprocessing and classifications, the decision threshold was simply set to 0.5 for a simple and fair comparison in this phase.

Results with Threshold Optimization. Genetic Algorithms
(GA) are the most classical direct search technique, while Particle Swarm Optimization (PSO) is another popular choice.Both of them were tested for the optimization of the decision threshold and they share the same objective function.Since the -measure  me ∈ (0, 1), the objective function of optimization can be simply set as follows: The higher the  me , the better the optimization result will be.
The optimization procedure follows the proposed algorithm in Algorithm 1, where the number of runs  was set to be 20.Tables 4 and 5 show the detail settings of the GA and PSO operators and parameters, respectively, according to the literature [36].Therefore, among 20 runs of the proposed algorithm for every combination of feature extraction and classification techniques under GA and PSO optimization, the optimized threshold  opt and its corresponding  me value of different combinations of techniques are shown in Tables 6  and 7, respectively.

Individual Result of Single-and Simultaneous-Fault Diagnosis.
The objective of this research is to train a probabilistic classifier using single-fault patterns and then predict both single and simultaneous faults.However, it is unclear whether the performance of the trained probabilistic classifier on simultaneous faults in Section 4.2 is correct or not because the classification results of different combinations of techniques were all evaluated over the whole test dataset , which contains single-fault and simultaneous-fault patterns.To better illustrate the performance of the proposed method,  was further separated into two groups, one for purely singlefaults  1 , and another for purely simultaneous-faults   .All evaluation tests were done using the combination of DK+WPT+PCA as feature extraction and the PSOoptimized threshold of 0.7147 because Tables 6 and 7 show that this combination produces the best -measure.The measures of purely single faults and purely simultaneous    faults are shown in Tables 8 and 9, respectively, which were calculated using (12) with the related faults.For example, for Fault 1,  me is evaluated on the test cases of Fault 1 only.For simultaneous faults of the combination (5, 7), after prediction there is a classification vector y = [ 1 , . . .,  10 ], and a true vector l = [ 1 , . . .,  10 ], then  5 and  7 with the true values  5 and  7 from the test cases are employed to compute the two separate  me values for detail analysis.

Results
Comparison with the Latest Technique.To further verify the effectiveness of the presented framework, the existing binarization approach using SVM [23] was applied to the ignition system diagnosis for comparison.The binarization approach builds classifiers directly based on raw ignition patterns, so there is no feature extraction step.In this approach, a number of binary classifiers   (⋅) were constructed, respectively, using support vector machines (SVM) with one-versus-all splitting strategy where  = 1 to ,  is the number of single faults again.A decision vector y = [( 1 (x)) ⋅ ⋅ ⋅ (  (x))] can be obtained for an unknown pattern x, where   (x) ∈  is the raw output value of the th SVM classifier, and (  (x)) = 1 if   (x) ≥ 0 and (  (x)) = 0 Otherwise.From this framework, only single-fault patterns were used for training the binary classifiers while simultaneous-fault patterns are also not necessary.Since there is no probabilistic output but only a binary decision vector is generated in the binarization approach, no decision threshold optimization is necessary in this experiment.
The results using the binarization approach is shown in Table 10.

Effect of Feature Extraction and Pairwise Probabilistic
Classification.The experimental results presented in Section 4 are discussed in this section.and 7. Therefore, the proposed PCRVM is a very effective and promising classification technique.

Effect of Decision Threshold
Optimization.Tables 3, 6, and 7 illustrate that the GA and PSO can improve the overall accuracy by 3.48% and 3.5% as compared with the fixed decision threshold of 0.5, but these two techniques give nearly the same threshold and  me .The reason is that the experiment was run for 20 times for both the GA and PSO, and then the pair of results with the highest  me was returned.However, it is found that the standard deviations of the 20 results for the GA and PSO are 1.02E-3 and 3.23E-4, respectively.For the GA, the standard deviation is larger than PSO in this case study.This result indicates that PSO is more stable than the GA and theoretically requires a fewer number of runs to obtain a suboptimal result than the GA.This is because PSO is somehow insensitive to the initial values, whereas the GA is initialized with random start points within the search space and the search result is very sensitive to the initial values [36].Consequently, PSO is recommended for this application.(1) the proposed framework can alleviate the problem of exponential growth of training dataset for simultaneous-fault ignition patterns by training the probabilistic classifier using single-fault patterns only.This evidence can be found in Tables 8 and 9 that the single-fault patterns can be almost correctly classified, while the overall classification accuracy for simultaneous-fault ignition patterns is still satisfactory;

Diagnosis of Simultaneous Faults.
(2) the feature extraction techniques of DK combined with WPT+PCA can effectively capture the timerelated and frequency-related features from singlefault and simultaneous-fault ignition patterns; (3) the features of single-fault ignition patterns can really be detected in some feasible simultaneous-fault ignition patterns; this feasibility will create a new research direction for automotive engine diagnosis; This study also shows that the decision threshold for identifying the number of simultaneous faults can be optimized over -measure using direct search techniques, such as GA and PSO.Both the GA and PSO generate almost the same decision threshold but PSO requires less computational time and is more stable because of its lower standard deviation in multiple runs.Moreover, PSO has fewer operators and hence fewer adjustable parameters that can further reduce the user burden.Overall speaking, PSO should be the first choice of the threshold optimization technique in the current application.
To further verify the effectiveness of the proposed framework, the latest method, binarization method using SVM, was also employed to diagnose the simultaneous faults.The results show that the diagnosis accuracy of the binarization method is worse than that of the proposed framework.Therefore, the proposed framework is very suitable for engine ignition-system fault diagnosis.Since the proposed framework for simultaneous-fault diagnosis is general, it can be adapted to other similar applications.Finally, the original contributions of the research are summarized as follows.
(1) The research is a first attempt at integrating DK+ WPT+PCA, PCRVM, and direct search techniques into a general framework for simultaneous-fault diagnosis of automotive ignition systems.
(2) The proposed diagnostic system is the first in the literature that can be trained with single-fault signal patterns (i.e., single-fault time-dependent patterns) only, while it can diagnose simultaneous-fault signal patterns too.
(3) This paper is also the first in the literature that reports that the features of single-fault ignition patterns can be detected in some feasible simultaneous-fault ignition patterns.This fact is an important contribution to automotive engine diagnosis.
(4) The integration of the pairwise coupling (1v1) strategy into RVM is original, and the 1v1 strategy can really improve the classification accuracy of RVM.

Figure 1 :
Figure 1: Proposed framework of the simultaneous engine ignition-fault diagnosis system and its evaluation.

Figure 2 :
Figure 2: Decision function based on threshold .

Figure 3 :Figure 4 :
Figure 3: Key domain knowledge features of the normal engine ignition signal.
Optimization and -Measure.PCRVM can only provide the probability vector 0

𝑗
and    are, respectively, the th predicted label and the th true label in the th test data,    and    ∈ {0, 1}.Substituting ( Framework and Techniques.The previous framework and techniques are summarized in Algorithm 1.

Figure 7 (
a) shows the workflow of using DK and WPT+PCA as feature extraction.Every dataset for training, validation and test requires going through the step of feature extraction.Figure 7(b) shows the construction of the classifier  class .The classifier has the architecture of pairwise coupling as depicted in Figure 6(b).

Figure 7 :
Figure 7: Workflow of feature extraction, decision threshold optimization, diagnostic system training, and testing.

about 3 /
4 of the single-fault patterns were taken as training data .There were 1/16 of the single-fault patterns and 1/5 simultaneous-fault patterns in the validation dataset , while the remaining 3/16 of the single-fault patterns and 4/5 simultaneous-fault patterns were used as test dataset .
select the best combination of the techniques for feature extraction, classification, and threshold optimization, many experiments based on the sample dataset were conducted.The sample dataset was separated into 3 groups:  for training the classifier,  for the threshold optimization and selection of direct search techniques, and  for evaluating the performance of different combinations of the feature extraction, classification, and threshold optimization techniques.The performance evaluation over  is based on measure that can evaluate single-fault and simultaneous-fault patterns at one time according to partial matching criterion.

Figure 8 :
Figure 8: Sample ignition patterns and their corresponding single engine faults.

Figure 10 :
Figure 10: Collection of ignition patterns from a test engine using a computer-linked automotive scope meter.
can be employed.Traditionally 1vA strategy constructs a group of classifiers  class = [ 1 , . . .,   ] in a -label classification problem.For any unknown input f, the classification vector y = [ 1 ,  2 , . . .,   ], where class contains a set of  probabilistic classifiers   .

Table 1 :
Sample single faults of engine trouble reflected by ignition patterns.

Table 2 :
Sample possible simultaneous faults of engine trouble reflected by ignition patterns.
speed fluctuation and various testing conditions, all patterns were normalized within the same range in order to match the number of inputs of the classifier.Normalization of the ignition patterns was done in terms of duration.In this study, the number of sampling points for every pattern was less than 17,000.For the sake of conservation, a standard number of Mathematical Problems in Engineering

Table 3 :
Evaluation of different combinations of techniques.

Table 4 :
GA operators and parameters.

Table 5 :
PSO operators and parameters.

Table 3
Table 3also indicates that no matter which classification technique is employed, the integration of DK and WPT+PCA as feature extraction gives the best accuracy.In addition, the three classification techniques are compared by using -measure as well.Both PNN and RVM employ 1vA strategy for probabilistic classification.In other words, only  binary classifiers were constructed for  labels so that there are large indecision regions between pairs of classes.Therefore, when a test case lies on these regions, PNN and RVM mostly fail to classify the faults correctly.However, PCRVM employs 1v1 strategy, which minimizes those indecision regions.Table3verifies the effectiveness of the 1v1 strategy because PCRVM outperforms the other two classification techniques.This situation is almost the same as the tests with optimized decision threshold as shown in Tables6 Table 8 reveals that the trained classifiers using PNN, RVM, and PCRVM perform well because the test cases contain single-fault patterns only.Due to the advantage of pairwise coupling, PCRVM performs the best among the three classification techniques.For the test cases of simultaneous-fault patterns, there are only five reasonable combinations of simultaneous faults because not every combination is possible.Since a simultaneous-fault pattern is caused by different single faults, some

Table 6 :
Evaluation of different combinations of techniques using GA-optimized threshold.The experiment was run for 20 times, and the best  opt and  me were returned.

Table 7 :
Evaluation of different combinations of techniques using PSO-optimized threshold.
*The experiment was run for 20 times and the best  opt and  me were returned.
of the time-related and frequency-related features may be distorted or even vanished.Therefore, the feature extraction using DK and WPT+PCA cannot work very well and hence the values of  me in Table9drop a little bit as compared with the values in Table8, but they can still provide an accuracy DiagonalmatrixinRVM   (⋅): th binary classifier   : th probabilistic classifier   (f): Probability of f belonging to the th label   : Pairwiseclassifier   (f): Pairwise probability of f belonging to the th label against the th label  1 : Number of training data   : Number of cases in sample dataset   : N u m b e ro ft e s td a t a   : Number of training data with the th and th labels (  | f): Probabilityoff belonging to   Probabilityvector   : Probability of the th label   : Pairwise probability of the th label against the th label Σ: CovariancematrixinRVM Σ  : th diagonal element of covariance matrix Σ (⋅): Logistic sigmoid function : Recall : Initial population Φ: Design matrix in RVM.