Fault Diagnosis System of Induction Motors Based on Neural Network and Genetic Algorithm Using Stator Current Signals

This paper proposes an online fault diagnosis system for induction motors through the combination of discrete wavelet transform (DWT), feature extraction, genetic algorithm (GA)


INTRODUCTION
As the majority of the industry prime movers, induction motors play an important role in manufacture, transportation, and so forth, due to their reliability and simplicity of construction.Although induction motors are reliable, the possibility of faults is unavoidable.These failures may be inherent to the machine itself or caused by operating conditions [1].Early fault diagnosis and condition monitoring can increase machinery availability and performance, reduce consequential damage, prolong machine life, and reduce spare parts inventories and breakdown maintenance.Therefore, fault diagnosis of induction motors has received considerable attention in recent years.
The statistical studies of EPRI and IEEE for motor faults are cited [2].Under EPRI sponsorship on industry assessments, a study was conducted by General Electric Co. to evaluate the reliability of powerhouse motors and identify the operation characteristics.Part of this study is to specify the reason behind the motor failures.The study of IEEE-IGA was carried out on the basis of opinion as reported by the motor manufacturer.The percentages of the main motor faults are shown in Table 1.Through these two studies, we notice that bearings are the weakest component in induction motors, then stator, rotor, and others.
Corresponding to the above-mentioned faults, many techniques have been proposed for motor faults detection and diagnosis.These techniques include vibration monitoring, motor current signature analysis (MCSA) [3][4][5][6], electromagnetic field monitoring [7], chemical analysis, temperature measurability [8,9], infrared measurement, acoustic noise analysis [10], and partial discharge measurement [11,12].Among these methods, vibration analysis and current analysis are the most popular due to their easy measurability, high accuracy, and reliability.
In many situations, vibration methods are effective in detecting the presence of faults in motors.However, vibration sensors, such as accelerometer, are generally installed on only the most expensive and load-critical machines where the cost of continuous monitoring can be justified.Additionally, the sensitivity of these sensors to environmental factors can cause them to provide unreliable readings.Furthermore, mechanical sensors are also limited in their ability to detect electrical faults, such as stator faults.Especially, during on-line monitoring and remote fault diagnosis, the faulty or normal conditions of vibration sensors should be checked, which makes the whole procedure complicated, and increases the system cost.Electrical techniques can overcome these shortcomings of vibration monitoring.Recently MCSA has received much attention, in particular, for motor fault detection [3].Current monitoring can be implemented inexpensively on most machines by utilizing the current transforms, which are placed on the motor control centers or switchgear.The use of current signals is convenient for monitoring large numbers of motors remotely from one location.Furthermore, the fault patterns in the current signal are unique, and cannot be affected by working environments.Many authors have verified the reliability of this technique using stator current signal.Examples include the air-gap eccentricity [4], stator faults [5], broken rotor bars [3], and motor bearing damage [6].
Additionally, artificial intelligence (AI) techniques, such as expert systems, artificial neural networks (ANNs), fuzzy logic systems, and genetic algorithms (GAs), have been employed to assist the diagnosis and condition monitoring task to correctly interpret the fault data [13].ANN has gained popularity over other techniques, as it is efficient in discovering similarities among large bodies of data.ANN is the functional imitation of a human brain, which simulates the human decision-making and draws conclusions even when presented with complex, noisy, irrelevant information.ANNs can represent any nonlinear model without knowledge of its actual structure and can give results in a short time during the recall phase.Research of ANN has been carried out successfully for fault diagnosis, and the results are promising [14][15][16][17][18][19].
If we want an intelligent system capable of adapting "online" to changes in the environment, the system should be able to deal with the so-called stability-plasticity dilemma [20].That is, the system should be designed to have some degree of plasticity to learn new events in a continuous manner, and should be stable enough to preserve its previous knowledge, and to prevent new events from destroying the memories of prior training.However, most ANNs, such as self-organizing feature maps (SOFM), learning vector quantization (LVQ), and radial basis function (RBF) ANNs, are unable to adapt well to unexpected changes in the environment.When new conditions occur, the "off-line" network requires retraining using the complete dataset.This can result in a time consuming and costly process [21].As a solution to this problem, the adaptive resonance theory (ART) network [20,[22][23][24] has been developed which can self-organize stable recognition codes in real time in response to arbitrary sequences of input patterns, and is a vector classifier which is used as the mathematical model for the description of fundamental behavioral functions of the biological brain such as the learning, parallel, and distributed information storage, short and long-term memory and pattern recognition.In this paper, the ART-Kohonen neural network (ART-KNN) [25] is used as a classifier.ART-KNN is a neural network which synthesizes the theory of ART and the learning strategy of the Kohonen neural network (KNN).It is able to carry out "online" learning without forgetting previously learned knowledge (stable training); and can recode previously known categories adaptive to changes in the environment, and is selforganizing.The rapid calculation speed and accurate success rate make it suitable for real application.
The main problems facing the use of ANN are the selection of the best inputs and how to choose the ANN parameters making the structure compact, and creating highly accurate networks.For the proposed system, the feature selection is also an important process since there are many features after feature extraction.Many input features require a significant computational effort to calculate, and maybe result in a low success rate.To make operation faster, and also to increase the accuracy of the classification, a feature selection process using GA is used to isolate those features providing the most significant features for the neural network, whilst cutting down the number of features required for the network.During the selection process, the network structure parameter is optimized.
There is some justification for using GA-based feature selection over some other methods available, such as principal component analysis (PCA), which can be much less computationally intensive than a GA-based approach.The downside to PCA is that all the available features are required for the transformation matrix to create the rotated feature space.However, it must be remembered that the motivation behind the feature selection process is to create a small system that requires as little processing as possible, whilst maintaining a high level of accuracy.PCA will still require the calculation of all the available features before the transformation matrix can be applied.Hence it requires a larger computing power on-board the hypothetical smart sensor than would be needed by using a GA that selects only the best features.The computational cost of the GA will be much higher than using a system like PCA during training and feature selection.However, this will be offset by the lower computation power required on a sensor, and hence the lower cost in manufacture.Another alternative for feature selection would be to use forward selection [26].One problem of forward selection is in the case where two features acting individually are relatively poor, but when used together give a much better result than two best features achieved through forward selection.The use of a GA has no such a problem, as the features are selected as a unit, and the interaction between the different features as a group is tested, rather than as individual features.According to the above statement, the GA is allowed to select subsets of various sizes to determine the optimum combination and number of inputs to the network.
In this paper, the fault diagnosis system of induction motors is proposed by combining advanced techniques: wavelet transform, feature extraction, GA, and ART-KNN, using stator current signal.All the experiments were implemented on Tian Han et al. the self-designed test rig.The result shows that the proposed system is efficient and promising for real time applications.

PROPOSED FAULT DIAGNOSIS SYSTEM
The proposed system and the overall description of the theoretical background are described.The architecture of the proposed system is shown in Figure 1.The original stator current signals, which are acquired by AC current probes from test induction motors, are preprocessed by discrete wavelet transform.The features of the transformed data are extracted from the database using statistical parameters, such as RMS, histogram, and so forth.Then GA is used as feature selector and network optimizer.The optimized neural network is able to function on-line and processes carry out without losing previous knowledge, which is suitable for online condition monitoring and fault diagnosis in real time applications.

Wavelet transform
When current signals show nonstationary or transient conditions, the conventional Fourier transform technique is not suitable.The analysis of non-stationary signals can be performed using time-frequency techniques (short-time Fourier transform) or time-scale techniques (wavelet transform).
The discrete wavelet transform (DWT) permits a systematic decomposition of a signal into its subband levels as a preprocessing of the system.Since different faults have different effects for stator currents, the wavelet transform can extract the features, which provides a good basis for the next feature extraction.The DWT is defined by the following equation: where W(t) is wavelet transform, a jk are the discrete wavelet transform coefficients, and ψ jk is the wavelet expansion function.k is the translation and j the dilation or compression parameter.

Feature extraction
Recently, on-line diagnosis systems are popular because they can detect incipient faults at the first time.However, directly measured signals are not suitable for on-line use since a small sampling number is deficient for diagnosis, and a large sampling number is a burden for transferring and calculation.So feature extraction of the signal is a critical initial step in any monitoring and fault diagnosis system.Its accuracy directly affects the final monitoring results.Thus, the feature extraction should preserve the critical information for decisionmaking.In this paper, the features of the signals are extracted from the time domain and frequency domain [27].

Cumulants
The features described here are termed statistics because they are based only on the distribution of signal samples with the time series treated as a random variable.Many of these features are based on moments or cumulants.In most of cases, the probability density function (pdf) can be decomposed into its constituent moments.If a change in condition causes a change in the probability density function of the signal, then the moments may also change therefore monitoring these can provide diagnostic information.
The moment coefficients of time-waveform data at each frequency subband are calculated by where E {•} represents the expected value of the function, x i is the ith time historical data, and N is the number of data points.
The first four cumulants, mean c 1 , standard deviation c 2 , skewness c 3 , and kurtosis c 4 , can be computed from the first four moments using the following relationships: ( In addition, nondimensional feature parameters in time domain are more popular, such as shape factor SF and crest factor CF: where x rms , x abs , and x p are root-mean-square value, absolute value, and peak value, respectively.

Upper and lower bounds of histogram
The histograms, which can be thought of as a discrete probability density function, are calculated in the following way.
Let d be the number of divisions we wish to divide the ranges into, let h i with 0 ≤ i < d be the columns of the histogram.

International Journal of Rotating Machinery
Assume we are doing it for the time x i only.Then The lower bound h L and upper bound h U of the histogram are defined as where Effectively, this normalizes by two things: the length of the sequence.Since the sum term above includes a 1/n term, and every x i must fall into exactly one h i column, the net effect is that Σh i = 1 (i = 0, . . ., d − 1).The column divisions are relative to the bounding box, and thus most of h i above will not be zero.This is desirable, since it essentially removes the issue of size of a sign, and low resolution on small signs, with lots of empty columns.The alternative would be to have absolute locations, which would be nowhere near as closely correlated with the information in the sign itself.

Entropy estimation and error
In information theory, uncertainty can be measured by entropy.The entropy of a distribution is the amount of a randomness of that distribution.Entropy estimation is two stage process; first a histogram is estimated and thereafter the entropy is calculated.The entropy estimation E s (x) and standard error E e (x) are defined as where x is discrete time signals, P(x) is the distribution on whole signal.Here, we estimate the entropy of stator current signals with using unbiased estimate approach.

Autoregression coefficients
Since different faults display different characteristics in the time series, autoregression is used to establish a model for each fault.Then the autoregressive coefficients are extracted as faults features.The first 8-order coefficients of AR models are selected through Burg's lattice-based method using the harmonic mean of forward and backward squared prediction errors [28].The definition that will be used here is as follows: where a i are the autoregression coefficients, x t is the series under investigation, and N is the order of the model (N = 8).The noise term or residual ε t is almost always assumed to be Gaussian white noise.

Feature extraction in the frequency domain
Frequency domain is another description of a signal.It can reveal some information that cannot be found in time domain [29,30].The problem is how to use parametric pattern to show them.In this study, frequency center FC, root mean square frequency RMSF, and root variance frequency RVF are introduced as follows.They are similar to RMS and standard deviation of time domain: where s( f ) is signal power spectrum.FC and RMSF show the position change of main frequencies, RVF describes the convergence of the spectrum power.

Selection based on genetic algorithm
While any successful application of GAs to a problem is greatly dependent on finding a suitable method for encoding, the creation of a fitness function to rank the performance of a particular genome is important for the success of the training process.The GA will rate its own performance around that of the fitness function.Consequently, if the fitness function does not adequately take into account the desired performance features, the GA will be unable to meet the requirements of the user.A simple GA, which is proposed by Goldberg [31], is used as feature selector in this paper.A simple binary-based genome string is implemented.The genome is composed of two parts: one part determines which features are selected as an input subset from the whole database ("0" represents feature absence, "1" means feature presence), another part is used to choose the network structure parameter.
There are three fundamental operators of GA: selection, crossover, and mutation.The aim of the selection procedure is to reproduce more copies of individuals whose fitness values are higher than others.This procedure has a significant influence on driving the search towards a promising area and finding good solutions in a short time.The roulette wheel selection is used for individual selection.The selection probability P s (s i ) of the ith individual is expressed as the following equation: where s is an individual, f (s i ) is the fitness value of the ith individual, and N is the number of individuals.According to the values of P s (s), each individual is defined for the widths of slots on the wheel.The crossover operator is used to create two new individuals (children or offspring) from two existing individuals (parents) picked from the current population by the selection operation.There are also several ways of doing this.One point simple crossover is used for this process.After that, all individuals in the population are checked bit by bit and the bit values are randomly reversed according to a specified rate.
The mutation operator helps the GA avoid premature convergence and find the global optimal solution.In the binary coding, this simply means changing 1 to 0 and vice versa.In the standard GA, the probability of mutation is set equal to a constant.However, it is clear in examining the convergence characteristics of GAs that what is actually desired is a probability of mutation which varies during generational processing.In early generations, the population is diverse and mutation may actually destroy some of the benefits gained by crossover.Thus, it would be desirable to have a low probability of mutation in early generations.In later generations, the population is losing diversity as all members move "close" to the optimal solution, and thus a higher probability of mutation is needed to maintain the search over the entire design space.Therefore, the selection of the probability of mutation must carefully balance these two conflicting requirements.The mutation probability P m (s i ) is then tied to the diversity measure through an exponential function: where N i and N t are the number of current generation and total generation, respectively.Figure 2 shows the mutation probability curve changing with the generation, as total generation is 200.
Since GA is used for feature selection and neural network optimization according to selected features, the objective function should relate with features and network structure parameters.In real applications, smaller is better in terms of the number of features and neurons and the value of network parameters.The reason is the small features and neurons can reduce the calculation time and make the network structure compact.Thus the objective function is as the following: where selected features F n and network similarity ρ are variable, their ranges are 0-126 and 0-1, respectively.The number of neurons N n is determined by F n and ρu.The maximum neuron N max is equal to the number of training data.F T is the total feature number, here it is 126.The minimum function value f (s) is searched by GA under 100% classification.

ART-Kohonen neural network (ART-KNN)
The architecture of ART-KNN [25] is shown in Figure 3.It is similar to ART1's, excluding the adaptive filter.ART-KNN is also formed by two major subsystems: the attentional sub- system and the orienting subsystem.Two interconnected layers, discernment layer and comparison layer, which are fully connected both bottom-up and top-down, comprise the attentional subsystem.The application of a single input vector leads to patterns of neural activity in both layers.The activity in discernment nodes reinforces the activity in comparison nodes due to top-down connections.The interchange of bottom-up and top-down information leads to a resonance in neural activity.As a result, critical features in comparison are reinforced, and have the greatest activity.The orienting subsystem is responsible for generating a reset signal to discernment when the bottom-up input pattern and topdown template pattern mismatch at comparison, according to a similarity.In others words, once it has detected that the input pattern is novel, the orienting subsystem must prevent the previously organized category neurons in discernment from learning this pattern (visa a reset signal).Otherwise, the category will become increasingly nonspecific.When a mismatch is detected, the network adapts its structure by immediately storing the novelty in additional weights.The similarity criterion is set by the value of the similarity parameter.A high value of the similarity parameter means that only a slight mismatch will be tolerated before a reset signal is emitted.On the other hand, a small value means that large mismatches will be tolerated.After the resonance check, if a pattern match is detected according to the similarity parameter, the network changes the weights of the winning node.
The learning strategy is introduced by the Kohonen neural network.The Euclidean distances of all weights between input vector X and each neuron of the discernment layer are evaluated as the similarity given by ( 14), the smallest one becomes the winning neuron.
where B j is the weight of jth neuron in the discernment layer, B J is the weight of the winning neuron.After producing the winning neuron, input vector X returns to the comparison layer.The absolute similarity S is calculated by If B J and X in ( 14) are the same, B J − X is equal to 0, and S is 1.The larger the Euclidean distance between B J and X is, the smaller S is.A parameter ρ is introduced as the evaluation criterion of similarity.If S > ρ, it indicates that the Jth cluster is sufficiently similar to X.So X belongs to the Jth cluster.
In order to make the weight more accurate to represent the corresponding cluster, the weight of Jth cluster is improved by the following equation: where B J is the enhanced weight, B J0 is the origin weight, and n is the changed time.On the contrary, as S < ρ, it means that X is much different with the Jth cluster.Thus there is no cluster that matches X in the original network.The network needs one more neuron to remember this new case by resetting in the discernment layer.The weight of new neuron is given by

EXPERIMENT PROCESS AND RESULTS
The experiment was carried out under the self-designed test rig, which is mainly composed of motor, pulleys, belt, shaft, and fan with changeable pitch blades, shown in Figure 4. Six 0.5 kW, 60 Hz, 4-pole induction motors were used to create the data needed under no-load and full-load conditions.One of the motors is normal (healthy), which is considered a benchmark for comparison with faulty motors.Others are faulty: broken rotor bar, bowed rotor, bearing outer race fault, rotor unbalance, and adjustable eccentricity motor (misalignment), shown in Figure 5.The conditions of faulty induction motors are described in Table 2.The load of the motors was changed by adjusting the blade angle or the number of the blades.
Three AC current probes were used to measure the stator current signals for testing the fault diagnosis system.The maximum frequency of used signal was 5 kHz and the number of sampled data was 16384.The typical stator current signals under no-load and full-load conditions are shown in Figure 6.Since the slip is almost nothing under no-load condition the waveforms of all conditions are very similar to the normal motor signals.On the contrary, due to the faults, the current waveforms have some changes under full-load condition.From the time waveform, no conspicuous difference exists among the different conditions.There is a need to come up with a feature extraction method to classify them.
In order to extract the differences between them, the DWT was used for preprocessing.The analysis of the data from induction motors was performed using the MATLAB 5.1 Wavelet Toolbox [32].The wavelet basis function was determined to be Daubechies-8 (db8) [33] to estimate the condition of each designated motor.The sub-band (level) or the multiresolution analysis (MRA) was performed by dividing  them into eight sub-bands in the frequency range from 0-5 kHz shown in Table 3. Figures 7 and 8 show the results of MRA implementation of current signals under no-load and full-load conditions.Levels 2 to 6 (78.125-2500Hz) in MRA are the most dominant band and other sub-bands cannot differentiate the difference between healthy and faulty motors.Hence, the feature extraction from levels 2 to 6 could be very effectively realized by using the multiresolution wavelet analysis technique.
After preprocessing, the data of detail coefficients (levels 2-6) were calculated by the 21 statistical parameters to extract the features, such as mean, RMS, skewness, kurtosis, shape factor, crest factor, frequency center, entropy estimation and histogram, and so forth.Some examples of typical features are shown in Figure 9.The distances between different conditions indicate the efficiency of the features.From Figure 9, efficient features show conditions of One problem appears after the feature extraction.There are too many input features (6 × 21 = 126) that would require a significant computational effort to calculate, and may result in low accuracy of the monitoring and fault diagnosis.Thus GA for feature selection was used to isolate those features providing the most significant information for the neural network, whilst cutting down the number of inputs required for the network.The parameters of GA settings are listed in Table 4.
The optimization process for feature selection and neural network using GA is shown in Figure 10.We notice that the convergence speed is similar under no-load and full-load    best systems for no-load and full-load conditions are listed in Table 5 In Table 6 the number of features under no-load is more than that of full-load condition.The reason can be explained that the fault characteristics are not clear in the signals due to no load, and the differences among the faults are comparatively vague, which coincide with time waveforms.Thus more features are needed.Under full-load condition, the fault characteristics are prominent.While other components appear, such as mechanical components that need higher similarity.From Figure 11, we found that the calculation Tian Han et al.In order to demonstrate the efficiency of wavelet transform and feature selection, Tables 7 and 8 are illustrated only using time domain features without wavelet transform and feature selection.Each column of the table shows the relative classifications made by the ART-KNN for a given condition.Each row in the column vector shows that the neural network perceived them, expressed as a percentage of the total num-ber of cases for that condition.Most conditions can manage to achieve an accuracy of 100% in Tables 7 and 8 excluding bowed rotor and rotor unbalance, which are similar to normal condition, and comparatively weak in the stator current signal.

SUMMARY AND CONCLUSIONS
In this paper, a fault diagnosis system for induction motors was proposed.The proposed system uses discrete wavelet transform and feature extraction techniques to extract the        features from stator current signal of electric motor.Then the input features selected by the genetic algorithm enter the input vectors of the ART-KNN for training and testing.Since the network can be carried out on-line, the system can learn and classify at the same time.The proposed system was tested using signals obtained from six induction motors under noload and full-load conditions.One is a normal motor, and the others are subject to faults: broken rotor bar, faulty bearing (outer race), unbalance rotor, bowed rotor, and misalignment.The test results are very satisfying.It is promising for the real time applications.The results of this study allow us to offer the following conclusions.using GA.Also the difficulty of neural network parameter setting has been solved through GA optimization.

Figure 1 :
Figure 1: Architecture of the diagnosis system for induction motors.

Figure 5 :
Figure 5: Faults on the induction motors.

Figure 9 :
Figure 9: Typical feature parameters after feature extraction.(a) Kurtosis and skewness, (b) upper bound of histogram and entropy error.

Figure 10 :
Figure 10: Convergence curves of GA under no-load and full-load conditions.

Figure 11 :
Figure 11: The relationships between calculation time, objective function value, and the number of features under no-load condition.

Table 3 :
Frequency levels of the motor stator current signal.

Table 4 :
Binary genetic algorithm parameters setting for feature selection.
conditions.Different GA parameter setting can get different results.Under the given GA setting, the parameters of the

Table 5 :
Best results after feature selection and network optimization using GA.

Table 8 :
Success rate under full-load condition using only time-domain features (ρ = 0.910).