Investigations on Incipient Fault Diagnosis of Power Transformer Using Neural Networks and Adaptive Neurofuzzy Inference System

,


Introduction
Power transformer is of prime importance and costly element of the power system and the reliability of the system then depend upon its well-being.Close and continuous monitoring and maintenance of it restore the service conditions.Thermal and electrical stresses can cause the incipient faults which further lead to failure of the equipment.Fault detection at the early stage can save the equipment.The important tool to diagnose the faults is DGA.Rogers ratio, Doernenburg ratio, IEC ratio, and Duval triangle are some of the standards established for diagnosis.The ratio methods are based on the single fault prediction but there are the situations of multiple faults and the diagnosis becomes erroneous.Among the existing methods for identifying the incipient faults, DGA is the most popular and successful method [1][2][3].When there is any kind of fault, such as overheating or discharge fault inside the transformer, it will produce a corresponding characteristic amount of gases in the transformer oil.This concept is the underlying principle of DGA.Through the analysis of the concentrations of dissolved gases, their gassing rates, and the ratio of certain gases, the DGA method can determine the type of fault of the transformer.The commonly collected and analyzed gases are H 2 , CH 4 , C 2 H 2 , C 2 H 4 , C 2 H 6 , CO 2 , and CO.An ANSI/IEEE standard and IEC publication 599 [4,5] describes three DGA approaches such as (1) key gas method; (2) Rogers ratio method; and (3) Doernenburg ratio method.All three methods are computationally straightforward.However, these methods, in some cases, provide erroneous diagnoses as well as no conclusion for the fault type.The key gas method based on the determination of the key gas provides the basis for qualitative determination of fault types from the gases that are typical or predominant at various temperatures.Now, if the fault is very severe, then all of the gas concentrations will be high but yet insufficient to register a fault when using the values specified in IEEE standard [2].Also, the gas ratios obtained for the particular transformer sample may not fall within ANSI/IEEE-specified ranges, leading to the failure of the ratio methods for transformer diagnosis [6].In recent years, many researchers studied the application of artificial intelligence, such as neural networks and fuzzy set theory to increase diagnosis accuracy [6][7][8][9][10][11][12][13][14][15].
The fuzzy systems, though good at handling uncertainties, could not learn from previous diagnosis results and, hence, are not able to adjust the diagnostic rules automatically [10][11][12][13].To account for uncertainties, the artificial neural networks (ANNs) have been proposed to diagnose the transformer faults because of their superior learning capabilities [6][7][8][9].In general, fuzzy systems and neural networks deal efficiently with two different areas of information processing.Fuzzy systems are good at various aspects of uncertain knowledge representation, while neural networks are efficient structures that are capable of learning from examples.Both techniques complement each other.The generalized regression neural network was used in [14] but since this network is a one-pass network, efficiency is somewhat low for fault detection.An application of fuzzy clustering and a radial basis function neural network has been reported [15]; however, when one type of fault is in the neighborhood of the other type of fault, the chances of false diagnosis may increase.
In this paper, the investigations on transformer fault diagnosis using supervised neural networks and ANFIS has been made.In the initial work the diagnosis was carried out using backpropagation (BP) and radial basis function (RBF) neural network, which belongs to the category of supervised networks and is presented in Section 2 and, at the later stage, the diagnosis by TSK model of ANFIS in Section 3. Section 4 provides the diagnosis results of investigations by all the methods listed above.

Supervised Neural Networks and
Training Algorithms Network design includes selection of input, output, and hidden layers network topology and weighted connection of nodes.The corresponding connection weights are also determined in the process.
Figure 1 presents the artificial neural network used in fault diagnosis of power transformers and consists of three-layer feed-forward structure with the input, hidden, and output layers.Only one hidden layer is shown to understand its architecture; however the designed network has three hidden layers.The nodes in each layer receive input signals from the previous layer and pass the output to the subsequent layer.The nodes of the input layer receive a set of input signals from outside system and directly deliver the input data to the input

Input layer
Hidden layer Output layer

Weights
Bias Bias of the hidden layer by the weighted links.Network is designed for seven inputs as the concentrations of gases and one output corresponding to the fault.Three hidden layers consisting of 7-7-1 neurons are selected for better design, so as to reveal the hidden relationship between faults and gas composition.

Levenburg-Marquardt Algorithm.
The network proposed is then trained and tested using Levenburg-Marquardt algorithm.This algorithm needs less memory space and is fast in operation as compared to gradient descent and other algorithms.The learning steps used in this algorithm are as shown below.Each learning iteration (epoch) will consist of the following basic steps: (1) compute the Jacobian matrix,  (by using finite differences or the chain rule); (2) compute the error gradient (3) approximate the Hessian (); (4) using the cross product Jacobian (5) solve ( + )  =  to find ; (6) update the network weights  using ; (7) recalculate the sum of squared errors using the updated weights; (8) if the sum of squared errors has not decreased, discard the new weights, increase  using 5, and go to step 4; (9) else, decrease  using 5 and stop, where  = vector of network errors, -damping or scaling factor,  = identity matrix, and  is the increment at each iteration.

Radial Basis Function (RBF) Network. As shown in
Figure 2, network consists of 3 layers (input, hidden, and output).Input layer is made up of nodes that connect network

Hidden layer
Input layer to environment.At input of each neuron (hidden layer), distance between neuron center and input vector is calculated applying Gaussian bell function to form output of the neurons.Output layer is linear and supplies response of network to activation function.Selection of radial basis function width parameter and number of radial basis neurons in the hidden layer is an important step.Larger width results in smaller size network but faster execution of data.Maximum number of neurons may be the number of inputs but the minimum neurons can be determined experimentally [10].Network structure solely depends upon the number of neurons in the hidden layer.Training the network with the performance parameters specified, yield the number of neurons and the diagnosis error.Learning strategies includes the centre and spread and output layer weight learning.Centers can be fixed randomly or self-organized or supervised selection can be employed.
Clustering also can be performed in self-organized learning.Supervised learning of RBF network is performed using least mean square (LMS) algorithm.RBF training with supervised selection of centers and spread is done by using the following equations.
Output layer weights (linear weights): Position of centers is given by Spreads of centers (hidden layer): where  is 1 × 1,  is 1 ×  weight vector, ∑ −1  () is a  ×  matrix, and "" is the feature dimension. 1 ,  2 , and  3 are the step sizes.()/  () is the change in error with respect to weight at each iteration.()/  () is the change in error with respect to the centre.
For linear combination of the function, () = ∑   ⋅ℎ() is used.Here ℎ() is a Gaussian function: where  is centre vector of a region,  is an input vector, and  is the radius or width of receptive field.The sum squared error to be minimized between the actual input and target is given by the following equation: where "  " is the desired output and "(  )" is the network output.
In [12] OLS based RBFNN is proposed to optimize the parameters of the network for transformer fault diagnosis.Authors selected sufficient training exemplars from previous literature and the performance of the network in terms of misclassification and hide neurons is presented.A method based on -means clustering algorithm and RBF neural network is proposed in [13] with an accuracy of 82.2% and 78 neurons in the hidden layer with data base from the research papers.SOM cell splitting algorithm is used for optimal network architecture of RBF network in fault classification of power transformers [14].

Fuzzy Inference System (FIS).
It is generally difficult to determine the hidden relationship between the gas concentrations and the fault type.Fuzzy set theory can be used to handle such type of uncertainty.In the proposed methodology, the gas concentrations based on the range are selected as low (L), medium (M), and high (H).The bell shaped membership function is used for all input gases and fuzzy inference rules are then developed.FIS consists of antecedents (if) and consequents (then) part and the rules are of the form.
If MH = M and AE = M and EE = L and EM = H, then the condition -Rule 1.
Similarly using the same gas ratios with different linguistic variables other than defined in rule 1, many such rules can be formulated as per the experience of the researcher.However using the concentration of 5 prominent gases with assigned linguistic variables and membership functions, various rules can be generated.
Using the max/min composition, the fuzzy inference, that is, the antecedent, consists of rules as shown below.ANFIS combines the best features of fuzzy systems and neural networks in which the representation of prior knowledge into a set of constraints, that is, network topology to reduce the optimization search space, is performed by fuzzy system and adaptation of backpropagation to structured network to automate fuzzy controller parametric tuning is done by neural network.Fuzzy inference is the actual process of mapping from a given input to an output using fuzzy logic.The process involves membership functions for input and output, fuzzy logic operators, and if-then rules.The architecture of fuzzy inference system is shown in Figure 3.
The process involves fuzzification, inference engine or rules, and defuzzification.The crisp inputs are to be fuzzified in the range from 0 to 1, using different membership functions with values of each linguistic label [15].Using International Electrotechnical Commission (IEC) Code, Central Electricity Generation Board (CEGB), and American Standard Test Method (ASTM) standards to build the fuzzy logic system as a case study of DGA data of power transformer is proposed [16], in which crisp logic and fuzzy logic are used to interpret the fault type.
The input feature selection is based on competitive learning and neural fuzzy model in which the fuzzy rule base for the identification of fault was designed by applying the subtractive clustering method which is very good at handling the noisy input data [17].Verification of the proposed approach has been carried out by testing on standard and practical data and has been shown in the efficient method which uses radius parameter in subtractive clustering with 96.7% diagnosis accuracy as compared to Rogers ratio and other neural fuzzy techniques.
The most important methods used in the FIS are Mamdani and Takagi-Sugeno-Kang (TSK) method.The main difference lies in the consequent of fuzzy rules.In the proposed work, TSK method of FIS has been used in the fuzzy toolbox of matlab, in which the fuzzy rules are generated from the input output dataset of 563 power transformer oil samples.
TSK model combines fuzzy sets in antecedents with crisp function in output: if 1 is  and 2 is , then  = (1, 2); Here  and  are the fuzzy sets in the antecedent, while  = (1, 2) is a crisp function in the consequent.(1, 2) is the polynomial in the input variables 1 and 2.Small, medium, and high are the nonfuzzy sets with the membership functions used in the present work.
In the architecture of TSK ANFIS model, five nodes are available and can perform the various functions.In layers 1 and 4, the nodes are adaptive and represented by the node functions.In layers 2, 3, and 5 the nodes are fixed.The overall output computed as the sum of all incoming signals at node 5 is given by Overall output where   is the normalized firing strength from layer 3 and   is the output of th rule.

Results and Discussion
In Section 2 the architecture, design and algorithms used for training BP and RBF network are discussed in detail.In Section 3, the FIS methodology based on the gas ratios or the concentration of gases is highlighted.Generated FIS with input, output and rule structure is also presented.
In this diagnosis, eight faulty conditions, namely, arcing, corona, low energy discharge (D1), high energy discharge (D2), thermal fault of temperature of 150-300 degrees Celsius, thermal fault of temperature between 300 and 700 degrees Celsius, thermal fault of temperature >700 degrees Celsius, and corona with solid insulation degradation, normal or healthy condition are considered.The incipient fault conditions are based on the energy and temperature at which the seven prominent gases such as H 2 , CH 4 , C 2 H 6 , C 2 H 4 , C 2 H 2 , CO and CO 2 evolved.Generally CO and CO 2 are responsible for solid insulation degradation.The chances of failure of equipment due to solid insulation degradation are less; hence five gases are enough to make the final diagnosis.But all the gas concentrations are considered and the additional combinational fault, that is, corona with solid insulation degradation is given due consideration.
DGA interpretation is mainly used as the basis in dealing with all the faulty conditions.Total 563 DGA samples of power transformer from the reputed ISO certified testing unit were used in the data base.Out of 563 samples, 40, 30, and 30% were used for training, testing, and model validation, respectively.The network structure and diagnosis results in terms of error as the performance measure in diagnosis were carefully studied and the comparative performance of the networks is presented.time for execution, but the number of neurons in the hidden layer as finally determined during the experimentation was 135.More numbers of samples in the data base, more numbers of neurons, and hence better accuracy in diagnosis were obtained.The network performance seems to be superior for this problem.Network training for SSE, number of neurons, and epochs is shown in Figure 6 and the error between actual network output and target and the performance curve is shown in Figures 7 and 8.

Diagnosis by ANFIS.
The well designed ANFIS was trained where the number of epochs was set at 3000 and the goal was set as 0. ANFIS training on 563 transformer oil samples was performed using 3 g bell membership functions for input and a linear function for the output.And method is used for input and weight average for the output.For defuzzification, weight averaging is used.The TSK model was then tested on 40% samples and the testing and validation were performed on the remaining 30% samples.The trained  ANFIS provided the diagnosis in terms of root mean squared error (RMSE) as 0.28, indicating an accuracy of 93.83%, and the best validation performance was obtained with an accuracy of 92.33%.
The input membership functions have been shown in Figure 9.The ANFIS module with 7 inputs, 1 output, and 2187 rules automatically generated by the trained system model is shown in Figure 10.Since the input parameters are 7 and 3 membership functions which are used, the rules generated are 3 7 = 2187.The rule viewer is shown in Figure 11 with the rules for 7 inputs and the output of the system.Figure 12 shows the performance curve with the root mean square error (RMSE) and the number of epochs during training.It has been observed that the trained ANFIS provides better performance at epoch number 3000.The FIS used in this work generates more rules and some may be redundant but the performance is reasonably good.Other membership functions, for example, triangular, trapezoidal, and sigmoid,   were also tried but the diagnosis error was too high which is restricted to inclusion in the present study and only the single membership function results are presented.
The comparative diagnosis performance of the methods is shown in Table 3.To overcome the drawbacks of neural networks as stated earlier, ANFIS could have been the best choice.The ANFIS is slow in convergence as compared to RBF and occupies more memory space but since it possesses the advantages of both least square and gradient descent, better performance is revealed during the investigation.Matlab codes have been used to match the ratio codes and the related fault.55 DGA samples out of the available 563 samples have been used and only 29 samples have been classified correctly showing an accuracy of 52.73%.This method is not so accurate and sometimes tends to have no diagnosis in many cases.It is not able to cover the entire range of input space.shown in Figure 13.This method has proved to be accurate but mainly depends upon the concentration of gases at medium and low level which affects the diagnosis.
When using Duval triangle for diagnosis, C 2 H 2 , C 2 H 4 , and CH 4 values from the testing laboratory are plotted and a point that lies within one of the triangle fault zones or rarely might fall on the borderline between two fault zones will determine the particular fault.

Conclusion
A comparison of the diagnosis ability of backpropagation (BP), radial basis function (RBF) neural network, and adaptive neurofuzzy inference system (ANFIS) has been investigated and the diagnosis results in terms of error measure,

Figure 6 :
Figure 6: Network training showing SSE and epochs.

Figure 7 :
Figure 7: Error between actual network output and target.

Figure 12 :
Figure 12: Performance curve showing the RMSE and number of epochs.

4. 5 .
Diagnosis Using Duval Triangle.Michael Duval developed Duval triangle utilizing a data base of thousands of DGAs of transformers for diagnosis.The Duval triangle is

Table 2 .
4. Diagnosis Using Rogers Ratio Method.It uses 4 gas ratios such as CH 4 /H 2 , C 2 H 6 /CH 4 , C 2 H 4 /C 2 H 6 , and C 2 H 2 /C 2 H 4and has been coded as i, j, k, and l, respectively, and the ranges of ratios are shown in Table1.Fault diagnosis suggested based upon the gas ratios is shown in

Table 2 :
Fault diagnosis based on Rogers ratio codes.

Table 3 :
Results of diagnosis on transformer oil samples.Total number of DGA samples in database = 563.