Calculation of a Health Index of Oil-Paper Transformers Insulation with Binary Logistic Regression

This paper presents a new method for calculating the insulation health index (HI) of oil-paper transformers rated under 110 kV to provide a snapshot of health condition using binary logistic regression. Oil breakdown voltage (BDV), total acidity of oil, 2Furfuraldehyde content, and dissolved gas analysis (DGA) are singled out in this method as the input data for determining HI. A sample of transformers is used to test the proposed method. The results are compared with the results calculated for the same set of transformers using fuzzy logic. The comparison results show that the proposed method is reliable and effective in evaluating transformer health condition.


Introduction
Distribution transformer is one of the most important components in the power grid.The knowledge about the insulation health condition of a transformer is essential for determining appropriate asset management decisions.Recently, the health index (HI) is a useful tool, which combines the current information about a transformer (such as the results of operating observations, field inspections, laboratory testing, etc.) with a single quantitative index, providing its overall health condition, for planning routine maintenance strategies and reflecting the failure rate, and enables transformer operator to obtain a clear idea which is to the end of life [1,2].However, two questions have been encountered for computing the HI of in-service transformers in power distribution networks.
One question is to find an accurate HI with fewer condition monitoring tests in order to improve efficiency due to large amount of it.Various online and offline measurements, broadly categorized as electrical tests, mechanical tests, thermal tests, and oil tests, were carried out on a transformer and a quantitative HI was calculated according to the relevant industrial standards.Although the combination of all the measurement results can provide a better assessment on the condition of the transformers [3], it is not suitable for distribution transformer because the online condition monitoring system may not be directly implemented in distribution transformers by considering economic aspects [4].Moreover, it is not very efficient with large amount of offline measurement results since some of them could be correlated.For example, the moisture content and breakdown voltage of the transformer oil are highly relevant [5].Therefore, it is necessary to identify the most significant measurement results, which clearly represent the health condition of the transformer, in order to improve the HI calculation efficiency.
The other challenge is how to incorporate the significant measurement results into the transformer HI calculation.One relatively easy way depends largely on the basis of industry standards such as IEC, IEEE standards, and CIGRE recommendations with the value and the weighting factor of each parameter [6,7].Fuzzy set theory, membership functions, and expert rules can also be used to produce the transformer HI [1].Since the weighting factor and the membership functions are subjective; the calculated health indexes might not be consistent.The purpose of calculating the HI of transformer is to determine its health status.The health status of transformer is categorized as very good, good, moderate, bad, and very bad in [1].It is classified by EA Technology, a specialist corporation in asset management solutions for owners and operators of electrical asset in UK, as slightly aging, obviously aging, aging beyond the normal range, and extremely poor state.A broad classification, which is good, moderate, and bad, is also used in an asset management and health assessment consulting company called AMHA in [1].In this paper, the HI of the transformer is simply categorized as healthy or unhealthy.Healthy means the transformer has just been put into operation and unhealthy means the transformer is in extremely poor state and it is necessary to replace it.Three main methods have been used for binary classification such as support vector machines, neural networks, and logistic regression, each with its own advantages and disadvantages [8].Neural networks [9] and support vector machines [10] have been proposed to determine the transformer HI.As far as we are concerned, use of binary logistic regression for evaluating the health condition of power transformer has not been published in the literature.
In this paper, binary logistic regression is preferred to classify the data to get the health status of a sample of transformers rated under 110 kV for the following reasons: (1) Binary logistic regression has been widely used in many fields like social science and medical field.
Compared with support vector machines and neural networks, the advantage of it is that it gives a direct estimate of the class probabilities [11,12].The way a doctor analyzes the different symptoms to check a human being with or without disease and suggests cure, same way can be used to predict the transformers' operational status with different internal as well as external parameters.Accordingly, a proper maintenance decision can be made.Drawing a parallel to the medicine, binary logistic regression may also be used to determine the health condition of transformer.
(2) Binary logistic regression can be used for testing of a statistical hypothesis to determine whether the independent variables in the model are significantly related to the outcome variable [13].On the basis of the correct evaluation of the transformer's health condition, the whole calculation efficiency is improved since those insignificant variables can be eliminated and the compelling information can be selected with this method.
(3) The cumulative probability distribution of failure of the electrical components in distribution networks summarized in [14] is consistent with logistic curves.The generalized logistic distribution function was assumed as the logarithmic time to the failure of the electrical insulation equipment [15].The growth and death rates of human with aging are in accordance with logistic curve.Considering the tight correlation between the transformer HI and the transformer failure rate and the analogy between the health condition of transformer and human being, logistic regression may be used to determine the health condition of the transformer.The remainder of this paper consists of five sections.Section 2 explains the key parameters representing the health condition of the transformer.Section 3 includes binary logistic regression and how to use it to calculate the HI of the transformer.Section 4 shows the case study and the discussions.Section 5 concludes the paper.

Input Data for HI Calculation
Various tests can be used to determine the insulation health condition of transformer such as dissolved gas analysis (DGA), oil quality tests, dissipation factor, and insulation resistance.In this paper, water content in transformer oil, total acidity of oil, dissolved combustible gases (DCG), oil breakdown voltage (BDV), dissipation factor (DF), and 2-Furfuraldehyde are selected as the input data, shown in Table 1, for HI calculation.The input data is the same as [1] and the reasons are as follows.
Over the past decades, DGA technique has been applied to detect the transformer fault conditions [16] such as overheating, internal arcing, bad electrical contacts, and partial discharge.It is always used to provide enough information to evaluate the integrity of a transformer.However, it is difficult to calculate the health status of the transformer with all of gas value.Dissolved combustible gases (DCG), summation of the levels of H 2 , CH 4 , C 2 H 4 , C 2 H 2 , and C 2 H 6 , are used for determining HI in this paper in order to improve the calculation efficiency.
Measuring the water content and breakdown voltage of the transformer oil gives a good indication of the overall health condition of transformer oil [1].The contaminated particles and the free water in transformer oil will cause the reduction of BDV and may cause increased partial discharges that accelerate the aging of transformer [16,17].Therefore, the water content in oil and BDV are used for HI calculation.
DF measures the power lost in the transformer oil during its operation [1].The dissipated power, transferred to the oil in the form of heat energy, increases the temperature of the insulating materials and accelerates its aging process.In oil-paper transformers, much attention must be paid to the condition diagnosis of the solid insulation (paper) [14] and the degradation of it can be considered as the primary reason for the end of life [9].It is known that the 2-Furfuraldehyde content, an important indication to the degree of polymerization, directly assesses the health of the solid insulation [1].Therefore, DF and 2-Furfuraldehyde content are considered as input data for determining HI.Since the deterioration of  the transformer's overall insulation is very important and it also increases the acidity of the transformer oil [1], the total acidity of the transformer oil is also adopted in this paper to reflect to some extent the condition of the paper insulation.

Logistic Regression for HI Calculation
In this section, binary logistic regression, a type of probabilistic statistical classification model which can be used to measure the relationship between a categorical dependent variable and several independent variables by using probability scores as the predicted values of the dependent variable [18,19], is used to explore the best fitting model to describe the relationship between the transformer HI and the set of independent variables like BDV, total acidity of oil, 2-Furfuraldehyde content, and DCG in order to determine the health condition of the transformer.
Assume we have a number of transformer classification samples from condition monitoring experiments.Each sample can be in one of the two classes: class 0 (healthy) and class 1 (unhealthy).As mentioned before, healthy means the transformer has just been put into operation and unhealthy means the transformer is in extremely poor state and it is necessary to replace it.A rule based on binary logistic regression is used to determine the probability that a sample belongs to one of the two classes.
Let HI be a variable indicating the health condition of the transformer: HI = 0 means the health condition of the sample transformer belongs to class 0 (healthy) and HI = 1 means that the health condition of sample transformer belongs to class 1 (unhealthy).So, HI is the probability that the transformer belongs to healthy or unhealthy.Let   be a variable indicating the set of independent variables like BDV, total acidity of oil, 2-Furfuraldehyde content, and so forth.The transformer health index for different input value   , illustrated in Figure 1, follows a logistic regression model expressed as [13] where  0 is a constant and   is the coefficient reflecting the contribution of the independent variable   to the HI.That is to say,   is the assigned weighting factor for the input data   .Therefore, the weighting factor is objective and the proposed method has the ability to overcome the shortcoming that the calculated health indexes might not be consistent with fuzzy set theory and expert system, which are subjective and based on the experience of the transformer expert, and might be different from one expert to another.The unknown parameters  0 and   shown in formula (1) can be estimated by maximum likelihood criterion.Suppose the total number of the transformers is ,   = 0 or   = 1 represents the health condition of each transformer being healthy or unhealthy, respectively, and HI  represents the health index of the th transformer.The likelihood function used to determine the unknown parameters  0 and   is as follows [8,[11][12][13]20]: Maximizing formula ( 2) is equivalent to maximizing the following formulation: The Newton-Raphson algorithm can be used to get these values of unknown parameters  0 and   according to the following likelihood equation [20,21]: Wald value, similar to the  value in multiple regression, is used to assess the significance of each coefficient in logistic regression.More information about it can be got in chapter 2.4 of [13].The larger the Wald value is, the more the variable is significantly related to the health index.On the basis of the correct evaluation of transformer's health condition, those insignificant variables with small Wald value can be eliminated and the compelling information can be selected with this method.
Statistical Product and Service Solutions (SPSS) can be employed to carry out the health index calculation.The whole validity of this health index calculation method can be represented by Nagelkerke- 2 , which ranges between zero and unity and measures the proportion of data variation explained by the independent variables [13,22,23].The proposed method is effective when the Nagelkerke- 2 is larger than 0.5.Classification table, shown in Table 3, is used in this paper to evaluate the goodness-of-fit of the health index calculation formula (1) [13,22].The larger the value of percentage correctly classified is, the better the goodness-offit of the health index calculation model is.For detailed information about Wald value, Nagelkerke- 2 , and classification table, please refer to the relevant literatures.The flowchart for calculating the health index is shown in Figure 2.

Case Study
A sample of 30 transformers shown in [1] is used as the input data and SPSS is employed to carry out the health index calculation using binary logistic regression.The transformer,

Independent variables
Health index maximum likelihood criterion Test the significance of the independent variables using Wald value Eliminate insignificant variables on the basis of correct transformer health condition results

Test whole validity of this method using
Test goodnessof-fit of this method using classification table whose health index calculated in [1] is less (larger) than 0.5, is categorized as healthy (unhealthy) in this paper.The results of the diagnostic tests of a sample of 30 transformers in [1] are shown in the appendix.Fuzzy set theory and six membership functions were used and 33 expert rules were determined in order to produce the transformer health index and its results were compared with AMHA [1].The health index value is ranged between zero and unity.The transformer health index in [1] was classified into five conditions: very good, good, moderate, bad, and very bad shown in Figure 3.According to AMHA, the health index values between zero and 0.4 are good, 0.4 and 0.7 are moderate and 0.7 and unity are bad.In this paper, the health index value is also ranged between zero and unity and four health conditions are adopted according to EA Technology: zero to 0.2 is good, 0.2 to 0.45 is moderate, 0.45 to 0.8 is bad, and 0.8 to unity is very bad shown in Figure 4.

Six-Feature
Factors for HI Calculation.The same as [1], the test results of the water content, acidity content, BDV, dissipation factor (DF), DCG, and 2-Furfuraldehyde content are taken as the input data for the proposed HI calculation  method in this section.The unknown parameters  0 and   in formula (1) and the Wald value are shown in Table 4.The Nagelkerke- 2 and the classification table are shown in Tables 2 and 3, respectively.The HI calculation method using logistic regression is effective since the value of Nagelkerke- 2 is 0.79, larger than 0.5 in Table 2. From Table 3 the percentage correct for the healthy (unhealthy) transformer is 90.5% (100%) and the overall percentage is 93.3%.The goodness-of-fit of the HI calculation model is good.According to [24], the dissipation factor and total acidity of oil are highly relevant and the relationship between them is log function.Therefore, in order to optimize and simplify the input data, the parameter with less importance may be deleted.Additionally, from Table 4, Wald value of dissipation factor is 0.053, the smallest value of  a Sig. is also used to assess the significance of each coefficient in logistic regression [13].The smaller the Sig.value is, the more the variable is significantly related to the logistic regression.
all the input six parameters, which means that it is the most unimportant parameter for calculating the insulation health index of transformer with the proposed method.Therefore, the dissipation factor is removed from the input data.

Five-Feature Factors for HI Calculation.
Deleting the dissipation factor, the test results of the water content, acidity content, BDV, DCG, and 2-Furfuraldehyde content are used as the input data for the proposed HI calculation method.As shown in Tables 5 and 6, the Nagelkerke- 2 is 0.782 and the overall percentage is 90%.The health index method is effective and the goodness-of-fit is good.Since the moisture content and oil breakdown voltage are highly relevant and the Wald value of the water content is 1.065 shown in Table 7, relatively smaller than the other four test results, the water content can be removed from the input data for the next calculation.

Four-Feature Factors for HI Calculation.
Deleting the water content, the test results of the acidity content, BDV, DCG, and 2-Furfuraldehyde content are used as the input data for the proposed health index method in this section.The unknown parameters  0 and   in formula (1) and the Wald value are shown in Table 10.The Nagelkerke- 2 and classification table are shown in Tables 8 and 9, respectively.The health index method using binary logistic regression is effective since the value of Nagelkerke- 2 is 0.754, larger than 0.5.From Table 9 the percentage correct for the healthy (unhealthy) transformer is 95.2% (88.9%) and the overall percentage is 93.3%.The goodness-of-fit of the health index where  2 ,  3 ,  5 , and  6 represent acidity content, BDV, DCG, and 2-Furfuraldehyde content, respectively.The health index results of the 30 transformers are shown in Table 11.

Discussion of the Calculation Results.
Table 11 shows the health indices given by the proposed method, by fuzzy-logic method in [1], and by AMHA.To facilitate the comparison, the good health conditions produced by logistic regression method were compared with the very good and good category determined by fuzzy-logic decisions and good category from AMHA in [1].The bad and very bad health conditions produced by logistic regression method were compared with bad and very bad category determined by fuzzy-logic decisions and bad category from AMHA in [1], respectively.
Although some numerical values of the health index are different, for example, the health index value of number 3 transformer is 0.007 and its value given by fuzzy-logic method is 0.3, they are not contradictory since the proposed method for health index calculation is based on the logistic curve.It is a common outcome on the transformers' health index, whose health conditions are good or very good because the initial stage of logistic curve is tending to zero.
From Table 11, three transformers' health conditions evaluated by the proposed method are different with the results shown in [1]: number 1, number 17, and number 25 transformers.It can be explained as follows.
For number 1 transformer, the health index calculated by AMHA is 0.377.However, according to AMHA, when the health index of the transformer is between 0.4 and 0.7, its health condition is moderate.The health condition of number 1 transformer tends to be moderate actually although it is classified as good.Therefore, it might not be incorrect that the health condition of number 1 transformer is categorized as moderate with the proposed method.
For number 17 transformer, the health condition is categorized as bad with the proposed method.However, the health index calculated in [1] is 0.53, according to Figure 2, when the health index of the transformer is between 0.6 and 0.8.Its health condition is bad.The health condition of number 17 transformer tends to be bad actually although it is classified as moderate.Therefore, it might not be called incorrect that the health condition of number 17 transformer is categorized as bad with the proposed method.
For number 25 transformer, the health condition is classified as moderate in [1].In this paper, the health index is 0.181 and the health condition is categorized as good.However, when the health index of the transformer is between 0.2 and 0.45 with the proposed method, its health condition is moderate.The health condition of number 25 transformer tends to be moderate actually although it is classified as good.Therefore, it might not be called incorrect that the health condition of number 25 transformer is categorized as good with the proposed method.
Table 11 also shows that, by using the proposed method, the evaluation results of the remaining 27 transformers health condition in this paper are consistent with the evaluation results in [1].Therefore, binary logistic regression can be used to determine the correct health condition of the transformer.

Discussion of the Selected Input
Variable.The first rule of 33 expert rules determined in [1] is if 2-Furfuraldehyde is very bad, then the health index is very bad.As mentioned before, 2-Furfuraldehyde content, an important indication to the degree of polymerization, directly assesses the health of the solid insulation.From Table 10, the Wald value of the 2-Furfuraldehyde content is 3.393, maximum value in the four pieces of input data.It clearly shows that 2-Furfuraldehyde content is the most important parameter in the proposed method, which is the same conclusion as [1].
Dissipation factor and water content are removed in this proposed method.The reasons are as follows.
The moisture content and BDV are highly relevant.The dissipation factor and total acidity of oil are also highly relevant and the relationship between them is log function [24].Since BDV and total acidity of oil are adopted in this paper, dissipation factor and water content can be removed.BDV and acidity of the transformer oil can provide a good symbol of the health condition of the oil.2-Furfuraldehyde analysis gives a highly convincing transformer health index result since it is related to the degree of polymerization of the paper.Analyzing DCG in the oil also provides a good indicator of transformer aging.Therefore, 2-Furfuraldehyde, DCG, BDV, and acidity, selected by the binary logistic regression, have the compelling information for determining the oil-paper transformer insulation health index.
For reasons discussed above, binary logistic regression can be utilized to determine the correct health condition of the transformer insulation.It can also be used to identify the important available tests for calculating the distribution transformer health index in order to improve the calculation efficiency.
As far as we are concerned, logistic regression does not face the strict assumptions, such as multivariate normality compared with the discriminant analysis, and is much more robust when these assumptions are not met, making its application appropriate in many situations [22].So, technically, many parameters can be handled with logistic regression.For example, in [22], 18 parameters were used as the input data for logistic regression.Therefore, it is possible that more parameters, not only just related to the oil-paper transformer insulation, can be used to determine the health condition of the transformers with the proposed method.4.5.Application.Another 110 kV transformer, shown in [25], has been in service for 25 years until April 2014 in China.No accident has ever happened with good running environment and load rate is about 0.8.The test data is as follows: acid value is 0.172 mgKOH/g; BDV is 39 kV; DCG is 359.72 L/L; 2-Furfuraldehyde content is 3.46 mg/L.The health index is 0.816 calculated with formula (5).According to Figure 4, the health condition of the transformer is very bad and nearly out of service, the same conclusion with higher calculation efficiency with [25].A total of one hundred 10 kV oil-paper transformers were collected in this paper.Their health conditions determined Table 12: Comparison of the health conditions produced by the presented method and the health conditions produced by method in [26].
Method in [26] Proposed method Total by method in [ [26].The results are shown in the last column in Table 12.To facilitate the comparison with these two methods, the good health index condition produced by logistic regression is compared with normal condition category from the enterprise standard of SGCC.Similarly, moderate, bad, and very bad conditions produced by this proposed method are compared with attention condition, abnormal condition, and severe condition category from the enterprise standard of SGCC, respectively.Table 12 summarizes this comparison and it shows eight cases from the results of the proposed method are not identical to the enterprise standard of SGCC method.For example, 2 transformers, shown in the third row of column 2 in Table 12, were categorized as attention condition by the enterprise standard of SGCC method.However, these conditions were classified as good condition by the proposed method.So, the comparison result is a 92% match.

Conclusion
A procedure of calculating health index for determining the insulation health condition of oil-paper transformer using binary logistic regression is introduced in this paper.Acid value, BDV, 2-Furfuraldehyde, and DGA are singled out as the inputs for binary logistic regression for calculating the sample of oil-paper transformers' health index.The presented method is tested using a sample of 30 transformers' field data in [1].The health index results are similar with [1] which shows the effectiveness of the proposed method.The calculation efficiency may be improved and the overhaul cost for transformer may be reduced, since the proposed method deletes two pieces of input data, that is, dissipation factor and water content, compared with [1].Next, further study to test the method with more field data will be conducted.

Table 1 :
Input data for oil-paper transformer insulation HI calculation.

Table 2 :
Model summary with six factors.
[23]x & Snell- Square, similar to Nagelkerke- Square, represents the whole validity of the logistic regression model and measures the proportion of data variation explained by the independent variables in logistic model; more information about it can be got from[23].

Table 3 :
Classification table with six factors.

Table 4 :
Variables evaluation with six factors.
Note: : the estimated coefficient; SE: standard error of the estimate; Wald: Wald value; Df: degree of freedom; Sig.: significance value.

Table 5 :
Model summary with five factors.From Table10, the unknown parameters  0 and   are −1.291,10.870, −0.08, 0.006, and 0.542, respectively.Therefore, the health index calculation model using binary logistic regression is as follows:

Table 6 :
Classification table with five factors.

Table 7 :
Variables evaluation with five factors.

Table 8 :
Model summary with four factors.

Table 9 :
Classification table with four factors.

Table 10 :
Variables evaluation with four factors.

Table 11 :
Health index and health condition of transformer.