Dissolved Gas Analysis of Insulating Oil in Electric Power Transformers: A Case Study Using SDAE-LSTM

Dissolved gas analysis (DGA) is the most important tool for fault diagnosis in electric power transformers. To improve accuracy of diagnosis, this paper proposed a new model (SDAE-LSTM) to identify the dissolved gases in the insulating oil of power transformers and perform parameter analysis. ,e performance evaluation is attained by the case studies in terms of recognition accuracy, precision ratio, and recall ratio. Experiment results show that the SDAE-LSTM model performs better than other models under different input conditions. As evidenced from the analyses, the proposed model achieves considerable results of recognition accuracy (95.86%), precision ratio (95.79%), and recall ratio (97.51%). It can be confirmed that the SDAE-LSTM model using the dissolved gas in the power transformer for fault diagnosis and analysis has great research prospect.


Introduction
Power transformers are considered as the core of the electric power systems, and its running state determines whether the power network is controllable or not. Power transformers tend to decompose and produce some gases during operation. ese gases are dissolved in insulating oil due to stress (electrical stress, mechanical stress, and thermal stress, respectively) [1][2][3]. Generally, there are many kinds of gases dissolved in insulating oil, such as hydrogen (H 2 ), methane (CH 4 ), acetylene (C 2 H 2 ), ethylene (C 2 H 4 ), ethane (C 2 H 6 ), carbon monoxide (CO), and carbo dioxide (CO 2 ) [4]. In fact, the dissolved concentrations of these gases are closely associated with the running state of the power transformers. It is common that Dissolved Gas Analysis (DGA) is considered as a potential way to diagnose the essential fault both in interior and exterior of the power transformer and predict how the power transformer will turn out [5][6][7][8]. erefore, the result of DGA directly provides enough effective information to diagnose the running states of the power transformers.
Over the past several decades, two different types of approaches for obtaining dissolved gas analysis results have been proposed. e first type belongs to mathematical approaches, which includes International Electrochemical Commission (IEC) ratio, Doerneburg ratios, David triangle method, and Rogers ratios. However, these approaches have some limitations, such as incompletely encoding and the absolution of fault boundary distinction. ese approaches are often difficult to exactly describe the running state and trend of power transformers due to its limitations, even though they are simple to operate. Meanwhile, the mathematical approaches cannot be able to provide an interpretation for faults or possible trends [9]. e second type of the approaches is applying the machine learning method that stems from artificial intelligence to overcome these issues. ese approaches applied machine learning method to diagnose the running state of the power transformers and revealed the trends of the power transformers. Compared with mathematical approaches, machine learning-based approaches performed better diagnostic accuracy due to its excellent capabilities of And SDAE contains noise reduction factor, which can avoid noise interference. In order to enhance the ability of model to extract fault data features, an improved SDAE-LSTM transformer fault diagnosis method is proposed. LSTM overcomes the "gradient" existing in traditional neural network in information processing. erefore, SDAE-LSTM transformer fault diagnosis method can not only effectively enhance the model's ability to extract fault features but also can track the variation of gas concentration in oil with time better, improve the diagnostic accuracy, and provide a strong guarantee for the safe and stable operation of power transformers.
In this study, a LSTM-based model is developed to diagnose the essential fault by the DGA. As the traditional fault diagnosis method of transformer difficult to deal with the fault feature information effectively, using Sade to fully mine the fault feature information of transformer is helpful to improve the accuracy of transformer fault diagnosis. Due to the interference and influence of monitoring device, ambient temperature, personnel operation, and other aspects of dissolved gas content in transformer oil, it is necessary to extract features from the original data. Feature extraction can reduce the impact of data on the model performance and improve the training speed and diagnostic accuracy of the model. e processed data are used as the input of LSTM. en, the LSTM is performed for further analysis. To testify the proposed model, its diagnostic accuracy is obtained from case studies. Furthermore, three other approaches, namely, BPANN, SVM, and RF, respectively, are compared with the proposed model. e rest of this study is organized as follows. Sections 2.1 and 2.2 briefly demonstrates the needed methods including SDAE and LSTM. Section 2.3 describes the proposed model. e experiment results of case study are presented in Section 3. Section 4 gives conclusions and extensive discussions.

Proposed Model
2.1. Stacked Denoising Autoencoder. For a given data, DNN stems from ANN is a classifier consisting of a set of models attempts to learn high-level representations with very deep neural networks, which is usually deeper than three layers.
ere are three parts of a typical DNN: an input layer, an output layer, and a lot of hidden layers stacked between the former two layers. e network is firstly layerwise initialized by a type of unsupervised training and then it turns into a supervised way. A large number of layers consisting of a finite number of nodes are introduced to realize highly nonlinear functions in DNNs, which can grasp the presentation of statistical regularity from the data. And the representation features have many virtues for classification as a common way in deep learning [15]. Figure 1 presents the difference between typical machine learning and deep learning. Where, SVM (support vector machine), RVM (relevance vector machine), RF (random forest), and PCA (principal component analysis) are few of the most widely used data dimensionality reduction algorithms. e main idea of PCA is to map the n-dimensional features to the kdimension, which is a new orthogonal feature, also known as the principal component. It is a k-dimensional feature reconstructed from the original n-dimensional features.
An AE is a three-layer network, which consist of an encoder and a decoder. On the one hand, the encoder acts on the input data from a high-dimensional space to a lowdimensional space. On the other hand, the decoder reconstructs the input data from the corresponding codes. On the basis of the training samples x i ∈ Z, the encoder transforms the input vector x into a hidden representation h i ∈ Z through a nonlinear mapping as in the following equation: where α(x) � 1/[1 + exp(−x)] is a nonlinear activation function, which is used for nonlinear deterministic mapping. en, decoder maps hidden representation back to the original representation in a similar way in the following equation: (2) e training process of an AE is conducted to optimize the parameter set, θ � W 1 , b 1 , W 2 , b 2 , that is evaluated by minimizing the reconstruction error between g and x.
Generally, the measurement of the average reconstruction error is mean square error (MSE). e calculation method of MSE is shown in An AE attempts to learn an identity map for the approximation between output and input in the training process. However, the retaining information of the original input does not guarantee to separate effectively the useful information from the noise in the original input. In the other words, it is not helpful to extract the useful features from the original input.
Vincent first proposed a SDAE model to escape an equivalent representation of the original input. A SDAE is a stack of single-layer DAE, which takes a corrupted version of data as input to reconstruct or denoise original input. Figure 2 presents the construction of the SDAE [16]. Above all, a DAE randomly breaks the original input based on q D , so that the original input turns to x. To make y become the hidden representation, take the encoding function f that acts on x. And, the decoding function g finally reconstructs y to z. Here, q D is one of the kinds of functions, such as Gaussian noise, Masking noise, and Salt-and-pepper noise. Likewise, the encoding and decoding function both are active function (i.e., sigmoid function, tanh function, and ReLU function). Generally, the reconstruction error is minimized to acquire the appropriate parameters of DAE. e well-trained encoding function acts on the original input to generate the feature representation, which is served as the input at the back [17]. e SDAE could be built quite mechanically by repeating the certain straightforward processes over and over again as shown in Figure 2. By adding noise in training data, the AE will reconstruct the original input data, so that the DAE not only overcomes the interference of the noise but also captures more effective features and improves the generalization ability of the model.
In general, the training process of SDAE can be divided into two stages: the unsupervised pretraining and the supervised finetune. Unsupervised pretraining is conducted by training the parameters of initialization network, so that each layer is trained as a DAE by minimizing the error in reconstructing its input. After that, the network goes through a supervised finetune by adding a logistic regression layer on top of network. As a result, the generalization performance of the SDAE is improved dramatically. When it comes to information processing tasks, LSTM is very suitable for processing problems that are highly related to time series. LSTM has a very good performance when dealing with time-series data classification problems. Considering that LSTM has solved the "gradient dissipation" problem of traditional neural networks in terms of information processing, it works better than RNN when dealing with some problems that require long-term storage. erefore, this article will use the LSTM model for data processing. LSTM is a collection of basic unit, which is the memory block consisting of memory cell and three gates of control memory cell state, namely, forget gate, input gate, and output gate, respectively. e forget gate plays an important role on removing some unnecessary information, and the input gate determines the effect of input on the state of memory units. Likewise, the output gate selects and sends out the output. Figure 3 presents the structure of the memory block. Given the input vector x t at the time t and the output ht − 1 at the time t − 1, we define the W, U, and b as weight matrix and bias vector. e memory block refreshes the state and decides the output.
Firstly, the forget gate removes the historical and futile information, as can be shown in en, on the basis of the input and historical information, the input gate refreshes the state. It is given by Finally, the output gate gives the current information as follows: where σ represents the logistic sigmoid function and C t represents the state of the memory cell at the time t Meanwhile, f t , i t , and o t represent the output state of forget gate, input gate, and output gate at the time t, respectively. LSTM improves the traditional memory block of RNN model (single tanh layer or single sigmoid layer), that overcome the limitation of long-term preservation for information [18]. By the design of gate structure and memory cell, LSTM can effectively update and transmit the key information in time series. Compared with the traditional RNN model, LSTM performs better on information selection and learning characteristics of time series, which can apply the long-distance information to the current prediction. e internal loop structure of the LSTM neural network is shown in Figure 4 [19]. As can be seen from it, the input of each time step is a vector, and the state of the hidden layer gradually moves backward according to the time series.
As a special structure, it is necessary to input a sequence of feature vectors, which consists of continuous feature vector from M time steps. In that, it is necessary to construct the input sequence of LSTM model before training. Given the input vector x t at the t time step, the constructed sequence is x t−M+1 , x t−M+2 , . . . , x t . More specifically, the first sequence is x 1 , x 2 , . . . , x M and the second sequence is x 2 , x 3 , . . . , x M+1 . Likewise, the other sequences are acquired.

SDAE-LSTM Model.
In this section, we apply a SDAE-LSTM model consisting of deep neural network and LSTM to diagnose the faults of the power transform. e framework of SDAE-LSTM model is shown in Figure 5. As seen in Figure 5, the first step is to extract the features from the original data vector by SDAE. en, the last hidden layer of the SADE is regarded as the input of the LSTM, which is an effective representation of the original data.
According to the common faults in daily monitoring, this paper establishes the following data tags of power transformer status: classified into normal pattern (F1), discharge with high energy pattern (F2), discharge with low energy pattern (F3), partial discharge pattern (F4), hightemperature thermal pattern (F5), middle-temperature thermal pattern (F6), and low-temperature thermal pattern (F7).
In this study, we have proposed a SDAE-LSTM model for transformer incipient fault diagnosis. As a typical unsupervised deep learning network, SDAE can dig the deeplevel features of unlabeled data. Without adding any labels to the input data, training can derive the intrinsic feature of the input data, so that the input intrinsic can extract high-dimensional features; tap the potential value within the data.
rough the monitoring of relevant data during transformer failure, it can be found that there are large numerical differences between different parameters. If the original gas concentration data are directly placed in the model for training operations, the weight setting may be affected due to the large difference in data. Due to different orders of magnitude, a larger value will affect the training results, resulting in distortion of the training results of the entire network. In order to avoid the influence of different original data orders on training weights, it is necessary to preprocess the gas data in transformer oil. Because the data range is relatively concentrated, this article uses a standardized method. A normalization method has been developed to normalize the input vector as follows: where x presents the original vector, x * means standardized vector, and x max and x min represent the maximum and minimum separately. After the LSTM training, the output results are the final classification of the original input features, and all the

Case Studies
In this section, case experiments confirmed that the proposed model has good classification performance. All experiments are coded in PYTHON 3.7 and implemented on Intel Core i7-6200CPU, 2.30 GHz computer with an 8 GB memory. e SDAE and LSTM are established by TensorFlow of Python.
In this case, we collect the practical data consisting of normal and abnormal state from China national grid and China southern power grid, which is called the original dataset.
e dataset was composed of 1723 samples and randomly divided into a training set (1378 samples, 80 percentage) and a testing set (345 samples, 20 percentage). On basis of that, we have conducted some experiments to validate the efficiency of the proposed model in this section.

Evaluation of Fault Diagnosis Accuracy.
e fault diagnosis accuracy is quantified by recognition accuracy (RA), which has been widely used in machine learning models. e criterion means the proportion of samples recognized correctly in the whole samples recognized, which represents the overall performance of the SDAE-LSTM model: where a f means the number of samples recognized correctly and c f gives the number of the whole samples recognized. e fault diagnosis accuracy also can be evaluated by precision ratio (PR), which gives the information on the percentage of particular samples recognized correctly in the entire same samples. is criterion reflects the performance on recognizing correctly the particular patterns: where b f means the number of samples, which come from the same pattern. b f 1 gives the information about the number of samples recognized correctly in the whole same samples.
In addition, the recall ratio (RR) as an effective index is utilized for measuring the performance of the proposed model in fault diagnosing. It demonstrates the percentage of a particular pattern recognized in the whole number of the same patterns. Moreover, recall ratio and prediction ration are regarded as two interconstrain measures: where d f means the whole number of samples. In this study, RA not only is an indicator in measuring, but beyond that, PR and RR are effective indexes for evaluating the fault diagnosis. In that, performance of the proposed model can be quantified objectively and overall.

Comparison with Different Inputs.
e RAs, PRs, and RRs are attained from the same model (LSTM model) with different inputs in this section, classified into DGA, composition ratio, IEC three ratio codes, Rogers four-ratio codes, noncode ratio, and SDAE. at is to say, the experiments are conducted to verify the importance of SDAE for LSTM  Table 1 gives the information about RAs, after inputting the different vectors to the same model.
As is evident from the Table 1, the model gains the best recognition accuracy when it comes to SDAE, which attained 95.86% on average alone. Four recognition accuracies in seven patterns are more than 95%, especially two of them are more than 98% dramatically. Moreover, the model performs well in terms of recognition accuracy on average, which are moderately worse than SDAE-LSTM model, 93.49% in composition ration, 93.85% in Rogers four ratio codes, and 93.87% noncode ratio. Every RA is no less than 90%, whilst no more than two patterns' recognition accuracies surpass to 95%. In a stark contrast, negative phenomena are found in results of the model, which are inputted by DGA. e average recognition accuracy is merely to 89.74%, and most of patterns are recognized to less than 90%. After all, it is not uncommon due to its originality in DGA. In that, the SDAE attaches its importance on the recognition ability of the model. Table 2 presents the precision ratios for different inputs by the proposed model. A thorough analysis from the table is that the precision ratios of normal pattern are acquired to more than 95% for all different inputs. For other patterns, the model has poor performances for the former five inputs, where one of patterns acquires the precision ratio less than 90% at least. Especially, most of precision ratios are less than 90% when it comes to DGA as the input of the proposed model. Interestingly, the most outstanding performance is shown for SDAE input in terms of the average precision ratio, which is to considerable 95.79%. e next one is slightly less by 2% around on average, 94.84% in noncode ratio, 92.59% in Rogers four-ratio codes, 92.67% in IEC three-ratio codes, and 93.46% in composition ratio, respectively. ere is no doubt it is not a good result for DGA by the model, which is merely to 89.83% on average. As a result, the proposed model reflects the most potential performance for SDAE when it comes to precision ratio. Table 3 illustrates the recall ratios for different inputs. As shown in Table 3, the LSTM model for SDAE reflects better recall ratios than other inputs, which attained 97.51% on average alone, and even recall ratio and prediction ration are regarded as two interconstrain measures. More specifically, every pattern's recall ratio almost is more than 95%, which are considerable. On the contrary, there are not great generalization capacity in the terms of the average recall ratio via the model, after inputting the former five inputs separately, 91.92% in noncode ratio, 89.58% in Rogers four-ratio codes, 83.44% in IEC three-ratio codes, 76.88% in composition ratio, and 82.41% in DGA. Most recall ratios of seven patterns are less than 80% for the former five inputs. Considering that, according to the RRs' comparison for different input, the model is more sensitive to recognize seven patterns of electric power transformers for SDAE.

Comparison with Other Methods.
Backpropagation neural network (BPANN), support vector machine (SVM), and random forest (RF) are the most traditional machine learning methods to recognize patterns in this interest. In this section, four comparative experiments are conducted to illustrate the potential merits of the SDAE-LSTM model which compare the RAs, PRs, and RRs with these three common methods. e parameters of these three models are shown in Table 4. Table 5 gives the information about recognition accuracies of different models for dissolved gas analysis process of insulating oil in electric power transformers. As shown from it, the SDAE-LSTM model achieves the best average recognition accuracy to 95.86%. e recognition accuracies of patterns surpass 95% except three of them. In particular, the recognition accuracies of normal discharge with high energy and middle temperature thermal pattern are more than 97%. e next one is RF model, which is moderately less by 3.6% on average RA. Interestingly, two of seven perform well, 95.89% in normal pattern, and 95.13% in partial discharge pattern. e rest of them have middle positions, even the RA of low temperature thermal pattern is merely to 88.54%. e BPANN and SVM have the worst average RA, which are 82.90% and 86.98%, respectively. It is obvious that the RAs of patterns are less than 90% except only one pattern, sometimes even less than 80%.
erefore, the proposed model performs better than other three common approaches on the recognition accuracies. Table 6 represents detailed results of different models in terms of precision ratio. As is the most evident from it, the SDAE-LSTM model reflects the largest average precision ratio, which attained 95.79%. More specifically, the precision ratios are more than 95% considerately except the precision ratios of partial discharge pattern (93.86%) and low temperature thermal pattern (93.97%). Moreover, the average precision ratio of RF model is attained to 91.04%, which is less than 2.93%. Most of the seven patterns' precision ratios are more than 90%. In addition, the SVM model gets a worse average precision ratio (85.83%), since four patterns' precision ratios are less than 85%. In particular, the SVM model gains the smallest value in discharge with low patterns (70%) among twenty-eight data when it comes to precision ratio. Furthermore, the average precision ratio of BPANN model is the smallest among the figures of other three models, which is merely to 81.34%. As a consequence, these experiment results reveal that the SDAE-LSTM model performs best among four models in terms of precision ratio. Recall ratios' comparison of different models is shown in the Table 7. To compare rationally the SDAE-LSTM model with other approaches, recall ratios are conducted as well due to interconstrain relationship between precision ratio and recall ratio. As can be seen in Table 7, the SDAE-LSTM model reflects better recall ratio performance than the other three traditional machine learning models, while the single recall rate is greater than 94.4%, and the average recall rate is reaching 97.51%. e following one is the RF model, which is 91.64% in average when it comes to recall ratio. Interestingly, although one of patterns is merely 87.50% (discharge with low energy pattern), the rest are more than 90%. Comparing with RF and SDAE-LSTM models, the average recall ratios are in stark contrast both in BPANN model and SVM model, 78.57% and 84.24%, respectively. It should be clear that sharp declines are found in these fourteen datums, such as discharge with low energy patterns in BPANN model (46.88%) and SVM model (65.63%). Arising from analyses represented, performance of the SDAE-LSTM model is the best when it comes to recall ratio.
In conclusion, merits of the SDAE-LSTM model are revealed in three aspects in comparison with other three   Mathematical Problems in Engineering traditional machine learning models (BPANN, SVM, and RF models), classified into recognition accuracy, precision ratio, and recall ratio. To be more specific, the recognition accuracies of seven patterns all reach high level, especially in normal and middle temperature thermal patterns. Similarly, higher precision ratios and recall ratios are found in the SDAE-LSTM model than in the other three models. To put it in a nutshell, the SDAE-LSTM model is more effective model for pattern recognition in the dissolved gas analysis process of insulating oil in electric power transformers.

Conclusions
is study proposes a new model based on LSTM to analyze the dissolved gas of insulating oil in electric power transformers. Two parts' experiments are conducted to evaluate the proposed model. As the comparison experiments for different inputs (DGA, composition ratio, IEC three-ratio codes, Rogers four-ratio codes, noncode ratio, and SDAE) show, the SDAE-LSTM model is superior on recognition accuracy, precision ratio, and recall ratio [20,21]. Next, another comparison experiment result demonstrated that the proposed model achieves better performance than three traditional machine learning models (BPANN, SVM, and RF models). When it comes to recognition accuracy, the SDAE-LSTM attains 95.86%, which is obviously higher than BPANN (82.90%), SVM (86.96%), and RF (92.17%). Similarly, the SDAE-LSTM model gains better precision ratio (95.79%) than BPANN (81.34%), SVM (85.83%), and RF (91.04%). Obviously, a larger advantage is found in terms of recall ratio for the proposed model than of BPANN (78.57%), SVM (84.24%), and RF (91.64%) as well, which is 97.51% alone.
at means the SDAE-LSTM model plays more promising role for recognizing patterns in dissolved gas analysis of insulating oil in electric power transformers.
Considering that the SDAE-LSTM model is a promising model for analyzing dissolved gas in the insulating oil of power transformers by identifying patterns, we can further study its application in online monitoring of power transformers. In addition, the comparisons with different capacities of samples are also interesting.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.   Mathematical Problems in Engineering 9