Transformer Fault Identification with an IF-1DCNN Based on Informative Integration of Heterogeneous Sources

Only using single feature information as input feature cannot fully reﬂect the transformer fault classiﬁcation and improve the accuracy of transformer fault diagnosis. To address the above problem, the convolution neural networks’ model is applied for transformer fault assessment designed to implement an end-to-end “diﬀerent space feature extraction+transformer state diagnosis classiﬁcation” to enable information from possibly heterogeneous sources to be integrated. This method integrates various feature information of the power transformer operation state to form the isomeric feature, and the model can be used to automatically extract diﬀerent feature spaces’ information from isomeric feature quantity using its unique one-dimensional convolution and pooling operations. The performance of the proposed approach is compared with that of other models, such as a support vector machine (SVM), backpropagation neural network (BPNN), deep belief network (DBNs), and others. The experimental results show that the proposed one-dimensional convolution neural networks based on an isomeric feature (IF-1DCNN) can accurately classify the fault state of transformer and reduce the adverse interaction between diﬀerent feature space information in the mixed feature, which has a good engineering application prospect.


Introduction
Power systems are an important basic engineering tool for social and economic development; large power transformers constitute very expensive and vital components in electric power systems [1], and the working state of power transformers affects the operational stability of power systems [2,3]. erefore, their normal and continuous service is vital [4]. In the long-term use of transformer equipment, there are inevitable ageing and failure hidden dangers [5]. Moreover, the cost of a power transformer is high, so it is necessary to diagnose the state of the transformer and take effective measures to fix faults that will occur to ensure the reliable operation of the transformer and reduce the occurrence of faults [6]. ese diagnostic methods include various electrical, chemical, mechanical, acoustic, and other diagnostic methods [7][8][9].
At present, dissolved gas analysis (DGA) is one of the best methods for detecting an abnormal situation in the transformer [10][11][12] and has been widely used to monitor the state of power transformers. e ratio of gas content in DGA is closely related to the type of transformer fault [13]. At the same time, the two different characteristic space data can reflect the operation state of the transformer from different angles [14]. Early methods for interpreting DGA data include the Doernenburg ratio method, Rogers' ratio method, and IEC ratio method, which have been developed and validated using large datasets of equipment in service. In these methods, multiple numeric thresholds and gas boundaries are commonly set to classify features of the dissolved gas data. However, insufficient ratio combinations or "code absence" may also invalidate the interpretation of DGA [15,16]. erefore, the fault diagnosis accuracy rate of these methods is relatively low [12,17].
With the rise of machine learning, machine learningrelated algorithms have been applied to transformer fault diagnosis. In the early stage, neural networks [18][19][20], support vector machines (SVMs) [14,21,22], and other algorithms [23][24][25] have improved the accuracy of transformer fault diagnosis. Jia et al. [19] proposed a wavelet neural network diagnosis model based on the improved artificial fish-swarm algorithm, and gas content is used as the input of the diagnosis model. Equbal et al. [20] proposed that the artificial neural network has been trained using the weighted fault gas concentration for transformer incipient fault diagnosis. Li et al. [25], based on clustering techniques, propose a new method for fault diagnosis of transformers with the DGA and, corresponding to the initialization process, calculate its membership to the reference faults by the single feature.
e correlation between single feature information and fault is limited, and the hybrid feature can reflect the operation state of the transformer from different angles. In reference [22], a set of new feature combinations is selected as input from the mixed feature quantity by the genetic algorithm. However, different new feature combinations may be obtained according to different sample data. At the same time, these methods have their own shortcomings. For example, the learning speed of a neural network is poor in small-sample data, but the convergence speed is too slow in large-sample data [26]. Although SVMs can show outstanding performance in processing smallsample data [27], they are essentially proposed on the basis of binary classification problems. When dealing with the multiclass classification problem of transformer state diagnosis, the overall performance efficiency of the algorithm is not high. And, the parameter optimization of the kernel function is relatively difficult. e parameter optimization problems belong to NP-Hard problems, and some intelligent optimization methods were proposed to solve the NP-Hard problems [28], such as genetic algorithm (GA), particle swarm optimization (PSO), and differential evolution (DE) [14,19,22,[29][30][31].
In recent years, the application of deep belief networks (DBNs) in transformer fault diagnosis models [32][33][34][35] has achieved high precision, but it has a disadvantage that the network structure and parameters are basically determined by experiences, and it needs the evolutionary algorithm to avoid premature convergence and improve the global search ability [36]. And, DBNs require a large amount of unlabeled data for pretraining. In reference [24,32,34], these models classify transformer faults based on hybrid DGA features. Convolution neural networks (CNNs) [37][38][39] are also used in research on the transformer fault diagnosis methods. In reference [33,35,[37][38][39], single-characteristic information that can reflect the operation state of a transformer is used as the input of the diagnosis model, but the hybrid feature of multiple feature information is not considered as the input.
At present, transformer state diagnosis methods are mainly based on the single feature or hybrid feature. However, the single-characteristic information that can reflect the operation state of a transformer cannot fully reflect the transformer fault classification, and it is often difficult to make a more correct diagnosis of the fault. For the hybrid feature, these new learning models do not specially distinguish different feature space in hybrid features to train models. At the same time, research on CNNs in power transformer fault diagnosis needs further study.
is paper presents a transformer fault diagnosis model based on one-dimensional convolution neural networks (1DCNN), and this model can enable informative integration of possibly heterogeneous sources. To make the extracted features from the convolution and pooling operations in the model independent of different feature information, it is necessary to transform the mixed feature to an isomeric feature and input it into the 1DCNN model. A schematic diagram of the 1DCNN based on an isomeric feature (IF-1DCNN) is shown in Figure 1.

1DCNN Network Model Theory
CNNs have unique network layers: convolutional layers and pooling layers. Under the interaction of the two layers, the features of the input data can be extracted automatically, and the dimension of the data features can be reduced at the same time. CNNs have many different network structures. A classic LeNet-5 CNN structure is shown in Figure 2.

1DCNN
2.1.1. One-Dimensional Convolution. In general, the convolutional layer uses a kernel function to deal with a onedimensional feature X � [x 1 , x 2 , x 3 , . . . , x N ] to map the features. e output of the one-dimensional feature is used to extract the feature of X. e process can be expressed by the following formula: where s is a one-dimensional output feature, s(i) is the ith output feature element, W is the convolution kernel of order M, w m is the mth element of W, and x i−1+m is the (i − 1 + m) th element of input X. An activation function needs to be added after the convolution operation to introduce the nonlinear factors into the neuron node and transform the learning model into a nonlinear model. Activation functions include the sigmoid, tanh, and rectified linear unit (ReLU) functions. In this paper, the current mainstream ReLU function is used, which is a function of bionic principles. It can more effectively carry out gradient descent and backpropagation, avoid gradient disappearance, and reduce the temporal and spatial complexity. e ReLU function can be expressed by Figure 3 and formula (2).
It can be seen from the formula that when the input is positive, the output remains the same; when the input is nonpositive, the output becomes zero, which makes the neurons in the model network have sparse activation.
Using all-0 filling to keep the edge of the output matrix unchanged, but all-1 filling will change the size of the output matrix. Formulas (3) and (4) are given to calculate the side length of the output matrix when filling with all 0 s and not all 0 s, respectively: length out � length in − length filter + 1 length stride , width out � width in − width filter + 1 width stride .
e working process of the 1DCNN is shown in Figure 4. is figure shows that the input feature size is 1 * 7, the network filling is not all-0 filling, and the size of the convolution kernel of each layer is 1 * 3. e data feature changes after convolution, as shown in     Mathematical Problems in Engineering calculation, and prevent overfitting. Pooling adds infinitely strong a priori information to the network. As it is invariant to a small amount of translation, pooling can greatly improve the statistical efficiency of the network [40]. e most common pooling operations are average pooling and maximum (max) pooling. In one-dimensional pooling, max pooling takes the largest element in the pool area, while average pooling takes the average value of the elements in the pool area.
e max pooling formula is as follows: where R is the corresponding pool area. e average pooling formula is as follows: where n is the number of elements in the corresponding pool area and x n is the nth element in the pool area. A schematic diagram of these pooling modes is shown in Figure 5.

Output Layer.
e convoluted and pooled neurons are tiled into a one-dimensional feature vector, which is the input of the fully connected layer. e activation function of the fully connected layer is also the ReLU function. Finally, it is used for classification. e transformer fault classification problem is a multiclass classification problem that uses the softmax function. e output of the function is a real number between 0.0 and 1.0, and the sum of the output values of the softmax function is 1. e higher the probability of a certain category, the more likely it is to classify the samples into this category. It can achieve a better performance than other classifiers. e detailed process is shown in Figure 6. e softmax function is expressed in the following equation: where x i is the ith input feature of the softmax activation function (input x � (x 1 , x 2 , . . . , x n )) and y i is the estimated probability distribution of observation x belonging to the ith class (output y � (y 1 , y 2 , . . . , y n )). e softmax loss function corresponding to this model is the cross-entropy loss function: where N is the number of training samples and y i ′ is the expected output corresponding to the input sample x i , that is, the actual label of the input.

Regularization.
Compared with traditional machine learning, deep learning is more prone to overfitting, which requires regularization to improve the network model. In this way, we can build a model that performs well in the training set and has strong generalizability, as shown in: (1) Dropout [41] is a convenient and powerful tool. In the training process of each neural network, a certain number of parameters can be ignored in the training process. In this way, each neuron must perform well, reducing the complex coadaptation between neurons. e effect is the best when the sampling probability of the hidden node is 0.5. When the sampling probability is 0.5, the number of randomly generated network structures is the largest. Generally, packet loss is used only in the fully connected layer, not the convolutional layer or pooling layer. A dropout neural network model is shown in Figure 7.
(2) L2 parameter regularization [42] can avoid a weight matrix that is too large in the network layer and can reduce the complexity of the model. erefore, it can make the model fit more reasonable and improve the interpretability of the model.

Adaptive Learning Rate Algorithm.
Various learning rate algorithms have a significant impact on model optimization. At present, commonly used optimization algorithms are Momentum [43], RMSProp [44], AdaDelta [45], etc. In this paper, the network model uses the current mainstream algorithm Adam. In the Adam algorithm, the first-order moment of the gradient is directly incorporated into the momentum. Second, the Adam algorithm includes offset correction, which makes it more robust and have good processing ability for sparse data and noisy sample data. It is also suitable for dealing with nonstationary targets [40].

Transformer Fault Diagnosis Model Based on a 1DCNN
and an Isomeric Feature. e IF-1DCNN model can fuse not only different feature spaces extracted from the same inspection/monitoring data but also different feature spaces extracted from different inspection/monitoring data. erefore, the inspection/monitoring data that can reflect the operation status of the transformer can be selected. e former, for example, fuses different feature space information extracted from DGA data that can reflect the operation status of the transformer in different aspects, such as the feature space composed of a set of characteristic variables for the dissolved gas content, a

Fault 1 Fault 2 Fault n
So max Mathematical Problems in Engineering feature space composed of a group of characteristic variables for the gas content ratio, and a feature space composed of a group of characteristic variables for the gas production rate. e model is shown in Figure 8. e latter, for example, fuses different feature spaces composed of different characteristic variable groups that can reflect the operation state of the transformer from different aspects, which are extracted from the pulse current detection/monitoring data of the transformer, ultrahigh frequency partial discharge inspection/monitoring data, ultrasonic partial discharge inspection/monitoring data, etc. In this paper, DGA data are used for the inspection/ monitoring data. Two groups of characteristic variables, namely, the content of characteristic gas and the ratio of characteristic gas content, are selected.

Feature Input Selection and Processing.
When a transformer breaks down, it will produce CH 4 , H 2 , C 2 H 2 , C 2 H 4 , C 2 H 6 , CO 2 , CO, and N 2 [25]. Considering the current research status of the diagnosis effect, five key gases of CH 4 , H 2 , C 2 H 2 , C 2 H 4 , and C 2 H 6 can be selected as the research object.
erefore, the characteristic engineering reference value is the sum of the five key gas volumes corresponding to the sample data, that is, the ratio of each gas concentration, namely, CH 4 /S, H 2 /S, C 2 H 2 /S, C 2 H 4 /S, and C 2 H 6 /S, where S is the total volume of the five key gases. e contents of various dissolved gases are converted into relative contents in the range of [0, 1], which can reduce the mutual exclusion between gases and provide different characteristic information. Moreover, to reduce the difference between different characteristic gas content values and make the gas content data obey the same distribution, the original DGA data are processed by maximum and minimum normalization.
When inputting data into the model, the two-feature data are fused into an isomeric feature. e structure of the isomeric feature is different from that of the mixed feature. e isomeric feature is a two-dimensional feature, while the mixed feature is a one-dimensional feature. Every one-dimensional feature in an isomeric feature represents the data information of its own feature space.
us, the one-dimensional convolutional layer and the pooling layer can learn and extract the characteristics of the dissolved gas content and dissolved gas content ratio, respectively. In the process of one-dimensional convoluting and pooling, the data in different feature spaces are not affected by each other. In this way, the data in different feature spaces can be convoluted and pooled to extract the features of data in different feature spaces.
In summary, the characteristic information of the gas content of transformer samples is arranged as follows: where N is the number of samples (the same as below) and [c 1 j , c 2 j , c 3 j , c 4 j , c 5 j ] shows the five kinds of normalized gas content values of sample j. Among them, where x i j is the original content value of characteristic gas i in sample j and x i min and x i max are the minimum and maximum content values, respectively, of gas i in the training sample. In addition, we need to save the values of x i min and x i max in the training set and use the values to normalize the validation set samples before testing.
en, the characteristic information of the gas content ratio of the transformer sample is arranged as follows: where [r 1 j , r 2 j , r 3 j , r 4 j , r 5 j ] is the set of the 5 gas content ratio values for sample j and each gas content value corresponds to the content ratio value.
e two characteristic data sets are connected to form a 2 * 5 isomeric feature, and the final arrangement of gas characteristic information of the transformer sample can be obtained: e feature processing procedure is shown in Figure 9.

Fault Output Label and Sample
Data. e output of the network corresponds to the state type of the transformer. In addition to the normal state of the transformer, according to the fault types in the operation of the power transformer, it is divided into five types: discharges of low energy (D1), discharges of high energy (D2), thermal faults <700°C (T12), thermal faults >700°C (T3), and partial discharges (PD).
In this paper, from the recent relevant literatures and transformer fault databases, we collected DGA samples to verify the performance of the IF-1DCNN method. e dataset contains 525 samples, of which 428 groups are used as the neural network training set, and 97 are used as the test set.

IF-1DCNN Diagnostic Model Architecture.
e input characteristics of the IF-1DCNN model used in this paper are the characteristics of the gas content and gas ratio of the transformer, and the input layer feature dimension is 2 * 5. According to the characteristics of the input feature, this paper designs a 1DCNN transformer condition evaluation model with two convolutional layers; each convolutional layer is connected to a pooling layer, and the two layers are stacked to form the convolutional structure, and the diagnosis process of the IF-1DCNN is shown in Figure 10.
e IF-1DCNN is composed of one feature extraction layer and one classification layer. In the feature extraction layer, the first convolutional structure consists of a stack of one convolutional layer (Conv-1) and an average pooling layer (Pooling-1) with a 2 × 1 filter, and the second convolutional structure consists of a stack of one convolutional layer (Conv-2) and an average pooling layer (Pooling-2). Each convolutional layer is connected to a pooling layer, the two layers are stacked to form a network structure, and the two network structures make the model deeper, which helps to acquire good representations of the input signals and improve the performance of the network. e feature extraction layer extracts 2 * 5 low-level features into 2 * 2 highlevel features with two convolutional layers and then inputs them into the classification layer for classification. e 5dimensional information of a single feature space has been extracted as a 2-dimensional information, and more network layers are not needed. e classification layer consists of a full-connection layer and a final output layer. e details of the parameters of the IF-1DCNN used in the experiments  DGA data Gas content Gas content ratio  Table 1. In Table 1, the kernel size is noted as D × W × H, where D indicates the channel size of kernels, W indicates the width of the kernel, and H indicates the height of the kernel.

Application Steps of the 1DCNN Diagnostic Model.
e transformer fault diagnosis model based on a CNN is shown in Figure 11. e specific application steps are as follows: (1) e gas content and gas content ratio of the transformer are selected as the characteristic parameters of the model (2) e fault type of the power transformer is coded (3) e feature input parameters are preprocessed (4) e sample data are divided into a training set and a validation set (5) e 1DCNN fault diagnosis model of the transformer is studied and tested 2.9. Example Analysis. In this section, the proposed fault diagnosis method is compared with other models based on a CNN to verify the effectiveness of the method. e proposed method in this paper was written in Python and run on a desktop computer with an Intel Core i7-9750H CPU and a 16 GB RAM.

Visualization of the Network Learning
Process. e 1DCNN used in this paper has two CNN modules. Each CNN module has a convolutional layer for feature extraction and then a pooling layer to further extract the most important features from the convolutional layer and reduce the feature dimension by half. During the model training and verification steps, the learning rate was set to 0.001, and the activation function was the ReLU function. In total, 500 rounds of training were conducted. e 1DCNN learns through the training samples and inputs the validation set into the model in the training process for verification.
Because the principle of a CNN is similar to a black box, the internal working principle is difficult to explain. erefore, to investigate the potential mechanism and present the extracted features from each layer of the IF-CNN, the t-distributed stochastic neighbor embedding (t-SNE) method was used to visualize and understand the classification effect; this paper uses 3D spatial visualization, as shown in Figure 12. Figure 12 shows the visualization of the data feature distribution of the input layer, convolutional layer 1, convolutional layer 2, dense layer, and softmax output layer. From Figure 12, we can see that the 6 states of the transformer are confusing, and the characteristics of most of the raw data samples are mixed with each other. In convolutional layer 2, the feature extraction ability of the layer is increased, and the distance between the features of different types becomes larger, while the distance between similar tags decreases; as a result, clustering is present, which shows that the CNN can effectively extract information related to category mapping. After the training of the whole connection layer, the same types of features are more obviously gathered together. Finally, after the softmax function is calculated, we can see that the features of the same label are clustered into one group. e 3D visualization of the classification results shows that the trained model has excellent feature extraction and nonlinear mapping abilities.
As seen from Figure 12, a few samples are still misclassified. Some faults are divided into different faults according to the same index. For example, if the temperature of the transformer fault is greater than 700°C, it is divided into T12 and T3 faults, which will lead to the index critical fault being easily misclassified. At the same time, it can be found from Figure 12 that the samples of different faults of the same nature are relatively close, while the samples of different faults of different natures are far apart, which conforms to the theoretical basis.

Comparison of Different Feature Information Processing
Methods. To verify the advantages of the proposed isomeric feature processing method, it needs to be compared with   Mathematical Problems in Engineering 9 other feature information processing methods. e detailed workflow of other models is as follows: F1-1DCNN: a single feature of the gas content of the transformer is taken as the input feature information.
Since the input feature is a 1 * 5 feature, the 1DCNN model remains unchanged. F2-1DCNN [39]: a single characteristic of the transformer gas ratio is input into the model. Similarly, the 1DCNN model remains unchanged. F3-1DCNN: a mixed feature, which combines the characteristics of gas content and gas ratio, is input into the model. Compared with the proposed heterogeneous feature, the feature is one dimensional and has no heterogeneous processing. To be similar to the model in this paper, the size of the first convolution kernel L is changed to 1 × 9, the second convolution kernel is changed to 1 × 4, and the other parameters remain unchanged.
e above three network models and the proposed model are trained, and the best model is saved. e accuracies of the saved models were compared, and the results are shown in Figure 13.
From Figure 13, comparing the trained models for two kinds of single feature spatial data, we find that the 1DCNN model can better learn the classification features from the gas content ratio features to carry out classification. e performance of the network model is improved from 74.23% to 82.47% when the input feature is a mixed feature. is finding suggests that it is difficult to improve the accuracy of the 1DCNN model for transformer fault diagnosis with single feature information. e 1DCNN model can extract more information from the mixed features to distinguish the fault types and improve the accuracy. In this paper, the isomeric features are inputted into the model to further improve the accuracy of the model, and the accuracy of the model is improved to 86.59%, which proves the rationality of the proposed IF-CNN model.
To further verify the superiority of the IF-CNN model, the average epoch training time of each model and the test time are compared. e comparison results are shown in Figure 14.
It can be seen from Figure 14 that the performance of the first three models is very similar. e average epoch time and test time gap between the training and testing of the three models are very small. From Figures 13 and 14, the F3-DCNN model outperforms the F1-DCNN and F2-DCNN models, but the average epoch time and test time required are also substantially increased. Compared with the F3-CNN, the IF-CNN model is superior, and the time required for the model does not increase significantly. is finding shows that the proposed IF-CNN can greatly improve the diagnosis ability of the transformer convolution network.

Comparison of the Fault Diagnosis Accuracies of the Different Models.
e proposed method and other machine learning models are used for fault diagnosis of power transformers and compared. Without changing the training set and validation set, the normalized transformer gas content feature (Feature 1), gas-ratio feature information (Feature 2), and mixed feature information (Feature 3) based on the transformer gas content feature and gas ratio are input into the traditional machine learning models, and then each model is tested on the validation set. e machine learning models are genetic algorithmextreme gradient boosting (GA-XGBoost) [23], particle swarm optimization-support vector machine (PSO-SVM), backpropagation neural network (BPNN), gradient boosted decision tree (GBDT), and DBN models. GA-XGBoost uses the genetic algorithm to optimize several parameters in the model. PSO-SVM uses the Gaussian radial basis function as the kernel function, and the search range of kernel function parameters c and G is determined by particle swarm optimization. e hidden-layer neural network structure of the BPNN is (1024-1024-512), the activation function is the ReLU function, the learning rate algorithm is Adam, the learning rate is 0.01, and the number of training cycles is 1000. e number of decision trees in the GBDT models is 100, the depth of each tree is 6, and the learning rate is 0.1. e structure of the DBN model designed in this paper is (256-256), with 50 pretraining epochs, and the pretraining learning rate is 0.05, the number of training times is 400, the learning rate is 0.1, and its activation function is also the ReLU function [30,31]. Each model was run ten times. e highest accuracy on the validation set is recorded in Figure 15. Figure 15 shows that the accuracy of models is higher, when each model is trained on Feature 2. is shows that classification models' algorithms often need to design feature extractors. e final classification effect is closely related to whether the designed feature extractors can describe the classified objects well. Except for the GBDT and PSO-SVM models, the accuracies of the other models are further improved when trained on Feature 3. At the same time, it also shows that the mixed feature information enables most models to learn more fault classification features, which further improves the transformer fault diagnosis accuracy. However, sometimes, the efficiency of some models decreases because of the influence of the different data distributions of different features. From Figure 12 and previous studies, the transformer fault diagnosis accuracy can be effectively improved by establishing the isomeric feature of the transformer gas content and gas ratio and training a 1DCNN model.
To prove that this method can overcome the bad influence of different feature space-isomeric feature data and the significance of the double-convolutional layer, the isomeric features of transformer samples are input into the best IF-CNN model, and the output of the first convolutional layer (conv1-out) and the second convolutional layer (conv2-out) of the model are taken as the input features. Both the input features and raw data are input into the GBDT for training and testing, and the accuracy of the validation samples is observed; the results are shown in Figure 16.
From the figure, we can observe that the accuracy of the model is greatly increased when conv1-out is used as the input, and the performance of the model is further improved when the input is conv2-out. At the same time, combined with Figure 15, the accuracy of both models is higher than the highest accuracy of the model under the single feature or mixed feature. is finding shows that the proposed method can effectively eliminate the adverse effects of the spatial data distribution of mixed features and improve the accuracy of the model. Moreover, the model has a better performance when conv2-out is used as the input than when conv1-out is used as the input.
is finding shows that the doubleconvolutional layer can further extract the isomeric feature information, and the IF-CNN model design is reasonable. e conv2-out and raw data are input into the other models for training and testing, the dimension of conv2-out is reduced to 3-dimensions by t-SNE, and then input into the PSO-SVM model training, when the conv2-out is input into PSO-SVM. And, the accuracies of the validation samples are shown in Figure 17. e performance of the other models is improved to a certain extent, and the accuracy of the model has been improved. e results show that the proposed IF-CNN model can overcome the bad influence of different feature space data to a certain extent for improved accuracy.

Conclusion
To improve the transformer fault diagnosis accuracy, this paper proposes a 1DCNN transformer condition diagnosis method based on an isomeric feature. e following conclusions can be drawn from this study: (1) e characteristic information of transformer gas in different characteristic spaces is processed according to the method in this paper and then input into the model. e convolutional and pooling layers of the 1DCNN are used to process the characteristic information; the experimental result shows that this method can improve the performance of the CNN model. (2) Compared with transformer gas normalization, most models can learn better feature information from the gas content ratio data, which improves the classification effect of the models. (3) With the transformer fault sample data, the characteristic information of each gas is processed as described in this paper. e diagnostic accuracy of the transformer fault diagnosis method based on a 1DCNN is higher than that based on other machine learning methods. (4) e double-convolutional layer IF-CNN proposed in this paper can overcome the bad influence of different feature space data to a certain extent to improve the transformer fault classification accuracy.
e transformer state diagnosis model proposed in this paper provides a novel idea. e proposed method can be further extended to other power equipment and power system fault diagnosis tasks and has certain application prospects. However, there are many data closely related to transformer fault types, such as the gas production rate of different characteristic spaces of the same category, the electrical test data of transformers in different characteristic spaces of different categories, oil temperature data, and frequency response data. With the continuous improvement of data mining technology, it is possible to obtain more    accurate results for transformer fault diagnosis based on multidimensional data.

Data Availability
In this paper, from the recent relevant literatures and transformer fault databases, we collected DGA samples to verify the performance of the IF-1DCNN method.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.