Principal Component Analysis Based Dynamic Fuzzy Neural Network for Internal Corrosion Rate Prediction of Gas Pipelines

Aiming at the characteristics of the nonlinear changes in the internal corrosion rate in gas pipelines, and artificial neural networks easily fall into a local optimum./is paper proposes a model that combines a principal component analysis (PCA) algorithm and a dynamic fuzzy neural network (D-FNN) to address the problems above. /e principal component analysis algorithm is used for dimensional reduction and feature extraction, and a dynamic fuzzy neural network model is utilized to perform the prediction. /e study implementing the PCA-D-FNN is further accomplished with the corrosion data from a real pipeline, and the results are compared among the artificial neural networks, fuzzy neural networks, and D-FNN models. /e results verify the effectiveness of the model and algorithm for inner corrosion rate prediction.


Introduction
Due to the influence of the medium composition, temperature, terrain, and other factors, corrosive substances are easily produced in steel gas pipelines, which can lead to internal corrosion. Internal corrosion is one of the causes of aging in natural gas pipeline systems. Corrosion will cause a thinning of the inner wall of the pipeline and reduce its structural strength, which will lead to natural gas leakage and seriously threaten the safety, integrity and economy of the whole gas transmission system [1][2][3]. To prevent these phenomena, some in-line inspection instruments and internal detection instruments have been developed. However, these methods are not only complex but also costly. It has been reported that less than 50% of the worldwide existing pipelines can be inspected with inline inspection instruments [4]. For small-diameter pipelines, it is difficult to carry out internal detection with commonly used internal detection instruments [5]. erefore, it is important to establish a reliable prediction model of the internal corrosion rate based on easily measurable parameters for studying the rules of internal corrosion.
A number of modeling approaches have been used for the corrosion rate prediction. Chou et al. compared the prediction accuracy of the carbon steel corrosion rate in marine environments based on an artificial neural network (ANN), support vector machine (SVM), classification and regression tree (CART), linear regression (LR) and hybrid metaheuristic regression models, and the results showed that the hybrid metaheuristic regression model had superior prediction accuracy in this case [6]. Jain et al. proposed a quantitative evaluation model based on Bayesian networks for the external corrosion rate of oil and gas pipelines [7]. Rocabruno-Valdés et al. developed an ANN model with three layers to predict the corrosion rate of metals in different biodiesel [8]. Wen et al. combined support vector regression (SVR) with particle swarm optimization (PSO) to establish a model for prediction of the corrosion rate of 3C steel under five different seawater environment factors [9]. Abbas et al. developed neural networks (NN) to predict CO 2 corrosion on pipelines at high partial pressures and assessed the degree of suitability for CO 2 corrosion rate prediction [10]. Hu et al. combined the design of experiment (DOE) approach with ANN to discuss the effects of environmental factors in the deep sea on the Ni-Cr-Mo-V high strength steel corrosion behavior [11]. Although these works have showed great advantages and potential in solving highly nonlinear problems, these models have shortcomings, such as poor fuzzy logic inference ability when completing the "black box" nonlinear mapping from the input to output. Improved applicability, robustness, and generalizability of the prediction model are still the most important targets.
Fuzzy logic, introduced by Zadeh, contains three features: modeling of nonlinear processes by using IF-THEN rules, employing linguistic variables instead of or in addition to numerical variables, and using approximate reasoning algorithms to formulate complex relationships [12,13]. A model that combines neural networks with fuzzy logic systems was first proposed by Takagi and Hayashi [14]. is model realized fuzzy reasoning so that the weights of traditional neural networks without an explicit physical meaning are assigned to the physical meaning of the reasoning parameters in fuzzy logic to more accurately describe the relationship between the input and output [15]. In the research of fuzzy neural networks for corrosion rate prediction, Biezma et al. proposed a method based on a fuzzy neural network (FNN) to predict the corrosion rate of buried pipelines with limited detection data, considering the factors that affected soil corrosion [16]. Najjaran et al. proposed two FNN models with different input numbers to describe the soil corrosiveness of buried pipelines [17].
In the present study, the FNN only learns and optimizes the parameters in the fuzzy system and adaptive adjustments based on a preset neural network, which is time consuming and leads to low-accuracy structure identification [18][19][20]. For neural networks, the core indicator for evaluating model performance is the generalization ability, which is mainly affected by the selection of the structure. Too few nodes will result in large learning errors, while too many nodes can cause overfitting. An unreasonable structure can lead to overfitting or overtraining, which directly affects the generalization ability of the neural network system. To overcome the above mentioned problems, this paper proposes an internal corrosion rate forecasting model using the dynamic fuzzy neural (D-FNN). e internal corrosion rate prediction effect of the model is largely determined by the correlation between the input data and the output data of the model. In addition to temperature, pH, and flow rate, the internal corrosion rate prediction is affected by oxygen content, pressure, and so on. Consequently, we introduce other factors to augment the prediction. However, if these factors are directly used as D-FNN model input, the redundant information of these factors will cause the inaccurate prediction results. At present, methods for dealing with this problem mainly include principal component analysis (PCA), which is a statistical method used for dimensional reduction and feature extraction. is method is particularly suitable for dealing with situations where such factors are highly interrelated [21][22][23][24]. erefore, this paper proposes a method whose parameters are optimized by a PCA algorithm, which is called PCA-D-FNN, to forecast the internal corrosion rate. e model generates fuzzy rules during the dynamic learning process, which grow exponentially instead of increasing with variables, thus improving the generalization ability of the network [25][26][27].
is paper is structured as follows. In Section 2, the basic concepts of PCA are described. In Section 3, the D-FNN modeling method is introduced in detail, and describes the proposed hybrid model. In Section 4, an application of the proposed method is presented. Finally, the conclusions are stated in Section 5.

The Method of Principal Component Analysis
PCA is a common multivariable statistical method used for feature extraction and dimensional reduction in analysis. e method uses a linear projection to map high-dimensional data to a representation in a low-dimensional space that maximizes the variance of the data in the projected dimension by using fewer data dimensions and retaining more original data points, thus realizing the dimensionality reduction process [28][29][30]. In fact, compared to the univariate approach, this explorative method allows to analyze together all the variables acting on a process and to isolate only the relevant information, minimizing redundant data [31]. ere are many factors influencing the corrosion in pipelines, and the relationships among them are complex. erefore, the PCA algorithm can effectively screen corrosion factors, reduce unnecessary analysis, and provide reasonable initial values for the subsequent construction of D-FNNs.
Assume there is a P-dimensional random vector X � (X 1 , X 2 , . . . , X P ) T , and normalize the transformation of the sample array elements to obtain a new matrixZ. By the characteristic equation |R − λ j E| � 0, the eigenvalues of the matrix R are obtained. en, select the principal component by formula where G is the cumulative contribution rate, such that when G ≥ 85%, we select the corresponding components. Calculate principal component loads:

e structure of the D-FNN.
e D-FNN combines the advantages of fuzzy systems and neural networks. e D-FNN is based on the extended radial basis function (RBF) neural networks and its essence is a fuzzy system based on the Takagi-Sugeno-Kang (TSK) model [32].
e D-FNN model consists of five information processing unit layers: the input layer, fuzzification layer, fuzzy reasoning layer, defuzzification layer, and output layer; these layers are described in detail in the following. e topological structure is shown in Figure 1.
Layer 1 (input layer): . . x m represent the input variables, where m is the number of input variables.
Layer 2 (membership function layer): each node represents a membership function. e membership function can be denoted as the Gaussian function: where μ ij is the jth membership function of the input variable x i , n is the number of membership functions, and c ij and σ j are the center and width of the jth membership function of the input variable x i , respectively. Layer 3 (Tnorm layer): each node represents a fuzzy rule, which is equivalent to the IF part of a possible fuzzy rule. erefore, the number of nodes in the layer also reflects the total number of fuzzy rules of the system, and the output of the jth rule is as follows: where X � (x 1 , x 2 , . . . x m ) T and C j � (c 1j , c 2j , . . . , c mj ) T is the center of the jth RBF unit. e RBF unit is used in this layer because the network structure of the RBF can be adaptively adjusted during the training phase according to the specific scene without having to be determined before training. is structure simulates the characteristics of the local adjustment and interaction of the human brain and makes the system's approximation ability better. is kind of neural network can well establish the corrosion rate prediction model in this paper and avoid the complicated problems of membership function selection, rule selection, and weight distribution.
Layer 4 (defuzzification layer, also known as the normalized layer): this layer achieves a normalized calculation, and the number of nodes in this layer is equal to the number of fuzzy rules. e output of the jth node is Layer 5 (output layer): each node in this layer represents an output variable, and the output is the accumulation of all the input signals: where y is the output of the variables, w j is the THEN-part or the connected weight of the fth rule, and n is the number of total fuzzy rules. e weights are a linear structure and can be expressed as follows: where a fm are the real-valued parameters.

Learning Optimization Process of D-FNN.
e structure of the D-FNN is not preset but is formed according to the gradual increase in the learning process. erefore, the learning algorithm of the system mainly includes the generation of fuzzy rules, the determination of premise parameters, the determination of weights, and the pruning technique of rules, to achieve the specific performance required by the system [34,35].

3.2.1.
e Generation of Fuzzy Rules. Determining the structure of the network is one of the main purposes of the training algorithm. To determine whether to add a new rule, it mainly depends on two judging indicators: the accommodating boundary and the system errors. e containable boundary characterizes the coverage of a membership function; multiple existing membership functions have the characteristics of dividing the entire input space. erefore, if a new sample appears in the coverage of a Gaussian membership function that already exists, it means that this sample can be represented by an existing Gaussian function, so there is no need to add new rules or RBF units to accommodate this new sample. e description of the basis for obtaining rules based on the accommodating boundary is as follows.
For the ith observed data (X i , t i ), the distance between the input variable X i and the current center of the RBF unit is where n c is the number of current fuzzy rules. Define kd as the effective radius of the accommodating boundary; if d min � arg min(d i (j)) > kd , then there is no Gaussian function to represent this new sample and then the fuzzy rules should be increased.  Figure 1: e architecture of a dynamic fuzzy neural network [33].

Mathematical Problems in Engineering
In addition to judging based on the accommodating boundary, system errors need to be considered. If there are too many or too few rules, the unnecessary complexity will be increased, which will worsen the system performance and reduce the generalization ability of the system. us, the system error is a vital factor in ensuring the new rules.
For the ith observed data (X i , t i ), where X i is the input vector and t i is the expected output, we define the output from the D-FNN as y i , and the system error is e i : when ‖e i ‖ > k e , the fuzzy rules can be increased.
kd and k e of each RBF unit are not fixed during training; with continuous learning, the values of kd and k e begin to gradually decrease, and local detailed learning is performed. e kd and k e are defined as follows: where d max is the largest length of the input space, d min is the smallest length expected in the experiment, c(0 < c < 1) is the attenuation coefficient, β(0 < β < 1) is the convergence coefficient, e max is the maximum error of the predetermined system, and e min is the expected accuracy of the system. e width of the RBF unit can affect the generalization ability of the system. erefore, the newly generated rules, that is, the width and center of the RBF unit, need to be adjusted. e adjustment method is as follows: where k is the overlap factor. When the first sample (X 1 , t 1 ) is obtained, the network has not yet been established, so the first fuzzy rule is set to where σ 0 is the predetermined constant.

Generation of Weights.
Assuming that p observed data generate n fuzzy rules, the output of p nodes is defined as follows according to the production criterion of rules: For any input (x f1 , x f2 , . . . , x fm ) T , the system output y f can be represented as Rewrite equation (14) in a matrix form: W � a 10 . . . a n0 a 11 . . . a n1 . . . a jm . . . a nm , e relationship between the expected output T � (t 1 , t 2 , t 3 , . . . , t p ) and Ψ is Find an optimal parameter coefficient vector W i that makes E T E the smallest. Select the regression least squares algorithm to solve this problem: where S i is the error covariance matrix of the ith training data, ψ i represents the ith column of ψ, and W i is the weight matrix obtained after the ith iterations. e initial parameters are set as W 0 � 0 and S 0 � χI, where χis a sufficiently large positive number and I is a unit matrix.

Pruning Algorithm.
In this paper, we trim the number of fuzzy rules in the third layer with the error reduction rate (ERR). is algorithm decomposes the output of the fourth layer into an orthogonal base matrix and an upper triangular matrix by QR decomposition. en, the ERR is calculated by the orthogonal basis matrix. Using the pruning algorithm, significant neurons are selected so that a parsimonious structure with high performance can be achieved [36]. e η i reflects the importance of the ith fuzzy rule; if η i is larger, then the RBF unit has a greater influence on the entire network. In contrast, if η i is smaller, then the RBF unit has less impact on the entire network, that is, if η i < k err , where k err is the preset threshold value, then delete the ith rule.

Proposed Hybrid
Model. e proposed hybrid model inherits the merits of the independent models and enhances the performance of the internal corrosion rate prediction compared with previous models. e complexity of the algorithm mainly includes two aspects: PCA and D-FNN. e flow chart of the PCA-D-FNN is shown in Figure 2. e PCA method is used to analyze the input variables, and a few principal components that can represent all the information are extracted, which will reduce the input dimension of the model and improve the prediction accuracy. e D-FNN, with a compact structure and high performance, is used as the prediction model for internal corrosion rate. e input variables of the D-FNN are screened by PCA, and the model generates fuzzy rules during the dynamic learning process, which grow exponentially instead of increasing with variables, and thus the model has lower computational complexity.

4.1.
Dataset. Natural gas should be purified to remove impurities, such as H 2 O and H 2 S, before entering the pipeline. However, it is difficult to remove these impurities completely. erefore, the inner wall of the pipeline will be corroded during long-term operation or under special working conditions. Corrosion in pipelines is affected by many factors, and its impact process is complex. Qiao et al. used computational fluid dynamics (CFD) simulation analysis to conclude that the solid particles in the natural gas flow were the main cause of corrosion in the elbow of the gas pipeline [37]. Pfennig et al. found that the presence of CO 2 had a greater corrosive effect on steel pipes at high temperatures (40°C-60°C) [38]. Mansoori et al. used scanning electron microscopy (SEM) and X-ray diffraction (XRD) to characterize the corrosion products near the damaged part of the gathering pipeline, and they believed that calcium carbonate easily precipitated on the inner surface of the pipeline when the Ca + concentration and pH value were high [39]. Javidi et al. believed that pH, temperature, flow rate, CO 2 , corrosion products, and H 2 S had a great influence on the corrosion of gas pipelines [40]. To develop a prediction model of the internal corrosion rate, a total of 9 variables (CO 2 content, H 2 S content, Cl − content, moisture content, pH, flow rate, temperature, pressure, and oxygen content) are chosen according to the workers' experience. e corrosion rate is derived from an online monitoring system which is shown in Figure 3. e nine natural gas parameters measured in the 34 samples are introduced into the following models.

e PCA Method.
e PCA method is used to analyze the above features, and a few principal components that can represent all the information are extracted, which will reduce the input dimension of the model and improve the prediction accuracy. Using PCA algorithm proposed in Section 2, the nine natural gas parameters measured in the 34 samples were analyzed, and the results are shown in Table 1. Mathematical Problems in Engineering Table 1 shows that the cumulative contribution rate of the first four principal components is 86.62%, which contains most of the internal corrosion information. Among them, H 2 S content has a higher value on the first principal component, CO 2 content has a higher value on the second principal component, moisture content has higher values on the third principal component, and the flow rate is higher on the fourth principal component. erefore, we choose H 2 S content, CO 2 content, moisture content, and the flow rate as the input of the D-FNN prediction model.

PCA-D-FNN Parameter Setting and Result Analysis.
In this paper, the D-FNN model is established to predict the inner corrosion of the pipeline. ere are four input nodes screened out by PCA algorithm, 34 pairs of input and output data are used in this research, while 24 pairs are used as the training dataset and the rest are the test dataset. e precision of the model is set to 0.05. When the accuracy of the training error is less than 0.05, or the maximum iteration number is 80, the training is terminated. e initial parameters of D-FNN are d max � 4, d min � 0.2, c � 0.955, e max � 1.1, e min � 0.02, β � 0.5, σ 0 � 1.1, k � 1.1, kd � 1, and k err � 0.0015. e rule number of the D-FNN is 6, and its mean square error gradually decreases with the training process, which indicates that the structure of the network is basically stable. e D-FNN is trained by 24 training samples, and the network converges after 20 iterations. e results are shown in Figure 4.
To study the prediction accuracy of the proposed model, the root mean square error (RMSE), the mean absolute percentage error (MAPE), and eil's inequality coefficient (TIC) are employed to evaluate the model performance in this paper. e RMSE is employed to evaluate the difference between the observed values and the actual values, the MAPE is a commonly accepted metric, and the TIC indicates a good level of agreement between the studied process and the proposed model [41]. e calculation methods are defined in Table 2. e ANN, FNN, and D-FNN models have also been chosen in comparison with the PCA-D-FNN model. In the contrastive experiment, all models were trained using 24 pairs' dataset with the remaining 10 pairs as test dataset. e architecture of the ANN consist of four input nodes, one hidden layer, and one output layer, and the hidden layer contains seven nodes, while the transfer function is tansig. In the FNN experiment, each of the four input nodes has four grades of membership of the fuzzy sets, and the membership generation layer consist of 16 nodes, with the model showing stability after generating 6 rules. e architecture of the D-FNN consists of nine input nodes, and the remaining calculation steps are consistent with PCA-D-FNN. e accuracy of the training error was set to 0.05, and the prediction results of the different models on the testing dataset are shown in Table 3. All models use leave-one-out cross-validation (LOOCV) method to investigate the generalization ability of each algorithm, and the RMSE and root MAPE are used to characterize the LOOCV results. e results are shown in Table 4.
From Table 3   between the internal corrosion rate and the influencing factors. Given that the ANN model is the neural network model, the ability and generalization stability of this model are inferior to those of PCA-D-FNN, D-FNN, and FNN when dealing with small samples. e FNN model first needs to convert the variables into grades of membership of the fuzzy sets, a task greatly affected by a researcher's experience, which affects the accuracy of the model. e D-FNN model that does not implement input dimension reduction has too many variables that can affect the generalizability of the model. e PCA-D-FNN shows advantages in modeling that employs internal corrosion rate sample sets, which greatly improves the robustness and generalizability of the model and achieves a more accurate result. e computation time of the ANN, FNN, D-FNN, and PCA-D-FNN models are 1.923s, 2.341s, 2.571s, and 1.621s, respectively. e proposed method can be used for internal corrosion rate prediction of gas pipeline.

Conclusion
e internal corrosion rate of gas pipeline is affected by many factors, and the reliability of the pipeline will be affected greatly by internal corrosion. us, conducting accurate forecasting of the internal corrosion rate appears to be especially important. erefore, a hybrid model called the PCA-D-FNN is proposed in this paper. PCA is an effective method that is used to extract features and reduce the dimensions of the original sample, and four factors, including 86.62% of the original information, are extracted. en, the D-FNN is used to conduct the prediction and is shown to take advantage of the fuzzy rules and ANNs to overcome the drawbacks of the single methods.
is method generates fuzzy rules in the dynamic learning process, which grow exponentially instead of increasing with variables, thus improving the generalization ability of the network. e experimental results prove the effectiveness of the hybrid model through testing the proposed model by using the collected corrosion data. rough a comparison of PCA-D-FNN with ANN, FNN, and D-FNN models, the PCA-D-FNN model is shown to predict the internal corrosion rate with an RMSE of 0.4232, an MAPE of 5.91%, and a TIC of 0.2352 on testing dataset, which is more accurate than other    models. e LOOCV results of different models also show that the results of PCA-D-FNN are better than other algorithms. It can also be determined that PCA-D-FNN obtains the best forecasting performance with a fast convergence rate and a high ability to search for global optimums. erefore, the proposed model demonstrates great potential in applications concerned with the internal corrosion rate of pipelines.

Data Availability
e data used to support the findings of this study have not been made available because they are currently under embargo while the research findings are commercialized. Requests for data, 10 months after publication of this article, will be considered by the corresponding author.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.