Graph Feature Fusion-Driven Fault Diagnosis of Complex Process Industrial System Based on Multivariate Heterogeneous Data

Te stable operation of the process industrial system, which is integrated with various complex equipment, is the premise of production, which requires the condition monitoring and diagnosis of the system. Recently, the continuous development of deep learning (DL) has promoted the research of intelligent diagnosis in process industry systems


Introduction
With the advent of the Industry 4.0 era, industrial processes are increasingly sophisticated and complex, which puts forward higher requirements for intelligent control of complex systems [1].To achieve the goals of stable production, maximum economic proft, and green energy saving, it is increasingly important to develop an efcient and reliable health monitoring system for complex systems [2,3].Te establishment of this system depends on a large number of sensing and monitoring equipment of complex types, and the most important task in a health monitoring system is fault detection and diagnosis (FDD) [4].As a critical task, it has attracted more and more attention from researchers in recent years.
Machine learning (ML) based methods [5][6][7] and deep learning (DL) based methods [8][9][10] driven diagnostic methods achieve good results when performing FDD tasks in process industry systems because of their excellent nonlinear data ftting capabilities [11].However, it is also challenged by the large number of heterogeneous sensors in complex process industry systems.Tese multiple data are coupled in high dimensions and cause a curse of dimensionality which weakens the DL models [12].To reduce this infuence, the feature fusion methods were applied to complex process industry FDD tasks.For example, Ye et al. constructed a feature fusion model by classifying multiple sensors and processing them separately [13].Xu et al. on the basis of considering both the internal correlation and the distribution gap between diferent signals proposed a hybrid fusion network model to improve the accuracy of diagnostic tasks [14].However, these methods ignore the prior knowledge, resulting in these feature fusion models are not closely combined with the characteristics of complex process industry systems.
In recent years, the wave of deep learning has made it possible to combine data analysis tools with process knowledge to build robust and scalable models of process industry systems [15].For example, Venkatasubramanian added mechanistic constraints to the purely data-driven models based on frst principles knowledge in the process industry [16].Bikmukhametov and Jäschke combined machine learning and process engineering physics to enhance the accuracy and explainability of data-driven models [17].From the point of view of physical mechanism, Ni et al. proposed a physics-informed residual network (PIResNet) that can mine the machine's potential physical characterization from measured data [18].However, most of these methods use prior knowledge to constrain established models rather than build models directly.Meanwhile, manual selection of prior knowledge-based constraints also weakens the convenience of models.As an alternative, the edge connection between nodes in graph structure data could express the prior knowledge information of the system.
Graph-based approaches are a type of deep learning model that can convert data into graph structure or non-Euclidean space, which has drawn much attention from researchers recently [19].Graph data contain not only data values-based information but also the structural and topological information hidden in raw data which is determined by the process knowledge of the system [20].For example, Man et al. constructed the graph structure from the sensor layout on high-speed rail rotating machinery to mine the potential relationship between the sensor signal [21].Liu et al. divided the timing signal into nodes and created a graph structure based on the similarity between nodes to solve the problem of fault diagnosis under unbalanced samples [22][23][24].Zhang et al. mined state representations of operating units from complex time-varying operating condition information [25].To mine fault features from the constructed graph data, the graph convolutional neural network (GCN) was used to extend convolution operations in spectral space, which is successfully used in many felds [26].Tese studies also shed light on fault diagnosis in the process industry domain.Te rich prior knowledge and spatial information of the process industry provide the basis for the graph-driven approach.
In the process industry FDD tasks, graph-based approaches are beginning to get attention.Tere is a strong correlation between the upstream and downstream units of the process industry system, which means that the fuctuation of the signal in one link will quickly spread to the entire system [27].As the production state of the system changes, the relationship between the sensing signals also changes.To capture this relationship, Chen et al. used diferent edge connection functions to construct graphs to capture fault representation information [4].Zhang and Yu designed a pruning GCN model to reduce noise based on signal similarity composition [28].Tese GCN-based models show the validity of graph theory in process industry FDD tasks, but there are still shortcomings.Te direct conversion of multisource heterogeneous monitoring data into a single graph will result in high-dimensional coupling of diferent functions of data and reduce the quality of graph representation learning.Meanwhile, the model built from the monitoring data purely ignores the physical spatial information of the system, which is important process knowledge.According to the importance of the process, diferent numbers of sensors are arranged in diferent links.Tese heterogeneous sensors describe the same changing process from multiple perspectives, so it is necessary to analyze them jointly.
To solve the above problems, the original monitoring data are divided into two types and converted into graph structure data by combining the system process knowledge.Specifcally, the frst type is reaction process monitoring parameters, which record the working condition information of each working unit of the system, and these data information cooperate to illustrate the production state of the system.Tis type of data is converted to physical space graphs (PSGs) based on the integration of the physical spatial layout of the system sensors.Te second category is sampling quality index parameters, which can directly determine the quality of reaction products and are afected by the mechanism of upstream and downstream processes.Pretrained networks are used to extract higher-order features from the above data to capture the mechanistic knowledge fowing through the system and convert them into process knowledge graphs (PKGs).Terefore, the research based on fused PSG and PKG for complex industrial FDD task is explored in this paper.Te major contributions are summarized as follows: (

Preliminary
2.1.Graph Representation.Mathematically, both undirected and directed graphs are denoted as G � {V, A, E, F}, as shown in Figure 1.V � {V i } represents the node set, which consists of the measurement time of the complex system.E represents the edge connections between these nodes.A � {a i,j }, a i,j ⊆ 0,1 { }, is the adjacency matrix, where a i,j � 0 represents there is no edge between node i and node j, and a i,j � 1 represents there is an edge.F is an eigen matrix composed of all node eigenvalues.

Spectral Graph Convolution.
Traditional convolution operations cannot be applied to graph domains, so graph convolution theory is proposed.As the basis of graph convolution theory, the classical Laplacian matrix is used for feature representation, which is denoted as follows: where L ∈ R N×N represents the Laplacian matrix, I n ∈ R N×N represents the identity matrix, D ∈ R N×N denotes the degree matrix, and W ∈ R N×N denotes the weight matrix.On this basis, feature extraction acting on the Laplacian matrix is used to realize convolution on graph signal X ∈ R N×S [29].
where U ∈ R N×N represents the eigenvectors of the Laplacian matrix L � UΛUT, Λ is the eigenvalues, Y ∈ R N×S is the output of the flter, and g θ (Λ) is a flter parameterized by θ ∈ R n .Te difculty of calculating formula (2) led to the formulation of a new convolution formula, the Chebyshev convolution [30], which is defned as follows: where Λ is rescaled as  Λ � 2Λ/λ max − I n , λ max denotes the largest element of Λ, K is the order of Chebyshev polynomials, θ k is the Chebyshev coefcient, and T k (•) is the recursive Chebyshev polynomial, defned in equation (4).T k (  Λ) denotes a function of the diagonal element of  Λ.
Te mathematical defnition of the Chebyshev graph convolution derived from the above equations is illustrated as follows: A trainable parameterized weight matrix W ∈ R S×M is introduced to implement feature matrix deformation and achieve feature transformation [31].Te output of the GCN layer X ′ ∈ R N×M is shown as follows: where Cheb(,) is the Chebyshev graph convolution, and W is the trainable parameterized weight matrix.

Proposed Method
Te prior knowledge of the complex process industrial system includes physical space layout and reaction mechanism.Te physical space layout of the system refects the monitoring information of key process control, and the reaction mechanism determines the correlation changes between signals.On the basis, this study will start from these two perspectives to extract prior knowledge to construct the graph model.

Physical Space Graph.
In this section, physical space graphs (PSGs) are constructed from the sensor layout in the complex industrial system, as shown in Figure 2.

PSG Construction.
According to the diferent system designs and sensor layouts, each chemical system has unique spatial information.On this basis, the physical spatial sensor layout in the system is transformed into a graph structure named physical space graph (PSG) to capture this spatial information.Tis information can explicitly express the relationship between nodes and provide a basis for spatial information and fault representation mining.
To concretely express the sensor space layout of the chemical system, the relative position of each sensor is transformed into the coordinate system by directly mapping the real system, and then the spatial coordinates of each unit are obtained.On this basis, sensors are considered nodes, and edge connection between the nodes is established with similar distances.As densely spaced sensors often describe the same important process from diferent perspectives, such as pressure, temperature, and power, their joint analysis is valuable.Te node is connected with the closest k nodes, and the distance between the nodes is calculated by the Euclidean distance formula, as shown in the following equation: Shock and Vibration where v i (x i , y i ) and v j (x j , y j ) are two nodes in the coordinate system, and D(v i , v j ) is the Euclidean distance between the i-th node and j-th node.

Node Embedding.
With the edge connections between the nodes determined, node features should be embedded in each node.Data normalization is operated on the original measurement data X � (x 1 , x 2 , . . ., x h ), x i ∈ R n , where n is the number of sensors.Te normalized data X nor can be calculated as follows: Subsequently, the unsupervised PCA algorithm is used to process the normalized data X nor obtaining the data after reducing noise, as shown in the following equation: Te high-dimension sensor data series are processed and separated into n sensor sequences.A sliding window s is set up to divide these monitoring sequences into m sizes, and the node set L � (l 1 , l 2 , . . ., l n * s ), l i ∈ R m , in length m is obtained.It is worth mentioning that each time series segment m is translated into a graph with the determined edge connection.

Process Knowledge Graph.
In this section, the process knowledge graphs (PKGs) are constructed to capture the relationships between nodes which are determined by the reaction mechanism in the complex industrial system.Te proposed process is depicted in Figure 3.

Data Segments as Nodes.
Te n-dimensional monitoring signal output of each sensor unit in a chemical system can be represented as According to the characteristics of the production line and the diferent needs of actual production, these signal sequences are divided into diferent segments X seg � (x 1 , x 2 , . . ., x m ), X seg ∈ R m×n of length m.It is worth noting that in this subtask, all the data entered build only one graph.

PKG Construction.
Te monitoring signals are converted into process knowledge graphs (PKGs) under the assumption that the internal relationships between variables are refected by the reaction mechanism and GCN can effectively mine fault representation information from these relationships.Data normalization is operated on the sliced multivariate sensor data X � (x 1 , x 2 , . . ., x h ), x i ∈ R m×n , and the normalized data X nor can be calculated as equation (8).Subsequently, the supervised linear discriminant analysis (LDA) algorithm is used to process the normalized data X nor obtaining the data after dimension reduction as the node features, as follows: where LDA(, ) is the linear discriminant analysis algorithm, Vi Undirected grpah G={V, E, A, F} Specifcally, by calculating the feature similarity between the samples and creating edge connections between the samples and their nearest k samples, the graph processing of the associated signal is realized.Furthermore, the system's high-dimensional monitoring signals will hinder the accuracy of Euclidean distance, so the Mahalanobis distance formula is used to calculate the sample distance [26], as follows: where v i and v j are i-th node and j-th node, respectively. −1 is the covariance matrix of multidimensional variables.At this point, the original graph is constructed, and the edge connections between nodes will provide the connections between signals for the graph neural network to enhance the ability of the model to capture details.However, due to the inherent characteristics of the KNN algorithm, the edge connection will be calculated between each node independently, which may lead to multiple edge connections in one node.With the number of nodes and edge connection increase, the whole graph structure becomes bloated, which will greatly increase the computational burden and reduce the performance of model diagnosis.At the same time, due to the presence of noise, redundant edge connections that are established in two dissimilar nodes will provide a false representation.To simplify the original graph and make it retain real valuable information, the high-level features of nodes are extracted through the pretraining GCN layer named PKMP.Te steps are illustrated as follows: Step 1: Te original graph is fed into the GCN and trained using the cross-entropy loss function.
Step 2: Te higher-level features obtained from the training are regarded as node features, and the nearest neighbor graph is constructed again.Te reconstruction graph G temp has the same number of nodes as the original graph with new node features and edge connections.Te selection of parameter M is determined in training according to the characteristics of diferent data.Specifcally, when the reconstruction of fve batches cannot improve the efect, the total number of PKG reconstructions is selected as M. Unlike each PSG corresponds to a segment of time series, PKG corresponds to all the input data, and each node corresponds to a segment of time series.

Multichannel Graph Feature Fusion Model.
As two types of subgraphs have been obtained, a multichannel graph feature fusion model (MCGFF) is designed.According to the diference of task level, the graph-level GCN and node-level GCN are used, respectively, for representation learning of subgraphs.On this basis, the learned subgraph representations are weighted and fused through the attention mechanism.
Te attention mechanism allows the model to focus more on important representational information by giving the raw data a unique attention vector.Te successful application of this feature has made the attention mechanism a classic enabling tool in the feld of deep learning [32].Specifcally, dynamic weight parameters are used to reinforce important information while weakening useless information, and the process can be described in the following equation: Shock and Vibration 5 where a i is the obtained attention distribution coefcient of the vector h, h is the original input vector, q is the query vector, and s(h i , q) is the scoring function, which is defned as follows: where W and U are learnable parameter matrices, v is a learnable parameter vector, and tanh () is a hyperbolic tangent function.
After subgraph feature fusion, the fused global-graph representation is used for fault diagnosis.Te overall fowchart of the proposed diagnosis MCGFF framework is shown in Figure 4, and the algorithm is summarized in Algorithm 1.

Case Study
Two public datasets of process industrial systems, including the Tennessee Eastman process (TEP) and fed-batch fermentation penicillin process (FBFP), were used for experimental verifcation.All algorithms were written in Python3.8 with Pytorch kit and processed by a server with an NVIDIA GeForce RTX3060 and a 16G RAM.

Data Description.
As a classic chemical process simulation system [33], TEP is widely used in the research of process system condition monitoring [34].Te system is capable of generating a total of 41 monitoring quantities containing nonlinear relationships and producing sequential monitoring sequences at three-minute intervals.To ensure that the data analysis is more representative, only the monitoring data of the stable production stage were used for this experimental study.Te schematic of TEP is shown in Figure 5.
Te experiment was carried out based on mode 1 of the TEP system, and all the twelve manually controlled variables were in the initial state.Tus, ten production statuses were simulated, including normal status and nine diferent process disturbances which indicate typical malfunctions that could occur in real practice.It is worth noting that in addition to the six perturbations involving known variables and occurrence types, we also set up three unknown perturbations composed of two random perturbations at random times to enhance the complexity of the data, and the details are shown in Table 1.
Te simulation of each batch lasted for 48 hours, and the sampling interval was 3 min, so a total of 960 data were generated.It should be noted that the upper and lower limits of reasonable operation were set for each device in the simulation model.Once a certain limit is breached, the reaction will stop to protect the complex system.In all the fault types introduced, the interruption of the feed A loss will trigger the complex system to stop.
One hour of data were segmented as one sample and then mixed all the samples from both the abnormal state and normal state.Furthermore, we randomly take 70% samples as the training set and 30% samples as the testing set.

Result Analysis. Diferent comparison experiments
based on classical data-driven methods were also set up to verify the efectiveness of the proposed MCGFF-driven method.Statistic learning methods contain PCA [5], LDA [35], and PCA + LDA [36].Classifcation methods based on deep learning contain CNN [8] and standard GCN [37].Te details of the proposed MCGFF model are shown in Table 2, and the original learning rate was set to 0.01.All models were tested 10 times and analyzed for average accuracy, as shown in Figure 6.
As shown in Figure 6, due to the nonlinear characteristics of complex process industry systems, classical statistical learning methods (PCA, LDA, and PCA + LDA) cannot achieve good accuracy.Te classical deep learning method CNN has a certain nonlinear capturing ability, but it was difcult to accurately diagnose faults due to the mutual coupling problem of multisource heterogeneous data.Te original GCN can improve the diagnostic capability of the model, which suggests that representing signals as graph data can enhance the relevant features.Te graph spatial structure obtained by signal conversion gives the process system unique state representations, but the problem of data coupling still exists, which limits the diagnostic capability of the model.
Compared with these models, MCGFF achieved the highest diagnostic accuracy of 94.33%, verifying the effectiveness of the proposed method, which further improves the accuracy of the traditional GCN by 8.69%.Tis efect was achieved by the following three aspects: (1) heterogeneous monitoring signals and sampled data in the system were classifed to reduce the risk of model confusion, (2) the connection between multisource heterogeneous signals describing the same process was established, and thus, the state of key processes can be analyzed jointly, and (3) the reaction knowledge fow in the process system was captured by the pretraining network, and the higher-order features of the internal relationships of the process fow were used to represent the system state.All in all, the proposed MCGFF-based method can perform accurate fault diagnosis for process industrial systems and efectively improve the accuracy of the GCN-based method.[38] experimental unit is a fermenter, which is used to perform the fermentation task of continuous production.On this basis, two proportional, integral, and diferential (PID) cycles are used to control various production indices in the tank, such as temperature acidity and hot and cold fow.Te whole process of production can be divided into three 6 Shock and Vibration (A) PSG Construction.

Data Description. Te main body of the FBFP
(1) Obtain the normalized signal X nor ; (2) Calculate the feature data: (5) Obtain the k closest neighbors nodes set of node Establish the edge connections for every node; (7)

Global-graph Representation
Fault dignosis Shock and Vibration sequential stages, namely, cell growth stage, penicillin synthesis stage, and cell autolysis stage.Te schematic diagram of this process is shown in Figure 7.
Te system of the penicillin fermentation process contains eight kinds of manual control variables, two kinds of automatic control variables, and eleven kinds of monitoring data.Furthermore, fve process disturbances are introduced into diferent batches.Te process disturbance details are listed in Table 3 and plotted in Figure 8.In this experiment, each batch lasted for 230 h, and the sampling interval of the (5) Establish the edge connections for every node; (6) Obtain original graph G 0 and original feature matrix F 0 ; (7) for i � 1, 2, . .., M: Train the GCN model for M epochs:  8 Shock and Vibration monitoring unit was 12 min; thus, 1150 original samples were contained.Each disturbance was simulated in 10 batches, and a total of 11500 anomaly samples were obtained.Tese abnormal samples were mixed with 100 batches of healthy samples, resulting in a total of 126,500 samples for experimental analysis.
In order to avoid possible information leakage problems, the samples were processed by mask, and then, the training set and test set were divided into 7 : 3 ratio.

Result Analysis.
Comparative experiments were also carried out as same as TEP experiments in Case I.

Shock and Vibration
Te detailed settings of the proposed method were the same as in Table 2, with the output size of K �6 and K � 3 replacing.Te learning rate was set as 0.01, and the experimental results are shown in Figure 9.
Te efect of the statistical learning model was still inferior to the deep learning model.However, because the complexity of the FBFP system is smaller than that of the TEP system, the results of all models are better than the TEP system.Te diagnostic accuracy of 89.43% was obtained by using the original GCN model, which means that the system has a relatively simple spatial structure.On this basis, MCGFF improved the diagnostic accuracy to 92.46%.Te less model accuracy improvement was speculated to be due to the smaller number of sensors in the FBFP and the simpler spatial layout.In addition, the improvement of the model efect is closely related to the appropriate aggregation of node information by the graph model.How to determine the most appropriate composition range and the number of convolutional nodes needs to be optimized according to diferent system data characteristics.However, the MCGFF model still achieves the best results, which verifes the validity of the proposed model.

Discussion
For graph-driven deep learning models, the quality of the constructed graph is the primary factor that determines the model efect.For this reason, three aspects that afect the effectiveness of the proposed method are discussed, including the impact of spatial aggregation range, the impact of reconstruction times, and the impact of convolutional parameter.

Impact of Spatial Aggregation Range.
According to the spatial layout of diferent chemical process systems, the number of PSG aggregate neighbor nodes k is an important parameter.An appropriate k value ensures that nodes from the same process are included without aggregating nodes too far.To this end, ten trials of experiments on diferent orders of k (k � 2∼7) were conducted in both the above two datasets to compare their results, and the average results are shown in Figure 10.
As shown in Figure 10, the classifcation accuracy peaked at k � 4 in both two datasets.Tis may be because the number of sensors in the critical step is usually 4, and too large k will lead to the aggregation of unrelated sensors, resulting in reduced accuracy.In the TEP dataset, the increase of k value will bring about a signifcant decrease in results, because it contains diferent reaction segments, which are quite diferent from each other.In contrast, sensors in FBFP monitor a single reaction process, which means that the extended aggregation range does not bring a large loss of model accuracy.Terefore, in this experiment, the parameter k of PSG was selected as 4 in both two datasets.

Impact of Reconstruction times.
According to diferent data characteristics, the parameter M in PKMP has different best choices.Parameter M determines the number of times that the GCN layer extracts higher-lever node features and reconstructs graphs.Less M will result in more retained noise signals, and more M will lead to too slow model training and weaken the model's ability.In this experiment, M was set as 1 (that is, degraded into an ordinary GCN network) to 9 with ten times comparison test in both two datasets, and the experimental results are shown in Table 4.
It can be seen from the results in the TEP dataset that although the precision fuctuation decreases when M > 6, it will greatly increase the consumption of computing resources.Tis is because proper reconstruction can reduce 12 Shock and Vibration noises, but too much reconstruction will also introduce new noises, limiting the model efect to improve further.Te same result happens in the FBFP dataset, with the accuracy peaking at M = 4. Tis is because simpler systems contain less noise and, therefore, require fewer reconstructions.To fnd a balance between accuracy and computational efciency, M was set to 6 and 4 in the TEP dataset and FBFP dataset, respectively.

Impact of Convolutional Parameter.
For ChebGCN, the choice of Chebyshev polynomial K is particularly critical, which represents the convolutional network aggregating information from the K-order neighbor nodes of the node.Too small K will limit the ability of the network to mine information from the graph structure data, and too large K will increase the consumption of computing resources exponentially and increase the noise.To ensure that the potential of the model is fully exploited, experiments are designed to determine the optimal K value, and the experimental results are listed in Table 5.
As shown in Table 5, when K goes from 2 to 5, the accuracy in the TEP dataset keeps going up, and as K goes beyond 5, the accuracy stays at about 93%.Te same result happens in the FBFP dataset, with the accuracy peaking at K � 4. Considering that every increase in K will bring a large    amount of consumption of computing resources, K was set to 5 and 3 in the TEP dataset and FBFP dataset, respectively.

Conclusion
In this article, a graph feature fusion-driven fault diagnosis of complex process industry systems based on a multivariate heterogeneous method is proposed.First, the sensor layout of the process industrial system is transformed into a graph structure by distance measurement, and the connection between multisource heterogeneous signals that describe the same process is established.Ten, the process knowledge graph is established by the similarity between the signals and refned by pretrained GCN layers.Furthermore, the multichannel graph feature fusion (MCGFF) model is proposed to mine fault representation from two diferent subgraphs and then fuse subgraph features into global-graph features through an attention mechanism for fault diagnosis.Two publicly available process chemistry datasets validate the efectiveness of the proposed method.However, the neighbors of each node are equally important in this research, but this is not reasonable in practice.For the monitoring of key processes, there are often primary monitoring signals and auxiliary signals, which means that the edge connections between nodes should be weighted.To further improve the quality of the constructed graph, the information contained in the edge connection also needs to be considered and investigated in future work.At the same time, how to develop a general knowledge extraction learning framework for diferent complex systems also needs to be studied in the future.

Figure 1 :
Figure 1: Undirected graph and the corresponding matrix representation.

Step 3 :
Te obtained reconstruction graph G temp is sent to GCN training again as a new graph, and the temporary reconstruction graph G 1 temp will be updated to G 2 temp .Step 4: Repeat Step 2 and Step 3 for k times, the ultimate output G M temp � G PK is the refned process knowledge graph.

Figure 4 :
Figure 4: General process of the proposed diagnosis method.
s PS   and G PK .Output: Te health label Z. (1) Divide the training set and testing set: V train , V test ; (2) Train the GCN model; (3) for V in V train do: MCGFF(V) ⟶ Z; − N L i�1 b i log(p c ) ⟶ CE loss; Update with backward propagation; end for (4) Output the health label: MCGFF(V test ) ⟶ Z. ALGORITHM 1: MCGFF.

Figure 6 :
Figure 6: Classifcation results of the experimental models in TEP.

Figure 10 :
Figure 10: Experimental results of diferent numbers of k-nearest neighbors.
Input: original feature matrix F 0 , training epoch M for the G PK .Output: PKG G PK with high-level feature matrix F. (1) Obtain the normalized signal X nor ; (2) Calculate the feature matrix: F 0 � LDA(X nor ); (3) Calculate the Mahalanobis distance: Dist M

Table 2 :
Te details of the proposed MCGFF model.

Table 4 :
Experimental results of comparison with diferent parameter M.

Table 5 :
Selection experimental results on order K of Chebyshev convolution kernel.