Analytical Redundancy of Variable Cycle Engine Based on Proper Net considering Multiple Input Variables and the Whole Engine’s Degradation

In this paper, Proper net is proposed to construct variable cycle engine’s analytical redundancy, when all control variables and environmental variables change simultaneously, also accompanied with the whole engine’s degradation. In another word, Proper net is proposed to solve a multivariable, strongly nonlinear, dynamic, and time-varying problem. In order to make the topological structure of Proper net physically explainable, Proper net’s topological structure is designed according to physical relationship between variables, by which means analytical redundancy based on Proper net achieves higher accuracy with less calculation time. Experiments were compared with performance of analytical redundancy based on Proper net, seven convolutional neural network topological structures, and five shallow learning methods. Results demonstrate that under condition of average relative error less than 1.5%, Proper net is the most accurate and the least time-consuming one, which proves not only the effectiveness of Proper net but also the feasibility of topological structures’ design method based on physical relationship.


Introduction
To guarantee the safety of aeroengines, diagnostics and faulttolerance control have been developed [1]. For example, on premise of having installed mass flow sensors and pressure sensors at the entrance and exit of compressors, when aeroengine's rotation speed sensor fails, by interpolating compressor's characteristics picture using mass flow and pressure ratio, rotation speed could be uniquely identified. Thus, analytical redundancy is usually defined as signals calculated by algorithms or methods rather than signals measured by physical sensors. For diagnostics [2], analytical redundancy could be used as redundant signals for voting whether faults occur. In fault tolerance control [3], analytical redundancy could be used as alternative feedback signals of fault sensors so that the close loop of control system can keep intact and finish control task.
Methods of constructing analytical redundancy can be divided into model-based methods and data-driven algo-rithms. Model-based analytical redundancy has been proposed for a long time and theoretically mature, whose representatives are Kalman filter and other improved Kalman filters. However, in practice, strong nonlinearity still makes Newton Raphson iteration or Euler iteration diverge as signals change rapidly or working points are away from the common operating line, which brings unsafe factors to model-based analytical redundancy [4]. This problem could be solved by constraining changing rate of environmental variables and control variables, but it is paradoxical to the emerging requirement-high mobility of military planes. Besides, in order to make model-based analytical redundancy more accurate, when aeroengines operate within a wide range of flight envelope, numerous sets of modification coefficients should be used to modify the onboard math model of aeroengines [5].
To avoid these model-related problems, scientists break a new path-data-driven analytical redundancy, by which means analytical redundancy could be built from historical data instead of priori knowledge. Zhao and Sun proposed a Support Vector Machine (SVM) to construct analytical redundancy [6]. By adopting greedy stagewise and iterative strategies, the SVM is capable of online estimating parameters of complicated systems. Zhou et al. proposed an ELM to construct analytical redundancy for sensor fault diagnostics [1]. By selectively updating output weights of neural networks according to prediction accuracy and norms of output weight vectors, the prediction capability of the ELM is enhanced.
However, it can be seen that whether in Zhao's experiments or Zhou's simulations, the harshest situation is the dynamic process involving only three control variables, two components' degradation, and always at 0 height, 0 Mach number. A question is naturally proposed-why do not these methods consider more control variables, more points in flight envelope, and the whole engine's degradation? It is because SVM and ELM belong to shallow learning methods, a category of methods that do not have enough nonlinear expressive capability, which also explains why most shallow learning methods have to be online or onboard, because their parameters have to be updated to adapt to aeroengine's nonlinear characteristics at different working points.
Long-Short Time Memory neural network (LSTM) and convolutional neural network (CNN) are the most commonly used deep learning algorithms to solve strongly nonlinear problems. On one side, forget-and-memory gate helps increase nonlinear expressive capability of LSTM [7]. On the other side, forget-and-memory gate narrows the application fields of LSTM to sequence problems, like natural language processing where the input is a row vector. CNN originates from image processing problems, but widely used in language processing, classification, and regression problems. Babu et al. used CNN to estimate remaining useful life of aeroengines [8]. Through deep architecture, learned features are higher-level abstract representation of low-level raw sensor signals. Furthermore, feature learning and RUL estimation are mutually enhanced by supervised feedback. In order to accurately calculate fuel savings after aeroengine washing, convolutional neural network is used in Cui et al.'s research [9]. The results demonstrate that prediction accuracy gets improved by replacing integral operation with convolution operation. In Gou et al.'s research, a convolutional neural network (CNN) model trained with preprocessed and labeled data sets is used to extract features of a time-frequency graph, based on which faults can be identified and isolated [10].
However  [11]. On one side, being physically unexplainable may cause potential safe hazard to safety-critical objects, like aeroengines and cars. For example, if mapping relationship from input to analytical redundancy has an undetectable peak due to black-box property of CNN, and the peak analytical redundancy signal is used as reconstruction signal in aeroengines' fault-tolerance control, then the control loop could be unstable or collapse, which may lead to the whole engine's breakdown. Besides, without a solid and explainable theory to support the design of topological structures applied in actual physical problems, many useless topological structures and huge amounts of redundant parameters would be kept, which means unnecessary computing overhead or lower accuracy. To be noted, during the process of applying all kinds of classical topological structures to analytical redundancy, such as Mobile net and Dense net, low accuracy and slow calculation speed are exactly the biggest obstacles that authors met.
Hence, on the basis of considering physical relationship between variables, an explainable topological structure named Proper net is designed to construct variable cycle engine's (VCE) analytical redundancy when all control variables and environmental variables change simultaneously, also accompanied with the whole engine's degradation. As for why VCE is chosen as research object and why the whole engine's degradation, control variables, and environmental variables are considered, it is only to make the problem more challenging, specifically making the problem in this paper a multivariable, strongly nonlinear, dynamic, and timevarying problem. Section 2 introduces the multivariable dynamic degradation data set and Proper net's structure. Following this, three experiments are conducted to demonstrate superiority of analytical redundancy based on Proper net in Section 3. Results are also discussed in this section. Section 4 concludes the paper tersely.  Figure 1 is VCE's structure diagram used in this paper. Authors refer to the two-bypass VCE's structure and the math model proposed in Aygun and Turan's paper [12], which has been validated and referred to by many other researchers. Due to the focus of this paper is not modeling of VCE, specific mathematic formulas have not been repeated in this paper. Main components of VCE include inlet (Inl), fan (fan), core driven fan (cdf), high-pressure compressor (hpc), combustion (cbt), high-pressure turbine (hpt), low-pressure turbine (lpt), mixer (Mix), nozzle (Noz), and bypass (Bps). Besides, mode switch valve (Msv), forward variable bypass ejector (Fvbe), and back variable bypass ejector (Bvbe) are used to change flow area. If the opening of Msv and Fvbe turns to 0, then VCE's operating mode will switch from turbofan into turbojet.

Method
Degradation coefficients are defined as follows: where D com,m and D com,e represent components' degradation coefficients for mass flow and efficiency, respectively. m com,degradation is a component's mass flow after degradation, and m com,nominal is the nominal mass flow before degradation. e com,degradation is a component's adiabatic efficiency after 2 International Journal of Aerospace Engineering degradation, and e com,nomimal is nominal adiabatic efficiency before degradation. In this paper, ten degradation coefficients are defined and act on VCE simultaneously-D fan,m , D fan,e , D cdf ,m , D cdf ,e ,D hpc,m , D hpc,e , D hpt,m , D hpt,e , D lpt,m , and D lpt,e . Their ranges are all from 0.96 to 1.
The original data set sizing 27 * 100000 includes multivariable dynamic degradation simulation data of VCE. There is one row of time series with interval of 0.02 s, 2 rows of environmental variables (H, M a ), 5 rows of control variables ð W f , A 8 , M sv,ope , F vbe,ope , B vbe,ope Þ, and 19 rows of state variables ðT 1 , P 1 , T 2 , P 2 , T 21 , P 21 , T 22 , P 22 , T 7 , P 7 , T 8 , P 8 , P 5 , T 6 , N l , N h , P 3 , T 5 , P 6 Þ. All variables are listed in Table 1. Rows from 2 to 22 are used as input data, and N l , N h , P 3 , T 5 , P 6 are variables to be estimated. In this paper, we only take N l estimation as example.
As shown in Figure 2, a data map sizing 21 * 42 from 2nd row (t m1 ) to 22nd row (t m42 ) is firstly segmented from original data set, and then, the data map is duplicated and reassembled into an input data map sizing 42 * 42. Corresponding output of the input data map is N l at t m42 moment. Other input data maps are also made in the same way. After segmentation and reassembly, 99959 input data map sizing 42 * 42 are generated from the original data set. Corresponding output of each input data map is N l from 42rd column to 100000th column, amounting to 99959 columns. To be clearer, 99959 data maps are segmented from 1-42rd column, 2-43th column, 3-44th column, ……, 99959-100000th column. Among all data maps, 79959 data maps are used for training, 10000 data maps for validation, and 10000 data maps for test.
Traditional ELM or SVM only uses 2 or 3 control period's information before the estimated moment, such as Zhao and Sun's research [6] and Zhou et al.'s research [1]. However, delay caused by burning, moment of inertia or largevolume mixer is usually longer than two or three control periods, which implies massive historical information has not been efficiently utilized.
The method proposed in this paper needs 21 input variables, which is obviously different from traditional methods. In many researches, only 5 or 6 sensors are used as inputs, but this is exactly why authors argue traditional methods cannot solve such a complicated problem proposed in this paper. No matter how excellent the traditional methods are, the information that can be utilized is far too less that it cannot help the judgement on the whole engine's degradation, not to mention constructing analytical redundancy under condition of degradation and multiple variables. What needs to be noted is that number of sensors used in Zedda and Singh's research has arrived at 16 in 2002, which means number of sensors deployed in advanced aeroengines at present or in the future has reached up to or even exceeds 21 [13].
Reasons why data maps should be segmented and reassembled like Figure 2 are listed as follows: (1) Data maps are made into quadrate for compatibility.
In classical CNNs, input maps are usually quadrate, but with the development of CNN, rectangle inputs and kernels are introduced, which means that rectangle inputs are not compatible with some classical CNNs, like Alex net used in this paper (2) Duplication of original data helps improve accuracy and calculation speed. From the perspective of connectionism, connection between rows may form key features to increase accuracy, but it is not known in advance which feature is more important. The general approach is directly connecting as many rows as possible. Input rows of original data are from 2 to 22, and it is tricky to build connection between row 2 and row 22 unless using a kernel sizing 21 * 21. If so, the number of weights and biases will increase steeply, also accompanied slower calculation speed. Thus, a relatively less time-consuming duplication is performed. As a result, original data set sizing 21 * 100000 is transformed into 42 * 100000, and connection between any two rows can be made with kernel only sizing 11 * 11 (3) Although the number of columns used in data maps is not analytic, it has to keep information sufficient. To a great extent, useless information will be automatically filtered by adjusting weights and biases during process of backpropagation. Besides, different from simple application situations, multivariable dynamic degradation analytical redundancy needs more significant information hidden in data to form deep features 2.2. Net Structure and Training Options. Feature extraction capability is directly related with macroscopic topological structures. Compared to Alex net [14] with only one path, Dense net's macroscopic topological structure features two    International Journal of Aerospace Engineering paths [15], which greatly improves Dense net's accuracy. In addition, on the basis of Dense net's macroscopic topological structure, Mobile net's macroscopic topological structure only repeats convolution layer and baths normalization layer one more time [16]. As a result, the number of layers used in Mobile net reduces to around one-fourth as many as that in Dense net, and Mobile net's calculation time also declines dramatically. These examples are just parts of the function of CNN's topological structures, but it can be seen CNN's topological structure has a great impact on its performance. The topological structure of Proper net is demonstrated in Figure 3. The whole structure can be divided into three levels, and the first level is Proper inner-level data fusion (PINDF). The second level is referenced to the classical interlevel data fusion structure-inception net [17]. The third level is an ordinary cascade structure consisting of convolutional layers, activation layers and normalization layers. At last, the whole net ends with a succession of two fully connected layers, and mean square error (MSE) is used as loss function.
What needs to be specified is that most hyperparameters are empirical rather than analytic. Hyperparameters refer to those parameters that are unable to be automatically modified, such as topological structure, choice of activation layer, the number of kernels, and neural nodes. In this paper, the number of kernels, kernel stride, the method of normalization, batch size, activation layer, and neural nodes of fully connected layer are set to 32, 1 * 1, batch normalization, 128, LeakyRelu, 100. [18]. The input of convolution layer is convolved with several convolution kernels, and after that, the result will be added with a bias. The output data map is calculated as follows:

Convolution Layer
where * denotes a convolution operator, and x l−1 j and x l j are input and output of convolution layers, standing for the i th channel of l − 1 th layer and the j th channel of l th layer separately. k l i,j means kernels of l th layer, related with x l−1 j and x l j , and s l j represents the bias of j th channel of l th layer. [19]. Batch normalization layer can prevent gradient explosion or disappearance, improve robustness of a model, and keep activation function away from its saturated region.

Batch Normalization Layer
where b is the batch size, μ l−1 i,j and σ l−1 i,j are mean value and corrected standard deviation of x l−1,t i,j , which represents j th value of i th channel in l − 1 th layer of t th data map within a batch. ε l−1 i,j is the correction factor defined as 0.0001 for steady training. γ l i,j and β l i,j are regulatory factors to be learned.

Activation
Layer. This paper uses LeakyRelu as activation layer [20], whose formula is given below.
where x l−1 i,j and x l i,j are input and output element of the activation layer, standing for j th element in i th channel of l − 1 th layer and the corresponding element of l th layer separately.

Addition
Layer. Addition layer adds inputs by element.
where m is the number of inputs.

Depth Concatenation Layer.
Depth concatenation layer connects all inputs along channels.
where ⊕ is the operator connecting inputs along channels. For example, after ⊕ operator, two inputs with 20 channels and 30 channels separately will be combined into an output with 50 channels.
2.2.6. Fully Connected Layer. For fully connected layer, all nodes are connected with each other.
where x l−1 i and x l j are i th node of l − 1 th layer and the j th node of l th layer, respectively. w l i,j means weights of l th layer, related with x l−1 i and x l j . s l j represents the bias of j th node of l th layer.

Regression Layer.
Regression layer defines the loss function. MSE is used in this paper.
where L is loss. y i and y i,a are estimation output value and actual output value. m is the number of outputs.

International Journal of Aerospace Engineering
Topological structure of CNN often includes hundreds of layers, so it is inconvenient to analyze relationship between the whole net and CNN's performance. Usually, one or two basic macroscopic topological structures will be abstracted from the whole net, which means a novel CNN structure is always accompanied with a new macroscopic structure. For Proper net, although its second and third level are referenced to inception net and other nets, its first level-Proper inner-level data fusion structure-is totally original and the whole topological structure is unique. In Figure 4, as the macroscopic structure and highlight of Proper net, PINDF proposed in this paper is demonstrated and compared with macroscopic structure of Google net named OINDF [21].
As shown in Figure 4, PINDF takes advantage of six kinds of kernel sizes to extract features in different scales. In many image-processing problems, requirement for real-time calculation is not as strict as that in aeroengines. So usually, two or

10
International Journal of Aerospace Engineering three small sizes of kernels are used to extract microscopic features, and then, through hundreds or even thousands of layers, these features are recombined to classify different objects. However, for aeroengines, microscopic and macroscopic features should be extracted directly and concisely to decrease the number of layers for saving calculation time.
So, six kinds of kernel sizes are used in PINDF instead of three kernel sized like that in OINDF. Considering physical relationship between variables, the lower bound and upper bound of these kernels have been decided in Section 2.1, which are 1 * 1 and 11 * 11, respectively. Of course, kernel sizing from 2 * 2 to 12 * 12 can also   11 International Journal of Aerospace Engineering achieve similar effects, but the price is more calculation time and storage space.
Another feature of PINDF is that six paths are combined by pair instead of getting combined together. On one side, if all paths are combined together, different scales of features will be mixed together. During the process of back propagation, it would become harder to adjust kernels' weights to differentiate importance of different scales of features. On the other side, it can be seen that kernels are combined equidistantly with 6, for example, 1 * 1 − 7 * 7, 3 * 3 − 9 * 9, 5 * 5 − 11 * 11. From the perspective of physical meanings, the larger the kernel sizes are, the more the number of variables and the sampling moments are. Smaller kernels perform reversely. When one or two variables change, while the other variables keep static, changing magnitude of features formed by small kernels is greater, which means output is more sensitive to smaller kernels. However, if the whole engine degrades where at least ten variables are changing together, larger kernels are dominant in output. In order to make full use of advantages of smaller and larger kernels and save calculation time, paths are combined equidistantly by pair.

Proper Net's Top Accuracy and Fastest Calculation Speed.
Seven classical topological structures that are used in image processing are modified appropriately to adapt to analytical redundancy, like Res101 net [22], Squeeze net [23], and Vgg19 net [24]. This paper changes classical nets' input size, number of output nodes, and output layer into 42 * 42, 1 and regression layer, but keeps their macroscopic topological structures that are critical to deep feature extraction unchanged.
All nets are trained with options listed in Table 2. The learning strategy is piecewise, in our case, which means the learning rate will be multiplied by 0.1 (LearningRateDrop-Factor) every 2 epochs (LearningRateDropPeriod). To increase fitting accuracy and generation ability, all data maps in training will be shuffled every epoch. Besides, considering changing learning rate, we decide to use Adaptive moment estimation (Adam) to update weights and biases [25]. The software and hardware environment of training are Matlab 2019a and RTX2080Ti 11G separately. Table 3 demonstrates the performance of multivariable dynamic degradation analytical redundancy based on eight kinds of macroscopic topological structures. Simulation results are based on test data set that is made in Section 2.1, and N l is picked as analytical redundancy. For aeroengines, tolerant ARE is around 1.5%, which means those nets with ARE higher than 1.5% cannot be used to aeroengines' analytical redundancy. Among all nets whose ARE lower than 1.5%, Proper net has the lowest ARE (0.81%), minimum number of layers (49), the least training time (25 minutes every 10 epochs), and fastest calculation speed (0.3645 second every 1000 points). All of the four indicators prove that the performance of analytical redundancy based on Proper net is better than that based on other nets.
In Table 3, it can be seen that nets whose ARE is lower than 1.5% have structures similar to Proper interlevel data fusion (PITDF), while nets whose ARE is higher than 1.5% do not have. Thus, it can be concluded PINDF helps improve the accuracy of aeroengine's analytical redundancy. Besides, among all nets whose ARE is lower than 1.5%, Proper net is the only net that has both PINDF and PITDF, while Mobile net and Dense net only have PITDF, which is why Proper net's performance is better than Mobile net and Dense net. Results demonstrate that PINDF could extract deep features of aeroengines and further improve performance of analytical redundancy.
In Table 4, PINDF and PITDF of Proper net are altered with OINDF used in Google net and over interlevel data fusion (OITDF) used in Mobile net. Nets after alteration are named as Google-alter net and Mobile-alter net, respectively. Simulation is based on the multivariable dynamic degradation test data set made in Section 2.1, and analytical redundancy is N l . After alteration, Proper net's accuracy deteriorates dramatically. ARE of Google-alter net is approximately two times as much as that of Proper net. Although Mobile-alter net's accuracy is still lower than 1.5%, its training time is 27 minutes more than Proper net's training time. Therefore, it can be ascertained that OINDF and OITDF would reduce the accuracy of analytical redundancy and increase training time.

Dynamic Performance of Multivariable Dynamic
Degradation Analytical Redundancy. To validate the effectiveness of Proper net, experiments under conditions of more than 100 stable points and 70 kinds of dynamic process have been conducted, which include small-step responses, largestep responses, slope responses, sigmoid responses, single variable, two or three variables, one or two components' degradation, and the whole engines' degradation. All results of these experiments can meet the requirement of accuracy and calculation speed. Due to the space limitation of the paper, the most representative and the harshest situation is picked up to display performance of Proper net. In this situation, seven step signals-H(0~5 km), M a (0~1), W f (2~2.5 kg/s), A 8 (0.2~0.25m 2 ), M sv,ope (100~75), F vbe,ope (100~75), and B vbe,ope (100~75)-act on the VCE  International Journal of Aerospace Engineering simultaneously at 5 s while all ten degradation coefficients change from 1 to 0.96. Although, in practice, it is not allowed to make all the seventeen variables change together by such big steps, and according to experience, if an algorithm can overcome the harshest situation, performance of analytical redundancy based on Proper net should be better in some simple situations, like slope signals, small-magnitude steps, or just two or three signals changing at the same time.
In Figure 5, red lines are actual N l , and green lines are estimated value of N l .
As shown in Figure 5, when step signals act on VCE at 5 s, Proper net responses immediately, but for other nets, there is an obvious delay between actual N l and estimation N l . Besides, it can be seen that both of dynamic performance and steady performance of analytical redundancy based on Proper net are the best. This is because Proper net is designed by considering physical relationship between aeroengine's variables while other nets are designed for image processing problems.

The Superiority of Proper Net to Other Data-Based
Methods. In order to demonstrate the superiority of Proper net, other five kinds of data-based methods are compared with Proper net under six conditions. Other data-driven methods used to estimate N l include SVM, Decision Tree (DT) [26], K Nearest Neighbor (KNN) [27], Fully Connected neural network (FC) [28], and Long-Short Memory neural network (LSTM) [29]. Six conditions are listed in Table 5. For point 1 (P1) and point 2 (P2), all environment variables and control variables are set to constants. For P1, its training data and test data are the same point, and P2 is also trained and tested in the same way. Dynamic status means that data are sampled during dynamic process in the way described in Section 2.1. Ranges of wide range 1 (WR1) and wide range 2 (WR2) are wider than those of SR1 and SR2. The experiment conditions are listed in Table 5.
The AREs of all six methods under six conditions are demonstrated in Table 6. AREs of analytical redundancy based on Proper net are less than any other methods under any conditions. The results also show that as far as accuracy is concerned, analytical redundancy based on Proper net outperforms analytical redundancy based on other methods.
In Table 6, as the conditions change from P1 to WR2 in the order from left to right, the difficulty of estimation ascends gradually. For P1 and P2, AREs of analytical redundancy based on Proper net are similar to those based on other methods, but with the condition gets harsher and harsher, the gap of AREs between Proper net and other methods dramatically increases. For SR1, the gap of AREs between analytical redundancy based on the lowest-ARE method and the highest-ARE method is 2.23%, while when it comes to WR2, the gap turns out to be 27.14%, which increases by more than ten times. It can be concluded that as a deep learning method, Proper net proposed in this paper performs better than other shallow learning methods.

Comparison between Different Sizes of Input Data Maps.
To quantitatively validated the effectiveness of input data map's segmentation and reorganization, data maps sizing 21 * 21, 42 * 42, 63 * 63 are tested by Proper net, Mobile net, Dense net, and Res101 net with N l test data sets. 21 * 21 data maps mean 21 features along with 21 sample periods. 42 * 42 data maps that include 42 sample periods are made with a duplication alongside row direction. Data map sizing 63 * 63 is made from original data maps sizing 21 * 63 with two times of duplication by row direction. All sizes of kernels are in consistency with those used in Section 3.1. Table 7 demonstrates N l AREs of three sizes of data maps tested with four nets. It can be seen that AREs of 42 * 42 data maps tested with four nets are all lower than that of data map sizing 21 * 21. When the size of data maps expands from 42 * 42 to 63 * 63, the ARE basically remains static, only with a slight decrease by 0.01% for Dense net and a marginal rise by 0.01% for Mobile net.
In practical physical process, features in top rows and bottom rows of data maps function together to N l , but as mentioned in Section 2.2, 21 * 21 data maps cannot build direct relationship between top rows and bottoms rows. The disconnection is why 21 * 21 data maps perform worse than 42 * 42 data maps. Due to data map sizing 63 * 63 will not build any new direct connection between variables, it is reasonable that the accuracy of 63 * 63 data maps basically is the same with data map sizing 42 * 42.

Conclusions
(1) A novel convolutional neural network topological structure named Proper net is proposed to construct variable cycle engine's analytical redundancy when all control variables and environmental variables change simultaneously, also accompanied with the whole engine's degradation (2) To sufficiently utilize the underlying information of aeroengine's sensors, original sensor data is segmented and reassembled into data maps that contain more historical information (3) On the basis of considering physical relationship between aeroengine's variables, Proper inner-level data fusion structure is specially designed to improve the accuracy and calculation speed of aeroengine's analytical redundancy (4) Compared to shallow learning methods and other convolutional neural network structures used for image processing, Proper net owns highest accuracy and fastest calculation speed

Data Availability
The data set used to support the findings of this study are currently under embargo while the research findings are commercialized. Requests for data, 12 months after publication of this article, will be considered by the corresponding author.

Conflicts of Interest
The authors declare that they have no conflicts of interest.
13 International Journal of Aerospace Engineering