Fault Detection of Aircraft System with Random Forest Algorithm and Similarity Measure

Research on fault detection algorithm was developed with the similarity measure and random forest algorithm. The organized algorithm was applied to unmanned aircraft vehicle (UAV) that was readied by us. Similarity measure was designed by the help of distance information, and its usefulness was also verified by proof. Fault decision was carried out by calculation of weighted similarity measure. Twelve available coefficients among healthy and faulty status data group were used to determine the decision. Similarity measure weighting was done and obtained through random forest algorithm (RFA); RF provides data priority. In order to get a fast response of decision, a limited number of coefficients was also considered. Relation of detection rate and amount of feature data were analyzed and illustrated. By repeated trial of similarity calculation, useful data amount was obtained.


Introduction
Fault-tolerance and adaptation of aircraft system with actual faults/healthy data have been studied. In order to process the adaptation of the pilot or the flight control system under abnormal condition, critical mission or return to a safe region should be followed with clear and right decision [1]. First of all, data for fault decision should be available; data acquisition system is required to take data accurately and quickly for the guarantee of precise fault decision. A lightweight fault detection model of DURUMI-II, which is an unmanned aerial vehicle (UAV), was considered and fault decision process was carried out with random forests algorithm (RFA) [2]. RFA is a state-of-the-art classification algorithm and has shown high classification accuracy. Additionally, RFA generates priority for each feature. The proposed approach enables one to figure out stable, important features with high detection rates. As a result, parameter optimization and feature selection were performed to make guaranteed high detection rates.
In order to get fault detection result, discrimination measure has to be considered. Similarity measure [3][4][5][6] represents the degree of similarity between comparable data; it has also been done by numerous works [1,2,[7][8][9]. Similarity measures between two vague datasets mean that it roughly depends on inverse value of distance. Therefore, similarity can be considered as common information between two data distributions; hence, the obtained similarity measure was based upon the distance measure. Hence, the computation of similarity between two fuzzy sets could be followed with the obtained similarity measure.
12 cases operating data are used to design discriminating measure for healthy/faults condition. Hence, it is represented as multivariate dataset; it is also understood as multidimensional data. Each component has different importance to determine either faulty or healthy. To get more accurate decision, it is needed to consider weighting factor to each 12 coefficients. Depending on expert's opinion could be a strong candidate to solve the problem. It seems, like the heuristic approach, results depend on individual. It means 2 The Scientific World Journal that results cannot guarantee the consistency. RFA provides important values with respect to each feature. It means that more rationale can be obtained through RFA.
In the previous research [1,10,11], fault detection and isolation (FDI) operation were accompanied by the faulttolerant control system to control process failure. With the assumption of the controllability and the trimmability of the UAV at postfailure conditions, it means that the aircraft can keep on flying with the help of 6 flight control computer even the control surface stuck happen. It makes restructure and reconfigure controllers according to the grade of system failure. To get a decision or analysis for faults, statistical information of elevator deflection, aileron deflection, and others has been analyzed.
By applying RFA to multivariate similarity measure, different weighting factors were considered. Detection results showed good performance with actual data. Specially, when data was placed within overlapping region, RFA showed satisfactory performance compared to unitary weighting case [7,8]. In the next chapter, RFA was introduced. Normal operation and longitudinal faulty operation were performed, and operation data were obtained. Considered airplane model and state equation were also illustrated in Chapter 3. RFA procedure was introduced in Chapter 4, and the importance value was obtained through RFA. Computation result was compared with similarity measure calculation. Similarity was also weighted with importance variable. Finally, conclusions are derived.

Random Forests Algorithm
A lot of interest in ensemble learning algorithm generated many results about classifier and regression by way of boosting and bagging [2,7,8,[12][13][14]. Random forests algorithm was proposed by Breiman [2], the method is categorized under "ensemble learning" method, and it adds an additional layer of randomness to bagging. Applying ensembles of trees can achieve important gains in classification and regression accuracy and each tree in the ensemble is developed according to the random parameter. Applying an injection of randomness, each of these trees is generated. Dietterich proposed random split selection approach, where the split was selected randomly from the best splits at each node [8].
The common element in all RFA procedures is multiple trees. Random vector Θ is generated from past random vectors Θ 1 , . . . , Θ −1 , each has the same distribution independently. By the derivation of Breiman, random vector Θ is generated as the counts in boxes. It is resulting from darts thrown at random at the boxes, where is the number of examples in the training set. Then, Θ consists of a number of independent random integers between 1 and . A tree is also grown with the training set and Θ and resulting in a classifier ℎ(x, Θ ) where x is an input vector [2]. With the definition of Breiman, RFA is defined in Definition 1.
Definition 1 (see [2]). A random forest is a classifier consisting of a collection of tree-structured classifiers {ℎ(x, Θ ), = 1, . . .} where the {Θ } are independent identically distributed With the property of Definition 1, two missions are needed to design RFA, one is a tree structure and the other is the number of trees. Tree structure has no rule for the design; it depends on computational requirement and designer himself/herself. The number of trees should be decided through heuristic consideration such as by using trial and error. Well-known tree algorithm is considered in reference [12].
Total structure of RFA is shown in Figure 1. Trees are repeated times. As in reference [12], RFA has several advantages and disadvantages. It is one of the effective learning algorithms, because it is convenient to process large database, and possible to handle many input variables. However, it often shows overfit for some datasets, and not easy to interpret, and others. Even some uncomfortable RFA results provide useful information such as variable importance, because decision result could be obtained through probability.
Majoring could be obtained by averaging each class result. By overlapping the results of each tree decision, it also provides ensemble mean of each decision result. Effect of deleting uncertainty was done by averaging node outputs. Furthermore, if feature vectors have wide characteristics, then it is obviously guarantee ergodicity.

Model for Similarity Measure
3.1. Airplane Model. Consider the aircraft system of combining with longitudinal mode and lateral-directional mode. Then, the state space equation is as follows [1,10,11,[15][16][17]: The Scientific World Journal 3  where x and u denote state vector and elevator control input variable, respectively. Output vector y is identified by x itself [11,[15][16][17]. Consider where u = is elevator control input. In the state vector, , , and are the angular velocities and and are the angle of attack and the sideslip angle. Finally, and represent the roll and pitch angle, respectively.
In Figure 2, an experimental model consisted of one elevator, one rudder, and two ailerons. In order to get two types of data, normal and fault, two times of data acquisitions were carried out. It is notified that one or two of elevator, aileron, and rudder were broken intentionally. It was made after taking normal operating data. In order to simulate our fault detection procedure, the experimental model also has been equipped with one-piece elevator. It was separated into two-one is normal and the other is faulty. Hence, it was difficult to know whether it was faulty or not. Therefore, control surface has been added at the other vertical stabilizer. Considered UAV is illustrated in Figure 2; fault was applied artificially. Two different flight tests were carried out for normal and faulty operations separately [1,10,11].
In order to carry out the experiment, intentional damages were applied to the right elevator, the left rudder, and the left aileron stuck and the combination of this surface stuck was considered. Without the uncontrollability and the untrimmability of the aircraft at postfailure conditions, the flight test was scheduled. For left rudder only and right elevator with the left rudder stuck cases were considered. Stuck angles of the first case (left rudder only) were considered from −10 ∘ , −5 ∘ , 0 ∘ , and +5 ∘ , to +10 ∘ . Same stuck angles of the second case (right elevator with the left rudder stuck) were also considered.
In the first flight test, the control input for the realtime parameter estimation was applied with the knob and switching method, and the flight data was obtained from the exciting dynamics of UAV operation with the mentioned method [1,10,14]. However, as it was pointed out in the result, result showed that the applied time interval was slightly inaccurate. So, in the second flight test, for the purpose of constant control realization and the time interval, the control input device was developed to use an RF modem and a R/C transmitter [1,10,11].

Parameters of Longitudinal Mode.
In order to control UAV during the occurrence of surface stuck and combination stuck, the aircraft should possess the controllability and the trimmability under the postfailure conditions. Because flight test procedure contains the ability to recover to the normal state from the fault state [1,10,11], Table 1 shows that the experimental results of UAV (DURUMI-II) show the trim value of available primary control surface at postfailure conditions. Blank was considered as the uncontrollability and the untrimmability cases.
Research on longitudinal mode fault detection was carried out in the previous research [10]. In order to obtain the failure status of the elevator, analyses of , , and were essential to obtain information on the aircraft performance characteristics. In the research, instead of statistical information such as mean and variance similarity measure was proposed to overcome the ambiguity of big standard deviation. Because it invokes scattering results, the analysis results of the stability and controllability derivatives are not clear.

Numerical Results and Their Analysis
Fault detection algorithm was proposed in this chapter. Total of 89 data were considered, 38 normal and 51 fault. Twelve features were also included within dataset.

Random Forest with Numerical Interpolation.
Considering RFA, the number of variables in the random subset at each node ( try ) and the number of trees in the forest ( tree ) are needed. In order to get the best classification rate (correction decision rate), determination of optimal number of two parameters is required.  Requirements to decide operation condition are considered as follows.

Requirement 1.
Parameters optimization is conducted to guarantee high detection rates.   To evaluate the feasibility of our approach, longitudinal experiments on the flight test data of Table 3 are considered [18].
In Figure 3 sequence, the highest detection rate was satisfied when try was used only with 2 features. For tree , there is no specific function that figures out the optimal value as try . Thus, the optimal value of tree was considered by choosing the tree value as high and stable detection rates.
As results of experiments, two optimized parameter values were considered when try = 2 and tree = 260. With these two parameters, feature selection of the flight test data has been carried out by employing the feature selection algorithm supported by RFA.
RFA provides the variable importance of each feature; its results are illustrated in Table 2. The proposed approach shows reasonable context information by their important feature. Here, the meaning of , , and and other parameters are expressed in reference [10,11]. By the results  of Table 1, the pitching moment coefficient with changes in the elevator deflection shows bigger difference than the other parameters. However, to get more reliable data, detection ratio versus number of parameters was also carried out.
This approach shows that the feature selection should be important to decide decision performance because the number of features determines detection rate. With combinations of the highest importance variable, decision rate was obtained. Results were illustrated in Figures 4 and 5. By numerical conclusion, the highest decision rate, 97.75%, was obtained when the highest two features were used.
Interpolating the data with sixth order polynomial was obtained as Intersecting value and coefficients 1 to 6 are illustrated in Table 3. By differentiating (3), three maxima satisfy 1.67, 6.00, and 11.22.
These values mean number of features. Hence, it is definite that two features selection guarantees highest detection rate as in Figure 4. Similarly, 9th order polynomial interpolation was obtained as follows: Next, three maxima are also obtained as 2.11, 6.11, and 10.11 ( Table 4).
The Scientific World Journal   It shows the same result with 6th order polynomial interpolation. By considering multidimensional scaling (MDS), it provides a method for discovering "hidden" structures in multidimensional data. MDS is designed by considering similarity measure and mapped on a lower dimensional spatial representation [12,13]. With coefficients , , and , normal/fault patterns are implemented by multidimensional scaling (MDS) methodology in Figure 6 [7]. It is obtained with open source -project [12].
Above results provide two parameters, and , which are enough and most efficient to decide whether it is faulty or not. Now, normal and fault patterns are illustrated via MDS. Normal and fault patterns are shown via blue triangles and red circles in Figure 6. This indicates that the fault monitoring and flight control system organization can be feasible by visualization, without expert's knowledge.

Comparison with Similarity Measure Results.
Similarity measure is designed through using the definition of Liu [19]. Following similarity measure will be used as the calculation of the degree of similarity between normal and fault operating conditions. Proposed similarity measure has strong point by the point of computation in comparison to the result of the literatures [3][4][5][6][20][21][22]. They required at least 2 comparisons and 2 additions by the formations of is the similarity measure between set and set , where satisfies Hamming distance measure. Proof. Commutative property of (S1) is easy to prove; it is clear from (6) itself. To show (S2); is obtained. Because ( ) and ( ) are complements, difference ( ( ), ( )) always satisfies one. (S3) is rather easy to prove: From above statement, it is rational that ( , ) satisfies maximal value. Finally, triangular equality is obvious by the definition of Liu [19]; hence (S4), is also satisfied.
By applying this similarity measure, calculation reduced comparisons and + 1 additions. Two parameter and membership functions are illustrated in Figures 7 and 8. Normal and fault distributions are also shown. With similarity measure (6), decision results are clearly discriminative. In Table 5, calculation results of (6) are emphasized by considering variable importance of Table 2 as weighting factor.

Conclusions
Fault detection problem of aircraft system was carried out with RFA; it was applied to build a fault detection methodology for unmanned aircraft system, named URUMI-II. With obtained performance of RFA, results provide importance of each parameter or feature. The feasibility of fault detection algorithm with RFA was validated.
With experimental data on the flight test of DURUMI-II, fault decision algorithm showed the approach is able to detect faults with high detection rates ( Figure 5). Additionally, the visualization of normal and fault patterns using MDS was able to easily figure it out with context information. Similarity measure weighting calculation with importance variable was applied for detection problem. Decision results were emphasized more than with only similarity measure.
By the calculation of RFA, meaningful result was provided that detection algorithm was effective even with a limited amount of operation data. Consequently, the flight supporting control system with fault detection algorithm is reconfigured. Then, the reliability increases without additional sensors such as a potentiometer on the control surface.