Aircraft Control Parameter Estimation Using Self-Adaptive Teaching-Learning-Based Optimization with an Acceptance Probability

This work presents a metaheuristic (MH) termed, self-adaptive teaching-learning-based optimization, with an acceptance probability for aircraft parameter estimation. An inverse optimization problem is presented for aircraft longitudinal parameter estimation. The problem is posed to find longitudinal aerodynamic parameters by minimising errors between real flight data and those calculated from the dynamic equations. The HANSA-3 aircraft is used for numerical validation. Several established MHs along with the proposed algorithm are used to solve the proposed optimization problem, while their search performance is investigated compared to a conventional output error method (OEM). The results show that the proposed algorithm is the best performer in terms of search convergence and consistency. This work is said to be the baseline for purely applying MHs for aircraft parameter estimation.


Introduction
Flight control is one of the most important parts in developing a new aircraft or improving an existing one. It is even more crucial when the applications of unmanned aerial vehicles (UAV) have been introduced. Traditionally, aircraft motion is modelled based on the equations of motion or Newton's second law, leading to a system of nonlinear equations. Often, such a system is linearized, resulting in a linear state-space control model. In order to reach the possibly highest performance for flight control, identification of aerodynamic parameters and the aircraft dynamic model needs to be accurate. Although aircraft aerodynamic parameters including aerodynamic, stability, and control derivatives can be evaluated from an empirical model [1], numerical models (a vortex lattice method) [2], computational fluid dynamics (CFD) [3]), and wind tunnel test, errors between the test data and the real aircraft data are still inevitable. e designed aircraft and the manufactured aircraft are always different, while it is even worse when the aircraft is subject to structural flexibility in real flight. is implies that accurate system identification of the real or manufactured aircraft is always required. In this regard, parameter estimation techniques are necessary and have been one of the most popular research topics in the field of flight dynamics and control. Conventional parameter estimation techniques that can be used to evaluate stability and control derivatives from flight information have been presented [4]. However, the traditional techniques are suitable for estimating stability derivatives in a linear trim condition for a stable and rigid aircraft. ey are insufficient for a highly manoeuvrable or unstable aircraft, and also expensive computation is required for estimating a large number of parameters for a full-order model of an aircraft. Also, some of the more efficient parameter estimation methods have been developed based on an implicated function technique in combination with optimization. ese include the Equation Error Method (EEM) [5][6][7][8], the Output Error Method (OEM) [9][10][11], and the Filter Error Method (FEM) [12][13][14][15]. However, those methods still require a predefined/ initial aircraft model. In this regard, development of the more efficient flight parameter estimation from flight test data without a predefined aircraft model is a challenging topic.
Recently, efficient flight parameter estimation techniques based on artificial intelligence (AI) and optimization tools have been proposed, e.g., artificial neural network (ANN) and fuzzy set theory [16][17][18]. e combination of AI and MH search has also been presented [19,20]. However, even all those techniques cannot be used without a predefined aircraft model. ANN and fuzzy require a large amount of training data and also rely on efficient parameter settings. Using standalone MH for such a problem as an inverse optimization problem solver, if successful, could be a good tool for aircraft flight control.
MH (also known as evolutionary algorithms) can be classified as optimization methods which are global and nongradient optimizers. Due to such advantages, they can deal with any kind of design variable, objective, and constraint functions, although they may be less efficient in some cases. ey can also explore a Pareto front within a single run, in cases of a multiobjective optimization problem. erefore, they are at present the most used and popular optimizers for real engineering design optimization problems [21][22][23][24]. For optimization applied to an inverse problem, successful use of MHs is reported worldwide, such as a damage detection problem [25,26], an inverse kinematic design of a robot [27], robot trajectory planning optimization [28], mechanism synthesis [29], and parameter identification of photovoltaic models [30,31]. For inverse problem optimization of aircraft parameters estimation using MH, to our knowledge studies are rare. Some MHs found being used for solving such a problem are a classical genetic algorithm (GA) [32]. Since investigation on the use of MHs for aircraft control system identification has been limited, it is thus a motivation for this work. Up to the present time, since one of the very first algorithms GA was invented, there have been over a thousand MHs and their variants presented in the literature. A comparative performance study of some established and newly invented MHs on the parameter estimation of an aircraft system is one interesting subject while the more challenging task is to develop a new powerful MH or to improve the performance of an existing MH for such an inverse problem. Among existing MHs, teaching-learning-based optimization (TLBO) is an outstanding MH, which is found to be one of the most powerful algorithms for solving inverse problems through optimization [25,[29][30][31]33]. However, its performance on the inverse problem of parameter estimation of an aircraft system has never been tested. Furthermore, although TLBO was found to be one of the best optimizers for solving inverse problems through optimization, the original version is developed for general optimization problem. erefore, enhancing the TLBO algorithm based on a novel technique for this specific optimization problem is challenging and will lead to a powerful tool for aircraft parameter estimation.

Formulation of Inverse Optimization Problem
Rigid aircraft flight dynamics are governed by the equations of motion or Newton's second law. e rigid aircraft has 6 degrees of freedom with 3 for translations and the other 3 for rotations. e conventional north-east-down coordinates can be used as an inertial reference frame. It is also convenient to use the body axes as shown in Figure 1. e equations of motion lead to a nonlinear flight dynamic model. In order to simplify the model, a small perturbation approach is employed to linearize the model. en, with the left right symmetry of an aircraft, the model can be separated into a longitudinal and lateral/directional motions. is leads to easy to handle aircraft dynamic and control models.
In order to examine the performance of the proposed parameter estimation approach, parameter estimation of the longitudinal flight control model is presented. e nonlinear flight control model for longitudinal motion of a conventional aircraft used in this study can be expressed as [4] 2 Computational Intelligence and Neuroscience e aerodynamic parameters Θ � C D0 , C Dα , C D δe , C L 0 , C L α , C L q , C L δe , C m 0 , C m α , C m q , C m δe } are assumed unknown and to be identified.
To apply an inverse optimization problem for longitudinal motion parameter estimation, the design problem is posed to find a set of aerodynamic parameters, in order to minimise errors between the longitudinal response of a real aircraft and the calculated response from the longitudinal dynamic equations. e optimization problem can be expressed as Subject to where x � Θ is a vector of design variables having L b and U b as the lower and upper bounds. e details of the design variables are shown in Table 1. e parameters r real,i and r estimate,i are respectively the i th real and estimated longitudinal motion parameter time responses, whereas l is the number of longitudinal parameters considered. Four longitudinal parameters include V, α, θ, and q. e parameters t 0 and t end are the initial and final times of the simulation, respectively. As there are several types of physical parameters with totally different units, e.g., aircraft velocity and angular position, each estimation error in (7) is therefore normalised before being summed up. is step can be easily achieved by dividing by the absolute value of the real time response.
In this study, the real time response from a flight test is simulated using the flight model of the HANSA-3 aircraft as shown in Table 2, while the target values of aerodynamic Table 3. e aircraft flight data are simulated with the elevator deflection of a 3-2-1-1 step input. e simulation is performed for a six-second time length (t) with a time step (Δt) of 0.025 second. Gaussian noise with zero mean is added to the system response with the degree of noise at 0%, 5%, and 10% with respect to the amplitude of the time response. e state time response of the longitudinal motion r � V, α, θ, q T can be numerically solved based on (9), while _ r can be calculated based on (1)-(6). e state time response used as real flight data is shown in Figure 2.

Self-Adaptive Teaching-Learning Based Optimization with an Acceptance Probability
A TLBO is a simple but efficient MH proposed by Rao et al. in 2011 [49]. e algorithm was inspired by the behaviour of teaching and learning in a class. e main search procedure of the TLBO consists of population initialisation, reproduction, and selection, while, at the reproduction process, there are two main phases called the teaching and learning phases. Each individual in the population is first updated in the teaching phase with the relation.
where x i � the ith individual in the population. x teacher � the best individual. x mean � mean value of other members in the population. rand � a uniform random number in the range of [0, 1]. T F � teaching factor, which can be either 1 or 2 at random.
Computational Intelligence and Neuroscience e offspring and parents are then put together while the best of them is selected and sent to the learner phase. In the learner reproduction phase, a particular offspring can be created as where x i1 and x i2 are two randomly selected individuals in the population. e greedy selection is then performed in the same manner as with the teaching phase. e computational steps are given in Algorithm 1. From Algorithm 1, the original TLBO teaching phase is performed using one teacher (the current best solution), while the learning phase is performed exploiting two randomly selected students (individuals). is, to some extent, leads to limitation in TLBO search exploration and exploitation. erefore, this work proposed an improved version of TLBO by introducing several numerical schemes. For the proposed self-adaptive teaching-learning based optimizer with acceptance probability algorithm, both teaching and learner phases are upgraded. Multiple teachers are assigned in the teacher phase while a three-student learning scheme is added to the learner phase to enhance its convergence rate. With several new numerical schemes being added, some control parameters are exploited in the new algorithm. While the original TLBO is said to be a derivative-free MH, the proposed algorithm applies selfadaptive strategies to the added control parameters.
In the teaching phase, a diversity archive is used to keep some promising solutions which have good balance between exploration and exploitation. ose solutions are assigned as teachers.
e archive is created and updated using the nondominated sorting technique to classify solutions to the archive. e nondominated sorting is operated based on simultaneously minimising the original objective function (f(x)) and the diversity objective function (f D (x)) [29]. e diversity function values are calculated based on a combination of the population in the current iteration and the populations from a few previous iterations. en, the diversity function of those individuals can be computed as where w 1 and w 2 are weighting coefficients with the condition w 1 + w 2 � 1, while w 1 is generated randomly within the range of [0,1]. e function f 2 can be calculated based on (11) e variable D ij is a Eulerian distance between individuals i and j, while max(0.0001, D ij ) is the maximum value between 0.0001 and D ij which is used to avoid a singularity occurring in the calculation. n P is the number of individuals in the pool. (11) is still used for the teaching reproduction phase, but the teacher can be selected between the best solution and those in the diversity archive with a given   Parameter probability. e selection of the x teacher is performed based on a probability of selection which can be expressed as where p T is the probability of selecting the best solution, while x D,ran d is an individual randomly selected in the diversity archive (archive of nondominated solutions obtained from f D and f 2 ). e probability of selecting the best solution for the teaching reproduction is made self-adaptive based on the accumulated data on each optimization run. ree subintervals for generating p T are defined as [0.4, 0.5], [0.5, 0.6], and [0.6, 0.7] where selection of the intervals is carried out using a roulette wheel selection technique. For example, if subinterval one is selected, the value of p T is generated as where rand ∈ [0,1] is a uniform random number. e j-th subinterval has the probability of being selected as p wj , which can be computed from Initially, two 1 × 3 vectors p T_success and p T_fail whose elements are all zeroes are created. During the teaching reproduction, if the j-th subinterval is used to generate the value of p T and the reproduced offspring is better or as good as its parent, the value of p T_success,j should be increased by adding a point to its j-th element. On the other hand, if it fails to surpass its parent, the value of p T_fail,j should be increased by one point. With such a concept, the value of p wj depends on the history of its use being either a success or a failure. Nevertheless, counting only success or failure can possibly lead to local optimum traps of the evolution of p wj ; as a result, the acceptance probability concept is used similarly to the Boltzmann's probability employed in simulated annealing. As a result, in cases that the j-th subinterval is used leading to an offspring x i teaching , the updating scheme for both p T_success,j and p T_fail,j can be expressed as If rand < p acc , p T success � p T success,j + 0.5, Else, where p acc is an acceptance probability set with a high value initially and reduced as the optimization run progresses. Although it has failed, the p T_success,j still has a chance to increase its score to 0.5 if the acceptance probability is passed. In this work, simple probability scheduling as displayed in Figure 3 is used, where t MAX is the maximum iteration number. For the learner phase, the 2-student learning in (11) is still used along with the 3-student learning strategy [33]. e selection of the two learning strategies relies on the probability of choosing the 2-student learning defined as p L . e 3-student learning is achieved by randomly selecting three individuals in the current population, then the search direction is computed in such a way that two other students are directed towards the best student. e new learner reproduction can then be written as where x i1 , x i2, x i3 are randomly selected from the current population and the last one is the best of them. e variable p L is generated in a similar way to p T . at means there are three subintervals for randomly generating p L , while the 1 × 3 vectors p L_success and p L_fail are used to memorize the successful and failed records of using the j-th subinterval. Similarly, the probabilities for the roulette wheel selection are computed using (16). e search process of SaTLBO-AP starts with initialising a population, a schedule of p acc , and the initial (zero) sets of p T_success , p T_fail p L_success and p L_fail . After objective functions of the current population are evaluated, the diversity archive is created. en, the reproduction process with the teaching and learner phases is performed, and the p T_success , p T_fail p L_success and p L_fail sets and the parameter p acc are updated. e search process is repeated until the termination criterion is satisfied. e computational steps of the proposed SaTLBO-AP are shown in Algorithm 2.
Each optimizer is used to solve the problems for 20 independent runs. e population size and maximum number of iterations are set to be 200 and 250, respectively. For any optimizer using a different population size, they will be terminated at the same number of function evaluations (FEs) of 200 × 250 � 50,000 FEs.
In addition, the conventional OEM has been used to compare with the MH approach. e OEM used in this study is based on the default code from the System Identification Program for Aircraft (SIDPAC) from NASA [50]. Due to the requirement of a predefined initial solution of OEM which is difficult to determine in the real situation, 20,000 initial solutions are generated based on the Latin hypercube sampling in this study. As a result, 20,000 runs from 20,000 initial solutions are performed for the OEM and the results obtained are discussed and compared with the proposed MH. In cases that the solution cannot converge, the OEM search process is terminated at 3,000,000 FEs. It should be noted that the numerical experiment is conducted via MATLAB 2020a with AMD Ryzen 9 5950 × 16-Core Processor 3.40 GHz with 32 GB of ram.

Results and Discussion
Having performed 20 independent runs of all MHs for solving the three cases of the optimization for aircraft control parameter estimation, the results are reported in Table 4. Table 4 shows the root mean square error (RMSE) between the longitudinal response of the real aircraft and the calculated response from the longitudinal dynamic equations at the optimum points. e mean values of the RMSE    0369 0.0024 0.0195 0.0083 1.25 0.1968 0.1899 0.1914 0.0017 1.2 0.4043 0.3960 0.3994 0.0016 1.15 * FR is the Friedman test score; lower is better.

Computational Intelligence and Neuroscience
Afterwards, WCA gets trapped at a local optimum after about 10,000 FEs. e proposed SaTLBO-AP seems to converge slower than the WCA initially, however, after 30,000 FEs, the proposed SaTLBO-AP is superior to the WCA and steadily approaches the global optimum. At the end of the optimization run, SaDE managed to reach near the global minimum point, but is still behind SaTLBO-AP. It can be concluded that the proposed SaTLBO-AP is obviously the best algorithm for this problem, is the most robust, and has good balance between search intensification and diversification. Table 5 shows the top 10 best solutions obtained from 20,000 initial solution of OEM. From the table, it is seen that for the parameter estimation problem without noise, there are only 7 of 20,000 initial solutions where the RMSEs are lower than 1. ere are only 4 out of 20,000 and 9 of 20,000 initial solutions with RMSE lower than 1 for the cases of 5% and 10% noise, respectively. When comparing the best obtained solution based on RMSE of the OEM and the proposed SaTLBO-AP, the OEM seems to have a slightly better RMSE for all cases, due to the use of a gradient-based optimizer. Nevertheless, such solutions can be found only 4-9 times out of 20,000 trials. When comparing the average RMSE obtained from 20 independent runs of the proposed SaTLBO-AP and average RMSE obtained from the top 10 best results classified from 20,000 solutions by using OEM, it is clearly seen that the proposed SaTLBO-AP is better for all cases. Although OEM is advantageous in terms of convergence rate with the use of gradient information, it requires a good predefined initial solution in order to obtain an acceptable solution, which is difficult to identify in a real situation. Based on this study, the possibility to obtain good results is lower than 10/20,000. However, for the proposed SaTLBO-AP based on solving the proposed inverse   Table 6 shows the comparison between the aerodynamic coefficients and derivatives obtained from using the three best MHs, i.e., SaTLBO-AP, SaDE and WCA and the target values.  e best obtained values (best run) and standard deviation (in parentheses) are presented in the table for each aerodynamic parameter. e results reveal that the aerodynamic coefficients and derivatives obtained from the proposed SaTLBO-AP and SaDE are close to the target values for all cases. Most of the standard deviation values for all aerodynamic coefficients and derivatives obtained from SaTLBO-AP are the lowest for all cases. e output responses obtained from the best run of the proposed SaTLBO-AP compared with target output responses are plotted in Figures 5 and 6.
Overall, it was found that the proposed SaTLBO-AP is the best performer for the case studies of aircraft parameter estimation. Applying the diversity technique, parameter selfadaption and the acceptance probability concept can increase the MH search performance.
e obtained values of the aerodynamic coefficients and derivatives may be slightly different from the actual or desired values; however, if a robust controller is used, such uncertainties can be coped with.

Conclusions
In this work, a new self-adaptive TLBO is proposed for aircraft parameter estimation. e method is based on integrating a diversity archive and a 3-student learning scheme into the teaching and learner phases respectively. A new selfadaptive strategy with the use of an acceptance probability is proposed. An inverse optimization problem is presented for aircraft longitudinal parameters estimation. e problem is posed to find longitudinal aerodynamic parameters by minimising errors between real flight data and those calculated from the dynamic equations. Several established MHs along with the proposed algorithm are used to solve the proposed optimization problem. e results shown that the top three best algorithms are SaTLOB-AP, SaDE, and WCA, while the proposed SaTLBO-AP is the best performer in both search convergence and consistency. When comparing the proposed SaTLBO-AP with the conventional OEM, the OEM gives better results if a good initial solution is used. Nevertheless, based on this study, only 1 of 20,000 trials for initial solutions of OEM leads to slightly better results than the proposed SaTLBO-AP, while the proposed SaTLBO-AP can obtain acceptable solutions without worrying about an initial guess. However, the proposed algorithm is shown to be suitable for the inverse problem of aircraft parameter estimation, while its performance on other inverse problems still needs investigation. is work is proposed to be the baseline of an investigation that only applies MHs for aircraft parameter estimation. For future work, performance enhancement of the proposed MH for solving the problem and for full model aircraft system identification should be studied.

Nomenclature
C D , C L , C m : Drag, lift, pitching moment coefficients C D 0 , C L 0 , C m 0 : Trim drag, lift, pitching moment coefficients p, q, r: Angular velocities in roll, pitch, and yaw axes ϕ, θ, ψ: Roll, pitch, and yaw angles u, v, w: Airspeed components in x, y, and z axes I x , I y , I z : Moment of inertia about x, y, and z axes Θ: Vector of unknown parameters α: Angle of attack C Dα , C D δe : zC D /zαzC D /zδ e L, M, N: Rolling, pitching, and yawing moments q: Dynamic pressure ρ: Air density V: True airspeed F eng : Engine thrust.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.