A Comprehensive Probabilistic Framework to Learn Air Data from Surface Pressure Measurements

Use of probabilistic techniques has been demonstrated to learn air data parameters from surface pressure measurements. Integration of numerical models with wind tunnel data and sequential experiment design of wind tunnel runs has been demonstrated in the calibration of a flush air data sensing anemometer system. Development and implementation of a metamodeling method, Sequential Function Approximation (SFA), are presented which lies at the core of the discussed probabilistic framework. SFA is presented as a tool capable of nonlinear statistical inference, uncertainty reduction by fusion of data with physical models of variable fidelity, and sequential experiment design. This work presents the development and application of these tools in the calibration of FADS for a Runway Assisted Landing Site (RALS) control tower. However, the multidisciplinary nature of this work is general in nature and is potentially applicable to a variety of mechanical and aerospace engineering problems.


Introduction
INTRUSIVE aircraft anemometer booms have long been successfully used to measure air data parameters for subsonic and supersonic flight. Rapid development in aerospace and naval technology has continuously demanded smaller, less intrusive, and more accurate air data systems. Accuracy of probe based air data systems suffers at high angles of attack or when dynamic maneuvering is required. These same conditions can lead to degraded flying handling qualities [1]. Errors due to vibration, poor alignment, and physical damage during operation or maintenance can also pose significant limitations on the use of protruding booms. Probe based air data systems cannot be used with stealthy aircrafts because they increase the total radar cross-section of the body which can potentially jeopardize the mission and risk human lives. Hypersonic flight regimes are another area where probe based air data systems cannot be used because the vehicle nose reaches extremely high temperatures that might melt the protruding boom. Also, such flow protruding booms cannot be used with research aircrafts without disturbing the airflow or the boundary layer. Finally, air data booms are too heavy and costly to use with unmanned micro air vehicles (MAVs) [2]. Reference [2] mentions that an 80% decrease in instrumentation weight and a 97% decrease in instrumentation cost can be gained by eliminating probe based air data systems from MAVs.
In order to overcome the above mentioned drawbacks, flush air data sensing (FADS) systems were developed. A popular flush air data system is a series of pressure taps mounted on a vehicle periphery that can be used to derive airflow parameters. FADS systems were first developed during the X-15 program. A hemispherical nose with pressure sensors was installed on this hypersonic aircraft to measure stagnation pressure and wind direction during reentry [3]. Further research on FADS for hypersonic vehicles was conducted during the Space Shuttle program [4]. After enjoying success on hypersonic flights the compatibility of FADS was tested on supersonic and subsonic flight regimes. The authors of [5] developed and flight-tested a flush air data sensing system for the F-18 Systems Research Aircraft (Figures 1 and 2). The authors concluded that the developed nonintrusive technique was clearly superior to the probe based air data systems. They showed that FADS were unaffected by dynamic flight maneuvers and they performed well in high angle of attack flights. FADS systems are also unaffected by vibration and  icing effects and are less prone to damage during operation and maintenance. They are an ideal choice for stealth vessels because they are self-sustained and give no additional radar cross-sectional area to the vessel signature.
Other nonintrusive air data systems include flush mounted optical air data systems and flush mounted heat films. The optical air data systems transmit lasers or acoustic signals generated by electromechanical transducers or loud speakers through a fluid medium to one or multiple receivers and measure the travel time to infer air data information. Even though these systems are nonintrusive, they are not selfsustained and are expensive and heavy to use [6].

Challenges in Estimating Air Data of a Flush Air Data Sens-
ing System. As previously mentioned, FADS systems infer the air data parameters from pressure measurements taken with an array of ports that are flush to the surface of the aircraft. However, because the locations of the pressure measurements are on the outer surface of the aircraft on geometrically simple components (e.g., hemispherical or conical nose), properties of the local flow fields like compressibility and flow separation can drastically affect these devices. Unlike aircraft, the FADS for bluff bodies like buildings must be placed completely around the structure and can routinely experience separated and unsteady cross-flow. Finally, to compete with conventional air data systems, the FADS must be able to estimate wind speeds ( ∞ ) over a range of 40 to 120 fps, ±3.4 fps (i.e., ±2 knots), and wind directions ( ) ranging from 0 to 360 degrees, ±2 degrees. For more details on assessment of conventional air data systems please refer to [7].
Using the incompressible flow about a right circular cylinder as an example of a bluff body (Figure 3), the problem of estimating relative wind speed and direction from the surface pressure at position can be framed as either a forward or inverse mapping problem. That is, the forward problem = ( ∞ , ∞ , , ) ; (1) the inverse problem ∞ = ( , ∞ , , ) , = ( , ∞ , ∞ , ) .
For simple geometries, closed-form semiempirical equations can be developed and used to solve the forward problem of estimating the pressure coefficient, , given the freestream wind speed and direction [8]. Panel methods [9] and other popular computational fluid dynamic (CFD) based approaches solve the forward problem on more complex geometries given the freestream wind speed and direction. However, this approach requires guessing the freestream velocity and solving for the pressure distribution until it matches the measured values. To solve the inverse problem, look-up tables from experimental fluid dynamics (EFD) can be constructed for the mapping function . Unfortunately, look-up tables suffer from a number of drawbacks when their real-time use is considered including the inability to handle high-dimensional inputs, noisy data, and nonlinear interpolation. It is proposed that machine learning methods, specifically Artificial Neural Networks (ANNs), be explored to solve the inverse problem. In addition, this paper will demonstrate how a particular form of ANN can also be used to determine the best locations for sensor placement, to maximize available FADS data by seamlessly combining CFD and EFD outputs, and to help guide physical experiments thereby minimizing the time needed in testing facilities.

Intersection of Statistical Learning and Fluid Dynamics.
In general, mechanical and aerospace engineering systems present a number of challenging problems especially in the field of fluid dynamics. Understanding pressure and velocity distributions of the fluid flow is essential to the design and control of complex multicomponent interacting systems. Experimental and computational methods in fluid dynamics have long been successfully used to model such engineering systems. However, intensive research in both fields spanning over several decades has highlighted several shortcomings [10]. Over the last few decades, as complexity of engineering systems increased, it became a necessity rather than academic curiosity to adopt interdisciplinary approaches to study a system. Intelligent data understanding tools like Artificial Neural Networks (ANNs) [11] and kernel methods from machine learning and statistics have enjoyed increased popularity in mechanical and aerospace engineering problems. However, the following issues remain.

Formalization of the Final Network.
A typical 3-layer ANN involves several parameters to be defined by the user including (a) number of neurons in the hidden layer, (b) initial weight matrix, (c) parameters of the basis function, and (d) values of the learning rates. Ad hoc procedures exist for choosing each of these parameters. However, lack of a principled approach to network construction generally leads to an overparameterized network. To optimize the structure of the network exhaustive grid search or cross-validation approaches are often needed. To attack this problem we focus on greedy scattered data approximation approaches [12]. In these approaches the user needs to load only the training and test sets and the algorithm solves for the required parameters in a self-adaptive greedy fashion. In other words, greedy scattered data approximation approaches provide a formal way to construct networks in an optimized manner.

Choice of Training Data.
It is often unclear how many training points would be sufficient for an acceptable generalization error and how these training data should be chosen. Traditionally, network training and testing is done only after all the data is collected. This is called passive learning in the machine learning literature, where the learning algorithm does not take part in the data collection procedure. Collecting data without taking into account the input-output functional relationship like the traditional Latin Hypercube Design [13] methods can result in choice of data that add little to no information to training of the network and can even hurt the generalization ability of the training method. This is especially a problem when data collection is costly, like flight tests or when the input domain is excessively large to sample. Active learning [14] methods are popular in the machine learning literature where the learning algorithm takes part in the data collection procedure and new input configurations are sampled such that they maximize some information gain criterion in a principled manner. These sampling procedures are essentially sequential in nature and not only help the engineers to accelerate through the test matrix, but also allow the learning algorithm to have better generalization capability.

Data-Model Fusion: Fusing Sparse High Fidelity Wind
Tunnel Data with Low Fidelity CFD Trends. ANNs provide a powerful tool to interpolate and extrapolate experimental data. However, it is beneficial to have a framework that can integrate experimental data with models of variable fidelity. Regularization approaches provide us with such a framework where it is possible to include physical knowledge of the system in the form of numerical simulations to be included as a priori information in the training of the network [15]. Such a data assimilation procedure can be seen as approximating noisy and scattered experimental data with physics based smoothness. It can also look as improving low fidelity numerical simulations with more accurate experimental data. This tool can result in acceptable flow field solutions by fusing fewer experimental data with possibly lower fidelity CFD data thereby saving precious resources. For the purpose of solving the inverse problem of FADS calibration discussed in Section 1 and similar engineering problems, a multidimensional learning tool should address the hyperparameter selection problem previously mentioned, operate as easily on high-dimensional data as it does on lowdimensional ones, and provide the user with an assurance that the tool will give its best performance on an unseen test set. The tool should also require a minimum of storage and require as little user interaction as possible. To this end, we use Sequential Function Approximation (SFA), an adaptive tool, developed earlier for solving regression and classification problems [16] with little user interaction to determine hyperparameters.
The first objective in this work is to develop a computational surrogate that can predict wind speed and yaw angle from surface pressure measurements. The surrogate should not have wind speed prediction errors greater than ±3.4 fps and yaw angle prediction errors greater than ±2 degrees. The second objective is to investigate how data-model fusion can help to better represent the coefficient of pressure distribution across the bluff body surface. The third objective is to develop a general intelligent sampling strategy with active learning and SFA on regression problems. Section 2 discusses estimation of air data parameters from surface pressure measurements in more detail. Section 2.2 presents freestream wind speed and direction estimation techniques and the results on the current problem. Section 3 briefly discusses the role of regularization techniques in fusing wind tunnel data with numerical models of variable fidelity. Section 4 begins by introducing the fields of active learning and optimum experiment design. Sections 4.1 and 4.2 discuss a G-optimal design procedure with SFA and present its implementation issues application on a simulated regression problem. Finally, Section 4.3 presents the application of this technique on the FADS problem.

Freestream Wind Speed and Direction Estimation
The Runway Arrested Landing Site (RALS) control tower is located at the Naval Air Warfare Center Lakehurst, New Jersey. Wind tunnel tests were conducted on a 1/72 scale model of the RALS control tower [7]. One pressure sensor was mounted on each face of the control tower as shown in Figure 4. A four-hole Cobra probe was used to measure the freestream wind speed and direction. The FADS system was tested on a variety of wind speeds ranging from 40 fps to 120 fps approaching the model from all 360 degrees of yaw. Variation of the wind incidence in three dimensions will be a  part of the future work. In addition, the scope of this work is to develop a computational framework to predict wind speed and direction from pressure measurements and not to conduct the wind tunnel experiments. The experimental setup is mentioned here briefly for completeness. Borrowing from flight mechanics literature and restricting our problem to two dimensions, is defined as the yaw angle and is measured in the anticlockwise sense with respect to the centerline of the model. So winds approaching the south port have = 0 degrees, for the east port = 90 degrees, for north port = 180 degrees, and for the west port = 270 degrees. In Figure 5 shown surface pressure is plotted for each pressure port. The pressure port on the south face bears positive pressure values when wind is incident headon at the pressure port and bears negative values for winds hitting the north face. Similarly the pressure port on the west face bears positive pressure values when the yaw angle is positive and bears negative values for negative yaw values. Also evident from Figure 5 is that the pressure ports do not show much variation when they are on the leeward side.

Forward and Inverse
Problems. Computational algorithms like neural networks that learn from data can be very effective in solving inverse problems and bear potential advantages in the construction of the FADS mapping function (Section 1.1) compared to look-up tables. Examples of neural networks advantages include high-dimensional mapping, smoother nonlinear control, intelligent empirical learning, and fewer memory requirements [17][18][19][20]. For speed of evaluation, reduced complexity, and the potential for graceful performance degradation with the failure of pressure sensors, the inverse problem is solved using only two sets of neural networks with pressures read from all surface sensors, between and can be safely assumed. The wind speed was predicted using only the surface pressure data while the wind direction was predicted using the pressure coefficient data derived from measured pressure and predicted wind speed values. It is noted that this coupling of networks put the burden of high accuracy on the wind speed predictor. Freestream air density was kept constant in the current problem which makes prediction of freestream wind speed equivalent to dynamic pressure. Sequential Function Approximation was used to construct one RBF network to model 1 and the corresponding wind speed surrogate is represented by The hypersurface for the current dynamic pressure prediction problem can be imagined as a hypercone with surface pressure as input dimensions and dynamic pressure as output. Dynamic pressure increases linearly with surface pressure given a yaw angle for all pressure sensor positions, . So each wind tunnel sweep at a constant speed would yield data points lying at the contours of the hypercone. Also, the slope of the hypercone corresponds to the coefficient of pressure at a given value of the yaw angle . Figures 6(a) and 6(b) show the aforementioned hypercone in two and one dimension, respectively. Looking at Figure 6(b), it is evident that the span of the cone includes all pressure measurements of the sensor at = 0 degrees at all wind speeds and yaw angles. The right edge of the cone corresponds to the case when the sensor faces directly the wind or = . As one moves from the right to the left edge of the cone the separation at the pressure sensor increases. A RBF network constructed by SFA can suitably approximate this hypercone; however it is possible that a better approach might exist for this approximation. This will constitute a part of the future work.

Wind Direction.
Once the wind speed predictor was available, the test surface pressure values were divided by the corresponding predicted dynamic pressure to obtain the test coefficient of pressure values. Several ways exist to predict the yaw angle . One straightforward way is to construct one RBF network for 2 given by (5), where However, unlike dynamic pressure, yaw angle prediction is a strongly nonunique problem. Figure 7 shows the variation of the yaw angle with bow side coefficient of pressure at 120 fps. It is evident from Figure 7 that at least two solutions for exist for most values of , and except when the pressure sensor faces separated flow. Here it is important to elaborate on the meaning of nonuniqueness when learning yaw angle as a function of pressure measurements. If we try to fit a metamodel for yaw angle as a function of coefficient of pressure, then from Figure 7 we realize that, for any given coefficient of pressure, say zero, there are at least two possible yaw angles that can yield coefficient of pressure of value zero on the wall. Similarly for a coefficient of pressure value of −0.5, there are more than 2 values of yaw angles that can result in coefficient of pressure of value −0.5. Since building metamodels for yaw angle as a function of coefficient of pressure is nonunique we try to build metamodels for the forward problem as described below. It is important to mention here that as we add more sensors the problem becomes less nonunique and we have more information to predict the yaw angle. In this problem four sensors are not enough to build accurate metamodels as shown in (5), motivating development of method shown below. It will be a very interesting problem to determine the minimum number of sensors and their location to maximize prediction accuracy and will constitute part of future work. A simple forward problem approach was developed to estimate the yaw angle which is discussed next.
In this approach, the forward problem of approximating as a function of the yaw angle is addressed. It is well known from fluid mechanics that in external flow is a function of both and the surface pressure sensor position, . Constructing a surrogate model of as a function of both and is equivalent to approximating for each pressure sensor as a function of just the yaw angle . Even though this increases the memory requirements, it yields faster and more accurate models for . Once a network has been constructed to represent each sensor, the objective function in (6) can be minimized to predict the yaw angle for a vector of test pressure coefficients: Here is the number of pressure sensors, , represents the approximated for the ith pressure sensor, test is the test vector of pressure values, and pre is the predicted dynamic pressure for test . It should be noted here that to predict wind direction, the predicted dynamic pressure is used and not the preknown dynamic pressure. A simple line search for would suffice for the above minimization problem. However, searching for over the entire range of [−180, 180] degrees would be wasteful, so could be searched over [ − , + ], where is the pressure sensor position that bears the maximum surface pressure value and is a value of the yaw angle chosen by the user so that range for the line search is sufficiently large. Because 2 norm is the most resilient to random noise it was chosen to express the differences between the test and predicted . Also, (6) allows the user to put weights on the contribution of each pressure sensor. Different weights could be helpful in situations where there is a combination of high and low fidelity sensors, or in the case of sensor damage or local flow field disturbance.
Besides ease of learning and better redundancy to noise this wind direction estimation approach uses all the available sensors for prediction instead of just those in a quadrant. The above mentioned algorithms have also been tested for a surface vessel and experimentally verified at the Fluid Mechanics Lab at the NASA Ames Research Center [16,21]. Besides learning the air data parameters, this work extends to sequential experiment design and data-model fusion for the current problem.

Results.
In this section wind speed and direction estimation accuracies are presented. Results for the RALS tower problem are presented using the estimation approaches discussed above. The SFA algorithm was implemented using the MATLAB programming environment on a Windowsconfigured PC with a Pentium 4 2.66 GHz processor and 1.0 GB of RAM.
Wind tunnel data for the RALS tower was available at speeds ranging from 40 fps to 120 fps as wind direction was varied from −180 to +180 degrees at increments of 2 degrees. All available data points were used in the test set, while a training set was created by taking data present at increments of 4 degrees at each wind speed. It should be noted here that, by this choice of training data, it consists only about half of the number of points in the test set. Once the training and test sets were constructed one network was constructed to predict dynamic pressure according to (4). The tolerance parameter was kept close to zero at a value of 1 − 6 so as to minimize any errors in the approximation of . Figure 8(a) shows the prediction errors if SFA was used to construct one RBF network for wind speed prediction as a function of surface pressure. However, there are several errors larger than the tolerance of 3.4 fps at 120 fps and on a closer look one realizes that most of the errors occur at yaw angles in the vicinity of 0, 90, 180, and −90 degrees. One possible explanation for the wind speed errors displaying a bias at the highest wind speed is shown in Figure 8(b). In this problem, a hypercone is approximated with radial basis functions (RBFs). RBFs model the smooth slopes of the hypercone; however they do not approximate the flat top of the hypercone that exists due to the wind speeds of the highest magnitude.
Next we present a heuristic which could be used to reduce the errors shown in Figure 9(a). As mentioned before most of the errors occur in the vicinity of the 0, 90, 180, and −90 degrees. For the following ranges of the yaw angle | | ≤ 25 ∘ , | −90 ∘ | ≤ 25 ∘ , | | ≥ 155 ∘ , and | +90 ∘ | ≤ 25 ∘ , the corresponding pressure sensors face a head-on freestream wind and bear a positive pressure value and the remaining sensors bear a negative pressure value. This a priori information was used in conjunction with the dynamic pressure model to improve prediction in the following manner. Training points lying in the above mentioned range were set aside before starting to construct (4). During testing, if a test point is such that only one sensor bears positive pressure value, the dynamic pressure of that test point was predicted using a simple nearest neighbor search from the training points that were set aside. Prediction of dynamic pressure is coupled with coefficient of pressure or the yaw angle. If of a test point is known a priori, linear dependence of dynamic pressure with surface pressure Once the wind speed predictor was in place, the predicted dynamic pressure values were used to obtain test coefficient of pressure values. The resulting test values were then input to the forward approach discussed in the previous section. To implement the forward problem approach first four models of pressure coefficient were constructed as a function of the yaw angle. Figures 10(a) and 10(b) show the residual error convergence and approximation of for each pressure sensor, respectively. Since sufficient noise-free training data was available, tolerance was kept low and so the models interpolate the wind tunnel data accurately.
Ability of SFA to handle noise and sparsity in training data will be covered in future work. Once the models have been constructed the objective function shown in (6) was minimized by a line search to predict the yaw angle for each test point. Figure 11(a) shows an example of the logarithm of the objective function for a test point with a yaw angle of zero degrees. Figure 11(b) shows yaw angle prediction errors on the RALS tower data using the forward problem approach.

Data-Model Fusion: Global Scaling via Surrogate Modeling
Estimating freestream wind speed and direction from surface pressure measurements is an ill-posed problem. The illposedness is induced due to the nonuniqueness of the problem which is worsened in the presence of noise in the pressure data. The inverse problem addressed in this paper is to reconstruct a complete pressure signal given sparse, noisy, and incomplete experimental data. This work develops an alternate method of fusing experimental and computational data to approximate pressure signals to estimate freestream wind speed and direction. The approach uses SFA as the inverse modeling tool and applies a simplified Tikhonovrelated regularization scheme to correct for the original data error. The purpose of this section is to introduce a fusing approach using SFA that maximizes the use of experimental data with the help of CFD data in approximating a smooth, continuous, and accurate pressure distribution. The fusing approach first involves calculating the error function of the CFD and experimental data defined by the following equation: for = 1, . . . , , where is the number of training data sets. The error vector, , is then used to train the SFA network to a predetermined tolerance, . The resulting error surface, ( ), will naturally involve some scatter directly related to the experimental data noise. Training the network to the given tolerance allows the SFA to regulate the noisy experimental data with a priori CFD information. Assuming the CFD surface is known, then the error surface approximation, SFA , can be subtracted from the CFD ( ) data to give the approximation in The value can be regarded as the regularization parameter and controls how well the approximations fit the experimental or CFD data. A very high tolerance value allows the training process to end prematurely with very few network units. As a result, the network "underlearns" the training data and the majority of the approximations reach a value of zero. For data points with an error value of zero, (8) shows that the approximation value will reproduce the CFD data. On the other hand, a very small tolerance value will force the network to use too many network units to reach the smallest possible tolerance. In this case, the network "overlearns" the training data and will fit even the experimental noise in the error surface. As a result, the approximations will reproduce the experimental data. The user must carefully choose the tolerance value to best fit the experimental data using the CFD information based on the noise in the data. Similar approaches have been developed in [22][23][24][25].
In external flow past bluff bodies, coefficient of pressure is a function of the yaw angle and local pressure sensor position . In the RALS tower problem wind tunnel data is available at only four values of = [−90, 0, 90, 180] degrees shown in Figure 12. A complete representation of as a function of and is necessary because it can possibly inform the experimentalist what locations should be chosen for pressure sensor installation to improve wind direction prediction accuracy.
Numerical solutions, however, provide full coefficient of pressure distribution as a function of yaw angle and pressure sensor position. The data-model fusion technique discussed here could similarly be applied to correct low fidelity solutions even at values of where no wind tunnel data is available. This could be done simply by calculating the differences between the wind tunnel and CFD solutions at = [−90, 0, 90, 180] degrees. Once the differences have been calculated, it can be treated as a function approximation problem by SFA with and as inputs and as output. Once a surrogate to the hypersurface of differences has been created, it could be added to the CFD surface to obtain a distribution shown in Figure 13. The numerical solutions were extracted at 56 values of for each yaw angle.

Sequential Experiment Design
In the previous sections, passive learning techniques were discussed for the FADS problem, where a learning algorithm, like SFA, acts only after the training data set has been collected. The learning algorithm does not take part in the data collection procedure. Traditionally, popular space filling experiment design strategies like Latin hypercube sampling (LHS) are used to plan the data collection procedure. Use of LHS is popular with Monte Carlo techniques to estimate uncertainty in computer models. It has been proven that fewer samples are required with LHS compared to random sampling when estimating the output with a given variance [13]. Latin hypercube sampling is a simple and effective space filling sampling technique which does not need or assume the input-output functional relationship. Even though LHS is better than random sampling, a significant improvement is possible if the functional form of the input-output relationship is assumed and data are sampled from those regions where the output variance of the assumed input-output function is maximum. It is thought that advances in experiment design strategies will not only benefit the training of FADS but the fields of EFD and CFD in general. Suboptimal designs    could be a serious problem if the input domain is excessively large or output determination is expensive. Careful selection of training points is also important to construct surrogates with low generalization error. Input sampling strategies where the learning algorithm actively takes part in the data collection procedure are studied in the field of active learning or query learning as suggested by its name. In statistics they are studied in the field of Optimal Experiment Design (OED) [26].

Active Learning for Regression Problems.
For continuous valued problems, significant research has been done in the field of statistics under the name of Optimal Experiment Design [26]. An acceptable active learning method for regression problems should be able to minimize mean squared error 0 100 200 0 100 200 P re ss u re p o si ti o n a n g le Y a w a n g le with dimensional inputs ∈ R , output ∈ R, and random errors which we assume to be independent and uncorrelated. Given a small, finite number of training samples , the parameters of this regression problem can be estimated that minimize the mean squared error given by where is the environmental probability. However, since the environment is generally realized by the user only via the number of training points, the following estimated mean squared error in (11) is minimized to estimate the parameters : Given a finite number of training samples and a learning algorithm to estimate the parameters , an optimum active learning scheme would compute a sample point where the expected mean squared error over the whole input domain is maximum. Estimation of the future generalization error could be a very computationally demanding task for both pool based and population based active learning methods where information about the conditional probability distribution is either known or assumed. Kindermann et al. [27] use a Bayesian theoretic framework to develop an active learning method where they use Markov Chain Monte Carlo methods to approximate the expected loss. Roy and Mccallum [28] estimate the expected error by adding all possible labels for each unlabeled sample to the training set. The proposed brute force approach is infeasible for regression problems and classification problems with many classes. The authors propose several ways to make this procedure efficient, for example, use of Naïve Bayes and SVMs where addition of a new training point does not need retraining, or estimate EMSE only on the neighborhood of the candidate points which compromises the generality of the approach. It is important to note here that methods such as Kriging/Gaussian Processes have the Expected Improvement criteria which can be used for adaptive sampling and estimating uncertainty in results. However, the authors wanted to show that a similar framework can be developed with other stateof-the-art metamodels. In the current FADS application, the number of training data points is high which would take longer for the Gaussian process or Kriging models to train. The SFA method trains quickly while preserving accuracy of the solution. And we show that with this new method we can also construct a framework for adaptive sampling and fusing high and low fidelity data. Since estimation of expected future loss is infeasible, one can rely on the bias-variance decomposition of EMSE to estimate a sample where the expected error would be maximum. For a given sample , let = ( , ), = ( , ), and = [ ] be the true output, estimated output, and the expected value of the estimated output, respectively. The expected value can be computed by averaging over different permutations of randomly chosen training sets of size . The expected mean squared error can be decomposed into bias and variance as shown in where the first term represents bias and the second term represents variance of the model. To elaborate, bias of a model can be understood as, for example, a quadratic function is being estimated by a linear model. The errors of the model are independent of the accuracy of the estimated parameters. The variance on the other hand is the error due to the errors between the estimated and the true parameters. For a simple regression problem shown below in where ( ) is a row of evaluated at . The standardized variance given in the above equation gives a way to calculate the estimated variance of a linear model whose parameters are computed given a finite training set. This means that for a model with insignificant bias, an optimum sample can be computed that maximizes (14). At this sample the expected mean squared error will be maximum making it an optimum choice for a training point. The set of training points that minimize the maximum standardized variance are called G-optimum. The previous analysis was derived for linear regression problems; an extension to nonlinear problems is possible using Taylor Let ( ) = ( ) − ( , ⃗ 0 ), = − 0 , and = ( / )| = 0 , which gives us the following linearized form of The variance of this new linearized form can be computed in a similar manner as described above for a linear regression problem with parameters. The standard variance of ( ) would be the same as ( ) and can be computed using (17) where And computing the derivatives for SFA,

Implementation Issues and Application.
There are several implementation issues that need to be carefully considered when developing a G-optimal design procedure using SFA with RBFs.

Singularity of Fisher Information
Matrix. The singularity of Fisher Information Matrix (FIM) poses a significant problem in the implementation of a G-optimal design procedure especially with RBFs. The FIM can become singular due to several reasons. If ( ) represents a row vector of the derivatives of the surrogate model with respect to its parameters evaluated at point , as shown in (19), FIM is constructed by computing ( ) * ( ) which is a × matrix and summing it over the available training points. It is important to realize that ( ) * ( ) is a matrix of rank 1 and that FIM gains rank as it is summed over increasing number of training points.
This could serve as a guideline to select the minimum number of initial training points necessary to start the incremental G-optimal design procedure. The initial number of training points ( ) should be greater than or equal to the number of parameters ( ). For greedy algorithms like SFA, this could also decide the stopping criterion to add basis functions.

4.2.2.
Overparameterization. Fukumizu [29] has shown that FIM can also become singular if any redundant basis functions are added to the approximation. FIM will become singular in the presence of singular basis functions even if it is constructed by summing over a large number of training points. This is also true for SFA in case a redundant basis function is added to the approximation when only noise is left in the residual error signal. A redundant RBF in a surrogate model constructed by SFA can easily be identified because its coefficient value will be very close to zero and the width of the RBF will also be very low, making the RBF appear as a spike. If this basis function is included in the surrogate model, the parameter derivatives and the FIM will bear this singularity resulting in spiking of the standardized variance values at the redundant basis function center. In order to avoid this problem, if the coefficient or the width of a RBF is unusually low then it should be eliminated from the surrogate model and the addition of basis functions should be stopped in order to preserve the smoothness of the surrogate model and the standard variance.

Initial Number of Training Points and Stopping Criteria.
The initial number of training points can be chosen at random as long as it is large enough to avoid construction of highly inaccurate surrogate models. As mentioned before, the number of training points ( ) should be larger than the number of parameters. A simple heuristic such as = 1.5 can be used to set the stopping criterion given the number of training points. However, this stopping criterion will continue adding an increasing number of basis functions as more training points are added, which will lead to overparameterization. This calls for an additional check that will stop the addition of basis functions if the coefficient, or the width, of the latest RBF falls below a user specified tolerance.

The Bias
Problem. The strategy of maximizing the standardized variance will result in an optimal design sample only if the model bias is insignificant. Previous work done by Kanamori and Sugiyama [30][31][32] has focused on using weighted maximum log-likelihood approaches to result in robust active learning schemes in case that the interpolation model has been misspecified. Sugiyama [31] and Hering anď Simandl [33] have also addressed the issue of using the conditional expectation of the generalization error instead of minimizing the expected loss in an asymptotic sense.
However, these efforts are only possible for population based active learning methods where information about the probability distribution of the parameters is either available a priori or is assumed. In this paper, attention is focused only on pool based active learning methods where no a priori information about the parameter probability distributions is available and sufficient initial data is not present to make good estimations about global parameter probability distribution functions. In this work, the strong approximation capabilities of universal approximators like radial basis functions are relied on to attenuate the bias problem. A possible strategy is to start the active learning procedure with an overparameterized model and remove any redundant basis functions before constructing the FIM. As the number of training points increases the number of redundant basis functions should diminish. Another constraint on the number of basis functions is that the resulting number of parameters (2 + 1) should be less than the number of training points to prevent the FIM from being singular.
The final steps to implement the G-optimal design procedure with SFA are as follows: (1) Use LHS method to pick a small number of input points and evaluate the output on them to construct the initial training set.

Sequential Experiment Design for FADS.
In this section, the G-optimal design procedure developed with SFA is tested on the FADS problem for the RALS tower. Wind tunnel tests for the FADS problem measure surface pressure on the surface at fixed sensor locations ( ). Test runs consist of sweeps where the Mach number or the freestream wind speed is fixed and the bluff body model is rotated resulting in pressure measurements at different incident yaw angles ( ). The time and effort of wind tunnel testing include changing pressure sensor locations ( ) and freestream velocity ( ∞ ). However, CFD simulations give pressure distribution for all for a given ( ∞ ) and yaw angle . Here the time consuming part is repeating the simulations for different . The time and effort required would increase drastically if 3D flow is considered and surface pressure is measured as a function of ∞ , yaw angle , and pitch . For 2D flow, if is measured at increments of 2 degrees, pressure measurements have to be made 180 times at each ∞ . For 3D flow, if and are measured at increments of 2 degrees, 180 × 180 pressure measurements have to be made at each ∞ . An experiment design strategy for the hypersurface = ( , ) could be helpful in reducing the time and cost of FADS. Optimal training data would also be able to give faster surrogates for and would accelerate the forward problem approach discussed in Section 2. For demonstration purposes, it is assumed that one can freely change and to arbitrary continuous values within the interval [−180, 180]. However, in reality wind tunnel tests can only freely change while CFD runs can only change freely. Including ∞ in the experiment design framework is left future work. Figure 13 showed the hypersurface of as a function and which will be used to create the test data set for the design procedure. The G-optimal design procedure will be compared against the traditional LHS design procedure. An initial sample of 100 training points was chosen by LHS to start the design procedure. Surrogates were constructed by SFA using increasing number of LHS training points with increment of 5 points till a training set size of 500 was reached. For each training set size the training-testing procedure was repeated 10 times to eliminate any bias due to selection of training points and the mean and standard error of prediction accuracies were computed. The test set was constructed by the data-model fusion procedure. CFD solutions were constructed with a increment of 1 degree and a increment of approximately 6 degrees. The wind tunnel experiment used increments of 4 degrees and four pressure sensors. The resulting test set had 20577 (= 361 × 57) points which captured the variation in detail. To implement the G-optimal design procedure, the stopping criteria discussed in Section 4 were used. This ensured that the FIM would not be singular. Also as the number of training points is increased the number of basis functions will also increase and address the bias problem. The lower bound on the RBF width parameter was constrained to 1 − 64 and the center selection heuristic corresponding to the maximum absolute value of the residual was chosen. The basis centers were not optimized to save computational expense. One point was added to the training set at each iteration till 500 training points were collected. Again, to put the performance of the active learning procedure in perspective, its performance was also compared to the scenario when it is assumed that the true model is known and one point is chosen at each iteration. This corresponds to the maximum squared error of the current surrogate model.
It was also realized that additional information could also be incorporated in the training set for this problem. 2 bears peaks that correspond to those combinations of and that result in maximum flow separation. Due to the geometry of the RALS tower bluff body, sensors on each wall measure the lowest pressures when the freestream wind is directed against the adjacent wall.
This physical phenomenon was used to enhance the information contained in the training set by adding points where = [−180, −90, 0, 90, 180] degrees. Result for this case, shown in Figure 14, displays that active learning improves upon the LHS data collection method. Figure 15 shows the 2 training points chosen by the G-optimal design procedure. The regions of the input domain corresponding to significant functional variation are efficiently covered by the design algorithm corroborating the claims of Optimal Experiment Design.

Conclusions
This work was devoted to studying the compatibility of flush air data systems with bluff bodies with arbitrary geometries. Nontrivial geometries and requirements for efficient real time usage resulted in a challenging inverse problem of determining air data parameters from surface pressure measurements that could be best solved by intelligent data driven algorithms. A data driven algorithm was developed based on the principles of scattered data approximation. It was proposed that mathematical models can be utilized as a surrogate that solves the forward problem of generating pressure values from known wind speed and direction configuration. This surrogate can be used in conjunction with an array of flush mounted pressure taps to act as an air data measurement system. The proposed scattered data approximation algorithm, called Sequential Function Approximation (SFA), is a greedy self-adaptive function approximation tool that results in reliable and robust estimates without any user dependent control or kernel hyperparameter selection. In particular, a nonparametric and adaptive scattered data approximation tool accurately and efficiently mapped wind speed and yaw angle to pressure measurements taken from all four sides of the body. SFA was also used to reduce uncertainty in the wind tunnel tests by fusing data with low fidelity and smooth CFD models. Finally, to accelerate through the test matrices an intelligent sampling strategy was developed using principles of sequential experiment design.

Sequential Function Approximation
Approximation of the unknown target function is constructed by noting that a continuous -dimensional function can be arbitrarily well approximated by a linear combination of radial basis functions . Sequential Function Approximation (SFA) was developed from mesh-free finite element research but shares similarities with the Boosting [34] and Matching Pursuit [35] algorithms. SFA was developed earlier for regression and classification problems [16] and it is presented here for completeness. Approximation of the target vector is constructed by utilizing the Gaussian radial basis function : for 0 < < 1.
Traditionally, Gaussian radial basis functions are written as Radial basis function is written as (A.1) in order to set up the optimization problem for as a bounded nonlinear line search instead of an unconstrained minimization problem for . The basic principles of our greedy algorithm are motivated by the similarities between the iterative optimization procedures of Jones [36,37] and Barron [38] and the Method of Weighted Residuals (MWR), specifically the Galerkin method [39]. The function residual vector ⃗ at the th stage of approximation can be written as in Using the Petrov-Galerkin approach, a coefficient is selected that will force the function residual to be orthogonal to the basis function and using the discrete inner product ⟨, ⟩ given by The discrete inner product ⟨ ⃗ , ⃗ ⟩ , which is equivalent to the square of the discrete 2 norm, can be rewritten, with the substitution of (A.8) and (A.9), as where is the angle between ⃗ and ⃗ −1 since ⟨ ⃗ −1 ⟩ / and ⟨ ⃗ ⟩ / are scalars. With (A.12) one can see that ‖ ⃗ ‖ 2 2, < ‖ ⃗ −1 ‖ 2 2, as long as ̸ = /2, which is a very robust condition for convergence. By inspection, the minimum of (A.12) is = 0, implying ( ( , ) + ) = −1 ( ) for = 1, . . . , . (A.13) Therefore, to force | ⃗ | 2 2, → 0 with as few stages as possible, a low dimensional function approximation problem must be solved at each stage. This involves a bounded nonlinear minimization of (A.10) to determine the two variables (0 < < 1) and index * ( * = ∈ R ) for the basis function center taken from the training set. The dimensionality of the nonlinear optimization problem is kept low since only one basis function needs to be solved at a time. To implement the SFA algorithm the user takes the following steps: Our SFA method is linear in storage with respect to since it needs to store only + vectors to compute the residuals: one vector of length ( ) and vectors of length ( 1 , . . . , ) where is the number of samples and is the number of dimensions. To complete the SFA surrogate requires two vectors of length ({ 1 , . . . , } and { 1 , . . . , }) and vectors of length + 1 ( 1 , . . . , ). The dimensionality of the nonlinear optimization problem is kept low since we are solving only one basis at a time. Also, Meade and Zeldin [40] showed that, for bell-shaped basis functions, the minimum convergence rate is for positive constants 1 and 2 . Equation (A.14) shows that, for minimum convergence, the logarithm of the inner product of the residual is a linear function of the number of bases ( ), which establishes an exponential convergence rate that is independent from the number of input dimensions ( ). For a detailed discussion and comparison of SFA with other state-of-the-art learning tools for regression and classification problems, please refer [41,42].

Subscripts
: Dummy index * : Component index of ( −1) with the maximum magnitude : Associated with the number of bases : Associated with the number of samples ∞: Associated with freestream conditions.

Superscripts
: Associated with the approximation : Associated with the input dimension.