A Radial Basis Function based Optimization Algorithm with Regular Simplex set geometry in Ellipsoidal Trust-Regions

We present a novel derivative-free interpolation based optimization algorithm. A trust-region method is used where a surrogate model is realized via an interpolation framework. The framework for interpolation is provided by Universal Kriging. A first contribution focuses on the development of an original sampling strategy. A valid model is guaranteed by maintaining a well-poised subset that exhibits the regular simplex geometry approximately. It follows that this strategy improves the scattering of points with respect to the state-of-the-art and, even importantly, assures that the surrogate model exhibits curvature. A second contribution focuses on the generalization of the spherical trust-region geometry to an ellipsoidal geometry, that to account for local anisotropy of the objective function and to improve the interpolation conditions as seen from the output space. The ensemble method is validated against its direct competitors on a set of multidimensional problems.

1. Introduction. We address the unconstrained minimization of objective function f : X ⊂ R n → R. We assume that f is continuously differentiable, smooth and bounded from below and that ∇f is Lipschitz continuous on subset X . We restrict the minimization to a bounded convex subset X ⊂ R n .
Such a convex subset can be defined through a number of inequality constraints. Yet we assume that none of these constraints will be active at the minimum. The subset X is thus mainly introduced to search for a local minimum within a range of practical interest. This may also be a means to delimit the search space to the definition domain of the function. To motivate the particular development of the algorithms presented we premise additional assumptions on the computational tractability of the objective function f .
As is the case in many engineering design applications, a single evaluation of the objective is assumed to be associated with the execution of a complex computer simulation or legacy code. Therefore, we assume it to be computationally cumbersome. Secondly, given the high degree of realism that is pursued by modern day simulations and the great level of expert knowledge that is required to fully comprehend the underlying physics, once such simulation codes reach the optimization stage, quite often they are treated as black boxes. This implies that little information about the inner working remains accessible except that a single simulation is known to be a time-consuming endeavour where sensitivity analyses need to be retrieved by examining input-output behaviour, hence also implying that derivative information is not accessible and becomes intractable to directly approximate.
Altogether we thus assume that the optimization needs to be carried out solely depending on direct evaluations of the objective function itself. The class of Derivative Free Optimization (DFO) methods is tailored exactly to problems that exhibit such challenging computational prerequisites. Among the most popular members of this class are the Nelder-Mead simplex method [1] and pattern search methods [2]. This article however focusses on the particular subclass of interpolation based optimization (IBO) methods [3,4,5,6].
Given the presumed lack of derivative information, DFO methods are forced to assess the behaviour of the optimization hypersurface indirectly. IBO methods rely on a surrogate model which is computationally simple to evaluate. Such surrogate models can be formed by interpolating a set of sufficiently scattered interpolation points. In our framework we use Universal Kriging as a generalisation of radial basis function (RBF) models [7,8,9]. A brute force approach involves sampling the entire feasible space [10], whereas limiting the surrogate modelling space to a compact neighborhood at each iterate of the IBO can reduce the overall computational time [11]. Trust-region algorithms have illustrated the benefits of following such a strategy.
When the surrogate satisfies Taylor-like error bounds on its function value and gradient for every iteration, convergence to first-order optimality can be guaranteed [12]. In the context of interpolation, such error bounds can be established depending on the geometry of the used interpolation points [13]. The geometry of points can be understood as the mutual scattering of points with respect to a certain region. The quality of a geometry can be quantified by a set dependent constant. An arbitrary set of points is well-poised for interpolation within the specified region if the corresponding quality measure turns out to be sufficiently large.
In order to guarantee a valid interpolation model suited for optimization within a specified trust-region, IBO methods maintain a subset of the available interpolation set for which the geometrical constant is maximized. Consequently, the well-poisedness of the subset geometry is assured and so is the model validity. In every iteration this constant is maximized by removing points from the interpolation set and generating new ones according to the current size and location of the trust-region; that whilst limiting the overall accumulation of objective evaluations. This delicate mechanism is referred to as the interpolation set management.
In [4,14,15] such a quality measure constant is described through a basis of polynomials, other authors have described this constant in function of the QR pivots of the Vandermonde matrix that depends on the interpolation points [3,13]. All the mentioned methods consider a subset of n + 1 interpolation points to guarantee a valid model, which they complement with interpolation points from previous iterations if the expanded geometry does not become illconditioned [3]. In any of these cases, the interpolation set management implies orthogonality between the n interpolation points with respect to the current trust-region centre. The centre then serves as the (n + 1) th interpolation point.
In light of this framework, the contribution of this article is twofold. Firstly, in our algorithm we rely on a subset of n + 2, instead of n + 1 interpolation points as in [3,4]. This way, we aim at achieving an improved scattering which is heuristically justified for low dimensional problems. Considering the RBF interpolation framework used, this also means that at every iteration the surrogate will exhibit curvature which is of vital essence for the well-functioning of our second contribution. This first, seemingly simple, modification has a large impact on the interpolation set management. An original sampling strategy is devised that is inspired by the volumetric extremity properties of the regular n-simplex. Details on the exact motivation and consequences are provided in the accompanying sections.
A second contribution focusses on a generalisation of the conventional spherical trustregion framework to an ellipsoidal trust-region geometry. The motivation to use an ellipsoidal geometry is again twofold. An ellipsoidal trust-region will adapt to the geometry of the objective function and will therefore allow larger steps towards more promising regions compared to using a spherical geometry. We will inspect the convergence properties by means of extensive numerical experiments. Secondly our interpolation set management is input space oriented. The introduction of an ellipsoidal trust-region, which shape reflects the objective's local output behaviour, allows to indirectly account for the output behaviour when executing the interpolation set management and will benefit the interpolation based metamodelling.

Outline.
To keep this article self-contained, we shortly resume the general trustregion framework for surrogate management in section 2. In section 3, we elaborate on the radial basis function interpolation framework and provide details on the implementation. Section 4 motivates our choice for using n + 2 interpolation points and introduces our interpolation set management. In section 5, we discuss the transition towards ellipsoidal trust regions. In section 6, we benchmark the proposed methods using extensive numerical experiments and in section 7 conclusions are drawn.
2. Trust-region methods. In this section we shortly review the trust-region methodology for DFO in a generic manner. Furthermore, we define fully linear surrogate models that generalise the characteristics of the surrogate model used by the trust-region methodology from derivative-based models to more general model types. As made apparent in the introduction, we are primarily interested in the use of interpolation models. Finally, we reach out towards the inclusion of the trust-region framework in the context of IBO.
2.1. Trust-region framework. The trust-region framework is a widely-used technique to attain global convergence of derivative-based optimization algorithms. At every iteration k, a derivative-based surrogate model, m k , is considered that approximates the actual objective within a neighbourhood of iterate x k . Traditionally, this neighbourhood, i.e. the trust-region, is modelled as the parameterized set, Here x k is the current iterate and ∆ k is referred to as the trust-region radius. The choice of the norm is arbitrary, commonly the standard Euclidean distance is used. In that case the trust-region becomes an n-dimensional hypersphere. A trial step, s k , is obtained by minimizing surrogate, m k , in the trust-region, B(x k , ∆ k ). This minimization problem is referred to as the trust-region subproblem. The resulting trial step, s k , gives rise to the trivial candidate iterate, x k+1 = x k + s k .
(2) s k = arg min In between iterations, the size and position of the trust-region is updated such that the generated sequence of iterates converges. The configuration of the trust-region, which is entirely determined by the couple (x k , ∆ k ), is updated according to a fundamental scheme based on the ratio of actual to predicted decrease, ρ k . The ratio constitutes a metric for the predictive quality of the surrogate with respect to the current trust-radius [11].
Based on the value of ρ k , the trust region is updated according to the following general rules. If the prediction of the surrogate is good, the radius is increased; in case of bad prediction, the radius is reduced. For other cases, the current radius is simply maintained.
To regulate the expansion and contraction of the trust-region, one takes values 0 < ∆ m ≤ ∆ M , 0 ≤ η 1 < η 2 < 1 and 0 < γ 1 < 1 < γ 2 . Here ∆ m and ∆ M represent a minimal and maximal allowed trust-region radius, respectively. The radius is then updated according to evaluate f (x k + s k ) and compute ρ k through (3) 6: update x k+1 according to (5) and ∆ k+1 according to (4) 7: end for 8: return x k To decide whether to accept or reject a trial step, s k , we adopt the rules from [11]. If the candidate iterate, x k + s k , is not accepted then the trust-radius shrinks given that ρ k < 0.
The generic trust-region algorithm is presented in Algorithm 1.

2.2.
Trust-region method using fully linear models. In the derivative-based version of the trust-region method, the surrogate, m k , is provided by a quadratic model of the form Here, g k and H k are evaluations of the first-and second-order derivatives of the objective function f in the iterate x k , respectively. According to Taylor's theorem, a region exists within the vicinity of x k where the assumption that model m k is a reliable approximation of f , is true. A region of reliable approximation can be defined as a region where the surrogate modelling approximation error, reflected by ρ k , is small enough such that descend in the surrogate model, m k , restricted to this region corresponds to descend of the objective fuction f . It is apparent that the strength of the trust-region method thus depends on the capacity of the updating rules to maintain a region where the surrogate modelling approximation error is sufficiently bounded, so that m k is suited for optimization purposes [16,17]. From that perspective the generic trust-region method can be decomposed into two crucial mechanisms that interact to generate the sequence of iterates. A procedure must be provided that can generate a quadratic approximation model m k of the goal function f for any given iterate x k . The updating rules will then maintain a series of regions where the corresponding and consecutive models, m k are suited to generate a descending sequence of iterates as the solution of subproblem (2). The latter mechanism is in that aspect generic and can be employed to maintain a series of regions given any type of model, also derivative-free.
In this work we are interested in the class of so called fully linear models. The definition of a fully linear model establishes Taylor-like bounds on arbitrary models and therefore generalises the notion of a reliable approximation. As a consequence, the trust-region framework can be generalised to include such fully linear models.
We adopt the definition from [12,13] For given κ f > 0, κ g > 0, and given Algorithm 2 Trust-region algorithm with fully linear models if m k is fully linear in B(x k , g 2κg ) then solve subproblem (2) to obtain s k 12: evaluate f (x k + s k ) and compute ρ k through (3) 13: update x k+1 according to (5) and ∆ k+1 according to (4) 14: end if 15: end for 16: return x k Let us assume two procedures that can either generate a fully linear surrogate model m k on B(x k , ∆ k ) on the one hand, or that can verify whether or not the surrogate model, m k , is fully linear on B(x k , ∆ k ) on the other, and that for any given x k ∈X and ∆ k ∈ (0, ∆ M ]. If two such procedures are available, Algorithm 1 can be modified to construct a trust-region algorithm with fully linear models that will converge to a first-order critical point of the function f [12]. Algorithm 2 presents such a trust-region algorithm using fully linear models. The main modification is the introduction of the termination criteria, ∇m k (x k ) ≤ g 2 . This condition is a reformulation of the first-order optimality condition on the objective function, ∇f (x k ) ≤ g . Assuming that m k is fully linear on B(x k , ∆ g k ), we have that Algorithm 2 assures that upon termination, model m k will be fully linear on B(x k , g 2κg ). Substituting g 2κg for ∆ g k in (8) reveals that this is equivalent with satisfying the first-order optimality criterion, ∇f (x k ) ≤ g . We remark that it is thus fundamental to provide procedures that can guarantee the tightest possible bound on κ f and κ g . In section 4, such procedures are developed considering that model m k is an interpolation model.

Interpolation models.
When first-order derivative information, g k , is not directly available, it is possible to construct a surrogate model based on strategically generated data, i.e. direct evaluations of the objective function. A model m k denotes an interpolation model if it satisfies the interpolation conditions in (9), for a given input set, Z = {z 1 , . . . , z q } with interpolation points z i ∈ R n , and corresponding output set, F = {f (z 1 ), . . . , f (z q )}.
A quadratic model, as in (6), is capable of modelling the curvature of the underlying function f and allows therefore to proof global convergence to second-order critical points [12]. Correspondingly, several DFO methods have been developed that operate with quadratic interpolation models [6,18,19]. To fully determine a quadratic model, the number of interpolation points, q, must either equal or exceed q = 1 2 (n+1)(n+2) requiring thus at least that number of objective function evaluations. This requirement contradicts with our intention to limit the number of objective function evaluations. A linear model can be constructed with as little as n + 1 interpolation points. Algorithms that operate with minimal interpolation sets of q = n + 1 are described by [3,4,20]. The drawback in this case is, however, that a linear model is not capable of representing any curvature present in the objective. In this work we impose the additional requirement that the minimal number of interpolation points must equal q = n + 2, because it is then guaranteed that the resulting model can represent curvature given a proper interpolation framework (see section 3). We will refer to such models as better-than-linear models, noting that these models classify as fully linear models but are constructed with at least one more interpolation point than the minimum for exact linear models q = n + 1. These better-than-linear models are further detailed and engaged in section 4 and section 5.

Interpolation based optimization.
Anticipating that a procedure is available that can generate an interpolation model, m k , that classifies as a fully linear model, an interpolation framework can be embedded in Algorithm 2.
Section 3 describes a multivariate interpolation framework that can model curvature and is flexible in the sense that it does not impose strict requirements on the number of interpolation points. Within the context of this framework, it will be shown that an interpolation model can be classified as a fully linear model if the set, Z, satisfies conditions on the geometry of the points with respect to the trust-region [13]. The geometry can be understood as the scattering of interpolation points inside a specified region. In section 4 the geometry of an interpolation set is quantified through a single set dependent constant that makes it possible to classify an interpolation model as a fully-linear model.
The procedures of verifying and generating fully linear interpolation models thus boil down to verifying whether the geometric conditions are satisfed with respect to having fullylinear model for a given interpolation set and to complement an existing or possibly empty set so that the condition is satisfied by construction. In the context of IBO, these two mechanisms are provided by a single procedure that is referred to as the interpolation set management or the sampling strategy. A sampling strategy for minimal interpolation sets of q = n + 2 is proposed in section 4.

Radial Basis Functions.
In IBO, the previously described Algorithm 2 embeds interpolation models (9). This requires a flexible, multivariate interpolation modelling framework that can capture curvature and does not impose strict conditions on the number of interpolation points since we will recycle interpolation points (and their evaluations) from previous iterations. The Radial Basis Function (RBF) interpolation framework has readily shown its utility within IBO [3,4,5]. The traditional RBF framework decides arbitrarily on the shape of the modelling RBFs by fixing a shape parameter a priori. Alternatively one can determine the shape parameter based on the interpolation data. In that case the RBF framework coincides with Universal Kriging [7] where the shape parameter is referred to as a hyperparameter. The modelling assumptions of the framework and the determination of the modeland hyperparameters are discussed next.
3.1. Modelling framework. Consider a given set of interpolation points Z and corresponding objective function evaluations F. The RBF multivariate interpolation framework then presumes a function model of the form where φ : R + → R is an element of the class of RBFs and p(x) is an element of an nvariate polynomial space P n d of at most degree d referred to as the polynomial tail. Let P n d = π 1 (x), . . . , πd(x) be a n-variate basis for P n d withd the size of this n-variate basis, then the polynomial tail can be represented by a linear combination of this basis, that is p(x) = πj ∈P n d β j π j (x) with β j the expansion coefficients. Some popular choices for the RBF are listed in Table 1. Notice that the function φ( x ) has isosurfaces coinciding with hyperspheres in R n , i.e. the function radiates out isotropically.
Introducing the vectors, φ(x) and α, ∈ R q and, π(x) and β, ∈ Rd which entries are given by φ( x − z i ), α i , π j (x) and β j respectively, the RBF model can be reformulated as (11) m Determination of model parameters. The RBF function model (11) is uniquely defined by the parameters α and β. These parameters can be determined as follows.
Let the Vandermonde matrices Φ ∈ R q×q and Π ∈ R q×d be defined as Φ ik = φ( z i − z k ) and Π ij = π j (z i ). The set of q interpolation conditions in (9) can then be written as Here the entries of f are defined as the function evaluations contained in F. These conditions do not render unique α and β. Therefored additional conditions are required to complement (12) and so that the assembled system matrix is nonsingular. To this end we require of m(x) that it is the smoothest function of the form (11) that complies to the interpolation condition (9). For RBFs, function smoothness can be defined in relation to the concept of conditional positive definiteness (CPD) [8]. RBFs have the specific property that to any RBF, φ, one can associate a value d 0 such that if the maximum degree d of the polynomial tail satisfies d ≥ d 0 , the matrix Φ will be positive definite with respect to all vectors α ∈ null (Π). Equivalently, we have that if d ≥ d 0 and Π α = 0 it holds that (−1) d0+1 α Φα > 0. In conclusion, when CPD is satified, i.e. α Φα > 0, it can be shown that the corresponding interpolator is the smoothest function of the form (11) satisfying (9). For an elaborate proof and additional details on CPD, we refer to [8].
Formally, the matrix representation of the interpolation condition can be complemented as such by the function smoothness condition so to obtain the linear system of equations The model parameters are determined uniquely if the system matrix is nonsingular. It is easily verified that if d ≥ d 0 , it is then sufficient that Π has full column rank. If so, the interpolation set is said to be well-poised for interpolation. Such corresponds with geometric conditions on the points in Z as will be discussed in section 4.
Based on the previous, the explicit solution of the model parameters is This result may also be obtained through the Universal Kriging (UK) framework. Here, a linear predictor of the form m(x) = η(x) f , is considered. The objective function is modelled as the realization of a polynomial tail and a random process, The process' stochasticity is quantified by a spatial correlation function that models the output correlation as a function of the spatial distance between input values. The correlation is thus expressed in terms of a RBF as corr(w(z i ), w(z j )) = φ( z i − z j ). It can be shown, e.g. [7,9], that the optimal linear unbiased predictor, η, is then determined by the following system of equations where ξ can be interpreted as a Lagrangian multiplier.
For additional details we refer to Appendix A.

Determination of model hyperparameters.
The value of the shape parameter γ (see Table 1) is arbitrary and affects the receding radial effect of the RBF contribution in (11). Heuristics may be provided, however they usually lack proper theoretical foundation. A suitable value is mostly determined by trial-and-error. Furthermore we emphasize that each component of the interpolation points z i contributes equally to the distance metric, · , i.e. the model presumes isotropic dependence of the objective function on all of its variables. Such isotropic behaviour can be achieved by normalizing the optimization space by performing a linear transformation of the input space. Nonetheless, such measure cannot account for any local deviations on the global trend. Since we are to construct local interpolation models which validity is not to stretch beyond the limits of the trust-region boundary, we could redefine this linear transformation locally.
To account for local anisotropy in the principle directions of the input space one may define the distance metric as (16). The RBFs in (11) are then redefined as φ( x − z i γ ). It is clear that in this case the univariate hyperparameter γ has become redundant. In practical implementations of the UK framework, the determination of hyperparameters γ i is automated through a maximum log likelihood estimate based on the observations, F.
We refer to Appendix A and [21] for elaborate detail.

Interpolation set management.
In section 2, Algorithm 2 described a trust-region method with fully linear models. It was demonstrated how a sequence of fully linear models m k can be utilised to converge to a first-order critical point of the objective function f . When considering interpolation models satisfying (9), the question whether m k is a fully linear model boils down to a proper management of the interpolation set Z used to construct m k . This so-called sampling strategy should maintain a proper geometry of the interpolation set.
In [13] the geometry of an arbitrary interpolation set, Z, was quantified by a single set dependent constant directly related to the conditions in (7) to define a fully linear model. The lemma in [13] however, only applies to interpolation models for which the set size equals q = n + 1 exactly. On this basis the authors in [3] devised an IBO algorithm employing RBF interpolation models given at least n + 1 interpolation points to construct a fully linear model. By applying a QR pivoting strategy directly related to the measure described by [13], their sampling strategy assures that a subset of n + 1 interpolation points (amongst the entire available data set) exhibits the required geometry to guarantee a fully linear interpolation model. Possibly additional samples need to be generated to have verification of the full linearity of the interpolation model. We henceforth refer to these points to verify that m k is fully linear, as the affine subset. Additional points from the entire data set are subsequently recycled to enhance the modelling capabilities of the RBF model as long as the well-poisedness of (13) is not corrupted.
Having an interpolation model m k representing curvature, requires an affine subset of q ≥ n + 2, cfr. subsection 3.2, that is adequately embedded in an interpolation based trustregion algorithm. In other words, we aim to ensure convergence of Algorithm 2 to secondorder critical points whilst only adding a single point to the affine subset with respect to [3,4]. Since an estimate of the curvature is now always available, the remark on the RBF hyperparameters in subsection 3.3 can be generalised to arbitrary directions. The latter is investigated in section 5.
In 4.1, we introduce a generalised version of the theorem in [13] for interpolation sets with size exceding n + 1. On that basis we identify the optimal affine subset geometry that corresponds with n + 2 interpolation points in 4.2. Finally in 4.3, we propose a procedure capable to maintain an interpolation set with an affine subset of n+2 points so that interpolation model, m k , is fully linear and exhibits curvature.
4.1. Better-than-linear models. Uniqueness of the RBF interpolation model parameters implied well-poisedness, i.e. full column rank, of the matrix Π . The size of the interpolation set must thus equal or exceed the size of the polynomial basis. Given that we adress a minimal interpolation set size n + 2 and thusd ≤ n + 2, the polynomial tail will be linear or equivalently p(x) ∈ P n 1 . A convenient choice for the basisP n 1 is then clearly π(x) = x 1 . Since we will embed the interpolation framework in a trust-region framework, the current iterate x k is always available for interpolation since f (x k ) is evaluated for calculating the predictive quality of the interpolation model (3). Henceforth, we will consider a trust-region centered at the origin, 0, by performing a shift of coordinates each time. We furthermore require that 0 is always an element of Z, i.e. Z = {0, z 1 , · · · , z q−1 }.
Introducing the interpolation set Vandermonde matrix, Z, matrix Π can be written as As mentioned, in [13], the conditions (7) were related to the interpolation set geometry and more specifically to the matrix Z. Here, a generalised lemma is presented establishing similar Taylor-like conditions for better-than-linear models, i.e. when q ≥ n+2. A derivation is included in Appendix B. LEMMA 2. Suppose that m is an RBF model with linear tail and RBF contribution α φ that satisfies the interpolation condition, m(z i ) = f (z i ), for the interpolation set Z := {0, z 1 , . . . , z q−1 } ⊂ B(0, ∆) where q ≥ n + 2 and that satisfies Z -1 ∆ ≤ Λ Z . Furthermore suppose that f and m are continuously differentiably in the hypersphere B(0, ∆) and that ∇f and ∇φ α are Lipshitz continuous in B(0, ∆) with Lipshitz constants γ f and γ φ respectively, then the following two inequalities are satisfied for any x ∈ B(0, ∆): These conditions are similar to (7). The Lemma provides therefore an admittedly conservative estimate of the constants κ f and κ g , but foremost it reveals the crucial role of the interpolation set geometry, which is reflected through the value of Z .
It follows that our interpolation set management must ensure boundedness of ∆ -1 k Z k from below by the a priori fixed value Λ -1 Z , at each iteration. 4.2. Optimal critical set geometry. The value of ∆ -1 Z is understood as a measure for the algebraic well-poisedness of a given set Z ⊂ B(0, ∆) within B(0, ∆) [13]. For any given ammount of interpolation points q we can define the optimal critical set geometry that corresponds with the largest value of Z corresponding the tightest bounds on κ f and κ g . Assuring set well-poisedness hence reduces to generating an affine subset that resembles the optimal set geometry as closely as possible whilst recycling as many points as possible. This strategy balances the requirement to minimize the total amount of function evaluations without corrupting the model quality. Figure 1 depicts optimal geometries for n = 2 dimensions, maximizing the measure Z for affine subsets of size q = n + 1, q = n + 2 and q = 2n + 1, respectively. The optimal set geometry for q = n + 1 complies with the normalized set ∆ -1 Z \ {0} being orthonormal with respect to the origin, or equivalently when Z coincides with a standard simplex. This set geometry is pursued by QR pivoting strategies as elaborated in [3], and Lagrange or Newton polynomials based strategies as elaborated in [4,13]. The optimal set geometry for q = n + 2 corresponds with the regular simplex geometry (plus the origin) [1,22] and that for q = 2n + 1 with that of an orthoplex (plus the origin) [10]. The orthoplex geometry has been recommended by Powell in [6]. This geometry also determines the fixed search directions in many pattern search optimization methods.
In this context it is relevant to inspect the spatial distribution of the 2D geometries in Figure 1. The optimal set geometries for q = n + 2 and q = 2n + 1 are unoriented: the centre of mass of the corresponding sets coincide with the origin. Such does not apply for q = n + 1 where the set is biased. Note that the above is valid for arbitrary n. To quantify this more rigorously, we introduce the metric, ϑ * (Z), defined as Table 2: ϑ(Z) for different optimal set geometries contained within B(0, 1).
Its value is equal to the maximum mean distance that any test point in the trust-region can be removed from the modelling set Z (including 0). The value gives an indication for the worst-case contribution that the radial basis term (that is directly related to the distance between the test point and the interpolation points) has compared to the linear tail. In Table 2, we compare ϑ * (Z) values for different optimal set geometries. It can be observed that ϑ * (Z) values can be improved by only adding a single point, referring to the shift from the standard to the regular simplex geometry. Contours of ϑ(x; Z) are presented in Figure 1 that allow to visualize which regions are not represented well by the optimal set geometries. For n → ∞, ϑ * (Z) tends to √ 2 for all three given configurations, implying that all are as good, or as worse, for infinite dimensional problems.
The discussion based on ϑ * (Z) provides an additional argument to choose the regular simplex over the configuration in Figure 1a, certainly for low dimensional problems.

Regular simplex affine subsets.
According to the previous subsection the optimal affine subset (without the origin) for q = n + 2 is a regular simplex. 0 remains an element of the affine subset which affects the definition of a regular simplex affine subset based on the n + 1 remaining interpolation points.
Formally, the set of all regular simplices lying on the surface ∂B(0, 1) is given by (20). l n (·) (see Appendix C) is a scaling factor depending on the radius of the n-sphere (here 1).
The algebraic problem of maximizing Z hence coincides with the geometric problem of maximizing the volume of the convex hull of Z. Approaching the problem from this point of view helps to devise an intuitive interpolation set management resulting in a critical subset that is (close to) an element of the set R n (x k , ∆ k ); whilst recycling interpolation points evaluated in previous iterations.
Based on the the hypervolume V (Z) of the subset Z a metric can be construed whether Z matches a regular simplex sufficiently. The principle idea is that V (Z) should be close to the theoretical maximal hypervolume of a regular simplex, V * , and this within a predefined relative error factor 0 ≤ µ 1 ≤ 1. If V (Z) ≥ µ 1 V * is not satisfied, new interpolation points need to be generated. This procedure guarantees ∆ -1 Z ≥ Λ -1 Z . Before we elaborate the particularities of our set management, we mention that each set of points R ∈ R n m (0, 1), is a lower order regular simplex itself where m is the dimension of the regular subsimplex considered and therefore m ≤ n. R n m (0, 1) is defined (22). This statement holds since any subset of a set R ∈ R n (0, 1) simply inherits the property that the distance between every two points is the same. Expand2volume. Based on the previous idea, Algorithm 3: Expand2volume, constructs an affine subset in a systematic manner so that m k classifies as a fully linear model for given Λ Z . The goal is to attain an affine subset with hypervolume V (Z) ≥ µ 1 V * whilst adding as few points as possible to the available set of interpolation points. The input is the set of all interpolations points available, scaled with respect to the trust radius, ∆ k , and unbiased with respect to the center, S k is the set incorporating every point evaluated so far. The current origin is excluded such that we remain with only the candidate interpolation points to form the affine subset.
The algorithm embarks on removing all points that are outside a certain periphery, parameterized by the variable θ 1 ≥ 1. Note that we artificially enlarge the trust-region since we allow parameter θ 1 to be greater than 1. (We can always assure that the interpolation model is fully linear by redefining κ f and κ g ). This ensures that in practice, the number of interpolation points actually accepted to be part of the affine subset is slightly increased, improving the overall convergence. Otherwise, the algorithm might end up expanding an empty set too regularly (certainly for larger n). If this operation yields a subset, Z 1 , that is still larger than the required affine subset size of n + 1 (which in practice will almost never occur), all points that are furthest away from the trust surface are additionally removed. If the size of the resulting subset, Z 1 , is smaller than n + 1, it is expanded by calling Simexpand (Algorithm 4).
The procedure described by Simexpand, exploits the idea that if the set Z 1 were to be an element of R n n−m (0, 1), we can expand it with an element of R n m−1 (0, 1) so that their union would then constitute an element from R n n (0, 1). The expansion element can be generated by rotating and translating a properly sized lower order regular simplex.
Our numerical implementation proceeds as follows. A set, X , is determined to be an element of R m−1 (0, 1) embedded in the n-dimensional workspace. The direction e is determined as the orthogonal projection direction of the origin on the subspace spanned by the vertices of Z 1 . Subsequently, a rotation matrix Q is determined so that the basis spanned by the rotated element QX is orthogonal to the union of the basis spanned by Z 1 and the direction e. In a final step the rotated element QX is scaled according to, r n m (1), and translated along the direction e over a distance d n m (1). The factors r n m (·) and d n m (·) are explained in Appendix C and will only maximize the volume in the occasional coincidence where Z 1 ∈ R n n−m (0, 1). Nonetheless, this same reasoning can be applied when Z 1 / ∈ R n n−m (0, 1). In that case an additional subproblem must be solved to identify the optimal value of σ so to maximize the volume of their union. Here σ is a scaling factor related to r n m (·) and d n m (·). In the description of the algorithm we make no distinction between a set X and its matrix column vector representation X.
We also use the following operators. Operator, [·], constructs a matrix containing a set of column vectors that span the space determined by the vertices in Z = {z 0 , . . . , z q }, i.e.
[Z] = z 1 − z 0 . . . z q − z 0 . The function, regsim(n, m), outputs a set of vertices,  Figure 2. Each subfigure depicts a set Z 1 that differs in size from the previous, the corresponding expansion set Z 2 , bases N and N X and the vector e.
If the union Z 1 ∪ Z 2 does not satisfy the criticality condition, V (Z 1 ∪ Z 2 ) ≥ µ 1 V * , an iterative procedure is initiated where points are removed from the set Z 1 systematically. The contracted set can then be expanded by applying Simexpand again. In order to avoid that expansion points are generated too close to a point that was removed in an earlier iteration, information from the removed point is transferred to the contracted set Z 1 . Information about a point that gets removed is stored by transferring it to a memory set Z . The mean of this memory set is used to complement the points in Z 1 that remain, before expanding the set with a new regular m + 1-simplex. In this fashion, the added simplex is generated away from the mass centre of the original set Z 1 .
The algorithm ends after at most n + 1 iterations, that is when the entire original set Z 1 is transferred to the memory set Z . We remark that in this case the mass center is expanded with an (n − 1)-regular simplex that is however not necessarily an element of R n n−1 (0, 1). As a consequence, the eventual affine subset needs not to be an element of R n (0, 1). As, this does not ensure that the sufficiency condition is satisfied, the algorithm closes with a final verification of the hypervolume and possible expansion of the set Z 2 .

Remarks.
With respect to Algorithm 3 with Algorithm 4 described above, we add the following. During the initialization of Expand2volume, all points that are furthest away from the trust-surface are removed. This is a heuristic. Alternatively, one might run through a more complex scheme where every possible n + 1 size set is verified with respect to their hypervolume and the largest set is selected. These points could also be added to the memory set. However, this is a situation that will almost never occur in practice, partially because the optimization algorithm will guide the iterations to unexplored regions of the parameter space, also because the trust region will contract when the sequence of iterates starts to converge. So by proper choice of µ 1 and θ 1 it can be avoided that the complete interpolation set of the previous iteration is within the current periphery.
Furthermore, we remark that alignment of the kernel associated to the space spanned

Algorithm 3 Expand2volume
Input: Z 0 = Z/0, θ 1 , µ 1 Output: Z 2 1: obtain Z 1 with elements z i ∈ Z 0 such that z i ≤ θ 1 2: define Z 2 = ∅ 3: if |Z 1 | > n + 1 then 4: order z i such that |1 − z 1 | ≤ · · · ≤ |1 − z q | 5: obtain Z 1 = Z 1 / {z n+2 , . . . , z q } 6: else if |Z 1 | < n + 1 then 7: get Z 2 = Simexpand(Z 1 ) 8: end if 9: find initial memory z = z i ∈ Z 1 : z i ≤ z j , ∀j 10: define memory set Z = z 11: while V (Z 1 ∪ Z 2 ) < µ 1 V * or |Z 1 | > 0 do 12: find z i ∈ Z 1 : z i − z ≤ z j − z , ∀j 13: contract by the union Z 1 ∪ 0 and the space spanned by the regular simplex X is underdetermined. This additional degree of freedom remains here unexploited. Alternatively, this degree of freedom could be employed by orienting the added simplex as such that the generated points are located furthest away from the available set of interpolation points or such that the added points correspond to a mean minimal predicted objective function value.
5. Ellipsoidal trust-region framework. The procedures described in the previous two sections can be employed to verify and construct a model, m k , that satisfies Definition 1 for a fully linear model in a given trust-region, B(x k , ∆ k ). Hence, these procedures can be used to embed an interpolation framework in the trust-region algorithm with fully linear models described by Algorithm 2, factually realising a IBO algorithm.
As for now the geometry of the interpolation points is considered only with regard to the input space and with respect to the trust-region defined in it. Given a spherical trust-region, our input space oriented sampling strategy will treat every direction equally in function of a well-conditioned system matrix Z. However, from the perspective of optimization, it might be of interest to account for the geometry of the objective function as well.
The gradient of the objective function will be more sensitive to certain search directions then to others as determined by the local curvature of the objective function. Currently the set management however treats all directions equally (as a result of the spherical shape of the trust-region) and will try to homogenize the distance from the trust-region center and in between interpolation points. Now as a result of the local curvature, the corresponding function values may vary faster either slower than can be expected when solely considering the local objective gradient. Hence the geometry of the objective function with respect to the arbitrary choice of the optimization space coordinates is reflected by its local curvature.
Therefore we propose to execute the set management in a transformed space by introducing an auxiliary optimization parameter, u. The goal of this transformation is to obtain a normalized curvature of the objective function in the auxiliary optimization space. We propose to use an adaptive affine mapping from the auxiliary space to the original space, that changes with the trust-region iterations. Notice that the origin in the auxiliary space corresponds with the trust-region centre in the original space. (23) x The trust-region will be stretched along those directions where the local gradient is varying slowly and shrinked in those directions where the local gradient is varying rapidly. As a consequence, objective function values evaluated along the trust-region boundary will increase or decrease in equal amounts with respect to the objective function value at the trustregion centre. Such will have a positive effect on the hyperparameter tuning described in 3.3 since the principle directions are now homogenized.
Another consequence is that a spherical trust-region in the auxiliary space z will now have an ellipsoidal shape in the original optimization space illustrated by Figure 3a and 3b. As a result, a step of length 1 taken in the auxiliary space represented in the original space may exceed the original spherical trust-region radius. Two situations can be distinguished when inspecting the directional curvature. When having a large directional curvature on the one hand, the ellipsoidal geometry will have a stabilizing effect considering that the set of admissible steps (i.e. the trust-region) becomes less aggressive in terms of large jumps in the objective function, which will also have an indirect effect on the modelling procedure. On the other hand when having a small directional curvature, the ellipsoidal trust region will allow the optimization procedure to take large steps in the direction of a slowly varying gradient. This way the original optimization space will be explored more rapidly. For instance in Figure 3b, the trust-region in the auxiliary space encloses a larger part of the inner contour that contains the actual minimum.

Curvature normalization.
The goal of the coordinate transformation is to normalize the local curvature of the problem in the new coordinate space. This condition can be expressed mathematically as given. For reasons that will become apparent, we introduce an additional scaling factor λ 2 k . (24) Suppose now that we have an estimate B k for the hessian ∇ 2 x f (x k ). We can formulate an expression for A k by considering the Singular Value Decomposition (SVD) of the matrix B k . The SVD decomposition of the positive definite matrix B k is given by U k Λ k U k . Here U k is an orthonormal matrix and Λ k is a diagonal matrix with the singular values σ 1 , · · · , σ n on its main diagonal. It is then easily verified that (25) A Finally we will utilise the scaling factor λ k to keep subproblem (2) as it manifests in the auxiliary coordinate space proportional to its manifestation in the original coordinate space. Mathematically this proportionality can be realised by determining a measure that remains invariant under the transformation. Recall that as a result of the affine coordinate transformation, a spherical trust-region in the auxiliary optimization space will adopt an ellipsoidal geometry in the original optimization space. We propose the following measure of proportionality: the value of λ k is determined such that the spherical and ellipsoidal trust-region representation remain isovolumetric.
Consider that the volume of an n-dimensional hypersphere with radius ∆ k , is equal to η(n) · ∆ n k , and that the volume of an n-dimensional ellipsoid with given half principle axis lengths, a 1 , . . . , a n , is equal to η(n) · i a i , where η n is a mutual scaling factor. We also have that by construction of A k , half the principle axis lengths can be written in function of the singular values as a i = λ k σ -1/2 i . An isovolumetric relation between the trust-region representations in both the original as the auxiliary coordinate space can be maintained by proper choice of the scaling factor λ k , that is The exact coordinate transformation of (23) can then be written explicitly as (27) x

Filtered Hessian approximation.
Several methods can be devised to provide an estimate of the Hessian, B k . Since our sample strategyensures that the local interpolation model is curved, we can calculate an approximation of the Hessian at the next iterate in the auxiliary coordinate space directly from the model, i.e. ∇ 2 u m k (u k+1 ). This estimate can be calculated back to the original coordinate space so to provide an estimate for the actual Hessian,B k+1 = A k ∇ 2 u m k (u k+1 )A k . In order to get a smooth approximation, a low-pass filter is proposed, parameterised by µ 2 . Note that if µ 2 = 1, this framework reduces to the original spherical trust-region framework.
Alternatively, the well-established BFGS-update can be used However we argue against the use of this strategy given the current framework. As a result of the method used to construct the surrogate model, large trust-regions can be anticipated which would detoriate the approximation principle of the BFGS update. Moreover, since the gradient calculated to compute y k are themselves an approximation of the real values, the resulting Hessian approximation becomes unreliable as confirmed by our numerical experiments.
6. Numerical results. In this section, we present numerical results of the presented derivative-free optimization algorithm applied on a set of unconstrained problems from the CUTEst test environment [24]. As indicated by the neighbourhood representation metric, ϑ(Z), the method should be especially suited for low dimensional problems and was developed from this point of view. We have applied the algorithm on a set of test functions with a maximum problem dimension n = 8. All tested algorithms were initialized from the traditional starting point [25]. The following benchmark procedure has been adopted and results are presented in Table 5.
The results as obtained with 4 different versions of the present algorithm have been compared to those obtained with its closest competitors in the context of derivative-free optimization, being ORBIT [3] and NMSMAX. Since ORBIT outperforms the NEWUOA algorithm (at least for small scale problems [3]) and given that no MATLAB implementation is available, we did not compare our algorithm with NEWUOA [6]. The details of ORBIT are elaborated in the associated paper and are essentially similar to the details of the presented algorithm when the spherical trust-region framework is used. That is apart from the novel regular simplex based sampling strategy. We used the implementation made available at [26]. The NMSMAX algorithm is a frequently practiced implementation of the Nelder-Mead simplex method [1] and is made available in the Matrix Computation Toolbox [27]. The NMSMAX algorithm was initialized with a regular simplex in accordance with the present algorithm.
Each algorithm was given a budget of 1000 function evaluations. Since every test function has a known global minimum, f * , the tolerance of having convergence of an algorithm was 10 −6 , i.e. f (x k )−f * < 10 −6 . The internal stopping conditions of each of the algorithms were set to 0 so that an algorithm only stops when this benchmark condition was satisfied or when it exceeded its evaluation budget. These internal stopping conditions include the gradient condition (8) and the minimal simplex size for NMSMAX. Only the total number of function evaluations and the difference between the final objective function value and the known minimum are recorded. We did not consider the CPU time as well as performance  and data profiles which are gaining currency in the optimization community [28,29]. Since the focus of the paper is primarily on the novelty of the sampling strategy and the ellipsoidal trust-region framework, we reason that the adopted approach was best suited to illustrate the elaborated methods. Each algorithm was applied using its standard settings to mimic realistic practical usage. The only parameter that was varied was the initial trust-region radius or in case of the NMS-MAX algorithm, the size of the initial simplex. This corresponds with an arbitrary uniform scaling of the original optimization space which is a standard practice when performing optimization routines. The standard parameter settings of the presented algorithm are given in Table 3, those of each of the competing algorithms can be found in the respective references.
We have considered 4 different versions of the present algorithm. Details about the numerical implemention are given in the next subsection. These 4 versions allow us to study the separate effects of the two presented mechanisms, namely the novel regular simplex based sampling strategy and the ellipsoidal trust-region mechanism. The different versions are defined in Table 4 and are related to the values of µ 1 and µ 2 from Algorithm 3 and (28) respectively. To study the effects of the regular simplex based sampling strategy apart from the ellipsoidal trust-region framework, version V1 can be compared with the ORBIT and NMS-MAX algorithm. Note that in this case the intepolation set management is purely input space oriented. In order to study the effect of the ellipsoidal trust-region framework version V2 can be compared with respect to V1. Recall that in this case the shape of the trust-region is adapted to the output behaviour of the objective function. In order to study the effect of the novel sampling strategy in combination with the ellipsoidal trust-region, versions V2 to V4 can be compared. Note that when parameter µ 2 (28) is equal to 1, this corresponds with a spherical trust-region framework. It is not possible to study the behaviour of the ellipsoidal trust-region framework in combination with the sampling strategy of ORBIT, given that this strategy does not assure that a better-than-linear model is available every iteration.
6.1. Numerical implementation. All described procedures were implemented in the MATLAB computational environment. Some of the subproblems were solved using available implemantations of the routine in MATLAB. These are documented next.
As mentioned in section 4, once an affine subset was determined using Algorithm 3, the available set of interpolation points was reconsidered so to enhance the modelling capabilities of the surrogate model. Additional points were added as long as such did not corrupt the wellpoisedness of (13) and if the total amount of interpolation points did not exceed the limit q M . A similar procedure is performed by ORBIT and was adopted from that source code.
The interpolation framework used here is that of Universal Kriging. We employed the DACE toolbox to obtain the modelling parameters as described in section 3. This framework requires the user to define lower and upper limits on te hyperparameter β, respectively β 1 and β 2 . More specifically we used Gaussian RBF's. Subproblem (2) was solved using the MATLAB function fminsearchcon, an extension of function fminsearch made available on Mathworks File exchange [30]. Since absolute CPU time was not considered, the solution procedure was executed until fully converged.
6.2. Discussion of results.
6.2.1. Benchmark problems. By assessing the results displayed in Table 5 one can observe that overall the proposed algorithm outperforms both NMSMAX and ORBIT, disregarding some coincidential draws.
Considering that V1 outperformed ORBIT in all but one occasion, we can conclude that the regular simplex based sampling strategy offers a significant advantage. One might however argue that the increased implementational and computational complexity of Algorithm 3 does not justify the advantageous optimization setting in case that less challenging optimization problems are considered. In none of the benchmark problems V1 outperformed V2, V3 or V4, indicating that engaging an ellipsoidal trust-region framework is clearly a beneficial endeavour regardlessly, given the low implementational cost. The downside is however that to obtain a curved model, a minimum of n + 2 point is required making the advanced regular simplex sampling strategy indispensable.
Whether V2, V3 or V4 outperformed one another depends on the objective function at hand. However overall we could state that V2 is the better candidate for lower dimensional problems whilst V3 is the better candidate for higher dimensional problems as will be confirmed in the next section. Also considering the convergence history of the different algorithms it was noted that the ORBIT algorithm occosionally stagnates on a local plateau of the objective function. Since this behaviour was also exhibited by V1 (e.g. GULF, TRID; although V1 recovered each time) this demonstrates that compensating for the output behaviour of the objective function through the input space oriented sampling strategy and by altering the shape of the trust-regions is quintessential to recover from such apparent plateaus.

Extended Powell Singular and Extended
Rosenbrock. In order to demonstrate and comment on the mechanics of the elaborated methods in greater detail, convergence re-sults on the 8-dimensional Extended Powell Singular and Extended Rosenbrock are presented in Figure 4 and discussed below. Consider the convergence history for the Extended Powell Singular function in Figure 4a. We will initially discuss the history of the algorithms V1, ORBIT and NMSMAX. All exhibit about the same convergence behaviour during the initial stage of the optimization (up to 100 function evaluations). After a 100 function evaluations the NMSMAX algorithm clearly starts to diverge from this trend. This is due to the fact that the NMSMAX algorithm is bounded to operate on its vertices. At the common point of about a 100 function evaluations, it reaches a region where the simplex size needs to shrink sufficiently so to attain a size that is small enough to accurately grasp the local curvature of the objective function. A similar phenomenon is observed for the ORBIT algorithm which starts to diverge from the global trend after about 200 function evaluations. The phenomenon occurs later for the ORBIT algorithm since it is not bounded to operate on the vertices of its standard simplex and the solution of (2) is only limited to the convex set, B(x k , ∆ k ). Therefore it can maintain a larger trust-region throughout the iterations than NMSMAX. That algorithm V1 can maintain a larger trust-region longer than both ORBIT and NMSMAX is a result of the improved geometry of the interpolation points. Apart from that, also algorithm V1 fails to reach the benchmark condition within the admissible budget. Now consider the behaviour of algorithm V2. The sampling strategy is equivalent to that of V1 but the modelling space is now normalized with respect to the estimated curvature of the objective function. This corresponds with an ellipsoidal trust-region geometry in the original optimization space. The effect is hardly observable during the initial stage but clearly demonstrates its usefulness after about 500 function evaluations where V2 diverges from V1 and successfully reaches the benchmark condition. It is argued that due to the transformation the modelling and optimization steps of the algorithm are now executed in a space where the objective function behaves more as if it was a quadratic function. The effect of the parameter µ 1 can be assessed by comparing V2, V3 and V4. The parameter determines how strictly the volume of an exact regular simplex has to be respected. This results in a larger amount of interpolation points being added in every iteration. Therefore algorithm V3 and V4 diverge from the common trend at about the same moment that V1 and ORBIT do. In this region of the optimization space, the size of the trust-region (simplex) starts to shrink for all algorithms. This means that less function evaluations can be recycled from one trust-region to another. Algorithms V3 and V4 set a stricter limit on the volume of the simplex and are therefore forced to add more points to complement the recycled set. As a results they consume more function evaluations than V1 and V2 to cross the same region. Nonetheless this increase in function evaluations benefits the overall modelling procedure and V2 is surpassed by V3 eventually.
These observations are confirmed by the convergence history for the Extended Rosenbrock function, see Figure 4b. Here the results are directly affected by the local behaviour of the objective function and all algorithms start to diverge at the starting point. In this case V1 does reach the benchmark condition, again illustrating the improved scattering of points with respect to state-of-the-art. That as a result of the regular simplex based sampling strategy. The beneficial effect of the ellipsoidal trust-region framework is again clearly present as V2 to V4 converge faster than V1. In conclusion one can observe that the same effect is present for V3 as was the case for the Extended Powell Singular function. Function evaluations are consumed at a higher rate in the initial stage. This phenomenon ultimately benefits the modelling procedure in the final optimization stage resulting in a steeper convergence history and even the surpassing of V2 with respect to convergence speed.

7.
Conclusion. This paper is on reducing the number of objective function evaluations in derivative-free numerical optimization with surrogate models. Radial basis functions are here considered as surrogate models with the interpolation framework provided by Universal Kriging. In each iteration of an interpolation based optimization algorithm, a trust region can be used wherein an optimization is performed using the surrogate model. The interpolation framework consists of using an interpolation set of which a subset of interpolation points needs to be well-poised so that the quality of the surrogate model is maximal. Interpolation set management aims at maximizing this range of model validity.
Currently, at least q = n + 1 interpolation points are considered to guarantee a valid surrogate model. We advanced on interpolation based optimization by relying on a subset of at least q = n + 2 interpolation points within the interpolation set management to improve the scattering of the interpolation points but more importantly to have surrogate models that exhibit curvature. For that purpose we defined fully linear surrogate models in Definition 1 that can be incorporated in the trust-region algorithm (Algorithm 2). We implemented RBF models using a flexible multivariate interpolation framework for which Universal Kriging is used (section 3). In Lemma 2 we provide for a given RBF model with linear tail, information on the boundedness of the tolerance of the RBF model and objective function as well on their gradients, i.e. constants κ f and κ g in Definition 1. This way we are able to predict whether an interpolation model classifies as being fully linear. Since we consider q = n + 2 interpolation points that represent a regular simplex subsection 4.3, we propose in Algorithm 3 and Algorithm 4 how the interpolation set management needs to be adapted, i.e. how the input points that need to be evaluated in the objective function need to be chosen.
By leveraging on the curvature of the surrogate model that is valid in the presented interpolation set management we introduced an ellipsoidal trust region framework that allows morphed trust-regions that adapt to the objective function geometry. This adaptation is implemented by means of curvature normalization with coordinate transformation (23) and filtered Hessian approximation (28). This way we allow to incorporate the output behavior of the objective function in the shape of the trust-region.
The proposed interpolation set management with proper input of interpolation points that follow a well-poised geometry together with the output behavior of the objective function that is incorporated in the ellipsoidal trust region, have shown their benefits with respect to the reduction of the number of objective function evaluations when performing elaborate numerical optimizations on various test functions with n ≤ 8. In order to show the impact of internal parameters of the algorithms, i.e. µ 1 in Algorithm 3 that is related to the input set and µ 2 in (28) that is related to the ellipsoidal trust region shape, results of 4 different versions of the presented algorithm are shown in Figure 4 and Table 5. Results clearly illustrate the impact of the used input interpolation points and the ellipsoidal trust region. Moreover, the convergence histories show faster convergence compared to algorithms incorporating at least n + 1 interpolation points and that use a spherical trust region.
Appendix A. Universal Kriging. The Universal Kriging (UK) framework obtains an interpolating approximation model for the function f : X ⊂ R n → f (X ) ⊂ R as the most likely unbiased linear predictor, m(x), based on a set of q interpolation points Z = {z 1 , . . . , z q }, z i ∈ X , and, a set of q function evaluations, F = {f (z 1 ), . . . , f (z q )}. The framework presumes that the modelled function f can be decomposed as (30) f The function p(x) is assumed to be an element of a linear function space, A. In the UK framework one is interested in the specific case where A is the n-variate polynomial space, P n d , of at most degree d spanned by the arbitrary basis,P n d = {π 1 (x), . . . , πd(x)}. We further introduce the vector, π(x), which entries are determined by π i (x), so that we can write, p(x) = π(x) β, where β ∈ Rd is a unique coefficient vector. It is further assumed that w(x) is an unbiased random process and that the output values w(x) and w(z) are correlated. Their correlation is modelled by a function from the class of radial basis functions, i.e. corr(w(x), w(z)) = φ( x − z ). One looks for an unbiased linear predictor, m(x) = η(x) f , where he elements of F determine the entries of f .
The model decomposition can be applied to this vector f so that one obtains f = Πβ + w. Here the entries of w are defined as w(x i ) = f (x i ) − p(x i ) and the matrix Π ∈ R q×d as Π ij = π j (x i ). Substitution in the linear predictor yields that m(x) = η(x) Πβ +η(x) w. Hence, we can define an error function between the linear predictor and the true function as We can now impose two additional properties on the linear predictor that affect the error function ∆ and that will render the predictor unique. Firstly, the linear predictor should be unbiased, which can be expressed as E[∆] ≡ 0. It follows that Π η = π and consequently that ∆ = η w − w. Secondly, we wish to determine η so that the expected squared error is minimized. Given that E[∆] = 0, the expected squared error can be defined as E[∆ 2 ] and after substitution of the error function we obtain E[(w − η w) 2 ]. Evaluation of the expectation operator determines that the error is proportional to the functional [η] = η Φη −2η φ+1, where matrix Φ ∈ R q×q is defined as Φ ij = φ( z i − z j ) and the entries of φ(x) are determined by φ( x − z i ). The most likely unbiased linear predictor η is thus the solution of the equality constrained variational problem (31) min Expression of the first order optimality condition yields the system of linear equations given in (15).
According to the probabilistic interpretation of the random process w(x), the function values in f can be associated to the outcome of a stochastic experiment. Consequently it is possible to associate a probability value to the outcome of this experiment. If we make the additional assumption that the random process w(x) itself behaves as a Gaussian with expectation µ = Πβ and covariance σ, the probability of the experiment is given by A maximum log likelihood estimation for the parameter σ 2 results in the following esti- . Now, assuming that the correlation model corr(x, z) still depends on the additional hyperparameter, say θ (e.g. φ( x − z ; θ)). The optimal parameter θ * can then also be determined by maximizing the likelihood Combination of the equations in (36) yields a bound on the derivative error function in function of the interpolation points contained in Z Since we already had established a relation between both error functions in (35), the bound above allows one to determine a bound on e f (x) as well, that is given by Now introducing the scaled interpolation matrix ∆ Ẑ = Z , we have established taylor-like bounds on both the error value functions that are related to the inverse value of Ẑ . These bounds are in fact solely determined by the interpolation set geometry.
Appendix C. Some simplex related defenitions. Consider a set of n + 1 points Z = {z 0 , ..., z n } associated to a set of coordinates in an n dimensional space. Assume that these points are affinely independent so that also the column vectors {z 1 − z 0 , ..., z n − z 0 } are linearly independent. A simplex is therefore the n-dimensional generalization of a triangle. In simplex terminology one has that each point from Z determines a vertice, each couple of vertices determines an edge and each subset of n vertices determines a face. A n-simplex thus has n + 1 vertices, n(n+1) 2 edges and n + 1 faces. Conveniently it follows that each face of an n + 1-simplex constitutes an n-simplex itself.
According to the general convention, when speaking of the n-simplex that is determined by Z one reffers to the convex set determined by S Now as it may be inconsistent with this general convention, in this work we refer to the set of vertices Z when speaking of the simplex. However, when we speak of the hypervolume of that simplex, we do in fact refer to the hypervolume of the convex set, S.
The volume V of the simplex determined by Z is defined as the determinant of the associated column vector matrix [Z] normalized by the factor n! here the operator [·] is defined as [Z] = z 1 − z 0 · · · z n − z 0 . In the following subsection we mention some relevant simplex geometries and properties.
C.1. Orthocentric simplex. A simplex for which the following holds are called orthocentric simplices: each edge is perpendicular to all the edges it does not meet [32]. Orthocentric simplices possess a unique orthocentre. In the special cases where n ≤ 2, we have that every set of points that determine a simplex are orthocentric as well.
In [22], it is shown that an orthocentric simplex maximizes the volume satisfying a predetermined orthocentre, o, and, distances from the simplex vertices to the orthocentre d i = z i − o . These conditions determine the simplex, Z, within an isometry. It is this very property that we exploit to maximize the volume of the union of an existing set and an expansion set generated by the procedure Simexpand, Algorithm 4. The expansion set is constructed exactly so that all of its edges are perpendicular to the edges determined by the existing set. Given the freedom of the edge lengths, beit that the union of exesting and expansion set is restricted to an n-sphere, these are determined so to maximize the volume.
C.2. Standard simplex. A standard n-simplex is obtained when n of its edges are orthogonal and have the same length, l. A convenient coordinate representation of a standard n-simplex is given by the origin, 0, and the scaled coordinates of the n basis vectors, le i . It is easily verified that the n(n−1) 2 non-orthogonal edges have length √ 2l. We also note that the standard simplex is not an orthocentric simplex.
C.3. Regular simplex. A regular n-simplex with side length l is obtained when all edges have the same length. From this definition it follows immediately that any subset of m + 1 points from that simplex determines a regular m-simplex.
If we consider all simplices Z which vertices are contained within an n-hypersphere with radius ∆, it can be shown that if a simplex's volume is maximized it is an element from the class of regular simplices [32]. Moreover it can be shown that the regular simplex is an orthocentric simplex.
It is worth mentioning that a regular n-simplex is embedded in a standard n + 1-simplex as the face determined by the n(n+1) 2 non-orthogonal edges of the standard n + 1-simplex. The edge length of a regular n-simplex that is embedded in a standard n + 1-simplex with orthogonal edge length, l, is √ 2l in accordance with the non-orthogonal edge length of a standard simplex with edge lenght l as determined above.
C.3.1. Volume of the regular n-simplex. Let V s n (l) be the hypervolume occupied by the standard n-simplex with orthogonal edge length l. Following the definition given in (41), it is easily verified that V s n+1 (l) = 1 (n+1)! l n+1 . Now consider that the face determined by all the non-orthogonal edges of the standard n + 1-simplex constitutes a regular n-simplex. The distance from the intersection of the orthogonal edges, i.e. origin 0 according the coordinate representation in C.2, to the face that determines the regular n-simplex is given by √ n+1 n+1 l. Since the hypervolume of a standard n-simplex can also be determined using the definition of the hyperpyramid, i.e. by multiplying the hypervolume of one of its faces with the corresponding altitude divided by n, the hypervolume of the regular n-simplex V r n (l) can be determined. Recall that the edge length of the regular n-simplex determined by the non-orthogonal edges is √ 2l and that therefore the result must be scaled by a factor √ 2 n .
(42) 1 n + 1 · √ n + 1 n + 1 l · V r n ( √ 2l) = V s n+1 (l) ⇒ V r n (l) = √ n + 1 n! l n √ 2 n C.3.2. Radius of the circumscribed n-sphere. The radius, r n (l), of the n-sphere circumscribing a regular n-simplex with side length l can be determined reconsidering the geometric relations between a standard n + 1-simplex and a regular n-simplex described to determine the hypervolume. In a regular n-simplex the distance between the unique orthocentre and a vertice is equal to the radius of the circumscribed n-sphere. It holds that the altitude of the face determined by the non-orthogonal edges and one of the orthogonal edges of the standard n + 1-simplex constitute a right triangle with the orthocentre of the associated regular n-simplex and a common vertice. Note that again the edge length of this n-simplex is √ 2l and that therefore the result must be scaled with by a factor √ 2.
(43) l 2 = r n ( √ 2l) 2 + 1 n + 1 l 2 ⇒ r n (l) = √ n √ n + 1 l √ 2 The side length, l n (∆), of the regular n-simplex inscribed in a n-sphere with radius ∆ is hence given by l n (∆) = √ 2 √ n+1 √ n ∆ C.3.3. Other usefull relations. From these results we can derive some other equalities that are of interest for the procedure Simexpand described by Algorithm 4.
The radius of the circumscribed m-sphere of a regular m-simplex that is part of a regular n-simplex circumscribed by an n-sphere with radius ∆ is given by (44) r n m (∆) = r m (l n (∆)) = m(n+1) (m+1)n ∆ The distance, d n m (∆), from the orthocentre of the regular n-simplex to a subset of m + 1 vertices determining such a regular m-simplex, can hence be determined considering that (45) ∆ 2 = d n m (∆) 2 + r n m (∆) 2 ⇒ d n m (∆) = n−m (m+1)n ∆ This relation can be generalised. Consider therefore a regular m-simplex embedded in a n-dimensional space and inscribed in an n-sphere with radius ∆. Its edge length must then be l m (∆). Now if one would translates this m-simplex along any direction perpundicalar to the m-dimensional subspace spanned by its edges over a distance σ∆, this m-simplex remains on the surface of the n-sphere as long as it is scaled by a factor √ 1 − σ 2 ∆. 838 100 2,51e-07 9,56e-08 9,40e-07 1,69e-07 1,80e-03 3,71e-07