An Analysis of the RBF Hyperparameter Impact on Surrogate-Assisted Evolutionary Optimization

. Computationally expensive optimization problems are often solved using surrogates and a common variant is the radial basis functions (RBF) model. It aggregates several basis functions which all depend on a hyperparameter afecting their individual outputs and consequentially the overall surrogate prediction. However, the optimal value of the hyperparameter is typically unknown and should therefore be calibrated. Tis raises the question how does the hyperparameter afect the overall optimization search efectiveness (and not just the stand-alone surrogate accuracy) and to what extent is such a calibration benefcial, which is an important consideration both for end-users and algorithm researchers alike. To rigorously address this issue this paper presents an analysis based on an extensive set of numerical experiments with an RBF surrogate-assisted evolutionary algorithm. It follows that the hyperparameter strongly afected performance and that the extent of its impact varied depending on the basis function, objective function modality, and problem dimension. Overall, calibration of the hyperparameter was typically highly benefcial to the search performance while dynamically optimizing the hyperparameter during the search yielded additional performance gains.


Introduction
Industrial optimization problems often involve functions whose evaluation is computationally expensive, for example, in engineering design optimization candidate products could be evaluated by a computer simulation (instead of real-world experiments) to improve the design process efciency. Tis simulation-driven setup introduces several optimization challenges: (i) the simulation acts as a "black-box function", namely, it assigns an output value to each input vector (candidate product) through involved numerical procedures or proprietary codes but there is no analytic expression for this functional relation thereby necessitating derivative-free optimization methods, (ii) each simulation run is typically computationally intensive so only a small number of vectors can be evaluated, and (iii) such "black-box functions" often have complicated features (such as multimodality) which exacerbate the search difculty.
A framework which has shown to be efective in such problems combines an evolutionary algorithm (EA) with a surrogate model (also termed in the literature as a "metamodel") yielding the surrogate-assisted EA (SAEA). Te evolutionary algorithm is a heuristic search algorithm which seeks an optimum by using operators inspired by the process of evolution and applies them to a population of candidate solutions, while the surrogate model approximates the true expensive black-box function and provides estimated function values at a much lower computational cost when compared to the true expensive function. A common surrogate variant is the radial basis functions (RBFs) model which uses a set of functions whose values are linearly aggregated. RBFs rely on a hyperparameter (HP) which afects their individual outputs and consequentially the surrogate prediction and the overall SAEA performance. Te optimal HP value is typically unknown, and therefore it needs to be calibrated, which raises the following question: how signifcant is the impact of the HP on the search and does the numerical efort spent on such calibration yield signifcant performance gains?. Te impact of the HP has been examined in the literature with respect to the stand-alone surrogate prediction accuracy but hardly so in the context of the overall surrogate-assisted search, which is a signifcantly more involved relation [1]. To address these open questions this study presents an extensive set of numerical experiments and analysis which examine how the HP afects the overall SAEA performance both when a fxed HP is selected a-priori and when it is dynamically calibrated during the search. Based on the large volume of results, the analysis clarifes in which instance does the HP strongly afect the SAEA performance. Te remainder of this paper is organized as follows: Section 2 surveys the relevant literature, Section 3 explains the methods used, Section 4 presents the numerical experiments and their analysis, and lastly Section 5 concludes the paper.

Literature Survey
SAEAs have shown their efectiveness across a variety of problems ranging from drug formulation to aircraft design [2,3]. Te success of such algorithms stems from the combination of the EA, which is both derivative-free and capable of handling functions with complicated features, with the surrogate which provides approximate function values efciently. In this setup the EA searches for an optimum of the surrogate while the latter is updated during the search. A basic SAEA is presented in Algorithm 1.
Various SAEA implementations have been explored in the literature, for example in [4], a multifdelity setup was proposed where several simulations of diferent accuracy were combined for design optimization of hydraulic turbines. In [5] the authors used an RBF-based SAEA for design optimization of a mechanical controller, while in [6] a multiobjective SAEA based on a Kriging surrogate was proposed. In [7] the authors used a SAEA to optimize the layout of a wind turbine farm with the goal of maximizing energy production while accounting for wind regimes and turbine wakes. More intricate frameworks include [8] which combined an EA with machine-learning techniques to handle vectors in which the simulation fails and [9,10] in which ensembles of multiple surrogates were used concurrently to improve the prediction accuracy. Elaborate surrogate updating procedures were studied in [11,12], while a dynamic in-search selection of the surrogate type was studied in [9,13]. Recently in [14] the authors proposed a dual surrogate-assisted algorithm which uses a global surrogate and a group of local surrogates for approximation and a dual-population cooperative particle swarm algorithm to explore the surrogates to better handle highly multimodal functions. Other studies have proposed ensembles of surrogates [15] and more recently in [16] the authors proposed using an ensemble of local surrogates and selecting the subset that provides the most accurate approximation. Also related is [17] in which the authors proposed a surrogate-assisted particle swarm algorithm in which the sampled points are split into multiple subsets and a diferent surrogate is trained based on each sample. Te surrogates are then managed to improve the global and local search capabilities of the algorithm. Also using multiple surrogates, in [18] the authors proposed a surrogate-assisted algorithm which used both a Kriging surrogate for localized modelling and a global quadratic model to approximate the overall function trend. Te algorithm also uses two optimization algorithms ("grey wolf" and a multistart search) to identify good solutions from each surrogate.
Various studies have also examined the impact of the HP on the stand-alone surrogate. In [19] a Kriging surrogate was used along with several HP calibration methods to achieve diferent balances between global and local prediction accuracies. In [20], the authors proposed estimating the optimal Kriging HP by using a multistart gradient search on the cross-validation error function, while in [21], calibrating the HP was found to afect the accuracy of various surrogates. In [22], the authors found that the size of the initial sample afected the calibration of the HP. However, the mentioned studies examined the impact of the HP only on the prediction accuracy of the stand-alone surrogate while this study analyzes the impact on the overall SAEA performance.

Te RBF Surrogate.
Te RBF surrogate used in this study is a linear combination (superposition) of basis functions (BFs), namely, where m( x → ) is the surrogate prediction, while x → i , i � 1, . . . , n, and ϕ i ( x → ) are the sampled vectors and basis functions, respectively. Te scalar weights α i are determined from the Lagrange interpolation conditions Two widely used BFs are the Gaussian and multiquadric (MQ) [23] which are defned as where σ is the HP which determines the BF shape and consequentially the overall surrogate prediction m( x → ). Tis relation is visualized in Figure 1.
Given its impact, various studies have examined how to calibrate the HP based on the stand-alone surrogate accuracy. In [24], the authors used an empirical fxed value derived from the mean distance between the sampled vectors but without accounting for the observed function values. In [25], the HP was calibrated by minimizing the root mean square error (RMSE) of the RBF prediction by using a gradient descent. A related leave-one-out cross-validation procedure was proposed in [26][27][28][29] with various modifcations explored in [30][31][32][33]. In [34] the HP was calibrated based on a least-squares ft procedure, while in [35], MARS interpolating polynomials were used. More recently, [36] proposed using an EA for the HP calibration. In the numerical experiments performed in this study, both a fxed and a dynamically optimized HP were used as explained in Section 4.

Te SAEA Algorithm.
A representative SAEA was implemented for the numerical experiments. It begins by sampling an initial random population and training the RBF surrogate. Te main loop then begins in which a realcoded EA seeks an optimum of the surrogate. Every g generations the best q percent of the population members are evaluated with the true objective function and are added to the cache of sampled vectors. Te surrogate is retrained based on the updated cache and the EA population is reevaluated with the new surrogate. Following [37] the settings g � 4 and q � 20% were used. Te loop repeats until either the true (expensive) function evaluations reach 50 d (where d is the function dimension), or 1000 EA generations have occurred. Algorithm 2 provides the complete SAEA workfow.
For the real-coded EA a representative variant with a population size of 20 was used which operates as follows: (i) vectors are ranked by ftness, (ii) parents are selected by stochastic universal selection (SUS) and recombined with a Input: initial sample size, max evaluations, EA parameters; Generate an initial set of vectors and evaluate with the true function; Repeat Train a surrogate model by using the vectors evaluated so far; Search for an optimum of the surrogate by using an EA; Evaluate the predicted solution (and possibly additional vectors) with the true function; until max evaluations; Output: the best solution found; ALGORITHM 1: A basic SAEA.

Fixed HP.
A frst set of numerical experiments was performed to analyze if and to what extent diferent fxed HP values, namely, which are chosen a-priori, afect the overall SAEA performance. Following the Design-of-Experiments (DOE) framework [38] the numerical experiments were formulated by using a single variable, namely the HP, which was assigned diferent fxed values, and 3 factors (function type, function dimension, and BF type). Accordingly, in each combination of factors the numerical experiments were repeated with the diferent HP values so that the HP impact could be analyzed across a wide range of problem settings. Te DOE setup is summarized in Table 1.
Te established set of test functions of [39] was used as it includes functions with diverse features as shown in Figure 2.
In each test case (namely, a combination of factors), the SAEA was ran with four diferent HP settings and with 30 runs per setting to obtain an adequate sample, thereby yielding a total of 2400 runs (5 functions × 2 dimensions × 2 BF× 4 HP settings × 30 repeats).
Tables 2-5 present the resultant statistics (mean, median, standard deviation, min, and max) of the fnal function values for each batch of 30 runs across the diferent HP values and test cases. Also provided is the p-value of the Mann-Whitney statistical signifcance test. It was obtained by comparing the results corresponding to the HP associated with the best mean to the results associated with the 3 remaining HP settings. Statistical signifcance was measured at the 0.05 level, namely, for p ≤ 0.05. As such in each of the 20 combinations of factors (function-dimension-BF), the number of signifcant performance diferences observed (denoted s) varied between 0 (no signifcant diferences observed) which implies that the HP had no signifcant impact to 3 (all comparisons had signifcant diferences) which implies that the HP had a strong impact.
Te corresponding results are summarized in Table 6 from which it follows that in 1 combination of factors the HP had no signifcant impact, in 3 cases it had only a minor impact, and in the remaining 16 cases it had a signifcant impact. Te cases showing a lessened HP impact are lowdimensional and involve the Gaussian basis function. Tese results are attributed to the relative simplicity of the lowerdimensional cases which allowed for a good approximation across a range of HP values. Also, the output of a Gaussian BF surrogate approaches the sample mean faster than a MQ surrogate due to the Gaussian BF behaviour and therefore the RBF output was similar for diferent HP values which results in fewer performance diferences, as observed. In contrast, with the MQ basis function and with the highdimensional cases the HP consistently had a signifcant impact. Tis is attributed to the MQ basis function, whose output varies for a wider range of HP values. Also, in highdimensional cases, generating an accurate surrogate is more challenging hence, diferent HP values resulted in signifcantly diferent prediction accuracies. Tese aspects are further addressed in length in the analysis which follows.
Expanding on the preceding analysis an additional fnegrained analysis was also performed to highlight how each factor (BF, modality, dimension) afected performance and the results are summarized in Table 7. It follows that the perfactor infuence is as follows: For a Gaussian BF in 4 cases varying the HP had a negligible impact while the MQ BF variations were always signifcant, namely, the sensitivity to the HP was lower for a Gaussian BF than it was for a MQ one. Tis is attributed to the mathematical formulation of each BF, namely, for a Gaussian BF the expression decays to 0 rapidly for a wide range of HP values which results in similar surrogate predictions for diferent HP values and consequentially a lower HP impact.
In contrast, with a MQ BF, Input: initial sample size, max evaluations, EA parameters, g and s parameters; generate and sample an initial random population; repeat train an RBF surrogate based on the evaluated vectors; run one generation of the EA; for each g th generation do evaluate with the best q% vectors with the true function; retrain the surrogate and re-evaluate the population; end until max function evaluations or max EA generations; output: the best solution found; ALGORITHM 2: Te implemented RBF-based SAEA.
the output varies over a wider range of HP values which in turn afects the surrogate prediction and results in a more signifcant HP impact. As an example Figure 3(a) where k is a prescribed parameter which determines the function modality (higher values increase modality). Te surrogate prediction accuracy was examined for a 5D function      Gaussian   0  Ack-05  1 Gri-05, Ras-05, Gri-10 2 Ack-10, Ras-10 3 Dix

Scientifc Programming
and results are presented in Figure 3(b). It follows that the surrogate prediction RMSE increased with modality for all HP values as observed in the numerical experiments. Consequentially the impact of the HP was more pronounced for the lower modality settings and diminished as the modality was increased, as observed in the numerical experiments.

Dimension.
In the lower dimensional cases (d � 5), the HP had a signifcant impact in 7 cases, while in the higher dimensional cases (d � 10) in 9 cases. Tese results are attributed to the "curse of dimensionality," namely in the lower dimensional problems a more accurate surrogate could be trained since the problems are simpler and hence    Figure 3(c) shows the variation of the prediction RMSE for diferent dimensions with the Rosenbrock function. It follows that for lower dimensions changing the HP afected the RMSE much less than for higher dimensions, as observed in the experiments.
Beyond the above per-factor analysis, an additional evaluation was performed to examine if and to what extent the impact of the HP varied depending on the main algorithm parameter of the limit of true (expensive) function evaluations. Accordingly, beyond the baseline 50 d evaluations limit the full set of numerical experiments was repeated also for a limit of 100 d and 200 d. Table 8 summarizes the obtained results for the number of cases in which the HP had a signifcant impact. It follows that the impact was consistent across all sets and that the number of cases with a statistically signifcant performance diference and the test cases in which they occurred remained largely unchanged across the three settings. Accordingly, the remainder of the analysis is presented for the 50 d setting.
Overall, the analysis shows that except for specifc cases (Gaussian BF/low modality functions/low dimensions), the HP had a signifcant impact on the overall SAEA performance.

Dynamically Optimized HP.
Te preceding analysis examined cases where the HP value was unchanged. Tis raises the following question: would the overall SAEA performance vary if the HP is dynamically calibrated during the search?. For a rigorous evaluation an additional set of experiments was performed with a modifed algorithm in which the HP was dynamically calibrated based on a cross-validation procedure with an 80-20 training-testing split ratio. Te HP optimization was achieved by a golden search algorithm in the interval [10 − 4 , 1]. Tests were conducted as in the preceding section and results are presented in Table 9 and compared to those from the best-performing variant (mean-wise) from the preceding section. It follows that optimizing the HP consistently improved performance as evident both from the mean and p-values where the variant with the optimized HP had a statistically signifcant performance advantage at 0.05 level (p-value < 0.05) in 8 out of 10 cases (all except the Ackley-5D and Rosenbrock-5D functions).
Furthermore, the prediction accuracies of the optimized and fxed HP variants were compared and the results are presented in Figure 4. It follows that the optimized HP variant had signifcantly smaller prediction errors and consequently achieved better fnal results. Tis trend was observed across diferent test cases, as shown. Overall, the analysis shows that dynamically optimizing the HP consistently improved the SAEA performance.

Conclusion
Surrogate-assisted evolutionary algorithms (SAEA) are often used to solve simulation-driven optimization problems and a common surrogate is the radial basis functions (RBF) model. Te latter relies on a hyperparameter (HP) which afects both the individual basis functions and the overall surrogate prediction. Te HP needs to be set by some numerical procedure and this raises the question how signifcant is its impact on the overall SAEA performance and not only on the stand-alone surrogate, namely, is the computational efort spent on the HP calibration justifed in terms of the overall SAEA performance. Tis is an important consideration both for end-users and algorithm researches.  To address this issue, this paper has presented an extensive set of experiments involving a representative RBF-based SAEA. Analysis of the large volume of results shows that a fxed HP typically had a signifcant impact on performance except for specifc cases (Gaussian basis function/low dimension/high modality). Analysis also shows that dynamically optimizing the HP during the search signifcantly improved performance. Tese results indicate that a careful a-priori selection of the HP can signifcantly improve the SAEA efectiveness while additional gains could be achieved with dynamic HP calibration.

Data Availability
Log fles of the numerical experiments are available from the corresponding author upon request.

Conflicts of Interest
Te author declares that there are no conficts of interest.