Validation and Calibration of an Agent-Based Model: A Surrogate Approach

. Agent-based modelling has been proved to be extremely useful for learning about real world societies through the analysis of simulations. Recent agent-based models usually contain a large number of parameters that capture the interactions among microheterogeneous subjects and the multistructure of the complex system. However, this can result in the “curse of dimensionality” phenomenon and decrease the robustness of the model’s output. Hence, it is still a great challenge to efficiently calibrate agent-based models to actual data. In this paper, we present a surrogate analysis method for calibration by combining supervised machine-learning and intelligent iterative sampling. Without any prior assumptions regarding the distribution of the parameter space, the proposed method can learn a surrogate model as the approximation of the original system with a relatively small number of training points, which will serve the needs of further sensitivity analysis and parameter calibration research. We take the heterogeneous asset pricing model as an example to evaluate the model’s performance using actual Chinese stock market data. The results demonstrate the good capabilities of the surrogate model at modelling the observed reality, as well as the remarkable reduction of the computational time for validating the agent-based model.


Introduction
Agent-based models (ABMs) are favoured by researchers when explaining the emergence of complex systems [1,2].e explanatory power of the existing ABMs mainly comes from exploring the market mechanism by describing heterogeneous agents' behavioural activities and their interactions, which are widely used in economics, demography, and ecology [3][4][5].Since an ABM can reveal the dynamics of complex systems using highly exible, natural, descriptive ways, many scholars regard it as "one of the most important methods of complex scienti c methodology" [6], and some even deem it to be "a revolutionary development for social science" [7].
However, ABMs are criticized for their lack of objective veri cation criteria, which harm the number and persistence of related studies [8][9][10].Some researchers doubt that ABMs can obtain any desirable results in subjective settings, claim that its practical applicability is exaggerated, and believe that the modellers are biased and not objective in the modelling process in order to obtain speci c results [11,12].
Due to the complexity of real systems, ABMs usually contain a large number of parameters that need to be calibrated.As the parameter spaces geometrically expand as the number of parameters increase, it results in another challenge in the use of ABMs, which is referred to as the "dimensional disaster" [13].It has fairly high hardware requirements and computational costs when searching meaningful parameter combinations, since the parameter space of ABMs cannot be exhausted, which is usually computationally prohibitive for researchers.
Whether an ABM is a good approximation of the original system depends on the veri cation of the results, which is accomplished by testing the consistency of the statistical characteristics of the ABM's output with respect to real data.In the high-dimensional parameter space, any estimator converges slowly to the true value of the smoothing function, resulting in the local critical point being mistaken for the global maximum or minimum [14].erefore, how to e ectively nd the parameter space of sensitive parameters and calibrate it has become one of the key problems of AB modelling.e existing ways of dealing with this issue can be mainly divided into three categories: the indirect calibration method, the Werker-Brenner method, and the historical data method.Here, the historical data method is more prevalent due to its excellent t and easy veri cation [15].e historical data method is implemented by dividing the collected data into a modelling set and a veri cation set to evaluate the model and verify results, respectively.Gilli and Winker [16] present a continuous global optimization heuristic for estimating the ABM of the foreign exchange market.Khashanah and Alsulaiman [17] develop a multisubject meta-model to capture the complexity of stock markets and calibrate the model using a scatter search heuristic approach.Franke and Westerho [18] present an improved structural stochastic volatility model for parameter calibration, but it is considered to be a relative simple model that contains only few parameters.Recchioni et al. [19] propose a calibration method that uses a simple gradient-based algorithm and evaluates the performance based on the out-of-sample prediction errors.Similar research can be found in Fievet and Sornette [20] and Amilon [21].
Recently, the surrogate analysis approach has been used increasingly more in the analysis of ABMs [22,23].e main idea of this approach is to generate a surrogate model using a certain learning algorithm as the approximation of the original agent-based model.
e surrogate model can reduce the dimensionality of the original model parameter vector and greatly simplify the form while maintaining the dynamic characteristics of the original system.
e key to surrogate model analysis depends on the chosen learning algorithm.In previous research, the main approach was the Kriging linear interpolation method.is approach estimates the ABM output for the parameter space using the ABM evaluation of limited samples, and then generates the best unbiased linear prediction factor by investigating the real variation map or real spatial correlation of the data.Under the condition that the data obey a uniform distribution, Kriging interpolation only needs 30 data points to approximate the spatial structure, which makes it a very e cient technique.However, for most of the complex systems, the distribution of the data is unknown.In this regard, the Kriging method relies on expert knowledge of the variogram to estimate the spatial dependence of the points, which demands a fairly large simulation data set and signi cantly higher computational costs.
In this paper, we present a new approach for ABM validation and calibration based on the surrogate model.By combining machine learning and the intelligent sampling technique, the method can learn the approximate surrogate model of the original ABM model at relatively low time costs.
e main advantage of this method is that it can search the ABM parameter space using fewer computing resources and e ciently nd the response surface of the model under fewer constraints.In particular, it does not need to make any prior assumptions about the distribution of the parameter space.

Surrogate Model
It is crucial to choose an appropriate learning algorithm for surrogate analysis.In this section, we rst de ne the relevant concepts of the ABM calibration.en, we will discuss the details of the CatBoost machine learning algorithm and use it in our work.Finally, the complete procedure of generating a surrogate model based on CatBoost is presented.We should point out that our work is an improvement based on the research work of Lamperti et al. [22], which has combined the xgboost algorithm with intelligent sampling method to generate a fast learning surrogate model for ABMS validation.In our research, we use the newly developed machine learning technique for generating the surrogate model and look forward to obtain some new ndings.

Related Concepts.
Whether the ABM outputs are consistent with the real data depends on the "calibration measure".e ABM outputs can be divided into two types: binary outputs and real value outputs.In the binary case, the calibration measure can take only two values: 1 and 0. A value of 1 means that the statistical characteristics of the output are consistent with the real data, and it is 0 otherwise.For instance, if we test whether the output data that are generated by the ABM have the same fat-tailed characteristic as the real data, and if it does, the calibration measurement takes the value of 1; otherwise it is 0. In the real case, the statistical characteristics of the ABM output data are quantitatively calculated, and the calculation result is used as the calibration measurement.For example, we can assess the kurtosis or the tail index of the output data.We expect to nd the parameter vectors whose calibration measurements meet certain speci c conditions, and these conditions are called "calibration criteria".For example, if the modeller wants to test whether the output data have a nonnormal distribution with negative skewness and leptokurtosis, he can use both the negative skewness and leptokurtosis as calibration criteria.For the real output case, the minimization of the loss function can be used as the calibration criterion.
A parameter vector ∈ with a response that accords with the calibration criteria is characterized as "positive calibration" and labelled as positive; otherwise, when the opposite occurs, it receives a negative label.We expect to nd the maximum number of positive labelled points in the parameter space and use them in learning to generate the surrogate model.
It should be noted that positive calibrated parameter vectors may be located in multiple discontinuous regions over the entire parameter space rather than a smooth topology.e approach we proposed avoids making any prior assumptions on the response surface of the output, which makes it universal for most of the real applications.

CatBoost.
CatBoost is a supervised machine learning algorithm that executes a process called boosting to classify categorical data.CatBoost is at its core a decision tree boosting algorithm.Boosting refers to the integrated learning method that sequentially establishes a large number of models.In classical GBDT, it is based on using the same data set to obtain the gradient value of the loss function for the current model in each iteration.However, this will cause the model to su er from over-tting due to pointwise gradient estimation bias.CatBoost uses the ordered boosting method to modify the gradient estimation in the classical algorithm, and it then obtains the unbiased estimation of the gradient to reduce the in uence of the gradient estimation bias.In this way, the generalization ability of the model is improved.e algorithm ows of GBDT and ordered boosting are illustrated as shown in Algorithms 1 and 2.
Here, is the model that is built using the rst trees, and g , is the gradient value of the -th training sample.To obtain the unbiased estimation of the gradient value g , with respect to the model , the training of should not contain observation .If we extend it to the entire process, no points should be used to train .To deal with this seemingly unsolvable problem, we consider the following trick.We train a separate model for each observation without using any -containing samples, and all s have identical tree structures.en, we calculate the gradient for and score the resulting tree.We use the 1 score and root mean square error MSE as the loss functions Loss , for the binary and real- value cases, respectively.More details will be discussed later.

Step-by-Step Implementation.
e procedure that we designed for generating the surrogate model is illustrated in Figure 1.ree initial settings should be determined before running the program.
(1) Select the Surrogate Algorithm.e modeller must select a learning algorithm to perform as the surrogate for the original agent-based model.We choose CatBoost as our surrogate algorithm for two reasons: it has a remarkably high computational e ciency, and it does not require too many assumptions about the parameters space.
(2) Select the Fast Sampling Procedure.e modeller must determine the sampling method to draw samples from the parameter space for the training of the surrogate model.
(3) Select the Performance Measure of the Surrogate Model.e modeller must give a quantitative measurement for assessing the performance of the surrogate.e surrogate program is implemented step-by-step as follows.
Step 1. Construct a relatively large pool of parameter combinations as a substitute set for the parameter space using a certain sampling routine.To ensure that the parameter pool covers all possible regions of the parameter space without knowing the topology, we use the quasi-random sobol sampling, which is designed to ll the sample space, even at small quantities, and is e cient to implement.
Step 2. Randomly draw a small subset from the parameter pool and run the AB model.Each parameter vector is identi ed as positive or negative according to the calibration measurement and calibration criterion.
e surrogate model is generated by using the labelled points to learn with the CatBoost algorithm.is model is the integration of simple decision trees, which can provide better prediction performance compared to other learning algorithms.
Step 4. Predict and label all of the parameter combinations in the pool according to the results of the surrogate model.
Step 5. Draw a small subset of unlabelled points in step 2 and run the ABM.e points are labelled and added to the training set to construct a new subset of training samples.e newly added parameter vectors are randomly selected from the positive labelled parameter combinations that are predicted by the surrogate model.In this way, the algorithm will gradually increase the included "true" positive labels and exclude the "false" positive labels.If there is no positive point in the present round, an uncertain sampling method is used to add new data points to the parameter pool.e uncertain sampling method increases the sampling frequency of the parameter space that the surrogate model has di cultly correctly predicting based on the entropy of the existing label distribution.
Repeat Steps 3-5 until the budget is reached.
e proposed method can intelligently pick the meaningful parameter combination points in multiple rounds of sampling process, which continuously improves the sampling performance and the calibration accuracy at relatively low computational costs.Compared to other iterative Monte Carlo sampling methods, the advantages of our approach are mainly as follows.First, it does not need to make any assumptions about the parameter distribution.Second, it does not require a prior assumption regarding the approximate distribution of the model's response.ird, the approach does not require that the points satisfy the Markov chain distribution.

Model Evaluation.
We have to evaluate the surrogate model's performance once it is generated.In the case of a binary  1: e surrogate model generation procedure.
where 1 ∈ (0, 1).e larger that 1 is, the better the surrogate model works.When moved to the real-value setting, we use the mean square error as the loss function: where ̂ is the predicted value of the surrogate, and is the number of data points in the learning set, which is the number of parameters that are evaluated by the agent model.us, the pursuit of an optimal surrogate model is equal to minimizing the MSE. ( Finally, we use the "True Positive Rate", which is calculated using an out-of-sample data set, to measure the surrogate's capacity to nd positive labels for both settings (Fawcett, 2006).It is calculated as follows: e proposed method also provides us with an instinctive way to assess the importance of parameters on the output by calculating the number of splits of a parameter in the CART tree generation process.Since each tree is built based on the optimal segmentation of the probable values of the parameter where is the amount of the risky asset that was bought by the trader at time , and = (1 + ) is the gross return of the risk-free asset.Suppose that all the traders are rational traders that seek to maximize their mean-variance using heterogeneous expectations and trading strategies.e expected price and variance of the risky asset at time is denoted as and , respectively.
en, the optimal demand of ℎ, for type ℎ traders is equiva- lent to solving the following problem: is implies that where measures the risk aversion of traders, and indicates the conditional volatility, which includes all types of traders and remains constant over time.Under the condition that the quantity supplied of risk assets and the type of traders remain unchanged, the market equilibrium state is calculated as follows: where ℎ, denotes the risk asset position that is held by type ℎ agents at time .If all of the agents are all homogenous traders with rational expectations and the market contains complete information, the no-arbitrage market equilibrium condition can be written as follows: where the expectation is determined by all historical prices and dividends up to time .We call * the fundamental price of the asset.Equation ( 12) has a unique solution when dividend payments are independently distributed with a constant mean.In this case, the fundamental price is equal to * = /( − 1).e deviation of the real price from the fundamental price can be expressed as = − * . ( vectors and it pays increasingly more attention to the dicult-to-forecast samples, we can rank the model's parameters in terms of their importance and sensitivity to the output by counting the number of splits.

Application
e heterogeneous agent models (HAMs) that were employed by Brock and Hommes [25] model the asset pricing mechanism by assessing the interaction among agents with heterogeneous beliefs and strategies.HAMs are powerful at duplicating the stylized facts of nancial data series, such as volatility clustering, fat tails, long memory, and the leverage e ect.e model is also useful for explaining nancial market abnormalities such as bubbles and crashes.Recent evidence proves that HAMs provide empirical results that outperform conventional capital asset pricing models or arbitrage models, which makes this theory one of the representative theories of behavioural nance.We choose heterogeneous agent models as an ideal investigatory instance for two reasons: they have been widely studied by nancial researchers and they o er a proper number of parameters [26][27][28].is section will rst brie y describe the heterogeneous trader pricing model, then use the model to test the of performance of our method, and nally report the evaluation and comparison results.

e Heterogeneous Agent Models.
Consider that there are agents who are engaged in trading activities in a market consisting of risky assets and risk-free assets.We denote as the price of the risky asset and as the uncertain dividend.
e wealth of the agent at time + 1 is expressed as where ℎ is the transaction costs, and ∈ [0, 1] is the impact weight of past pro ts. e probability of choosing strategy ℎ for a trader is given by the following: Equation ( 17) is also known as the market fraction model, where ∈ [0, +∞) is the intensity of choice.A larger sug- gests more frequent shi ing between the two strategies.In this way, the model captures the trader's bounded rationality and the e ect of their behaviour on the price.

Model Setting.
e model has 12 total parameters that need to be estimated, as shown in Table 1.We set the parameter space within the range that is shown in Table 1 according to the existing related work.It can be further expanded or reduced based on the modeller's needs.
We choose the daily data of the Chinese Shanghai and Shenzhen 300 Indexas the real sample data for the calibration.
e sample interval is from Jan. 4th, 2017 to Dec. 31th, 2018 and it contains 412 total observations, as shown in Figure 2.
e statistical characteristics of the sample are reported in Table 2.
It can be seen from Figure 2 that the sample data have a high peak, are fat tailed, and are right-skewed.Table 2 conrms these phenomena from Figure 2, and the data series exhibits a signi cant ARCH e ect.where is the lag operator, and ℎ (⋅) is a function that repre- sents traders' predictions of future prices.e simple linear expression of function ℎ (⋅) that was proposed by Brock and  Hommes (1998) is the following: where ℎ and ℎ are the trend coe cient and intercept term, respectively.e agent is de ned as a positive feedback trader if ℎ > 0, and otherwise they are a negative feedback trader.When ℎ = ℎ = 0, the trader adopts a fundamental trading strategy that believes that the price will converge on the fundamental value.
Following existing studies, we consider that a typical market consists of a fundamentalist and chartist, and 1, and 2, are their respective trend functions.e market price can be written as follows: To maximize pro ts, the traders choose and shi between the two strategies, which is equal to maximizing the following objective function: (9) ℎ,  model.However, the determination of the sample size still lacks an objective basis and it mainly relies on the rule of thumb.To do this, we should ensure that at least one parameter combination that satis es the positive calibration criterion is contained in the pool, and, then, a small number of parameter points can be continuously added to the pool during each computing round.When the TPR curve tends to be at or even tends to decline, the relative size of the training sample can be regarded as a reference for the settings.

Conclusion
e agent-based model has been extensively utilized in complex system such as those in economics, demography and management science due to its advantageous high degree of exibility and freedom.However, there is still a lack of e ective parameter calibration methods due to computational restrictions.is paper proposes a surrogate model approach for exploring and calibrating ABM parameters by combining supervised-machine learning with intelligent sampling.By using the CatBoost machine learning algorithm, a surrogate model of the original ABM is learned, which allows the modeller to explore and locate the regions that have signi cant impacts on the output for the parameter space.Generating the surrogate model only requires a small training sample, which can signi cantly reduce the computational costs compared to other similar approaches.
e results that are obtained from the application of the heterogeneous asset pricing model suggest that our approach possesses good performance with respect to both accuracy and costs.Another advantage of our approach is that it does not require any prior assumptions about the distribution of the parameters or the topology of the output space, which makes it more applicable to a wider range of applications.
e approach that we proposed is a powerful tool for addressing the "dimensional disaster" problem that is caused by the parametric explosion in agent-based model.In future research, we plan to use it in more complex systems with more numerous parameter combinations.We also plan to establish an ABM toolbox that contains surrogate modelling, a calibration measure and the calibration criterion for general use in the future.
In the binary case, we use the two-sample Kolmogorov-Smirnov method to test whether the distribution of the model's output is consistent with the real data as follows: where denotes the log return, and and are the distribution functions of the real sample data and simulation data, respectively.
To provide a direct comparison, we use the value of the Kolmogorov-Smirnov test statistic > , as the calibration criterion when we analyse the real-value case.e higher the -value is, the better the tting e ect.
e surrogate model is trained 500 times using di erent numbers of parameter combinations ranging from 250 to 2500 with 250 samples added in each round.A well-distributed outof-sample data series is necessary and crucial for evaluating the performance of the model.We set a relatively large number of 100000 unlabelled parameter combinations as the evaluation set of the model, which is based on recent literatures.

Results
e importance of the parameters is evaluated and ranked according to the number of splits of each parameter in the decision tree construction process, as shown in Figure 3. e results indicate that the trend coe cients 1 and 2 have the most signi cant impacts on the output, and second is the intensity of choice term .e intercept terms 1 and 2 also have certain impacts on the t of the model.e risk aversion coe cient , the conditional volatility , and the wealth regression coe cient are relatively less important on the output.e surrogate model is generated using the procedure that is described in section 3 and the simulation results are shown in Figure 4.
In the case of a binary output, the 1 score increases as the amount training sample data increases.e 1 score reaches its maximum at approximately 0.8 when 2500 training samples are used, and the TPR index is approximately 0.75.Since the TPR cannot be greater than 1, we consider the results satisfying.
e surrogate model provides superior results in the realvalue setting.Even when the number of training samples is low (500), a higher TPR (approximately 70%) can be obtained.When 2,500 training points are employed, it can reach 95%.
is can be explained as the learning process over the continuous variable containing more information about the original system, which leads to better performance compared to the binary case.
Finally, we compare the time costs by running the procedure 100 times and taking the average (in seconds) for each subroutine.e subroutine includes training the surrogate model, predicting the parameter using the surrogate model and labelling the parameter using the ABM.e results show that the time costs of the surrogate method are about one-ve-hundredth of those of the original ABM model, which is a remarkable e ciency improvement for parameter calibration.
It should be pointed out that the training sample size in the surrogate model is crucial to the performance of our (14) , = sup ᐈ ᐈ ᐈ ᐈ ( ) − ( ) ᐈ ᐈ ᐈ ᐈ ,

Choose initialization settings Draw a pool of points from the parameter space using a certain sampling routine 1 Step
Select a small subset from the pool, run theABM and classify the resultsGenerate the surrogate model using the learning algorithmPredict the labels of points from the pool using the surrogate model Select a sample of unlabelled points from the pool, run the ABM and label the points Yes

F
Number of correctly predicted positives Number of positives in the pool .

.
Since di erent types of traders have heterogeneous expectations regarding stock prices and dividend payments, we can express the gain of the type ℎ traders at time + 1 as follows: