Option Pricing Model Combining Ensemble Learning Methods and Network Learning Structure

Option pricing based on data-driven methods is a challenging task that has attracted much attention recently. There are mainly two types of methods that have been widely used, respectively, the neural network method and the ensemble learning method. The option pricing model based on the neural network has high complexity, and a large number of hyper-parameters will be generated during training, resulting in difficult model adjustment. Furthermore, a lot of training data are needed. The option pricing model based on ensemble learning is not ideal for data feature extraction, because each calculation of the ensemble learning method is mainly to reduce the final residual. Therefore, this paper adopts a learning framework that embeds the modular ensemble learning methods into the network learning structure, and an option pricing model based on deep ensemble learning is proposed. The model is mainly composed of two parts: features reorganization based on random forest, used to calculate the importance of features, combined with the original data as training input; the multilayer ensemble data training structure is based on network learning structure and embeds two ensemble learning methods as network modules, and it also designs a stop algorithm to automatically determine the number of layers. This enables the model to retain the effect of data feature extraction and adapt to small and medium data sets without generating many hyper-parameters. Moreover, in order to make the model fully absorb the advantages of the two ensemble learning methods, we adopt cross-training for data. From the experimental results, it can be concluded that compared with the current optimal method, the prediction performance of the proposed model is improved by 36% in the root mean square error (RMSE), which proves the superiority of the proposed model from the quantitative direction.


Introduction
Since the option as a stock derivative o cially appeared in the market, options have always been a hot topic in nancial markets. As a kind of stock derivative, options also have risk similar to stocks, and in the process of trading may also face huge losses.
us, how to reasonably avoid the risk has become the early option market that has been plagued by people's problems. To solve this problem, the option pricing model is proposed.
In 1973, Black and Schole [1] rst proposed an option pricing model, the Black-Schole model. e Black-Schole model is the rst parametric model proposed to predict the price of options based on strict conditional assumptions in economics. erefore, the model cannot perfectly t the changing nancial market, and there is also a certain error between the predicted value and the real value in the market. So, scholars have made further research by relaxing the strict economic assumptions in the model [2][3][4]. e parameter model attempts to nd some speci c laws from the option data obtained in the market for the prediction of future option prices and tries to convert these laws into formulas in economics and mathematics. Some information signals that can be directly obtained from the market, such as stock price, execution price, maturity time, and volatility, are taken as the input of the formula and the price forecast of the options as the output of the formula [5]. However, for the real market that is changing all the time, the sensitivity of the parameter model to the change in the market has not reached people's expectations; that is to say, the prediction error of the parameter model is not as small as expected compared with the actual market data [6]. In order to make the prediction results of the model more attached to the data in the real market, people began to introduce the data-driven model [7], compared with the parameter model, the data-driven model is generally constructed based on the machine learning method. ese models solve the prediction problem of option price in a data-driven way. Data-driven model is to find the connection between data from a large number of historical data, through which to predict the option price.

Motivation
e advantage of the data-driven model is that as long as the training data is enough, the regression model trained by the machine learning method can well summarize the situation outside the training data samples [5]. erefore, the datadriven model can well predict the price of options and even surpass the formula derived from economic principles. e earliest data-driven model introduced into option pricing is the neural network. Starting from the basic neural network method, with the deepening of research, deep neural network (DNN), artificial neural networks (ANN), hybrid neural network, and other methods with higher complexity and better effect are also applied to the field of option pricing. e advantage of these neural network methods is that they can well extract the feature of the original training data, yet neural networks usually require a lot of data to train. At the same time, the neural network model is generally a complex model, and there are many hidden layers and a large number of super parameters, which makes the model parameter adjustment difficult [8].
erefore, in recent years, the research on the option pricing model is not simply limited to the neural network and various optimization and improvement of the neural network. e traditional ensemble learning method has been introduced into the field of option pricing. In essence, ensemble learning is a method that combines the results of multiple basic learners to make decisions [9]. Compared with the neural network, ensemble learning can be trained on small data sets. At the same time, the ensemble learning method does not have a large number of super parameters, and the debugging of parameters is more convenient. Unfortunately, the ensemble learning method is not as good as the neural networks in feature extraction of original training data. If the advantages of these two methods can be combined and the shortcomings of each other can be complemented, the prediction effect of the model will be significantly improved. Some people [8] believe that the learning advantage of a neural network lies in their layer-by-layer processing of the original data characteristics. erefore, a deep forest method is proposed.
e deep forest adopts a cascade hierarchical structure like the neural network. Moreover, random forest and completely random forest are used as the processing units of each layer. is not only retains the layer-by-layer processing of neural network but also takes advantages of random forest with fewer super parameters and trained on small data sets. e deep forest method has been applied in many fields such as medical image, image classification, and multilabel learning and has achieved ideal results [10][11][12]. Inspired by deep forest, we introduce this  idea to the regression problem of option pricing, and a new  option pricing model based on deep ensemble learning is  proposed. e model is mainly composed of two parts: features reorganization is based on random forest, used to calculate the importance of features, combined with the original data as training input; the multilayer ensemble data training structure is based on network learning structure and embeds two ensemble learning methods as network modules and designs a stop algorithm to automatically determine the number of layers, which can produce a small number of hyper-parameters and adapt to small and medium data sets, and with good data feature extraction effect. For the input and output of data, the output of each layer is spliced with the original input data to form the input of the next layer. In the meantime, in order to make the model fully absorb the advantages of the two ensemble learning methods, influenced by the idea of mutation in genetic algorithm, we adopt cross-training for data, so that the data are trained in different methods in the adjacent two layers. To briefly summarize the contributions, we have the following:

Contribution.
(1) A new option pricing model based on deep ensemble learning is proposed. is model introduces the idea of deep ensemble learning into the field of option pricing, adopts a learning framework that embeds the modular ensemble learning into the network learning structure, and encompasses two subprocesses, namely, importance extraction and multilayer ensemble. (2) e multilayer ensemble structure is based on network learning structure and embeds two ensemble learning methods as network modules and designs a stop algorithm to automatically determine the number of layers, which can produce a small number of hyper-parameters and adapt to small and medium data sets, and with good data feature extraction effect. e structure also adopts cross-training for data in order to make the model fully absorb the advantages of the two ensemble learning methods. (3) A novel features reorganization module which can calculate the importance of features by using the processing results of random forests on the original data is designed. e feature importance matrix is taken as the weight matrix multiplied by the original data as the training data.

2.2.
Organization. e rest of the paper is organized as follows: Section 2 explains the background of the parameter model and nonparametric model in the option pricing field. In Section 3, the principle of proposing a model is introduced. In Section 4, the experimental results of the proposed model and classical parametric and nonparametric models are discussed. Finally, Section 5 concludes the paper.

Parametric Model.
e earliest proposed option pricing model is the B-S model, which was proposed by Black and Schole [1] in 1973. ere are some unrealistic assumptions in the B-S model, such as the assumption of stock price returns and the assumption of market-implied volatility are not in line with the actual market [13]. erefore, the B-S model cannot perfectly fit the actual market, and there is a certain error between the predicted value and the real value. So, scholars have done further research by relaxing the assumption of the B-S model.
Heston [14] assumes that volatility follows a random diffusion process and proposes the Heston model. Merton [15] believes that the price of options should be the sum of a continuous process and a traditional discrete jump process. So the Poisson distribution satisfying the random process is added to the model, and the jump-diffusion model is proposed. Kou [16] found that the prices of some options are different from the traditional discrete jump process, and the probabilities of bipolar jumps are different or show multilevel jump changes. erefore, a double-exponential jumpdiffusion model is proposed. Bollerslev [17] regards the volatility variable of options as a discrete random process of change, unlike the assumption that the volatility in the B-S model is a constant, which proposes the GARCH model. ere are also some models breaking the assumption of constant volatility, such as the random volatility model and the random volatility jump model [18]. Some scholars have broken the assumption that the risk-free interest rate is constantly replaced the risk-free interest rate with a variable short-term interest rate, proposed a stochastic interest rate model [19,20], or replaced the initial constant risk-free interest rate with a weighted average of multiple interest rates, and proposed an "interest rate affine" model, Duffee [21]. ese parameter models improve the prediction results of option prices to some extent, but they are still looking for some specific laws from the market and trying to convert them into formulas in economics and mathematics, so they are still subject to some economic and statistical assumptions. Different from the traditional parametric model, our proposed model is based on data-driven option pricing to avoid some assumptions that affect the effectiveness of the model.

Data-Driven Model.
Computer scientists have also used neural networks to solve the problem of option pricing for a long time [7,22,23]. Option pricing can be seen as a standard regression problem, and many methods in machine learning can be applied to the field of option pricing.
Many people have been studying how to better apply neural network (NN) to option pricing over the years [24][25][26][27][28]. Bennell [29] used an artificial neural network (ANN) to predict the option price; PC Andreou [30] tried to build a new model by combining ANN and parameter model in addition to using ANN alone; Lajbcygier and Connor [31] proposed a hybrid neural network for trading by applying a guided method; Culkin [32] used deep neural networks to construct an option pricing model. ese network models make people pay more attention to the development of big data, and big data have many other aspects that are closely related to people, such as big data analysis of health care can better protect the health [33]. e standard neural network consists of many simple connection processors called neurons, each of which produces a series of real-valued activations. e input neurons are activated by the sensor of the sensing environment, and other neurons are activated by the weighted connection from the previous active neurons.
ese neural networks are designed to mathematically simulate how the human brain works by receiving a wide range of stimuli and then parsing them by learning the neuron layer that associates input and output [34]. e application field of deep learning is very extensive, and the field of auxiliary diagnosis is also a recent research hotspot. Some scholars have constructed a detection model for sentiment analysis of mental disorder based on attentionbased deep learning and fuzzy classification [35]. ese neural network methods can well extract data features because they can process the original data layer by layer. Meanwhile, because the neural network methods are black boxes, the processing of each layer is invisible, resulting in a large number of super parameters, and the parameter adjustment of the model is very difficult. At the same time, because the neural network method requires a large number of data for training, the effect on small data sets may not be ideal. erefore, the traditional ensemble learning method is introduced into the field of option pricing. Similarly, ensemble learning methods have a wide range of applications, such as diabetic retinopathy classification model based on ensemble learning [36] and software cost analysis model based on multiobjective optimization [37]. Some ensemble learning methods also have good applications in financialrelated fields [38]. Codru [39] constructed multiple options pricing models by using the ensemble learning methods such as random forests, XGBoost, and LightGBM and conducted prediction experiments on the actual market data. Ensemble learning is a general term for the methods that combine multiple basic learners to make decisions, which is usually used to supervise machine learning tasks. A basic learner is an algorithm that takes a set of labelled examples as input and generates models that generalize these examples (such as classifiers or regressions). e main premise of ensemble learning is that by combining multiple models, the error of a single basic learner is likely to be compensated by other basic learners, so the overall prediction performance of the ensemble will be better than that of a single basic learner [40]. For each basic learner, it can complete training on small data sets as well, and there is no need for a large number of hyperparameters in training. erefore, the ensemble learning method has the advantages of fewer hyperparameters and adapting to small data sets. Although ensemble learning has many advantages, it is inferior to the neural networks in data feature extraction.
Our proposed model processes data layer by layer to ensure the effect of data feature extraction, and each layer is a set of ensemble learning algorithms, that is, an ensemble of Mathematical Problems in Engineering the ensemble. In order to maintain the diversity of integration, we choose two ensemble learning methods at each level. As is well known, diversity is the key to overall construction [8]. In terms of the selection of specific methods, according to the actual test results of various ensemble learning algorithms in the field of option pricing [39], we choose two methods with the best performance, namely, XGBoost [41] and LightGBM [42], which retains the advantages of fewer hyperparameters of integrated learning and adapting to small data sets and obtains better results than single neural network or single ensemble learning methods.

Proposed Method.
As presented in Section 2.2, there are recent studies showing promising results in option pricing using neural network methods and ensemble learning methods. us, this paper aims to retain the advantages of the neural network method in feature extraction, and at the same time, it has the advantages of ensemble learning fewer hyperparameters and adapting to small data sets. Our methodology is presented in Figure 1. In this section, we first introduce the overall architecture of our proposed framework and then discuss details of the two main modules: (1) the features reorganization for obtaining feature weight and (2) the multilayer ensemble structure are used to training data.

Overall
Architecture. Input data are first prepared, which consists of stock prices, execution prices, maturity times, and implied volatility. First, the original input data are processed by random forests, and the output vector and the trained forest model can be obtained. Based on this, the importance of features can be calculated, and the weight matrix of features can be obtained. In the multilayer integration structure, each layer receives the feature information processed by the previous layer, and the processing results of this layer are spliced with the input vector to the next layer. e last layer is the output layer, and the input data of the output layer will no longer splice the original data, but the average value of the prediction results obtained by each processing module of the front layer is output to obtain the final prediction results. e calculation process is as follows: where y represents the prediction result of the final output, y 1 , y 2 , . . . , y n represents the output value obtained by the processing module in each layer, and n represents the number of processing modules in each layer.

Features Reorganization.
We obtain the weight matrix of features through the process of features reorganization and combined it with original data as training input in the model. First, the original input data are processed by random forest, and the output vector and the trained forest are obtained. en, based on this, the importance of the feature is calculated, and the weight matrix of the feature is obtained. e weight matrix is multiplied by the original input feature as the input of multilayer integrated training. e process of calculating the importance of features is as follows: where n k is the importance of a node, w k , w left , and w right are the ratio of the number of training samples to the total number of training samples in the node and its left and right subnodes, respectively, and G k , G left , and G right are the Gini impurity of the node and its left and right nodes, respectively.

Multilayer Ensemble Structure.
In this section, we introduce the multilayer ensemble structure in three parts, namely, the layer-by-layer training strategy, two ensemble learning algorithms, and the cross-training and stop algorithm.

e Layer-by-Layer Training
Strategy. e multilayered structure is based on the hierarchical structure of deep neural networks. e inner the network layer of the deep neural network can be divided into the input layer, hidden layer, and output layer according to different positions. Generally, the first layer is the input layer, the middle layers are hidden layers, and the last layer is the output layer. e layers are fully connected; that is, any neuron in layer i must be connected to any neuron in layer i + 1.
Although DNN looks complex, it is very similar to sensors in small local models, i.e., a linear relationship z � m i�1 w i x i + b plus an activation function σ(z).
Since there are many parameters and layers of DNN, the definitions of bias b and linear coefficient w need certain rules. Definition of bias b: the bias corresponding to the third neuron in the second layer is defined as b 2 3 . Definition of linear coefficient w: the linear coefficient from the fourth neuron in the second layer to the second neuron in the third layer is defined as w 3 24 . Among them, the upper marker 2 represents the number of layers, and the lower marker 3 represents the index of neurons where bias exists. Note that the input layer has no w parameter, bias parameter b.
Assuming that the activation function we choose is σ(z), the output value of the hidden layer and the output layer is a. For the output a 2 1 , a 2 2 , a 2 3 of the second layer, we have For output a 3 1 of the third layer, we have 4 Mathematical Problems in Engineering Generalizing the above example, assuming that there is m neuron in the l − 1 layer, for the output a l j of the j neuron in the l layer, we have If l � 2, then for a 1 k is the input layer x k . From the above, it can be seen that using the algebraic method to express the output one by one is more complex, and if the matrix rule is relatively simple. Assuming that there are m neurons in the l − 1 layer and n neurons in the l layer, the linear coefficient w of the l layer constitutes a matrix W l of n × m, the bias b of the l layer constitutes a vector b l of n × 1, the output a of the l − 1 layer constitutes a vector a l− 1 of m × 1, the linear output z of the l layer before activation constitutes a vector z l of n × 1, and the output a of the l layer constitutes a vector a l of n × 1. en, expressed by the matrix method, the output of the l layer is

Two Ensemble Learning Algorithms.
Each level is an ensemble of ensemble learning methods, i.e., an ensemble of ensembles. Here, we include two different types of ensemble learning methods to encourage diversity. e core idea of f(x) can be expressed in three steps. e first step is to continuously add a tree, that is, to continuously split the features to generate a new tree, and each time to add a tree is actually to learn a new function, so as to fit the residual of the previous prediction. e second step is to get k trees when we complete the training, and then, we need to get a predicted sample score. Specifically, according to the characteristics of this sample, each tree will fall to a corresponding leaf node, so that each leaf node will correspond to a score. e third step is to add the scores corresponding to each tree we get, which is our predicted value for the sample. Assuming that we use y to represent the predicted value: where ω q(x) is the fraction of leaf node q, F corresponds to the set of all k regression trees, and f(x) represents one of all regression trees.
Obviously, our goal is to make the current prediction value y i as close as possible to the real value y i , and improve the adaptability of the algorithm to the data outside the training sample as much as possible. erefore, from a mathematical point of view, this is a problem of finding the optimal value. We regard the objective function as the sum  of the loss function and the regularization term; then, the objective function can be expressed as It can be seen from the formula that the objective function is divided into two parts, the left side i l(y i − y i ) is the loss function, and the role is to reveal the training error, that is, the gap between the predicted score and the real score. e right side k Ω(f k ) is a regularization term to define the complexity of the objective function. For formula (8), y i is the output of the entire cumulative model, and the regularization term k Ω(f k ) is a function that represents the complexity of the tree in the model. e smaller the value of the regularization term is, the lower the complexity of the tree is, and the stronger the generalization ability of the model is. e specific formula is expressed as where T represents the number of leaf nodes, c represents the parameters used to control leaf nodes, so that the leaf node T is as few as possible, ω represents the fraction of leaf nodes, and λ is used to control the fraction of leaf nodes. e role of c and λ is to minimize the prediction error and prevent overfitting. Specifically, i in the first part of the objective function represents the prediction error of the i sample, and l(y i − y i ) represents the prediction error of the i sample. Our goal is, of course, that the smaller the error is, the better. Previously, it was said that the first method needed to accumulate the scores of multiple trees to get the final prediction score. In the process of iterative implementation, each iteration was based on the existing tree, adding a tree to fit the residual between the prediction results of the previous tree and the true value. We need to select a f in each iteration to minimize our objective function. e objective function for the entire first method process can now be expressed as Similarly, l in formula (10) represents the loss function, and Ω(f t ) represents the regularization term.
g(x) is a method proposed to improve some defects of the traditional boosting method. e traditional boosting algorithm needs to scan all the samples for each feature to select the best segmentation point, so the traditional boosting method will be time-consuming. erefore, with the development of the traditional boosting method, it has been unable to meet the current needs in efficiency and scalability. In order to solve the high-latitude mass sample data that need to be processed now, g(x) uses two methods, one is GOSS (gradient-based one-side sampling) method, and the GOSS method does not use all the sample points to calculate the gradient, but sample to calculate the gradient; the other is the EFB (exclusive feature bundling) method. e EFB method does not scan and calculate all the features when searching for the best segmentation point but binds some features together to reduce the dimension of the features and then finds the best segmentation point, which will greatly reduce the consumption in the process of searching for the best segmentation point. ese two methods can reduce the time complexity of processing samples but do not lose accuracy.
GOSS method: in the calculation of information gain, generally, the sample points with a large gradient play a major role in the calculation; that is to say, these sample points with a large gradient can contribute more information gain, so the main idea of GOSS method is to retain these sample points with large gradient when the sample is downsampled, ignored a part of the remaining sample points with small gradient, and randomly sampled these sample points in proportion, which can not only save the processing time but also maximize the accuracy of information gain assessment.
EFB method: the traditional boosting method will not only conduct data sampling but also conduct feature sampling, mainly to further reduce the training speed of the model. e second method also has feature sampling, but this feature sampling is not the same as the traditional feature sampling method. It binds mutually exclusive features to reduce the dimension of features so that there will be less consumption for feature sampling. Usually, the highlatitude data in our application are basically sparse data, which makes it possible to reduce the number of valid features by designing an almost lossless method, especially in a sparse feature space where there are many mutually exclusive features, which allows us to bind mutually exclusive features stably together to form a new feature, so as to reduce the feature dimension. e combination of mutually exclusive features in g(x) uses the histogram algorithm. e basic idea of the histogram algorithm is to discrete the continuous eigenvalues into k integers and construct a histogram with width k. According to the discrete value as the index in the histogram of the cumulative statistics, when traversing the data once, the histogram accumulated the required statistics, and then according to the discrete value in the histogram, traverse to find the optimal segmentation point.

e Cross-Training and Stop
Algorithm. Both f(x) and g(x) are very mature ensemble learning algorithms, which are widely used in various scenarios. In comparison, f(x) adopts the greedy process to calculate each feature to find the optimal segmentation point with better accuracy, and g(x) adopts the decision tree growth strategy of selecting a leaf node with the largest splitting gain in each leaf node layer to split with higher accuracy. To fully absorb the different advantages of the two methods in training, inspired by the idea of mutation in genetic algorithms, we exchange the output data of each layer and input the results of the two methods into the next layer after splicing with the original data, so that the adjacent two layers are trained by different methods.
At the same time, the layers of our model are adaptive, and after adding a new layer, the performance of the whole model will be estimated on the verification set. If there is no obvious improvement, the number of layers will no longer increase, and the training process will terminate. For the performance evaluation criteria, we choose the root mean square error of prediction which is more suitable for the options field, not the accuracy commonly used in the classification field. e multilayered structure of the whole model can adaptively determine the number of layers by z(n), which also makes the module applicable to different scales of training data. e calculation process of z(n) is as follows: where r(n) represents the root mean square error of prediction with n layer, and r(n − 1) represents the root mean square error of prediction with n − 1 layer. Stop increasing layers when z(n) ≥ 0.

Data.
e data used in our experiment are daily data on the KOSPI200 option market for the period from 2 June 2009 to 7 November 2019. KOSPI200 option is based on KOSPI200 index. e KOSPI200 index is a weighted total market price index of 200 blue-chip stocks listed in the Korean stock market. In order to avoid the synchronization caused by the trading effect, we use the closing price of the KOSPI call option as price data. We use 2064 (80%) samples as the training data set, and the remaining 516 (20%) samples as the test data set. Input vector X consists of stock price S, strike price K, maturity time T − t, and volatility σ. Here, we use σ available in the market, which is given according to the general implied volatility formula. We exclude the interest rate c from the input vector because it changes very little in the period of KOSPI200 data. e evaluation criteria of pricing results are measured by root mean square error (RMSE). e errors were calculated according to moneyness and the duration of the contract. According to the duration of the contract, it can be divided into short term (<1 month), medium term (1 − 3 months), and long term (>3 months), as well as according to the moneyness, it can be divided into Deep In e Money(S⁄ (K < 0.8)), In e Money(S⁄ (K∈(0.8,0.96))), At e Money(S⁄ (K∈(0.96,1.04))), Out of e Money(-S⁄ (K∈(1.04,1.2))), Deep Out of e Money(S⁄ (K > 1.2)).

Results
is section discusses the experimental results of the model proposed in Section 3. We selected several comparative experimental models for analysis, including two parameter models, respectively, the Black-Scholes model (1973) [1] and the Heston model (1993) [14]. At the same time, there are three machine learning models, namely DNN [43], XGBoost [41], and LightGBM [42]. To emphasize the prediction power of our model, the comparison will be made with the error of these parametric models and machine learning models.
In order to find the most suitable feature space, three models are compared, and each model has different input and output. e experimental results are shown in Table 1.
Here, S represents the KOSPI200 stock price, K is the strike price, t is the expiration time of the option contract, and σ is the implied volatility available in the market. e roughened number represents the minimum pricing error of each model. e input and output of Model 1 are the same as those proposed by Hutchinson (1994) [7]. In order to reduce dimensionality, he assumes that the evaluation function is homogeneous of degree one in S and K, respectively, is approach has been intensively used in the literature [44,45] with good reported results. In this case, nonparametric models do outperform parametric models.
Model 2 uses the same input and output as model 1, but the difference is that model 2 is not a homogeneous assumption. For the next models, implied volatility has been considered. Like Model 1, the performance of parametric models in Model 2 and Model 3 is not as good as that of nonparametric models. In the nonparametric model, the proposed model is better than the separate DNN, XGBoost, or LightGBM models. It can also be seen that the prediction error is the smallest when using the input and output of model 3.
After determining the input and output, we performed ablation experiments on the proposed model. Ma is a model without cross-training, Mb is a model with only f(x) ensemble learning method, Mc is a model with only g(x) ensemble learning method, and Md is our model. Table 2 shows the results of ablation experiments, and we used root mean square error and mean absolute error as the evaluation indexes of the ablation experiment. It can be seen from Table 2 that the prediction accuracy of the model trained by only f(x) method or only g(x) method is lower than that of the proposed model. Similarly, the prediction accuracy of the model without cross-training method is also lower than that of the proposed model. is proves from the experimental point of view that the cross-training and two methods of training data used in the proposed model are effective.
Based on the input and output of model 3, we do more detailed experiments in terms of moneyness and time until maturity. e pricing error is shown in Table 3, and we can see in detail the prediction results of various methods under different moneyness or time until maturity. From the results, even in the case of different moneyness and time until maturity, the prediction effect of all nonparametric models is better than that of parametric models. In the nonparametric model, the prediction accuracy of the two ensemble learning methods is higher than that of DNN. At the same time, it can be seen that the prediction accuracy of our method is better than the existing methods in most cases. Figure 2 presents the boxplot of pricing errors. e main body of the box represents the error distribution between the first and third quantiles. e center of the box represents the maximum and minimum record errors. As you can see, the error distribution centers of all nonparametric models are basically near 0. e error  Figure 2: Boxplot of pricing error for the out-of-sample period. e body of the candle represents the distribution of errors between 25% and 75% of the data, and the wick represents the maximum and minimum recorded error. e results are shown for Model 3. Blue dots represent outliers. distribution center of the proposed model is closer to 0, and the error distribution center of the DNN model is farther away from 0. e error distribution center of the parametric model is between 2 and 3, farther away from 0. is also shows that the overall error of all nonparametric models is relatively small, and the proposed model is more accurate in prediction accuracy.
For the robustness of the model, another prediction method is used for the experiment. Use 1-year data as a training set, and then use the next month's data as a test set. Taking one month as a sliding window, seven nonoverlapping periods were tested. e results can be seen in Table 4. Similarly, the nonparametric model has excellent prediction ability. In most cases, the proposed model has higher prediction accuracy than the nonparametric model. However, the differences between nonparametric and parametric models have been diminished. An explication could be the smaller train set compared with the analysis in Table 3, because the more training data of most nonparametric models, the more accurate the prediction results. Figure 3 shows the visualization results of the sevenmonth test data. It can be seen that the prediction error of all nonparametric models is much smaller than that of parametric models. Figure 4 shows the error comparison of each nonparametric model more accurately. is also proves that the proposed model has better prediction accuracy.

Conclusion
In this study, a new model based on deep ensemble learning was developed for option pricing. e model applies the idea of the deep ensemble to the regression problem of option pricing and encompasses two subprocesses, namely, importance extraction and multilayer ensemble. e performance of the model was experimentally verified, and the results were evaluated from many aspects. A comprehensive comparative study ensures that the model is superior to other models in different measures. erefore, the model based on deep integration learning is used as a skilled tool for option pricing.
e limitation of the current work is that although the model has achieved good results on option data with certain exercise time, the pricing of option data with uncertain exercise time is still a challenging problem. In terms of future work, we will consider how to improve the pricing power of the model for different types of option data.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.