An Improved StackingModel for Equipment Spare Parts Demand Forecasting Based on Scenario Analysis

e purpose of spare parts management is to maximize the system’s availability and minimize the economic costs.e problem of cost availability trade-o leads to the problem of spare parts demand prediction. Accurate and reasonable spare parts demand forecasting can realize the balance between cost and availability. So, this paper focuses on spare parts management during the equipment normal operation phase and tries to forecast the demand of spare parts in a specic inspection and replacement cycle. Firstly, the equipment operation and support scenarios are analyzed to obtain the supportability data related to spare parts requirements. en, drawing on the idea of ensemble learning, a new feature selection method has been designed, which can overcome the limitations of a single feature selection method. In addition, an improved stacking model is proposed to predict the demand for spare parts. In the traditional stacking model, there are two levels of learning, base-learning, and meta-learning, in which the outputs of base learners are taken as the input of the meta learner. However, the proposed model brings the initial feature together with the output of the base learner layer as the input of the meta learner layer. And experiments have shown that the performance of the improved stacking model is better than the base learners and the traditional stacking model on the same data set.


Introduction
Accurate spare parts demand forecasting can ensure reasonable spare parts inventory, which can reduce the downtime loss or unnecessary inventory cost [1]. Di erent spare parts have di erent demand patterns, and the factors in uencing spare parts demand in di erent phases di er a lot. Besides, the adoption of the prediction method is limited by data availability, so di erent prediction methods are required in di erent situations [2]. Generally, prediction methods can be divided into qualitative and quantitative. Qualitative methods are mainly based on the subjective judgment of decision-makers or experts, which often lack theoretical basis and are di cult to explain. In contrast, quantitative methods can be divided into time series and causal methods.
e former is only based on historical demand data, while the latter considers other explanatory variables that may a ect the target value. And it can be subdivided into simple linear regression and machine learning methods. e main di erence between linear regression and machine learning is that the former determines the model structure in advance, while the latter learns the relationship between input and output from data [3]. Compared with the statistical methods, machine learning is more suitable for tting complex nonlinear relationships [4]. Machine learning has been widely used in the prediction eld, such as hourly demand prediction of rental bikes [5], pharmacy drug demand estimation [6], and patients' demand prediction in hospitals [7]. In [8], clustering methods and XGBoost were used to assess vehicle driving and predict risk levels. In [9], grey neural networks, BP neural networks, SVM, and so on were used to train data and establish demand prediction models.
ere is a complex nonlinear relationship between spare parts demand and many explanatory variables, so machine learning method is considered to predict spare parts demand.
In the task of spare parts demand forecasting based on machine learning, we need to first analyze the factors that may affect spare parts demand, and then make feature selection. Generally, the demand for spare parts comes from the process of operation and maintenance [10]. Demand for spares may be related to failure occurrence, maintenance activities, equipment age, and operating conditions, the environmental conditions, such as weather condition, past usage patterns such as fleet size, the annual budget from departments, flying hours, sorties flown, management, and reliability characteristics of spare parts should also be considered [11][12][13]. However, previous studies' analysis of influencing factors is not comprehensive and detailed. Many studies have mentioned that operating conditions and environment, maintenance activities, flight hours, takeoff and landing times, reliability, fleet size, management factors, and other factors may affect the demand for spare parts, but factors such as environmental variables and management are always omitted. Besides, the specific operation process of equipment, the maintenance support process of equipment spare parts, including the factors of spare parts themselves, may also affect the demand for spare parts. erefore, through the analysis of the three-level maintenance support process of equipment spare parts, this paper tries to obtain more comprehensive potential factors affecting the demand of spare parts.
In the field of machine learning, feature selection is a data preprocessing technology. Selecting more compact feature subsets from existing features for modeling can improve the accuracy and generalization ability of the model and reduce the running time. [14,15]. As demonstrated in Chandrashekar's study, the forecast model's performance depends not only on the model structure but also on the feature space [16]. Feature selection methods are generally divided into four categories, including filters [17][18][19][20], wrappers [21][22][23][24], and embedded [25] and hybrid methods [26,27]. New feature selection methods based on ensemble learning have also emerged in recent years [28]. In the ensemble feature selection method, many different feature selectors are generated first, and then the outputs of these individual selectors are aggregated and returned as the final integrated result [29,30]. e ensemble feature selection method also has been used in some fields, such as biomarker recognition in a cancer diagnosis process, and experiments have shown that it can significantly improve the stability and classification performance of biomarkers [31]. Taheri and Nezamabadipour [32] proposed a feature selection method of high-dimensional data based on the ensemble method. e performance of which has shown that it has more advantages than other methods in classification accuracy and feature reduction rate. e purpose of researchers using ensemble learning is similar to that of people weighing various opinions and making decisions in daily life. For example, in a democratic society, citizens vote to elect officials or make laws. Ensemble learning completes the learning tasks by constructing and combining multiple learners. First, a group of individual learners is generated, and then they are combined by some strategy. In 1979, Dasarathy and Sheela [33] first proposed the idea of ensemble learning. Since then, ensemble learning has become an important research field of machine learning. In 1992, Wolpert [34] proposed the stacking generalization model. Breiman [35] suggested a bagging algorithm in1996. And Freund and Schapire [36] proposed the AdaBoost algorithm in 1997. Dietterich [37] explained three fundamental reasons for the success of the ensemble learning from a mathematical point of view: statistics, calculation, and representativeness.
Later, more ensemble learning algorithms have been proposed, and ensemble methods have been successfully applied in pattern classification, regression, and forecasting problems. For example, in [38], ensemble technique, bagging, and boosting were used to conduct antiaircraft missile spare parts demand to forecast. Matrenin et al. [39] solved the power system medium-term load forecasting problem based on the ensemble machine learning method, in which the AdaBoost model has the best prediction performance. Gutierrez-Espinoza et al. [40] used the ensemble learning method to identify false information.
e experimental results show that both the bagging ensemble and AdaBoost ensemble methods can detect deceptive information better than the traditional machine learning methods.
As a kind of ensemble learning technology, model fusion solves problems by combining multiple models in some way. In the era of big data, massive multimodal data exist widely. Data fusion method and model fusion method are essential means of multimodal data analysis and mining [41][42][43]. And the learning performance of model fusion is often better than that of the narrow sense ensemble learning model. Many papers have explained reasons from the perspectives of training basis, hypothetical space, and starting point of calculation [44,45]. eoretically, model fusion can reduce variance and deviation simultaneously and find the best advantage of variance deviation equilibrium. Now, the widely used fusion methods include averaging, voting, stacking, and so on. e first two methods fuse the results of the base learners by voting or averaging, which can only improve the fusion results to a certain extent. However, stacking model uses meta learner to fuse the assumptions made by the base learner, which is equivalent to looking for the best fusion rules. erefore, we can train the meta learner in the direction of minimizing the loss function, which is more reliable and efficient than the averaging and voting methods. erefore, stacking is more effective than voting and averaging and has been widely used in various fields. For example, a stacking ensemble learning model was adopted to predict the short-term wireless network load and car-hailing demand [46,47]. In [48], a stacking-based ensemble model was deployed to forecast solar radiation strength. Liang et al. [49] and Al-Sarem et al. [50] adopted a stacking learning framework for genomic prediction and phishing website detection. In the stacking model, selecting the appropriate base models, optimizing the hyper-parameters of the model, and improving the generalization ability are the challenges we have to face, and many scholars have made attempts to solve the above problems. Al-Sarem et al. [50] adopted a genetic algorithm (GA) to tune the parameters of machine learning learners, Cui et al. [51] developed an improved swarm intelligence algorithm to optimize the critical parameters in the prediction model. In [52], the Pearson Correlation analysis was used to measure the correlation between base learners to select the basic learners with less correlation. In [53], the proposed deep ensemble learning framework adopted two sparse auto-encoders to reduce the correlation of attributes and diversify the base classifiers. Pari et al. [54] proposed a multi-tier stacked ensemble algorithm, in which the cross-validated predictions are combined to generate new features by different combination schemes.
Spare parts demand forecasting is an important issue in spare parts management, but in this field, ensemble learning and model fusion are rarely used, especially the stacking model, which has a large utilization space. erefore, this paper introduces the stacking model in ensemble learning into the field of spare parts management to predict the demand for spare parts. Besides, the stacking model constructed in this paper has only three base models, which will lead to a small number of features of the meta model and may affect the prediction accuracy. erefore, this paper proposes an improved stacking model which adds the initial features to the feature matrix of the meta model to solve this problem. And the main contributions of this paper are as follows: (1) Potential factors affecting spare parts demand that may not have been considered before are obtained and quantified based on the scenario analysis of the operation and support process of aviation equipment; (2) An integrated feature selection framework based on the three feature selection methods is proposed, which can reduce the one-sidedness and limitation of a single feature selection method; (3) In addition, the stacking model is introduced into the field of spare parts demand prediction, and the improved stacking model tries to solve the problem of insufficient number of meta learner features in the traditional stacking model, which effectively improves the generalization ability and prediction accuracy of the model.

Equipment Support Scenario Analysis
e so-called scenario was initially a concept in product design. From a macro perspective, it refers to the needs of users to use the product, such as analyzing what problems a product solves for which target users; from the micro perspective, it is the specific use details, such as the user's specific search and browsing scenes, which reflects the interactive experience in the details of a product. Scenario modeling is a method to describe and analyze the user requirements. In the form of stories, it tells the whole process of target users using products to complete target tasks in a specific environment. e 5W1H method [55] is usually used to mine the six elements in the scenario, as shown in Table 1: erefore, we apply the concept of scenario modeling to the context of equipment integrated support, digitize the support scenario of equipment spare parts, and analyze the potential factors considered by the staff when planning the spare parts requirements.
By analyzing the spare parts supply and support process, we can summarize the potential factors affecting spare parts consumption, but this is not enough. Comprehensive and detailed analysis and digital description of equipment operation and support scenarios can get more detailed features.
Taking the annual flight hours as an example, many studies have taken the total flight hours as an influencing factor. However, different equipment utilization may result in various spare parts requirements when the total flight hours are determined. In this way, the flight time can be subdivided into flight time under different natural environments, different operating environments, different training subjects, different personnel operations, and different support and maintenance conditions. ese differences may be the potential factors affecting the demand for spare parts under the premise of a certain total flight time.
And these differences may be the potential factors affecting the demand for spare parts under the premise of a certain total flight time.

e ree-Level Support System for Military Spare Parts.
e flight regiment usually performs various training tasks under various working conditions. Figure 1 shows the three-level maintenance support process of repairable parts of a flight regiment. e outfield spare parts warehouse and the outfield maintenance depot are responsible for the support and maintenance of LRU (Line Replaceable Unit), respectively. Spare parts will be demanded in the process of equipment operation and maintenance activities. erefore, we will analyze the operation and support scenarios of equipment.

Quantify the Influencing Factors.
e consumption of spare parts is related to the task quantity and the consumption law, while the former includes flight hours, takeoff and landing times, operation conditions, natural environment, and other factors, while the latter is related to the attributes of spare parts, such as loss sensitivity, and reliability. Besides, the replenishment of spare parts mainly comes from returning to the warehouse after repair, so factors related to maintenance should be considered. So, the 5W1H method is applied to the field of equipment support, Scientific Programming and the equipment operation and support scenarios are analyzed in Table 2: (1) According to the scenario analysis, the potential factors that may a ect the demand for equipment spare parts are obtained, which are shown in Table 2.
(2) Based on scenario analysis, we have obtained the relevant data of ve kinds of spare parts from a department of the Chinese Air Force, which generated from 2011 to 2020. e data set has 34 attributes, corresponding to the attributes in Table 2, and the label is the monthly demand quantity of spare parts.
(3) Due to military con dentiality requirements, the data used in this paper cannot be made public. To support analysis, the data have been standardized and normalized.

Model Framework
Based on the analysis of equipment spare parts use and support scenarios, the potential factors a ecting spare parts demand are obtained. e initial feature set is established after data cleaning and processing. After that, to eliminate irrelevant or redundant features, this paper proposes a feature selection method based on ensemble learning, which obtains the nal feature set by aggregating the output of several individual feature selectors. And then, a stacking framework is established. First, G gridsearch CV is used to tune the parameters of base learners, and then cross-validation is used to determine the best combination of base learners. e stacking model combines multiple prediction models to get a more accurate model to predict the demand for spare parts. In addition, an improved stacking model is proposed to solve the problem of less features of meta learners, and the e ectiveness of the method is veri ed by comparing the performance of di erent models. And the model framework is shown in Figure 2.
3.1. Feature Selection. Inspired by ensemble learning methods, an ensemble feature selection method is proposed and shown in Figure 3, ensemble speci cally means synthesizing the results of multiple feature selection methods to obtain the nal feature set. And the three base feature  Figure 1: ree-level maintenance support process of repairable spare parts. . PCA, RFECV, and RF are the three commonly used feature reduction methods, and in these three methods, we can nd a parameter to measure the importance of features. In the PCA method, explained variance ratio represents the proportion of the variance value of each principal component in the total square di erence after dimension reduction. erefore, explained variance ratio is used to rank the feature importance. In the RFECV method, all features' importance evaluation is performed by the recursive feature elimination.
e selected feature set is cross veri ed to determine the number of features with the highest average score. In the RF method, the features are arranged in descending order through random forest, and a certain proportion of features are deleted to obtain a new feature set. e feature set with the lowest out-of-bag error rate is selected as the nal feature set.
en, according to the feature importance parameters, namely, explained variance ratio, grid scores, and feature  Scienti c Programming importance, the importance of features are sorted into three sequences, namely, indice_1, indice_2, and indice_3. Let i be a feature, the weight of feature i can be obtained by its position in the sequence, which is represented by index(i). In the three sequences, the position of feature i may be different, which is represented by indice_1.index(i), indi-ce_2.index(i), and indice_3.index(i). So, the comprehensive weight of features can be obtained by synthesizing the position of features in the three methods. So, we de ne the comprehensive weight of feature i as weight(i): where α1, α2, α3 represents the weight de ned by di erent base feature selection methods according to the e ect of feature selection, and α1 + α2 + α3 = 1. Based on this, the features can be reordered by weight(i), then, according to the feature importance, the features are removed in turn and put into the model training and testing. e optimal feature subset is determined by evaluating the model performance. And the model used in this paper is RandomForest, which can be called by Scikit-Learn.

Stacking Model.
e core idea of stacking is to take the output of the base learner as the input of the meta learner. Base learners can also be called level 0 learners, and the meta learners can be called level 1 learners. Meta learners are stacked on top of base learners, which explains why this model is called stacking model. e structure of a simple stacking model composed of three base models and one meta model is shown in Figure 4.

Cross-Validation.
In the stacking model, the individual learner will train and predict on the original data, and then arrange the prediction results into a new characteristic matrix and put them into the meta learner for learning, which will inevitably lead to the problem of small sample size in the feature matrix of meta learner. When training the stacking model, the original data set is divided into three parts: training set, veri cation set, and test set. e test set is used to detect the e ect of the whole fusion model, so it cannot be used in the training process. e training set is used to train individual learners, which has been wholly disclosed to individual learners, and the prediction result cannot represent the generalization ability of individual learners. erefore, only the verication set can be used to predict and describe the real learning level of individual learners, leading to insucient sample size in the feature matrix of the meta learner.
To solve this problem, cross-validation is used to expand the feature matrix of meta learners. Figure 5 shows the stacking model composed of three base learners and one meta learner. Firstly, threefold crossvalidation is performed on the three base learners, and the training data is divided into three parts. Two of them are used as the training set, and the left is used as the validation set. Each base learner is trained on the training set and performs prediction on the validation set and the whole test set. Repeat this process three times to get the prediction data of three validation sets and the prediction data of the test set. en, the prediction data of the three validation sets are stacked vertically, and the prediction data of the three test sets are averaged. e prediction data of the validation sets of the three base learners are spliced horizontally to obtain the training set of the meta learner, and the prediction data of the test sets of the three base learners are joined horizontally to get the test set of the meta learner. In this way, the problem of small sample size of meta learner is solved.
Generally speaking, before model fusion, all possible algorithms need to be run once, and the algorithm with better performance is selected as the basis of fusion.
Cross-Validation can be used to select the base learners in the stacking model. Speci cally, a better model can be chosen by comparing the CV results of di erent models on the same data set. And the steps of model selection by crossvalidation are as follows: (1) e total training set is divided into k parts, namely, In order to determine the best combination of base learners, eight candidate models are selected and shown in Table 3, all of which are regression models. e cross-validation results are shown in Figure 6, and through Grid-search CV, the hyper-parameters of models are determined, and the nal base learners are shown in Table 4. Generally speaking, the meta learner needs to be a low complexity algorithm, so a simple linear model is selected as the meta learner.

Stacking Model and Improved
Model. Based on the above analysis results, this paper builds a stacking model framework, as shown in Figure 7: However, for the typical stacking model, it is inevitable that the number of features in the feature matrix of the meta learner is insu cient, because an individual learner can only output one set of prediction results, and the number of features in the new feature matrix is equal to the number of individual learners. Taking the stacking model built above as an example, since there are only three base learners, only three features are nally input to the meta learner. To solve this problem, we can consider adding the original feature matrix to the predicted values of individual learners to form a new high-dimensional feature matrix, which can be used as the input of meta learners. And the improved model structure is shown in Figure 8:

Result Discussion
is section will discuss the results of feature selection and spare parts demand prediction based on the stacking model.

Feature Selection and Experimental Data.
By analyzing the operation and support scenarios of equipment, the factors that may a ect the demand for spare parts are obtained. After the digital description, the digital features are obtained, which can be used as the input of the stacking model after feature selection. Figure 1 compares the feature selection method based on the idea of integrated learning proposed in this paper with the other three methods. We eliminate features in turn according to the increasing direction of feature importance, input the features into the   random forest model for training and testing, evaluate the feature selection method according to the model performance and prediction accuracy, and determine the most appropriate number of features. And the experimental results are shown in Figure 9: (1) e ensemble feature selection method proposed in this paper can optimize the model's performance and minimize the prediction error. (2) After eliminating 17 features in turn according to the increasing direction of feature importance, the optimal feature set with the best performance of the model can be obtained. Taking the features before and after feature selection as the input of the three base learners, comparing the model's performance, it is found that the selected features can e ectively improve the model's performance according to Figure 10.

Performance and Result
Discussion. Based on the selected features, the spare parts demand is predicted by the stacking model and the proposed improved stacking model, and compared with the prediction performance and error of the base learners. e results are shown in Figure 11.
For each individual evaluator in model fusion and the fusion model, we perform cross-validation on the training set and nal test on the test set. e goal we hope to achieve is that the results of cross-validation of the proposed model and the results on the test set should be as good as possible, which means that the model's generalization ability is improved. According to the experimental results, we can draw the following conclusions: It can be seen from gure (a) that the stacking model performs better than the other models in the score of crossvalidation, and the proposed improved stacking model signi cantly improves this score, indicating that the generalization ability of the stacking model is much stronger. Figure (b) re ects the di erence between the scores of di erent models in the training set and the test set, and according to gure (b), the base models have a certain degree of over-tting. In contrast, the stacking model e ectively reduces over-tting, and the proposed improved model minimizes this gap. Similar to gure (a), the stacking model and improved stacking model in gure (c) can e ectively improve the score on the test set, that is, it can e ectively improve the generalization ability of the model. Figure (d) shows the prediction errors of di erent models on the test set, which can be e ectively reduced by the stacking model and improved stacking model.

Conclusions
is paper attempts to predict the demand for equipment spare parts through the stacking model. By analyzing equipment operation and support scenarios, some factors that have rarely been considered by predecessors but may a ect the demand for spare parts are obtained. In addition, drawing on the idea of ensemble learning, this paper proposes a new ensemble feature selection method, which can weaken the limitations of single feature selection. e experimental results show that the proposed feature selection method is better than the other three methods, and e ectively improves the performance of the base models. In the past, only a few studies used ensemble learning in spare parts prediction, while fewer studies used a stacking model. In this paper, the stacking model in ensemble learning is used for spare parts demand prediction, and the problem of small number of features in the feature matrix of meta learner in the traditional stacking model is improved. e improved model added the original features as the meta learner's input, which can improve the generalization ability and prediction accuracy. e experimental results show that the proposed model can improve the generalization ability and prediction accuracy, and performed well in the obtained data set. In the future, if the unstructured data generated during the operation and maintenance of equipment can be obtained and considered, for example, text information in maintenance history records, the data set will be expanded, and more in uencing factors that have never been considered may be found.
Data Availability e data supporting this study are available from the corresponding author through e-mail.  (cv_mean is the cross_val_score of models on the validation set, which is R2_Score; train_score is the cross_val_score of models on the training set, which is R2_Score; test_score returns the coe cient of determination R^2 of the prediction; mean_absolute_error is the expected value of the absolute error loss; mean_squared_error is the expected value of the squared (quadratic) error or loss.).