Selecting a Multicriteria Inventory Classification Model to Improve Customer Order Fill Rate

Multicriteria models have been proposed for inventory classification in previous studies. However, it is important to make a decision when a particular multicriteria inventory classification model should be preferred over other models and also if the highest performingmodel remains the highest performing at all times. Companies always look for ways to improve customer order fulfillment process. This paper shows how better inventory classification can improve customer order fill rate in variable settings. Themethod to compare the inventory classificationmodelswith regard to improving customer order fill rate is proposed.The cut-off point is calculated which indicates when a model currently in use should be dropped in favor of another model to increase revenue by filling more orders. Sensitivity analysis is also performed to determine how holding cost and demand uncertainty affect the performance metric. Finally, regression analysis and hypothesis testing inform the decision-maker of how a model’s performance differs from other models at various values of holding cost and standard deviation of demand.


Introduction
In order to efficiently manage inventory, companies use an ABC classification by assigning items to one of three classes so that specific inventory control policies can be applied [1].Class A is considered very important, class B is seen as moderately important, and class C is considered the least important [2].The traditional ABC classification is based on Pareto analysis, and the criterion typically used is annual dollar value usage [3].This criterion is most frequently used in practice because managers pay more attention to the inventory that has a high dollar value [4].
The traditional ABC method is easy to use because it only considers dollar usage.The following studies have suggested that using other criteria in combination makes inventory classification more effective.The criteria that are often employed in multicriteria methods include lead time, commonality, obsolescence, and criticality [2,[5][6][7][8].The analytic hierarchy process (AHP), distance-based modeling, and neural network techniques have also been used in different studies [6,[9][10][11].In later studies, optimization models are used to classify inventory.These are discussed in the literature review.The study focuses on optimization models.
Since the single-criteria method is easy to use, it is most widely employed in classifying inventory.The multicriteria literature claims that more than one criterion should be used to enhance the inventory classification.This argument cannot be validated unless the results of the single-criteria and multicriteria models are compared quantitatively with a common metric.
Inventory classification directly affects the ability of inventory to satisfy customer orders.Poor inventory classification, on one hand, will result in inventory buildup of those items that are not needed to meet customer demand; on the other hand, this may cause inventory shortages of items when they are needed the most.The metric that is usually used in the industry is customer order fill rate, which is defined as the probability of filling an entire customer order within a specified period [12].A comparison of order fill rates obtained by inventory classification from different optimization models will highlight the shortcomings of using one model over other models.This will help a decision-maker in selecting the right model for the data.This has not been addressed in previous studies.
This study fills these gaps.The contribution of this paper is as follows.(1) The performances of the models are compared using order fill rate.(2) Since inventory holding cost and standard deviation of demand may vary with time, this can affect the decision of model selection.Therefore, assessment of models' behavior at varying levels of holding cost and standard deviation of demand is presented using sensitivity analysis.(3) Hypothesis testing is included to understand if the difference in models' behavior is significant at varying levels of inventory holding cost and standard deviation of demand.(4) Finally, a cut-off point method is introduced to understand the revenue impact when the highest performance shifts from one model to another.

Literature Review
Both single-criteria and multicriteria methods are used to classify inventory.Several different techniques are used in multicriteria classification.In general, inventory classification techniques can be divided into nonoptimization techniques and optimization techniques.Studies that use these methods are listed in Table 1.In optimization methods, linear or nonlinear optimization models are used.In these models, items receive an aggregate optimal inventory score based on the objective function [3].Scores of inventory items are maximized, and later they are classified into A, B, or C categories.Table 1 summarizes these models.
Techniques that do not use optimization models may include the analytic hierarchy process, cluster analysis, decision tree, distance modeling, graphical matrix, and neural network.AHP is widely used in earlier studies.This process is based on a pairwise comparison of all criteria, where the user must define the direction and degree of preference.The method is criticized for its limitation which involves subjective judgment when making pairwise comparisons [3].
Since this research focuses on using optimization models in classifying inventory, the literature review discusses only these models.

Multicriteria Analysis Using
Weighted Linear Optimization.Ramanathan [3] proposed an approach called the weighted linear optimization model, which uses multiple criteria.The Ramanathan model (R model) employs a weighted additive function to aggregate the performance of each inventory item into a single score, referred to as the optimal inventory score.The model is a maximization objective function.Weights are automatically assigned when the model is solved.The output also gives the optimal inventory score.Several modifications to this model have been suggested.
Ng [1] presented a modified version of the R model, referred to as the Ng model.He ranked the criteria in descending order and used normalized weights.He also proposed a transformation technique to solve the model without the need of linear optimization software.
Zhou and Fan [31] pointed out shortcomings of the R model, whereby an item can get classified as A when it scores high in less favorable criteria.This leads to an inappropriate classification of ABC items.Their proposed model (ZF model) uses the most favorable and least favorable weights for each item.
Chen [34] further improved the ZF model by introducing a peer-estimation approach.This method selects two common sets of criteria weights, namely, the most favorable and least favorable weights.The resulting scores are then aggregated for each item without any subjectivity.
Hadi-Vencheh [32] observed that, in the Ng model, the final scores of each item do not depend on the weights of each criterion obtained from the model; hence, the items may be inappropriately classified.He modified the Ng model and proposed a nonlinear optimization model (HV model).
Torabi et al. [35] proposed a model to incorporate both quantitative and qualitative criteria.This is the first model of its kind that uses both quantitative and qualitative criteria to classify inventory into A, B, or C categories.The authors refer to it as a modified linear programming model instead of an optimization model.Jeddou [36] used the Ng model to classify vehicle spare parts items.
Park et al. [37] developed a cross-evaluation-based weighted linear optimization (CE-WLO) model, using crossefficiency evaluations in weighted linear optimization to provide a classification of inventory items.Their model (PBB model) performs in a peer-evaluation mode.
Iqbal and Malzahn [38] extended the ZF model and included the descending ranking criteria constraint in the model.They named it modified ZF model.They also proposed a model fitness test.The test evaluates the model's ability to classify items without resulting in infeasibility.The user can compare different models by comparing infeasibility arising from each model before making a choice in model selection for inventory classification.The test results suggest that models that use descending ranking criteria constraint result in low or no classification infeasibility.
It will be useful to know if the classification of items from models using descending ranking criteria also performs any better than other models.In this study, we evaluated the performance of models in fulfilling customer orders.We used order fill rate as a measure of comparison.

Method
The model uses three criteria: lead time, average annual demand, and average unit cost.These criteria have consistently been used in other studies [1,3,31].We considered lead time as the first criterion because the higher the lead time of an item is, the more time it takes to replenish it when stockout occurs, which can affect a company's ability to satisfy customer demand.We included Ng model, ZF model, HV model, PBB model, and modified ZF model in our analysis.However, we excluded models that result in classification infeasibility in Iqbal and Malzahn [38].The exclusions include single-criteria model and R model.The reason to include modified ZF model is to understand if the descending ranking criteria improve the performance of the model.

Order Fill Rate.
To determine how suitable an inventory classification model is, we calculated the order fill rate for each model.Then the order fill rates of models were compared to see which model results in the highest order fill rate.The method to calculate order fill rate is explained in the work of Babai et al. [39].The formulae used in the calculations are as follows: where  is number of inventory items in the inventory system, ℎ  is unit inventory holding cost of item ,   is safety factor for item  against the customer service level (CSL),   is mean annual demand of item ,   is standard deviation of annual demand for item ,   is lead time of item ,   is order quantity of item , FR  is fill rate of item , FR  is overall fill rate of the inventory system, and () is loss function of the standard normal distribution.

Order Fill Rate for Sample Dataset 1.
The order fill rate has been calculated for several multicriteria models.First, sample dataset 1, which contains ten items, is used.This dataset is extracted from Park et al. [37].Holding cost and standard deviation of demand are not known.To calculate the order fill rate, we assumed annual holding cost of 20% of the average unit cost as used in Park et al. [37] and a standard deviation of demand at 2.5%.In the sensitivity analysis section, we will use multiple values of the holding cost and standard deviation, and we will evaluate how these affect the final result.A summary of calculations of the overall order fill rate for the modified ZF model is shown in Table 2.
A summary of the overall order fill rate of all other multicriteria models is shown in Table 3.We excluded the single-criteria method, the R model, and the ZF model because they all resulted in classification infeasibility [38].But, once the descending ranking criteria constraint is added to the ZF model (i.e., modified ZF model), the classification infeasibility does not exist.However, the PBB model resulted in the highest order fill rate.
We also notice that resulting order fill rates from the modified ZF model and Ng model are identical.This is because both models classify items into the same classes in dataset 1 which may not be the case for other datasets where sample size is large.In the next section, we test multicriteria models using 47-item dataset.

Order Fill Rate for Sample Dataset 2.
The second dataset contains 47 inventory items.This dataset is used in Ramanathan [3].We calculated the order fill rate for the multicriteria models.We used a holding cost of 20% of the average unit cost and a standard deviation of 2.5% of demand.The ABC classification of items used in these calculations is  4.
We observed from Table 4 that results are more distinctive compared to dataset 1.We found that the order fill rate of the modified ZF model is better than that of the ZF model, which shows that adding a descending-order constraint improves the performance of the ZF model.These results are consistent with the results of sample dataset 1.In this example, the modified ZF model results in the highest order fill rate.
We notice that, in dataset 1, modified ZF model improved the classification feasibility.And, in dataset 2, apart from improving classification feasibility, it also improved the order fill rate.This infers that using criteria in descending order (some criteria more important, some less) provides an equal or better order fill rate compared to the case when this constraint is not considered in the model.

Sensitivity Analysis.
In calculating the order fill rate, for example, dataset 1, a standard deviation of 2.5% of demand and a holding cost 20% of average unit cost are used.These values can differ, which may change the results of the findings.Sensitivity analysis was done to investigate how the results change at different levels of demand uncertainty and holding cost.

Sensitivity Analysis for Sample Dataset 1.
We used four models for comparison: HV, modified ZF, PBB, and Ng.We did not include the R model and ZF model because inventory classification is infeasible if we use them [38].We calculated the order fill rate at five different values of standard deviation of demand (2.5%, 1%, 0.5%, 0.25%, and 0.1%), and, for each value of standard deviation, we used different values of holding cost, ranging from 0.1% to 50% of the average cost.We used 99% service level for class A items, 95% service level for class B items, and 90% service level for class C items.Some of these plots are shown in Figure 1.Graphs at standard deviation 0.5%, 0.25%, and 0.1% of demand are not shown because they show a similar trend as we have seen at 2.5% and 1%.It is interesting to see how models behave when the standard deviation of demand changes at a constant value of the holding cost.We used an annual holding cost of 20% of the average unit cost and ran the models at different values of standard deviation.Results are shown in Figure 2.
There is a little difference among these four models at lower values of holding cost.The PBB model scores the highest overall fill rate at all values of standard deviation and holding cost.The Ng model and modified ZF model provide the same model value, which is why their lines overlap.Two more important results can be drawn from Figures 1 and  2. First, the difference in fill rate among models is reduced when holding cost is decreased at any given value of demand uncertainty.At a standard deviation of 0.25% of demand or below, when the holding cost reaches 0.001 or 0.1% of the average cost, the difference between the order fill rates of the models becomes nearly zero.Therefore, it becomes irrelevant which model is used for inventory classification when the standard deviation drops below 0.25%, and the holding cost is 0.1% or below.The analysis shows that, at lower values of holding cost, it does not make much difference which model should be used for inventory classification.But care should be taken when the inventory holding cost is high.Second, we find an increasing trend in the fill rate when the standard deviation is decreasing.This means we cannot ignore the effect of the standard deviation of demand when comparing model results.It is always advantageous to perform a sensitivity analysis to determine which model results in the highest fill rate at any given value of standard deviation and holding cost.If a model results in the highest order fill rate at a given value of standard deviation and holding cost, then it is wrong to assume that it will always result in the highest order fill rate at any value of standard deviation and holding cost.
We did a similar analysis at other service levels as well.In the second comparison, we used paired service levels of 95%, 90%, and 85% with classes A, B, and C items, respectively.In the third comparison, we used the pairing of service levels at 90%, 85%, and 80% service for classes A, B, and C items, respectively.The PBB model results in the highest fill rate.This result is shown in Appendix.We observe the same trend as shown in the first pair of service levels.That is, the difference in fill rate is decreasing among models when the holding cost is decreasing.Also, the fill rate improves for all models as the standard deviation of demand drops.

Sensitivity Analysis for Sample Dataset 2. Sensitivity analysis of sample dataset 2 was done by varying the holding cost at different values standard deviation of demand while
calculating the order fill rate.To be consistent, we used the same values for the standard deviation of demand at 2.5%, 1%, 0.5%, 0.25%, and 0.1% and the same values of holding cost ranging from 0.1% to 50% of average cost as those we used in the sensitivity analysis of dataset 1.In the first comparison, service levels of 99%, 95%, and 90% are paired with classes A, B, and C items, respectively.The fill rate is calculated from four models: HV, modified ZF, PBB, and Ng.Results are shown in Figures 3 and 4. Graphs at 1%, 0.5%, 0.25%, and 0.1% standard deviation of demand are not shown because they show a similar trend.
Here we see a similar trend as shown for dataset 1.With a decreased standard deviation, the overall fill rate is improved.Also, the overall fill rate shows an increasing trend with decreasing holding cost.The difference between the results of the four models is reduced when holding cost is reduced.However, the modified ZF model results in the highest overall fill rate in this case at all values of standard deviation and holding cost used in the calculations, except when the holding cost becomes 0.1% of the average cost or drops below this value at the standard deviation of demand at 0.25% or below.This means that the standard deviation and holding cost should be given consideration when selecting the model.Sensitivity analysis will highlight which model results in the highest overall fill rate.We also calculated the overall fill rate at other values of customer service levels.The second pairing of service level was 95%, 90%, and 85% for class A, class B, and class C items.The third pairing of service level was 90%, 85%, and 80% for class A, class B, and class C items.Results for these are the same as shown in the case of pair 1 service level.

Regression Analysis of Difference in Slope.
In Figures 2  and 4, we see an increasing trend in fill rate at lower values of standard deviation.However, the slope of each model appears different.This difference indicates that the performance among models varies with the standard deviation of demand.It is useful in decision-making to analyze the difference in slope of the models.We selected case study 1 (Figure 2) to compare the slopes of different models.The regression analysis to fit the individual model lines shown in Figure 2 was obtained using the software Minitab.Some of these statistics are shown in Table 5.
We used a -test (since  < 25), and the rejection region was set as follows: We find  (0.95,8) = 1.8595. statistic is found from the formula Results of the hypothesis test are shown in Table 6.The comparison is made at 95% confidence interval.We already know from Figure 2 that the PBB model results in the highest fill rate and the difference between the PBB model and other models shrinks when the standard deviation of demand decreases.Table 6 provides insight that this difference is significant.Users should be careful in selecting a model when the variation in demand is high.A model can result in better performance at one value of standard deviation, but, at other values, another model might outperform the first model.

Discussion
In this study, we present a method for comparing the performances of the models.The sensitivity analysis, regression analysis, and hypothesis test are used to understand the effect of variability in inventory holding cost and standard deviation of demand on the decision of model selection.The analysis also determines if descending ranking criteria have any significance in improving the results and performance of the model.We used two datasets for the analysis.We evaluated the performance of the model in fulfilling customer orders.The metric used in the industry is "overall fill rate."Inventory classification that is received from each model is used in calculating the order fill rate.Results were then compared to determine where each model stands in terms of satisfying customer orders.In the first dataset, we found that the PBB model leads other models.In the second dataset, we found that the modified ZF model shows the best performance.Therefore, selection of a model to perform inventory classification depends on the dataset.
In a comparative analysis, it was also revealed that, after adding descending-order criteria to the ZF model, the resulting order fill rate was improved.This indicates that giving unequal weights to the criteria enhances the performance of the model.
Sensitivity analysis showed that holding cost and standard deviation play very important roles in determining which model results in the highest order fill rate.By varying these values, we analyzed the response of different models when compared.We discovered that the difference in the order fill rate of models is reduced as the holding cost and standard deviation of demand decrease.At one point, the difference is negligible.Below this point, it becomes insignificant which model is used for inventory classification.A given model cannot outperform other models at all values of holding cost and standard deviation of demand when comparing the order fill rates.Sensitivity analysis helps in making an informed decision about model selection at a given value of holding cost and standard deviation of demand.
Results from regression analysis and hypothesis testing showed that the difference between slopes of the models is significant, thus indicating that the best performing model may not be superior to other models when the standard deviation of demand is reduced.Selecting a model for the best performance depends on the standard deviation of demand and holding cost.Users should take this into consideration when doing a comparative analysis for model selection.
If we further extend this analysis, it is useful to find the cut-off point where the performance of originally superior model becomes equal to another model.When we know these cut-off points, it becomes easier to select a model which gives the highest order fill rate.

Conclusion
This paper compares the performances of various models using the order fill rate.The study presents an analysis to show that the best performing model does not remain the best at all times.A method to calculate a model cut-off point is also shown where the highest performing model loses to another model.Two different datasets are used in this study.Sensitivity analysis, regression analysis, and hypothesis test are also included to understand the usefulness and significance of the results.
The study shows that the inclusion of descending criteria constraint improves the performance of the model.This means that when criteria are set in order of importance, the resulting classification of inventory items produces better order fill rate.As was seen in the case of the ZF model, when the modified ZF model, which includes the descending ranking criteria, is used, the order FRT of the model is improved.We also found an important relationship between the results of each model when the holding cost and standard deviation of demand vary.Regression analysis shows that the slopes of models are significantly different.Sensitivity analysis confirms that the end result of models comparison does not change at other customer service levels.However, the lead of the highest scoring model shrinks when the holding cost and standard deviation of demand decrease.The study also shows that selection of a model depends on the dataset and values of the holding cost and standard deviation of demand.
The method explained in this study tends to improve the order fill rate if the results of comparative evaluation of models are used in selecting the model.For example, once a cut-off point of a model is determined and achieved, the user can switch to new highest performing model.By doing this, order fill rate is improved which results in an increase in revenue in a specified period.
For future research, it would be interesting to compare the results when a particular product has a variable lead time, especially when a product is purchased from more than one supplier, as each supplier may have different lead times for the same product.

Figure 1 :Figure 2 :
Figure 1: Overall order fill rate from multiple models for dataset 1.

Figure 4 :
Figure 4: Order fill rates at different values of standard deviation of demand with constant holding cost.

Table 2 :
Summary of calculations of overall order fill rate for modified ZF model (first dataset).

Table 3 :
Summary of overall order fill rate for all other multicriteria models from dataset 1.

Table 4 :
Summary of overall order fill rate for multicriteria models from dataset 2.
shown in Appendix.We set a 99% service level for class A items, 95% service level for class B items, and 90% service level for class C items.Results are shown in Table

Table 5 :
Some output of regression lines.

Table 6 :
Test results of slopes comparison (for dataset 1).