Combinatorial Efficiency Evaluation: The Knapsack Problem in Data Envelopment Analysis

The traditional data envelopment analysis (DEA) literatures generally concentrated on the efficiency evaluation of single decision making unit (DMU). However, in many practical problems, the decision makers are required to choose a number of DMUs instead of a single one from the DMUs set. Therefore, it is necessary to study the combinatorial efficiency evaluation problem which can be illustrated as a knapsack problem naturally. It is indicated that the basic model proposed by Cook and Green may have some drawbacks and a modified model, which is combined with the super efficiency model, is proposed in this paper. What is more, our proposed model is more persuasive to the decision makers because it is able to provide a unique best combination of DMUs. An adapted local search algorithm is developed as a solver of this problem. Finally, numerical examples are provided to examine the validity of our proposed model and the adapted local search algorithm.


Introduction
Data envelopment analysis (DEA) was first introduced by Charnes et al. [1] in 1978. DEA is an effective method of evaluating the relative efficiency of decision making units (DMUs) which consume multiple inputs to produce multiple outputs. After more than thirty years' development, DEA has become a significant and active research area; see [2][3][4] as reviews of DEA. The traditional DEA literatures, which we called individual efficiency evaluation here, generally concentrated on the efficiency evaluation of single DMU. However, in many practical problems, such as projects selection or technology evaluation problem, the decision makers are required to select a number of DMUs instead of a single one. According to Cook and Green's research [5], traditional individual efficiency evaluation is not adequate to support the decision making in these problems mainly because a combination of individually efficient DMUs is not necessarily still efficient within all the possible combinations. Therefore, it is necessary to do further research on the combinatorial efficiency evaluation problem which means the efficiency evaluation of multiple DMUs combined together under the DEA framework and can be illustrated as a 0-1 knapsack problem naturally [5].
The combinatorial efficiency evaluation is a relatively new problem and some related researches can be found in the projects selection and technology evaluation problem. Oral et al. [6] firstly proposed a DEA-based multistage methodology in the R&D projects collective evaluation and selection problem. Cook and Green's [5] research afterwards may be the primal article that combined the DEA model and knapsack problem together to solve the R&D projects selection problem. Loch et al. [7] and Beaujon et al. [8] did their researches on the projects selection problem with different mathematical programming models. A DEA-based methodology was developed by Eilat et al. [9] considering the interactions between R&D projects. Vitner et al. [10] used DEA to compare the project efficiency in a multiproject environment. Based on Cook and Green's research, Chang and Lee [11] extended the problem into the fuzzy case. DEA was also used in a methodology proposed by Khalili-Damghani et al. [12]. Tavana et al. [13] introduced a fuzzy DEA model for a high-technology projects selection problem at NASA. 2 The Scientific World Journal Cook and Green' paper [5] is considered as a basic research on the combinatorial efficiency evaluation problem; some other papers [11,13] are based on their research. However, during our research, we find that there may be some mistakes in Cook and Green's model. As mentioned in [10], each project in the projects selection problem is necessarily a one-time nonrepeated event. But in Cook and Green's model, this basic fact was neglected and the same project was possible to be selected more than once. Considering the drawbacks of Cook and Green's model, we proposed a new DEAbased methodology which is combined with the knapsack problem model and super efficiency model. Comparing with the previous researches, the efficiency evaluation based on our proposed methodology is more practically based on the fact that each DMU is allowed to appear only once in a combination. Our proposed methodology is also more persuasive to the decision makers because it is able to provide a unique best combination of DMUs. What is more, our proposed methodology is combined with the super efficiency model and therefore is more powerful in discriminating the efficient combinations of DMUs. An adapted local search algorithm is provided as a solver of the knapsack problem.
The rest of this paper is organized as follows: Section 2 is the problem formulation with some comments on Cook and Green's model. Section 3 illustrates our proposed model in detail. Section 4 described the adapted local search algorithm briefly. Section 5 is the application into numerical examples. Finally, we give the conclusions in Section 6. ] T and = [ 1 , 2 , . . . , ] T are usually used to denote the inputs and outputs of DMU , in which = 1, 2, . . . , . And the basic efficiency evaluation model of DMU 0 ( 0 = 1, 2, . . . , ) is as follows:

Problem Formulation
0 ] T are the optimal weights assigned to the inputs and outputs, respectively, and ℎ 0 is the efficiency score of DMU 0 ( 0 = 1, 2, . . . , ). Although many different forms of DEA models have been developed [2][3][4], we use the basic CCR model (model (1)) here to illustrate our thoughts on combinatorial efficiency evaluation. Some further studies are possible to extend the combinatorial efficiency evaluation into other DEA models.
It can be found that, just like the basic CCR model, most of the traditional researches are concerned with the individual efficiency evaluation of single DMU but pay no attention to the combinatorial efficiency evaluation of multiple DMUs. As mentioned before, sometimes the decision makers are required to choose a number of DMUs from the alternatives instead of a single one, and, therefore, the combinatorial efficiency evaluation is needed. Before extending our research to the combinatorial efficiency evaluation problem, we give some definitions first for convenience.   (2) The number of CDMUs in is denoted by | |, and, for CDMU ( = 1, 2, . . . , | |), a binary 0-1 vector = [ 1 , 2 , . . . , ] T is used to denote the combination as follows: It is supposed that all the DMUs in DEA problem are neither synergistic nor interfering [5]; therefore the inputs and outputs of the CDMUs are directly the sum of its elements' inputs and outputs. For example, the inputs = There are usually some constraints on the inputs and outputs of CDMUs and therefore not all the CDMUs in the possible-combination set are rational in practice. In a CEE problem, what we are concerned with is the CDMUs that satisfied the constraints instead of the entire possiblecombination set. For this reason, we give some further definitions about the possible-combination set .
Definition 4. The rational-combination set ( ) is a subset of in which the CDMUs satisfied some constraints on inputs and outputs, and it is described as follows: The Scientific World Journal 3 where 0 and 0 are the constraints on inputs and outputs of CDMUs, respectively. In some simplified situation, there may be no constrains on the outputs and the rational-combination set can be described as follows: Definition 5. The desired-combination set ( ) is a subset of in which the CDMUs are not allowed to combine with any other DMU under the restrictions of 0 , and it is described as follows: What the decision makers are concerned with the most is the efficiency evaluation of CDMUs in the desiredcombination set that make full use of the budgets. It should be noted here that the DMUs in a CDMU are naturally one-time nonrepeated events and a DMU is not allowed to appear repeatedly in a combination. Therefore, the basic combinatorial efficiency evaluation model of CDMU 0 ( 0 = 1, 2, . . . , | |) in desired-combination set is as follows: As CDMU is a combination of DMUs, the CEE model of CDMU 0 ( 0 = 1, 2, . . . , | |) can be illustrated as an equivalent 0-1 knapsack problem model as follows, in which the binary 0-1 vector 0 = [ 1 0 , 2 0 , . . . , 0 ] T is the solution we need: There are mainly two drawbacks in model (8). Firstly, as mentioned in [5], the quantity of restrictions (8.1) would grow too fast as the number of DMUs becomes bigger, and this would affect the practicability seriously. Secondly, there would be more than one CDMU to be evaluated as efficient based on the basic model (8). This would be less persuasive to the decision makers to make a choice between the CDMUs.
These two problems would also exist in model (9) and some further discussion is needed to improve the usability of models (8) and (9).

Some Comments on Cook and Green's Model. Cook and
Green introduced a resource-constrained DEA approach combined with the knapsack problem in the project prioritization problem in order to select a subset of projects from a larger set of proposals [5]. Cook and Green's paper may be the first research about the CEE problem, and some other papers have done research based on Cook and Green's work [11,13]. However, during our research, we found that there are some drawbacks in Cook and Green's model and some comments are provided in this section.
In a project selection problem, the decision makers are required to select a number of projects with constrains on the project cost. The alternative projects are considered as the DMUs in DEA problem and the individual efficiency of each project can be calculated by model (1). Model (8) was also used as the basic model in Cook and Green's paper and, in order to overcome the first drawback of model (8), Cook and Green introduced the following model: Comparing Cook and Green's model (10) with the basic model (8), we can find that restrictions (8.1) were replaced by restrictions (10.1) which are the same with the basic CCR model (1). It is effective to reduce the quantity of restrictions in model (8) but some problems arose too.
The inequality restrictions in a DEA model shaped the efficient frontier of the production possibility set (PPS) which is used as the benchmark of evaluating all DMUs. A fundamental fact, which was neglected during Cook and Green's transformation, is that the efficient frontier has changed during the combination of DMUs. In Cook and Green's model, the combinatorial efficiency of CDMUs is evaluated by the original efficient frontier of DMUs, and this would be inappropriate for some CDMUs. Cook and Green's model also neglected the fact that a DMU is not allowed to appear repeatedly in CDMUs.
In a word, the major drawback of Cook and Green's model is that it had not realized the differences between the PPS of set and set . The purpose of Cook and Green is to evaluate the efficiency of CDMUs in set , but what they used was the PPS of set . It is obviously inappropriate and that will result in some efficient CDMUs being evaluated to be inefficient by error.
In the following, a simple numerical example is provided to illustrate the drawbacks of Cook and Green's model. Suppose that there are 4 DMUs shown in Table 1 with one input and two outputs. The cost of the CDMUs is restricted to be no more than 2 units; therefore, all the possible 4 The Scientific World Journal combinations are provided in Table 2. The purpose of this CEE problem is to evaluate the combinatorial efficiency of CDMUs in set . And the evaluation results by models (8) and (10) are compared in Table 3.
Comparing the evaluation results in Table 3, we can find that the CDMU {3, 4}, which should be efficient in the combination set , is evaluated incorrectly by model (10). This is mainly because some impractical combinations, such as {1, 1}, {2, 2}, {3, 3}, and {4, 4}, are used to shape an impractical efficient frontier in Cook and Green's model (10); see Figure 1. And the combinatorial efficiency evaluated by the impractical efficient frontier is certainly incorrect.

Proposed Model for the CEE Problem
In order to overcome the drawbacks of model (8) and model (10), a new combinatorial efficiency evaluation model is proposed in our paper. There are mainly three advantages of our proposed model: (a) our model is combined with the super efficiency model and therefore the best CDMU can be provided to the decision makers; (b) in the meanwhile, the quantity of restrictions is reduced to be acceptable in our model; (c) finally, evaluation in our model is based on the super efficient frontier and no efficient CDMU would be incorrectly evaluated.
The super efficiency model is a series of DEA models in order to achieve a full ranking of both efficient and inefficient DMUs [14][15][16][17][18]. A basic super efficiency model is provided by Andersen and Petersen in [14] as follows: The main idea of the AP model is to evaluate the efficiency of a target DMU by excluding it from the DMUs set. The practical meaning of AP efficiency is a measure of how much a DMU can extend the PPS. This is persuasive enough to the decision makers in reality. Although infeasibility would happen in some cases and many papers have done research to solve the infeasibility problem [15][16][17][18], our interest is not to improve the super efficiency model. It should be noted here that any improved super efficiency model can be used in the CEE problem, and the emphasis of our paper is to provide a methodology of solving the CEE problem combined with the super efficiency model. For this reason, we give our proposed model for the CEE problem as follows: What is more, the equivalent knapsack problem model can be formulated as follows: Model (13) is used as the fitness measure of solutions in the knapsack problem, and the optimal solution 0 = [ 1 0 , 2 0 , . . . , 0 ] T is the result we need. And it is assumed that all the CDMUs we considered here belong to the desiredcombination set .

An Adapted Local Search Algorithm
An adapted local search algorithm is developed in this section to solve the CEE problem with the knapsack problem formulation. The local search algorithm is simple but effective in solving combinatorial optimization problems [19][20][21]. Considering the fact that the emphasis of our research is proposing a methodology for the CEE problem, the basic local search algorithm is used here with some adaptations. The pseudocode of the basic local search algorithm is provided in Pseudocode 1, and some adaptations are illustrated afterwards.

Generate an Initial Solution.
As mentioned in many literatures [19][20][21], the initial solution has great effect on the performance of the local search algorithm. There are mainly two problems in generating the initial solutions: (a) how to generate a feasible solution; (b) how to generate a relatively good initial solution. For the first problem, we use the constraints in Definition 5 to make sure that all the initial solutions are feasible and belong to the desired-combination set . For the second problem, we introduce an assumption The Scientific World Journal that the combination of efficient DMUs is more efficient than the combination of inefficient DMUs. Therefore, when generating an initial solution, we choose an efficient DMU first and then add other DMUs to the combination randomly until the solution satisfies the constraints in Definition 5.

Numerical Examples
In this section, two numerical examples are provided to demonstrate the validity and effectiveness of our proposed model. By the first example, it is demonstrated that our proposed model is able to provide a unique best combination of DMUs and therefore would be more persuasive to the decision makers. By the second example, the superiority of our proposed model is demonstrated in different scenarios comparing with Cook and Green's model. What is more, the CEE problem considered as a 0-1 knapsack problem here is solved by the adapted local search algorithm in Section 4. Example 1. The first numerical example has been introduced in Section 2.2 to illustrate the drawbacks of Cook and Green's model. It is used here again to demonstrate that, comparing with Cook and Green's model, our proposed model is able to provide a unique best solution to the decision makers and therefore would be more persuasive in practice. The data of four DMUs is provided once more in Table 4. It is supposed that the input of CDMUs should be no more than 2 units and a comparison between Cook and Green's model and our proposed model is provided in Table 5.
The inefficient CDMUs are not shown in Table 5, and, by the application of our proposed model, the best combination we found is {1, 2} which achieves a super efficiency of 1.2857. And by the comparison in Table 5, the validity and effectiveness of our proposed model can be demonstrated in three points: (a) some impractical CDMUs have been eliminated by our proposed model based on the fact that each DMU is allowed to appear only one time in a CDMU; (b) our proposed model is more powerful in discriminating the CDMUs and is able to provide a unique best combination which would be more persuasive to the decision makers; (c) in our proposed model, efficient CDMUs are prevented from being incorrectly evaluated by Cook and Green's model, such as the CDMU {3, 4}.
Example 2. The second numerical example is selected from Oral's paper [6] in which 37 research and development projects in Turkish iron and steel industry were evaluated and selected collectively. This example was also used in Cook and Green's paper [5]. In this example, 37 projects are considered as the DMUs with one input and five outputs as follows:     The input and outputs data is provided in Table 6. It is supposed that the budget restriction is 1000 units and, as mentioned in [5], the average cost of these 37 projects is 67.99 and there would be approximately 15 projects selected in a combination.
The common projects selected by all three methodologies are defined as core projects according to Cook and Green's research [5], and in this numerical example, the core projects are {1, 16,17,18,23,26,27,31,35, 36}. In the meanwhile, the six distinct projects in our proposed combination are {10, 11, 12, 21, 29, 30}. By comparing the costs of these three solutions, we can find that our proposed solution is better at making full use of the budgets. And finally, what is the most important, our proposed solution achieves the best combinatorial efficiency of 1.3186, and the combinatorial efficiency of Cook and Green's solution evaluated by model (13)   in which different budget restrictions are introduced into Example 2. The comparison with different budget restrictions is also shown in Figure 2. By the comparison between Cook and Green's solutions and our proposed solutions with different budget restrictions; it can be found that our proposed model generally achieves a better combinatorial efficiency than Cook and Green's model and only when the budget restriction is 500 these two models achieve the same result. It should also be noted that the combinatorial efficiency scores calculated by a certain CEE model are incomparable according to different budget restrictions. For example, the combinatorial efficiency under budget restriction 200 is incomparable with the combinatorial efficiency under budget restriction 500, even for the same CEE model.

Conclusions
Data envelopment analysis (DEA) is generally an effective methodology of evaluating the relative efficiency of single decision making unit (DMU). However, in some practical problems, the decision makers are required to choose a group of DMUs instead of a single one. Therefore it is necessary to study the efficiency evaluation of multiple DMUs within a larger DMU set, and this relatively new problem is named as combinatorial efficiency evaluation (CEE) in our paper. By modifying some drawbacks in Cook and Green's model, a new combinatorial efficiency evaluation model is proposed based on the concept of knapsack problem and super efficiency model. Our proposed model is more logical in practice, and in the meanwhile, our proposed model is able to provide a unique best combination of DMUs which is more persuasive to the decision makers. Numerical examples are provided to demonstrate the validity of our proposed model compared with some other methods. It should be noted that our research in this paper is based on the CCR model and super efficiency model in constant return to scale (CRS) case, and some further studies are possible to extend the CEE problem into variable return to scale (VRS) case.