SAMPLING SIZE AND EFFICIENCY BIAS IN DATA ENVELOPMENT ANALYSIS

In Data Envelopment Analysis when the number of decision making units is small the number of units of the dominant or e cient set is relatively large and the average e ciency is generally high The high average e ciency is the result of assuming that the units in the e cient set are e cient If this assumption is not valid this results in an overestimation of the e ciencies which will be larger for a smaller number of units Samples of various sizes are used to nd the related bias in the e ciency estimation The samples are drawn from a large scale application of DEA to bank branch e ciency The e ects of di erent assumptions as to returns to scale and the number of inputs and outputs are investigated


Introduction.
Data Envelopment Analysis (DEA) has become an important tool for the comparison of units in terms of e ciency and has been applied to many elds, see Charnes, Cooper, and Rhodes (1978), Banker, Charnes, and Cooper (1984) and Charnes, e.a. (1995). Its advantages are well known. Any n umber of inputs and outputs can be included in the comparison and no speci c functional form of their relationship is assumed. Constant, variable, increasing and decreasing returns to scale can be accommodated. However, some di culties related to the method have not been addressed so far.
In DEA the units of the dominant set, for which n o c o m bination of other units exists with lower inputs for the same outputs, are assigned e ciencies of 100%, and other units are expressed in terms of this dominant set. But these units are not necessarily e cient, they are merely dominant, which means that no other units were found that were more e cient. If the units of the dominant set are in reality less than 100% e cient, DEA overestimates their e ciency. The same is then true for the other, non-dominated units. This means that DEA e ciency scores overestimate e ciency and are biased. This has been recognized in theoretical work from Farrell (1957) to Banker (1993), but in applied work this is seldom mentioned.
The bias will depend on the relative size of the dominant set, because the smaller the relative size of this set, the larger the likelihood that its units will be 100% e cient. The size of the dominant set depends on many factors. Apart from 52 ALIREZAEE, HOWLAND AND VAN  the distribution of e ciencies of the units, the most important seem to be the total number of units in the analysis, the number of inputs and outputs, and the assumption as to returns to scale. This study is an attempt to shed light on these relations by using sampling from the units of a large scale DEA application. These units were 1282 branches of a major Canadian bank. The average e ciency found in this study was 50%, which di ers from the results found in comparable bank branch studies, such as Sherman and Gold (1985), Parkan (1987), Oral and Yolalan (1990), Vassiloglou and Giokas (1990), Giokas (1991), Tulkens (1993), Sherman and Ladino (1995) and Scha nit, c.s. (1995). Table 1 gives an overview of the characteristics of these studies as well as the average e ciencies found. Though these studies di er in many respects, there is a general tendency for the average e ciency to go down as the number of Decision Making Units (DMU's) increases.
In order to study the impact of the number of units on DEA e ciency measurement, this paper uses sampling from the 1282 branches for two di erent con gurations of inputs and outputs and for two di erent assumptions as to returns to scale. The order of discussion is as follows. First some general theoretical background is given, followed by an explanation of the sampling. Then the data and the models of the bank branch study are described. The sampling experiments and their results are given in the next section, while the last section contains conclusions that are drawn from these results.

Data Envelopment Analysis and E ciency.
Data Envelopment Analysis provides a measure of the e ciency of a decision making unit (DMU) relative to other such units, producing the same outputs with the same inputs. DEA, which was developed by Charnes and Cooper and Rhodes SAMPLING SIZE AND EFFICIENCY BIAS 53 (1978), is related to the concept of technical e ciency and can be considered as a generalization of the Farrell (1957) e ciency measure.
Consider a number of comparable units, represented by the index k, w h i c h h a ve a number of inputs with index i and a number of outputs with index j. The quantity of input i for unit k is then given by x ki and its quantity of output j by y kj . The e ciency of the unit k = 0 relative to all units is then determined by the following linear programming problem: Minimize with respect to and all k 's g = subject to P k x ki k = x 0i , for all i, P k y kj k = y 0j , for all j, k = 0, for all k.
This problem can be interpreted as that of nding a linear combination of all DMU's producing at least the same outputs as DMU 0 but using at most a fraction of its inputs, with to be minimized. For 0 = 1 , ? = 1, so that the e ciency has an upper bound of 1.
The formulation given above is input oriented. A similar output oriented formulation is possible, as well as equivalent formulations corresponding to dual linear programming problems.
Consider the following example for ve DMU with the same output of 1: k DMU Input 1 Input 2 1 A 5 0.5 2 B 2.5 1 3 C 1 2.5 4 D 0.5 5 5 E 3 3 Figure 1 gives the graphical representation. Points A, B, C, a n d D are situated on the e ciency frontier, while point E is not e cient as it uses more inputs than B and C.
The optimal solution is 2 = 3 = 0 :5, 1 = 4 = 5 = 0, and = 7 =12 = 0:5833. This corresponds with point F, which is a linear combination of B and C, producing one unit but using only 0:5833 of the inputs of point E. Hence the e ciency of E is 0:5833 or 58:33%.
The formulation given above i s the one given in Charnes, Cooper, and Rhodes (1978), which assumes that production functions have constant returns to scale. This is frequently not realistic, because a small unit may be not made comparable to a large one by simply reducing inputs and outputs by some factor. This can be avoided in a formulation given by B a n k er, Charnes, and Cooper (1984)   increasing and decreasing returns to scale. This is achieved by adding to the linear programming problem the convexity constraint X k k = 1 :

Data Envelopment Analysis and the Production Frontier.
Data Envelopment Analysis provides estimates of the production frontier. Banker (1993) has shown for the multiple input, one output case and variable returns to scale that under fairly general assumptions the DEA estimator of the production frontier can be interpreted as a Maximum Likelihood Estimator which is biased, but consistent. He indicated that similar results can be obtained for the multiple output case and for constant returns to scale. If the number of DMU's is small, the dominant set resulting from any application of DEA does not exhaust all possible con gurations of inputs and outputs and may contain units that are dominated by units that were not included. An overestimation of e ciency may result, which may be high for cases with few DMU's, but which will tend to zero in probability a s t h e n umber of DMU's increase.
We may also wish to compare a model with another one in which inputs or outputs are aggregated. The linear programming nature of DEA implies that the latter model cannot have higher e ciencies than the former, and will in general have lower e ciencies. This leads to the general expectation that models with a higher number of inputs and outputs will have higher DEA e ciencies. In the following, models with aggregated and disaggregated inputs and output will be compared. For sampling from the units of a DEA application, a framework must be indicated. Consider an in nitely large set of decision making units for which d a t a a r e a vailable, and a DEA model with a given number of inputs and outputs and a returns to scale assumption. It is further assumed that the e ciencies of the units are given by a n application of DEA to this in nite set, and that the units in the dominant set are 100% e cient. The DEA e ciencies of other units will vary from 100% downwards and will be considered as the real e ciencies.
If samples of various sizes are taken from this in nite set, and DEA is applied to these samples, e ciencies will be found that are generally di erent from the real e ciencies determined from the in nite set. Of particular importance are the e ciencies of the dominant set of the sample. If their real e ciencies are not 100%, the sample e ciencies of these units will be biased and overestimated. Since the e ciencies of the other units in the sample are based on those of the dominant set in the sample, they will be biased in the same direction and to a similar extent.
This can be proved as follows. For the input oriented formulation of DEA, the e ciency score compares the required inputs of the sample e cient set with those of the evaluated set. If the sample e cient set contains units that are ine cient i n the in nite set, a lower input combination can be found using the e cient units of the in nite set.
We may consider taking a number of samples of a certain size, apply DEA to each of the samples, and analyze the results in terms of the number of units in the dominant set, their e ciencies, and the e ciencies of the other units, and compare these with the real e ciencies. By varying the size of the samples, information is obtained about the overestimation of e ciencies related to the sample size.
Data for an in nite number of decision making units are not available, unless they are simulated, which has its own di culties. Instead, a large nite number of units for which data exist may b e used, from which samples can be taken. In this case, data for 1282 bank branches were available. This number seemed large enough for practical purposes, but the sampling experiments may indicate to what extent this is true. The results of an application of DEA to these data are used as an approximation of the real e ciencies, with which sample e ciencies can be compared.

Data and Model
The data originate from a major Canadian bank with a large branch network with 1282 branches. Branch size varies, with the largest branches having a size of 100 times that of the smallest. Figure 2 presents these data in terms one input, (salaries), and one output, (revenues), showing the range of the data.
Di erent assumptions as to returns to scale can be made (see Charnes Here the two most important possibilities will be chosen, namely Constant Returns to Scale (CRS) and Variable Returns to Scale (VRS). The model is further determined by the choice for the inputs and outputs for DEA purposes. Here we shall use the so-called production approach ( s e e Ferrier and Lovell (1990)) or value added approach, see Berg, F rsund, and Jansen (1990), where the volumes of the di erent kinds of deposits and loans are considered outputs. Inputs are de ned in terms of various kinds of costs. Two models were considered, one with 3 inputs and 3 outputs, and one with 6 inputs and 12 outputs.
For the rst model, the following inputs and outputs were used:

The Sampling Experiments.
Samples without replacements are taken from the 1282 units representing bank branches of a major Canadian bank. As the number 1282 is close to 1280 = 10 27, sample sizes of 640, 320, 160, 80, 40, and 20 are used. The number of samples taken for each sample size is 10. Two c hoices for inputs and outputs were made, with the rst having 3 inputs and 3 outputs (case 3I,3O) and the second 6 inputs and 12 outputs (case 6I,12O). Both constant returns to scale (CRS) and variable returns to scale (VRS) were considered. Altogether DEA was applied 4+10 6 2 2 = 244 times and 55,528 linear programming problems were solved. Calculations were performed with the General Algebraic Modeling System (GAMS) and spreadsheets.
For any particular sample, the e ciencies obtained when DEA is applied to these units only may be compared with the e ciencies of the same units when evaluated by DEA using all units. In Figure 3 these e ciencies are compared for a sample of 40 for the 3I,3O,CRS case. All points are on or above the 45-degree line, since the sample e ciencies cannot be lower than the full set e ciencies. In the sample evaluation there are 8 units that are 100% e cient, of which only one unit was 100% e cient in the full set evaluation. In the following, the average results for 10 samples of di erent sample sizes and model speci cations are discussed.   Table 2 gives an overview of the results for this case in terms of the dominant set.  This implies that if a sample of size SS units is taken, the units that constitute the dominant set and that are therefore declared 100% e cient, can be expected to have in reality an e ciency of 0.503SS0.089. The other less e cient units of the sample may b e g i v en a similar e ciency correction, though this will result in some underestimation, as they are compared with the dominant units of the sample instead of the dominant units of the population.
A maximum percentage could be given for the number of units in the dominant set. If this percentage were set, somewhat arbitrarily, at 5%, then, according to Table 2, the sample size should be at least 320. This average e ciency of the dominant set at 5% depends, of course, on the real e ciencies of the units. Table  1 indicates that in this case the average e ciency of the maximum dominant set is 84%, which implies a bias of 19%. 60  Consider now the number of units in the dominant s e t . For the 10 samples of size 640, the average was 17.2, he lowest number found was 9, and the highest 24, with the average minus and plus twice the standard deviation at 14.3 and 20.0. Even though the distribution cannot be normal, since the number of units is discrete, this gives a good idea of the variability.

ALIREZAEE, HOWLAND AND VAN DE PANNE
For the full set, the number of units in the dominant set was 15, which can be considered as the result for one sample of 1282 units. For increasing sample sizes, the average number in the dominant set increases, but if upper and lower limits of twice the standard deviation are taken into account, it seems that for sample sizes above 100 the average does not change signi cantly. An asymptotic value between 1 6 a n d 1 8 m a y be assumed from the results given in Table 2. 6.2. Results for the 3I,3O,VRS Case. L e t u s n o w consider the results for variable returns to scale. For the full set of 1282 units, the number of units in the dominant set was 33, and the average e ciency score was 54%, which is signi cantly higher than the 50% found for constant returns to scale. This is probably related to the increased size of the dominant set, which i s now 3 % o f t h e total number of units. Note that in the CRS case, a 3% dominant set for 640 units has an average full set e ciency of 91%, which implies an overestimation of 11% if a sample of 640 is used. Table 3 gives the results for the dominant sets. From the second and the last two columns of this table, it may be concluded that the average number of units of the dominant set has stabilized for sample sizes above 320 to about 31, with 33 as just the sample value for 1282 units.
Also here an increased sample size leads to smaller errors in the estimation of the production frontier. The e ciency of the dominant set as a function of sample size can be estimated as indicated in the CRS case. The following result is found: E = 0 :54SS 0:075 (R 2 = 0 :91 t = 1 6 :3): If the 5% rule is used, a sample of 640 units is needed to reduce the dominant set to 5%. The average full set e ciency for such samples is 93%, which implies that the sample e ciency overestimates the full set e ciency by 7.5%. To this could be added the bias resulting from a larger dominant set for 1282 units. The assumption of variable returns to scale approximately doubles the size of the sample required to have the same accuracy as in constant returns to scale.
The size of the dominant set, which is 33 for the full set is about double that for CRS. It does not appreciably change for samples of 320 and higher. A larger dominant set must lead to higher average e ciency than in the CRS case. In accordance with this, we n d a n a verage e ciency of 54% versus 50.3% in the CRS case.
If inputs and outputs are disaggregated into separate parts, the number of inputs and outputs increases. This has an impact on the results of DEA. Here we consider the case in which the three inputs are increased to six by splitting up the salaries in various groups, and instead of three outputs, twelve are taken by dividing up deposits, retail loans, and commercial loans according to the type. Table 4 gives the results for the constant returns case.
The average number in the dominant s e t n o w increases over the entire range of the sample size. If it has a nite asymptotic value, it is probably much larger than 1282. In practical cases such large samples can almost never be obtained. The average e ciency, which is 79%, is much higher than in the 3I,3O case, which can be explained by the substantial bias induced by t h e large relative size of the dominant set.  which gives a t less satisfactory than before. The main reason for this is that the average e ciency for the 1282 set of 79% is somewhat out of line with the other observations. This is probably related to the fact that the dominant set for 1282 is far from 100% e ciency. The percentage in the dominant set is even for 1282 units equal to 20%. Only for samples larger than a few hundred units is the dominant set smaller than 50%. It is obvious that the results for this case are not realistic in the sense that the e ciency scores do not re ect real e ciency.
For the corresponding variable returns to scale case, the results are given in Table  5. The percentage in the dominant set seems to go below 50% for about 500 units. It could be that also here a doubling in the number of units is required to obtain the same accuracy as in the constant returns case. There seems to be no realistic number of units that will reduce the dominant set to 5% or less.

Conclusions.
In applications of Data Envelopment Analysis it is implicitly assumed that the units of the dominant set are 100% e cient. This assumption leads to a biased e ciency evaluation, which tends to be larger when the number of units is smaller.
Sampling from a large number of units from the data of a bank branch study made it possible to analyze the impact of the number of units on the e ciency scores. This was done by studying the absolute and relative size of the dominant set for varying numbers of units, for di erent returns to scale assumptions, and varying numbers of inputs and outputs.
It was found that for 3 inputs and 3 outputs and constant returns to scale, a reasonably accurate estimation of e ciency was possible if the number of units was at least a few hundred. For the corresponding variable returns to scale model, this number should be roughly doubled. For the 6 inputs, 12 outputs cases, the relative size of the dominant set is too large for any realistic number of units to yield reasonable accuracy in e ciency measurement.
The results obtained are valid for the bank branch data and model. This does not preclude that in other cases di erent results will be found. However, it must be expected that in most e ciency studies, the data will have a similar dispersion over the input and output spaces, which w i l l g i v e similar results.
As most DEA studies have at least three inputs and three outputs and less than 100 units, their e ciency scores mu s t b e s e v erely biased in an upward direction.
However, this does not make these DEA results meaningless, as the scores of ine cient units may b e i n terpreted as relative to the dominant s e t . Furthermore, these results may be used to propose improvements by noting the input or output combinations corresponding to the optimal DEA solution. But the caveat should be given that, because of the small number of units, many better solutions may g o undetected.