A New Measure for Detecting Influential DMUs in DEA

Data envelopment analysis (DEA) is a nonparametricmethod for evaluating the relative efficiency of decision-making units (DMUs) on the basis ofmultiple inputs and outputs. In recent years DEA has had important role in application of many fields such as energy [1, 2], banking [3], and sport [4, 5]. DEA is a useful technique to evaluate the performance of DMUs; meanwhile, if a data set contains one or more influential DMUs, obviously calculated results (by DEA) of the performance are changed. Influential DMUs are atypical observations. Some of them are result of recording or measurement errors and should be corrected (if possible) or deleted from data. So detecting influential observations has an important role in DEA [6]. Influential observations for the first time were introduced by Cook [7] in regression analysis as follows: an influential observation causes noticeable effect on the estimation of parameters and fitted values in the regression. He also proposed a practical statistic that is called Cook distance and it is based on Mahalanobis distance. Then several methods and statistics were proposed to detect influential observations in the regression. Most of these methods are based on case deletion approach. Some of these methods and statistics are given by Belsley et al. [8], Cook and Weisberg [9], and Chatterjee and Hadi [10]. General approach for detecting influential observations is the case deletion technique. This technique is applied by single and multiple cases’ deletions [8]. In the single case deletion, pth observation is eliminated from data and then the result of computation is compared by the result which is computed using all data. Multiple cases are the generalized form of the single case deletion; namely, these cases are applied by eliminating k observations, where 1 < k < n/2 and n is the number of observations. The main idea about influential observations in DEA is similar to the regression analysis. Indeed, an influential DMU is an efficient DMU, which basically extends the production possibility set according to its own coordinate, and therefore it may cause several problems as follows.


Introduction
Data envelopment analysis (DEA) is a nonparametric method for evaluating the relative efficiency of decision-making units (DMUs) on the basis of multiple inputs and outputs.In recent years DEA has had important role in application of many fields such as energy [1,2], banking [3], and sport [4,5].
DEA is a useful technique to evaluate the performance of DMUs; meanwhile, if a data set contains one or more influential DMUs, obviously calculated results (by DEA) of the performance are changed.Influential DMUs are atypical observations.Some of them are result of recording or measurement errors and should be corrected (if possible) or deleted from data.So detecting influential observations has an important role in DEA [6].
Influential observations for the first time were introduced by Cook [7] in regression analysis as follows: an influential observation causes noticeable effect on the estimation of parameters and fitted values in the regression.He also proposed a practical statistic that is called Cook distance and it is based on Mahalanobis distance.Then several methods and statistics were proposed to detect influential observations in the regression.Most of these methods are based on case deletion approach.Some of these methods and statistics are given by Belsley et al. [8], Cook and Weisberg [9], and Chatterjee and Hadi [10].
General approach for detecting influential observations is the case deletion technique.This technique is applied by single and multiple cases' deletions [8].In the single case deletion, th observation is eliminated from data and then the result of computation is compared by the result which is computed using all data.Multiple cases are the generalized form of the single case deletion; namely, these cases are applied by eliminating  observations, where 1 <  < /2 and  is the number of observations.
The main idea about influential observations in DEA is similar to the regression analysis.Indeed, an influential DMU is an efficient DMU, which basically extends the production possibility set according to its own coordinate, and therefore it may cause several problems as follows.
(1) The influential DMU may cause that one DMU to be inefficient, while by omitting the influential DMU, it can be an efficient one.(2) The influential DMU may result in decreasing the superefficiency scores of some efficient DMUs.(3) The influential DMU may result in decreasing the efficiency scores of some inefficient DMUs.
Particularly the mentioned last item is significant, because one of the main objectives of DEA is identifying the efficient DMUs and then expressing several suggestions to improve the efficiency of inefficient DMUs.Clearly these influential DMUs may cause wrong suggestions for improving the efficiency of inefficient DMUs.
One of the first propositions about detecting influential DMUs in DEA was given by Wilson [6].He proposed a method that is based on the superefficiency scores by modified DEA, which contains case deletion technique.This method allows researcher to prioritize observations in the efficient subset for the future scrutiny.This prioritization depends on the number of efficiency scores that are influenced by a given observation.
Pastor et al. [11] propose a method for detecting influential DMUs which is based on Uniformly Most Powerful Test.They consider BCC model and they define a ratio that is calculated by division th efficiency score obtained from all DMUs and th efficiency score obtained by elimination of th efficient DMU.Then they define a binary variable according to th ratio either smaller than 0.95 or larger than 0.95.Hence they obtain a binomial variable by sum of these binary variables.Ruiz and Sirvent [12] use similar approach and they propose an alternative method to identify the influential DMUs in radial and nonradial DEA.
A method proposed by Jahanshahloo et al. [13] aims to detect the influential DMUs by the way of deterioration efficiency scores of inefficient DMUs in the radial DEA.They focused on BCC model but they pointed out that the method also can be used in the CCR model.This method is based on a specific ratio like the proposed ratio by Pastor et al. [11].
In this study we propose a new method for detecting the influential DMUs.This new method is based on the Euclidean distance and excluding the efficient DMUs by using the single case deletion.
The structure of this study is as follows.The next section presents some basic concepts of DEA.In Section 3, we discuss on influential observation in DEA and we propose a new approach for detecting influential DMUs.Section 4 illustrates the new method by an example.Finally, conclusions are given in Section 5.

Data Envelopment Analysis
The first introduction on DEA was practiced by Charnes et al. [14].They proposed CCR model which is also called Constant Return to Scale (CRS).The CCR model evaluates both technical and scale efficiencies via optimal value of the ratio form.The modified version of CCR model is BCC model, which is also called Variable Returns to Scale, proposed by Banker et al. [15].The BCC model is used to estimate the pure technical efficiency of DMUs by reference to the efficiency frontier.
DEA can be applied in two models which are called inputand output-oriented models.The primal form of inputoriented BCC (VRS) model is considered in this paper and it is given as follows: where   is efficiency score of DMU  ,   and   (all nonnegative) are th input and th output of the DMU  , respectively, and   is intensity of DMU  .If the   is equal to one, then DMU  is called an efficient DMU.
In DEA a large number of efficient DMUs usually occur in the results of analysis.Therefore efficient DMUs cannot evaluate each other since their scores are equal to one.To overcome this problem, the analyst can either add new DMUs to data set or order the efficient DMUs by using some criterion.Andersen and Petersen [16] proposed a model to obtain superefficiency scores and these scores are useful in both ordering the efficient DMUs and comparing them between one another in DEA.
The primal form of input-oriented BCC (VRS) superefficiency model is considered in this paper and it is given as follows: where  *  is the superefficiency score of DMU  .In the inputoriented BCC model the superefficiency scores of efficient DMUs are greater or equal to one.However, superefficiency scores of the inefficient DMUs are the same as their efficiency scores that are obtained by BCC model in (1).

A New Measure for Detecting Influential DMUs
DMUs consist of two groups which are influential DMUs and noninfluential DMUs.An influential DMU is defined as a DMU which affects the efficiency scores of some inefficient DMUs [6].This DMU also changes production possibility set and extends this set to its own coordinate.In this study we classify noninfluential DMUs in three groups as follows.
(1) The first group consists of efficient DMUs, such that including or excluding the influential DMU has not any effect on the efficiency scores of these DMUs.
(2) The second group consists of inefficient DMUs such that including or excluding the influential DMU has not any effects on the inefficiency of these DMUs.
(3) The third group consists of inefficient DMUs such that, excluding the influential DMUs make them efficient DMUs.
Clearly DMUs in the first type are on the efficiency frontier by the BCC model, so their efficiency scores are equal to one.Table 1 consists of 12 artificial DMUs and also their efficiency and superefficiency scores by the input-oriented BCC model.These data consist of one input () and one output () variable.Let  1 and  2 be sets of the efficient and inefficient DMUs, respectively.Evidently, if data involves an influential DMU, since the influential DMU is also an efficient DMU, it becomes an element of the set  1 .In Table 1, these sets are  1 = {, , , , } and  2 = {, , , , , , }.For details, we investigate scatter plots of the variables ( and ) to identify influential DMUs in these data.Figure 1 provides a pattern of the DMUs in Table 1; also it displays efficiency frontier by the BCC model.Clearly, DMU  is an influential DMU, since this DMU extends the production possibility set to its coordinate (see Figure 1).It also affects on the superefficiency scores of some efficient DMUs and the efficiency scores of some inefficient DMUs.With excluding DMU , DMUs , , , and  in the set  1 are stable on their efficiency score; however, their superefficiency scores are affected by the influential DMU.These DMUs are classified as the first type DMUs.By excluding the DMU , the DMUs , , , , , and  in the set  2 save their inefficiency, but efficiency scores of some of them may be affected.These DMUs are classified as the second type DMUs.Finally,  is a particular DMU in the set  2 ; it becomes an efficient DMU by omitting the DMU .Namely, this point locates on the frontier after omitting the influential DMU and the production possibility set becomes smaller.DMU  is classified as the third type DMU.Of course the discussion above is only based on visualization of data and we need a reliable method in detecting influential DMUs.

Detecting Influential DMUs.
We are preparing to present a new method in identifying the influential DMUs.Suppose data consists of  observations, Φ = {1, 2, . . ., } where any  = 1, 2, . . .,  points on the th DMU, and  1 and  2 are sets of the efficient and inefficient DMUs, respectively.Let  all be an  × 1 vector consisting of efficiency scores in these data, which is obtained by the input-oriented BCC model, where, for  = 1, 2, . . ., ,   is the efficiency score of th DMU.Let   be the efficiency score of th DMU that is obtained by the input-oriented BCC model after omitting th DMU from data (  cannot be calculated due to omitting th DMU), and   is an ( − 1) × 1 vector consisting of these efficiency scores as below: In order to generate a measure to compare  all and   (these vectors dimensions must be the same) in the   , let   = 1.Then   can be rewritten as follows: For any  ∈  1 to calculate influence of the th efficient DMU, we propose to use Euclidean distance measure.Therefore, square of the Euclidean distance between  all and  *  is given as below: where Therefore, for any  ∈  1 , DMUp is an influential DMU, if   > .The main problem with this cut-off point, however, is that both the mean and variance are nonrobust.Extreme values inflate the mean and variance, yielding a high cut-off point.This problem can be avoided by replacing mean and variance by more robust estimators such as the median and the median absolute deviation, respectively, as follows: where This criterion at first was proposed by Hadi [17] to detect influential observations in linear regression; also for details see [10,18].
To illustrate this method we use the artificial data in Table 1 and results are shown in Table 2.In these data, the cutoff point value is  = 0.0211.DMU  and DMU  violate this cut-off point; therefore, they are influential DMUs.Let us be precise on these influential DMUs.It is seen that there are no changes on the efficiency scores of DMUs , , , and  in the case of omitting DMU , and their efficiency scores are equal to one.However, some efficiency scores of the inefficient DMUs such as , , and  are increased.The efficiency score of DMU  increases to   = 0.7143.This rise is noticeable.As it can be seen in Figure 1 that omitting DMU  makes a new frontier that is closer to the point  (the doted frontier), and it causes the increasing of efficiency scores in the DMUs , , and .Besides, by omitting DMU , the efficiency score of DMU  becomes 1.This rise is also noticeable, since omitting DMU  makes a new frontier which crosses the point  (the doted frontier).

Empirical Example
In this section an empirical example is presented to examine the proposed new method in Section 3.1.We provide meteorological data of 50 regions in January 2010 (from Turkish State Meteorological Service) which are shown in Table 3.The data consist of one output variable  and two input variables  1 ,  2 , where  is average solar radiation (watt/m 2 ),  1 is average duration of exposure sunlight (hours), and  2 is the average velocity of wind (m/sec).Table 4 provides efficiency scores () and superefficiency scores ( * ) of these data.It also provides diagnostic results such as the efficiency scores (  ) obtained by omitting th efficient DMU and square of Euclidian distance , discussed in Section 3.1.
Using Table 4, it is seen that DMU  4. By omitting DMU 9 , inefficient DMU 6 becomes an efficient DMU and also efficiency scores of the inefficient DMU 1 , DMU 5 , and DMU 12 have a salient increase.DMU 12 is a remarkable example for this case and its efficiency score rises from 0,6008 to 0,9016.In the case of excluding DMU 21 , there are salient increases on the scores of DMU 43 and DMU 49 such that these two DMUs become efficient DMUs.Furthermore, the efficiency scores of all inefficient DMUs (except DMU 1 ) increase while these DMUs save their inefficiency.Evidently, omitting an efficient DMU has influence on the efficiency scores of inefficient DMUs which are referenced by the efficient one.
The  distance, for the efficient DMUs, is presented on the last column of Table 4 as follows:  2 = 0.0147,  9 = 0.4528,  21 = 0.8841, and  41 = 0.0268; also the cut-off point value is  = 0.8572.With relying to the discussed measure in Section 3.1, it is clearly seen that only  21 > ; therefore DMU 21 is an influential DMU in these data.

Conclusion
In this study we classify noninfluential DMUs in three groups, which are as follows.(1) the first group consists of efficient DMUs such that including or excluding the influential DMU has not any effect on the efficiency scores of these DMUs.
(2) The second group consists of inefficient DMUs such that including or excluding the influential DMU has not any effects on the inefficiency of these DMUs.(3) The third group consists of inefficient DMUs such that excluding the influential DMUs makes them efficient DMUs.Clearly DMUs in the first type are on the efficiency frontier by the BCC model, so their efficiency scores are equal to one.Then we propose a new method to detect influential DMUs, which is based on Euclidean distance and omitting the efficient DMUs by using single case deletion.We apply this method on meteorological data of 50 regions in January 2010 (from Turkish State Meteorological Service).The method is also more practical than some of other similar methods since the measure is based on the single case deletion.

Figure 1 :
Figure 1: Efficiency frontier and production possibility set of artificial data.

Table 1 :
Artificial data with efficiency () and superefficiency ( * ) scores by input-oriented BCC model. ) ‖ ⋅ ‖ indicates  2 norm.This distance is only calculated to investigate the influence of efficient DMUs.Evidently, if th DMU has less influence on inefficient DMUs, then the value of   is smaller than the other efficient DMUs distances.On the other hand, if th DMU has more influence on inefficient DMUs, then the value of   increases.Hence if the biggest distance pertains to   , it indicates the th DMU has the most influence on the inefficient DMUs.We have to determine an upper bound (cut-off point) such as , where, for any  ∈  1 ,   >  means that the th DMU is an influential one.Let  = {  |  ∈  1 }; then we define an upper bound in detection the influential DMUs as follows:
: average solar radiation,  1 : average duration of exposure sunlight, and  2 : average velocity of the wind.

Table 4 :
Efficiency scores, superefficiency scores, and diagnostic results in the meteorological data.