1. Introduction

JAM

Journal of Applied Mathematics

1687-0042 1110-757X

Hindawi Publishing Corporation

102163

10.1155/2013/102163

102163

Research Article

A Distribution-Free Approach to Stochastic Efficiency Measurement with Inclusion of Expert Knowledge

Khoo-Fazari

Kerry

http://orcid.org/0000-0001-6469-5217

Yang

Zijiang

² Paradi

Joseph C.

³ Yang

Suh-Yuh

TD Canada Trust, Toronto, ON

Canada ²

School of Information Technology

York University

4700 Keele Street, Toronto

Canada

M3J 1P3

yorku.com

Centre for Management of Technology and Entrepreneurship

University of Toronto

200 College Street

Toronto, ON

Canada

M5S 3E5

utoronto.com

2013

24 6 2013

2013 24 10 2012 11 05 2013

2013

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This paper proposes a new efficiency benchmarking methodology that is capable of incorporating probability while still preserving the advantages of a distribution-free and nonparametric modeling technique. This new technique developed in this paper will be known as the DEA-Chebyshev model. The foundation of DEA-Chebyshev model is based on the model pioneered by Charnes, Cooper, and Rhodes in 1978 known as Data Envelopment Analysis (DEA). The combination of normal DEA with DEA-Chebyshev frontier (DCF) can successfully provide a good framework for evaluation based on quantitative data and qualitative intellectual management knowledge. The simulated dataset was tested on DEA-Chebyshev model. It has been statistically shown that this model is effective in predicting a new frontier, whereby DEA efficient units can be further differentiated and ranked. It is an improvement over other methods, as it is easily applied, practical, not computationally intensive, and easy to implement.

1. Introduction

There has been a substantial amount of research conducted in the area of stochastic evaluation of efficiency, such as the stochastic frontier approach (SFA) [1, 2], stochastic data envelopment analysis (DEA) [3, 4], chance-constrained programming (CCP) efficiency evaluation [5–8], and statistical inference to deal with variations in data. The problems associated with these methodologies range from the requirement for specifications of some functional form or parameterization to the requirement of a substantial amount of (time series) data. Relying on past and present data alone to provide a good estimation of the efficient frontier may not be suitable today due to the rapid evolution of these “nuisance” parameters. Hence, the inclusion of management's expert opinion cannot be excluded in efficiency analyses.

This paper proposes to develop a new efficiency benchmarking methodology that is capable of incorporating probability while still preserving the advantages of a function-free and nonparametric modeling technique. This new technique developed in this paper will be known as the DEA-Chebyshev model. The objectives are to first distinguish amongst top performers and second to define a probable feasible target for the empirically efficient units (as they are found from the usual DEA models) with respect to the DEA-Chebyshev frontier (DCF). This can be achieved by incorporating management's expertise (qualitative component) along with the available data (quantitative component) to infer this new frontier. The foundation of DEA-Chebyshev model is based on the model pioneered by Charnes et al. in 1978 [10] known as DEA. It is deterministic approach, which requires no distributional assumptions or functional forms with predefined parameters. The main drawback to deterministic approaches is that they make no allowance for random variations in the data. The DEA methodology has been chosen as a foundation for this research because of the following advantages.(i)

It is nonparametric and does not require a priori assumption regarding the distribution of data

(ii)

It has the ability to simultaneously handle multiple inputs and outputs without making prior judgments of their relative importance (i.e., function-free)

(iii)

It can provide a single measurement of performance based upon multiple inputs and outputs.

DEA ensures that the production units being evaluated will only be compared with others from the same “cultural” environment, provided, of course, that they operate under the same environmental conditions.

The rest of the paper is organized as follows. Section 2 gives a brief literature review. Section 3 describes some possible causes of data discrepancies that may or may not be observable and their effects on the variables. Section 4 discusses the assumptions and mathematical formulation of DEA-Chebyshev model. Section 5 provides the simulation and comparison with other efficiency evaluation techniques. Finally, our conclusions are presented in Section 6.

2. Literature Review

This section provides the applicable literature on past and present researches relating to stochastic models and weight-restricted models designed for performance measurements. They show the relevance of well-known methodologies used for estimating efficiency scores and constructing the approximated frontier in order to account, as well as possible, for noise which can have diverse effects on efficiency evaluation of human performance-dependent entities.

2.1. Stochastic Frontier Approach

Aigner et al. [1] and Meussen and Van Den Broeck [2] independently and simultaneously proposed a stochastic frontier model known as the Stochastic Frontier Approach (SFA) for performance evaluation. SFA uses econometric methods for estimating the efficient frontier. The problems associated with SFA are, that weights (or parameters) have to be predefined to determine its functional form and this requires parameterization. Second, a distributional form must be determined in order to estimate random errors. Third, inclusion of multiple outputs is not easy to incorporate into the model. Finally, samples have to be large enough to be able to infer the distributional form for random errors.

2.2. Stochastic DEA

Stochastic DEA is a DEA method that attempts to account for and filter out noise by incorporating stochastic variations of inputs and outputs while still maintaining the advantages of DEA [4]. The method relies on the theory that there will always exist an optimal solution for industrial efficiency. The variability in outputs is dealt with using the risk-averse efficiency model by Land et al. [11] with a risk preference function. Kneip and Simar [3] proposed a nonparametric estimation of each decision-making unit (DMU)'s production function using panel data over T time periods. This filters the noise from the outputs. The fitted values of the outputs along with the inputs are then evaluated using DEA. In this instance, efficiency is determined by the distance of the estimated frontier to the observed DMUs. The drawback of this method is that a reasonable estimate of efficiency can be obtained only when T and q (number of DMUs) are sufficiently large.

2.3. Chance-Constrained DEA

Chance-constrained programming was first developed by Charnes and Cooper [5] and Kall [7] as an operational research approach for optimizing under uncertainty when some coefficients are random variables distributed according to some laws of probability. The CCP DEA models in the past generally assumed that variations observed in the outputs follow a normal distribution. Variations in inputs are assumed to be the cause of inefficiency [12], while random noise occurs in outputs. Since the distribution of inefficiency is uncertain (although, theoretically assumed to be half-normal or gamma), the chance-constraint formulation is not applied to input constraints (inputs are held deterministic, while outputs are stochastic). Olesen and Petersen [9] state that the hypothesis concerning the amount of noise in the data cannot be tested. Using panel data, variations in the data can be dichotomized into noise and inefficiency. Another variation of CCP DEA was introduced by Cooper et al. [6] utilizing the “satisficing concepts.” The concept is used to interpret managerial policies and rules in order to determine the optimizing and satisficing actions, which are distinguished from inefficiencies. Optimizing and satisficing can be regarded as mutually exclusive events. The former represents physical possibilities or endurance limits and the latter represents aspirational levels.

All these CCP formulations have considered normal distributions for the probability of staying within the constraints. This method is effective when qualitative data is not available. However, expert opinion from management cannot be discounted with regard to data dispersion from the expected or correct values. Unfortunately, the current CCP is strictly a quantitative analysis based on empirical data and whose variations are said to be of a predefined distributional form.

2.4. Assurance Region and Cone-Ratio Models

In an “unrestricted” DEA model, the weights are assigned to each DMU such that it would appear as favourable as possible, which is an inherent characteristic of DEA. Hence, there is a concern when largely different weights may be assigned to the same inputs and outputs in the LP solutions for different DMUs. This motivated the development of weight-restricted models such as the “assurance region” (AR) [13, 14], the “cone-ratio” (CR) [15], and other variations of these models.

The motivation behind weight-restricted models is to redefine the DEA frontier so as to make it as practical as possible; that is, altering the inherent characteristic of DEA when assigning small/large weights to certain inputs or outputs is not realistic. On the contrary, the stochastic frontier models redefine the frontier in the presence of noise or data disparity. Stochastic approaches are designed to evaluate DMUs based on the understanding that constraints may, realistically, not always hold due to noise. Weight restrictions are also applicable in stochastic approaches.

Weight restriction models deal directly with the model’s inconsistencies in a practical sense using qualitative information, whereas stochastic models deal with data discrepancies and inconsistencies using quantitative approaches to infer to the degree of data disparity. Although the motivations of these two methods are similar, the underlying objectives for their developments are not the same. Both are valid extensions of the normal DEA model in attempting to correct the frontier.

The Assurance Region (AR) model was developed by Thompson et al. [13] to analyze six sites for the location of a physics lab. This approach imposes additional constraints in the DEA model with respect to the magnitude of the weights. The AR is defined to be the subset of W, the weight space that denotes the vectors of multipliers consisting of v and u, such that any region outside the AR does not contain reasonable input and output multipliers. An additional constraint for the ratio of input weights [14] can be defined as (1)l1,i≤viv1≤u1,i≡v1l1,i≤vi≤v1u1,i for i=1,…,m, where m denotes the number of inputs, v1 and vi are the weights for the input i and input 1, respectively, and l1,i and u1,i are the lower and upper bounds for the ratio of multiplier.

The cone-ratio (CR) method was developed by Charnes et al. [15] which allows for a closed convex cones for the virtual multipliers. It is a more general approach compared to that of the AR. In the AR model, there can only be two admissible nonnegative vectors, one for the lower bound and the other for the upper bound of the ratio of virtual weights. However, in the CR case, there can be k admissible nonnegative vectors for input weights and l admissible nonnegative vectors for output weights; that is, the feasible region for the weights is a polyhedral convex cone spanned by k and l admissible nonnegative direction vectors for inputs and outputs, respectively, (2)v=∑h=1kαha→h,u=∑s=1lβsb→s, where a→h represent the vectors and αh≥0 (∀h) are the weights applied to select the best nonnegative vector. Similarly, the AR method is equivalent to selecting only two admissible vectors under the CR method. The lower and upper bounds are denoted as vectors in the two-input case (2.4)a→1=(1l1,20⋯0),a→2=(1u1,20⋯0), respectively.

3. Data Variations 3.1. Two Error Sources of Data Disparity Affecting Productivity Analysis

Before we begin to make modifications to incorporate probability into the basic DEA model, it is crucial that the types of errors are identified, which are sources of data disparity. These can be segregated into 2 categories; systematic and nonsystematic errors. Nonsystematic errors are typically defined to be statistical noise, which are random normal N(0,σ2) and independent and identically distributed (i.i.d.). They will eventually average to zero. Systematic errors are defined to be “the degree to which the measured variable reflects the underlying phenomenon depend on its bias and variance relative to the true or more appropriate measure” [16]. Systematic errors or measurement errors are deemed to have the most disparaging effects because they introduce bias into the model. These may be caused by the lack of information.

The design of the new DEA model is intended to take into account the possibility of data disparity that affect productivity analysis while preserving the advantages that DEA offers in order to estimate the true level of efficiency. Due to data disparity, normal DEA results may contain two components of the error term. The first refers to statistical noise which follows a normal distribution, while the second refers to the technical inefficiency which is said to follow a truncated normal or a half-normal distribution. This can be achieved by relaxing the LP constraints to allow for these variations which may provide a better approximation of the level of efficiency.

The following general linear programming model illustrates the mathematical form of systematic and nonsystematic errors as defined previously. Variation in the variable (X) of the objective function will result in different values for the optimized coefficient (β) (4)minβ g=X′β,subject to X′β≥y, β≥0. If the variation in X is stochastic, then X=x- +ε; ε~N(0,σ2), by the Central Limit Theorem; one can characterize how closely the vector X is scattered around its mean x- by the distance function; D2=D2(X;x-,Vε)=(X-x-)′Vε-1(X-x-). Vε denotes the variance-covariance matrix for ε [4].

Four scenarios are illustrated later which describes sources of data disparity. The notations are as follows:

xir: observed input i for i=1,…,m for DMUr,

yjr: observed output j for j=1,…,n for DMUr,

μrx: expected value of input for DMUr,

μry: expected value of output for DMUr,

brx: bias of xr,

b^rx: estimate of brx,

bry: bias of yr,

b^ry: estimate of bry.

The following are equations defining the relationship between the observed and true or expected values for both inputs and outputs in a productivity analysis such as SFA where measurement errors and/or random noise and inefficiencies are a concern in parametric estimations: (5)xir=μirx+birx, for some input i for unit r(6)yjr=μjry+bjry, for some output j for unit r (considered for cases in which there may be some bias in output levels) (7)μirx,μjry≥0, birx,bjry, unrestricted in sign.

The following four scenarios illustrate the impact of different errors and were constructed using the notations given previously. These scenarios follow the definition by Tomlinson [16].

Scenario I. Consider the following: (8)E(birx)=0, Var(birx)=0,E(xir)=μirx, Var(xir)=0. With zero bias and variance, observed input value is the true value ∴E(xir)=μirx=xir. This implies that the data is 100% accurate. The expected value is exactly the same as the observed value. In reality, it is rare to have data with such accuracy.

Scenario II. Consider the following: (9)E(birx)=b^irx≠0, Var(birx)=0,E(xir)=μirx+b^irx, Var(xir)=0. Bias is nonzero with zero variance; hence, errors are systematic. E(xir) is not an unbiased estimator of xir. In this case, systematic errors are a problem where inputs are concerned. When measurement errors exist, the expected value is a biased estimator of the observed value. This in turn causes biases in DEA results. Empirical methods, such as DEA, make no allowance for this error and evaluate DMUs based strictly on the observed values. However, expectations of the observed values can be determined qualitatively and be incorporated into the LP.

Scenario III. Consider the following: (10)E(birx)=0, Var(birx)=σbir2>0,E(xir)=μirx, Var(xir)=σbir2. Expected value of a constant is the constant itself. Variance of a constant is zero. Hence Var(xir)=Var(μirx+birx)=0+Var(birx)=σbir2. Bias is zero but the variance is nonzero. Hence, variations are due to statistical noise. A DMU that appears efficient may in fact be utilizing an input-output production mix that is less than optimal. Its seeming efficiency is caused by a variation to its favour. Results obtained using empirical models are prone to inaccuracy of this nature. However, the expected value will converge over time to the true value in the absence of bias.

Scenario IV. Consider the following: (11)E(birx)=b^irx≠0, Var(birx)=σbir2>0,E(xir)=μirx+b^irx, Var(xir)=σbir2. Bias and variances are nonzero. This implies that both systematic and nonsystematic errors exist in the data. The variance corresponds to some input i. The variable, xir, is affected by some random amount and some bias birx. Hence, E(xir) is not an unbiased estimator of xir. This scenario corresponds to the drawback of empirical frontiers.

The term “measurement error” does not simply imply that data had been misread or collected erroneously. According to Tomlinson [16], it may also not be constant over time. The inaccuracy of the data collected may be due to the lack of implicit information which may or may not be quantifiable but are deemed to have the most disparaging effects because they introduce bias into the model.

3.2. Chance-Constraint Programming and DEA

Deterministic methods such as DEA are not designed to handle cases in which, due to uncertainty, constraints may be violated although infrequently. Various methods have been employed to transform the basic DEA approach to include stochastic components. Two of the more popular methods are chance-constraint programming (CCP) and stochastic DEA. An extensive literature survey has revealed that CCP DEA has always assumed a normal distribution. The objective of this research is to redefine the probabilities employed in CCP productivity analysis, which would accommodate problems emanating from various scenarios where errors are independent but convoluted without assuming any distributional form. The independent and convoluted properties of the error terms make it difficult to distinguish between them, and hence, a distribution-free approach will be employed.

The advantage of using CCP is that it maintains the nonparametric form of DEA. It allows modeling of multiple inputs and outputs with ease. There is no ambiguity in defining a distribution or the interpretation of the results as had been demonstrated in the Normal-Gamma parametric SFA model [17]. CCP typically states that constraints do not need to hold “almost surely” but instead hold with some probability level. Uncertainty is represented in terms of outcomes denoted by ω. The elements ω are used to describe scenarios or outcomes. All random variables jointly depend on these outcomes. These outcomes may be combined into subsets of Ω called events. A represents an event and A represents the collection of events. Examples of events may include political situations, trade conditions, which would allow us to describe the random variables such as costs and interest rates. Each event is associated with a probability P(A). The triplet (Ω,Α,P) is known as a probability space. This situation is often found in strategic models where the knowledge of all possible outcomes in the future is acquired through expert opinions. Hence, in a general form, CCP can be written as (12)P{Aix(ω)≥hi(ω)}≥αi, where 0<αi<1 and i=1,…,I index of the constraints that must hold jointly. The previous probabilistic constraint can be written in its expectational form (or deterministic equivalent) where fi is an indicator of {ω∣Aix(ω)≥hi(ω)}: (13)Eω(fi(ω,x(ω)))≥αi.

The focus of this paper is on the further development of DEA coupled with CCP. The benefit of applying CCP to DEA is such that the multidimensional and nonparametric form of DEA is maintained. To drop the a priori assumption discussed in [9, 11, 18] regarding the distributional form to account for possible data disparity, a distribution-free method is introduced. In [11, 18], CCP DEA input-oriented model is formulated on the basis that discrepancies in outputs are due to statistical noise while those in inputs are caused by inefficiency: (14)Min z0=θ,Subject to P(Yλ-y0≥0)≥1-α,Xλ-θx0≤ Xλ-θx0≤0,Xλ-θx0≤ 1→λ=1,Xλ-θx0≤ θ,λ≥0.

The CCP formulation shown in (14) is designed to minimize the radial input contraction factor θ, subject to the constraints specified. CCP DEA models in the past generally assume that the normal distribution suffices. For example, the assumption that the variation shown previously is normal, the formulation (14) can be written in the following vector deterministic form (15): (15)Min z0=θ,Subject to E(Yλ-y0)-1.645σ≥0,Xλ-θx0≤ Xλ-θx0≤0,Xλ-θx0≤ 1→λ=1,Xλ-θx0≤ θ,λ≥0, where X and Y denote the vectors of inputs and outputs, respectively. Assuming that each DMU is independent of others, then the covariance equals zero. σ denotes the standard deviation for Yλ-y0 which is formulated as (16)Var(Yλ-y0) =Var(y1λ1+y2λ2+⋯+yqλq-y0), where subscript q denotes the number of DMUs. If the DMU under evaluation is DMU1, then y0≡y1, hence, (16) can be written as (17)σ=(λ1-1)2Var(y1)+λ22Var(y2)+⋯+λq2Var(yq). If λ1=1 and λr≠1=0, then the efficiency scores calculated in CCP will be the same as that of DEA. This does not imply that all DEA scores will coincide with the CCP ones (except for DMU1’s score). In this case the standard deviation disappears.

The first constraint in (15) states that there is a slight chance (i.e. α=0.05) that outputs of the observed unit may exceed those of the best practice units with a very small probability. E(Yλ-y0) is determined based on the assumption that the observed values are representative of their mathematical expectation. The second constraint is strictly deterministic which states that the best performers cannot employ more than θX0 amount of inputs, and if they do, they cannot be efficient and will not be included in the reference set of best performers.

Using the same mathematical formulation shown in (14) and (15), and by incorporating a distribution-free approach, the DCF is established.

4. DEA-Chebyshev Model

The advantages of using DEA-Chebyshev model as an efficiency evaluation tool are that it provides an approximation of performance given that random errors and inefficiencies do exist, and these deviations are considered, either through expert opinion or through data inference. Nevertheless, the results should always be subject to management scrutiny. This method also provides for ranking efficient DMUs.

4.1. Chebyshev’s Theorem

In a simplified explanation, the Chebyshev theorem states that the fraction of the dataset lying within τ standard deviations of the mean is at least 1-(1/τ2) where τ>1.

DEA-Chebyshev model developed in this paper will not be restricted to any one distribution but instead will assume an unknown distribution. A distribution-free approach will be used to represent the stochastic nature of the data. This approach is applied to the basic DEA model using chance-constraint programming. This distribution-free method is known as the Chebyshev inequality. It states that(18a)P(|x--μ|≥τσ)≤1τ2, or equivalently (18b)P(|x--μ|≥τ)≤σ2τ2.

Let a random variable x have some probability distribution of which we only know the variance (σ2) and the mean (μ) [19]. This inequality implies that the probability of the sample mean, x-, falling outside the interval [μ±τσ] is at most 1/τ2, where τ refers to the number of standard deviation away from the mean using the notation in [19]. The one-sided Chebyshev's inequality can be written as (19)P(x--μ≥τ)≤σ2σ2+τ2 as shown in [20].

Other methods considered to define the probabilities for DEA-Chebyshev model were the distribution-free linear constraint set (or linear approximation), the unit sphere method, and the quantile method. These methods were tested to determine which of them would provide the best estimate of the true boundary mentioned in [21]. The true boundary (called set S) is defined to be a two-dimensional boundary which is generated using some parametric function defined as the chance-constrained set shown later: (20)S={X=(x1,…,xm)∣pr[AX-b≤0]≥α; X≥0}, where b and the vector A=(a1,a2,…,am) are random variables. Let the function L(X) be defined as L(X)=AX-b, and E[L(X)] and σ[L(X)] denote the expected value and the standard deviation of L(X), respectively. In this example m=2. Twenty-nine samples were generated.

The distribution-free approaches tested were the Chebyshev extended lemma (24), quantile method (21), linear approximation (23), and unit sphere (22). The deterministic equivalent of these methods can be written in the following mathematical forms according to the notation used by [21]. (21)Quantile methodSQ(α)={X∣E[L(X)]+Kασ[L(X)]≤0; X≥0}.Kα is known as the quantile of order α of the standardized variate of L(X). If random variable, X, belongs to a class of stable distributions, then the quantile method can be applied successfully. All stable distributions share the common properties of being specified by the parameters U and V of the general functional form F[(x-U1)/V1],…,F[(x-Ul)/Vl] and when convoluted will again give us F[(x-U)/V]. Examples of stable distributions are Binomial, Poisson, Chi-squared, and Normal [NOLA99]. (22) Unit sphere SS(α)={1max(a1,h)2+max(a2,h)2X∣1X2 ≤1max(a1,h)2+max(a2,h)2}.(23)Linear approximation SL(α)={X∣A*X≤1}, where ag,h is an element amongst the 29 simulated samples of ag=(ag,1,…,ag,H); g=1,…,m (g=2 in this example); and H=sample size=29. Vector A* is defined as A*=(max(a1,h1),max(a2,h2)): (24)Chebyshev ST(α)={α1-αX∣E[L(X)] +α1-α·σ[L(X)]≤0; X>0}.

Allen et al. have proven in their paper [21] that the quantile method was the least conservative, while the Chebyshev was the most conservative. When a method of estimation provides relatively large confidence limits, the method is said to be “conservative.” The advantage of those two methods is that they both have the tendency to follow the shape of the true (real) boundary more closely than the other two methods, that is, unit sphere and linear approximation [21]. Given that Chebyshev provides the most conservative point of view and has the inclination to follow the shape of the true boundary with no regard to distributional forms, this method was chosen as the estimation for CCP DEA. Although the error-free frontier (EFF) is unknown, we can, at best, estimate its location or estimate its shape with respect to the DEA frontier. The EFF represents the frontier where measurement errors and random errors are not present, but it does not imply absolute efficiency. This means that there can be room for improvement even for the DMUs on the EFF. The theoretical frontier represents the absolute attainable production possibility set where there can no longer be any improvements in the absence of statistical noise and measurement errors. It is undefined due to the fact that human performance limits are still undefined at the present time.

Since, we do not want to place an a priori assumption regarding which stable distribution best describes the random variables in DEA, the Chebyshev theorem will be used. The deterministic equivalent of (20) by Chebyshev's extended lemma is shown as (24).

Derivation of α / ( 1 - α ) · σ [ L ( X ) ] in (24). We use the one-sided Chebyshev’s inequality and the notation used by [21]: (25)P(L(X)-E[L(X)]≥τ)≤σ2σ2+τ2, which states that the probability that L(X) will take on a value that is greater than τ standard deviations away from its mean, E[L(X)], is at most 1/(1+τ2). α in chance-constrained programming can be expressed in the general form: P(L(X)-E[L(X)]≤0)≥α. Hence, (26)1-α=σ2σ2+τ2⟹τ=σα1-α. Note that from here onwards as we discuss the DCF model, for simplification and clarity we will denote τα=τ/σ.

A “k-flexibility function” is coined because α is a value that may be defined by the user (where k denotes the user's certainty of the estimate) or inferred from the industry data. The unique property of α is its ability to define τα such that it mimics the normal distribution given that random noise is present or to include management concerns and expectations with regard to their perceived or expected performance levels. This can overcome the problem of what economists coin as “nuisance parameters.” These parameters can be problems of controlling difficult-to-observe or unquantifiable factors such as worker effort or worker quality. When firms can identify and exploit opportunities in their environment, organizational constraints may be violated [22]. Because DCF allows for management input, the flexibility function can approximate these constraint violations. The mathematical formulation, implications for management, and practical definition of α will be explained later.

4.2. Assumptions in DEA-Chebyshev Model

Two general assumptions have been made when constructing the model. First, nuisance parameters (including confounding variables) will affect efficiency scores causing them to differ from the true performance level if they are not accounted for in the productivity analysis. Second, variations in the observed variables can arise from both statistical noise and measurement errors and are convoluted.

In the simulation to follow, as an extension to the general assumptions mentioned previously, we will assume that variations in outputs are negligible and will average out to zero [11, 18]. The variations in inputs are assumed to arise from statistical noise and inefficiency (inefficient use of inputs). Both of these errors contribute to the possible technical inefficiencies in DEA-efficient units. These possible inefficiencies are not observed in DEA since it is an empirical extreme point method. Using the same characteristics defined in SFA, statistical noise and measurement errors are said to be normally distributed v~N(μ,σ2), and inefficiency is said to be half normally distributed u~N+(μ,σ2). Thus, the relationship between the expected inputs, μir, versus the observed, xirobs, can be written as (27)xirobs=μir+(v+u)ir, where (v+u)ir denotes the convoluted error terms of input i for DMUr.

The assumption regarding the disparity between the observed and expected inputs is to illustrate the input-oriented DEA-Chebyshev model. In input-oriented models, the outputs are not adjusted for efficiency, but the inputs are based on the weights applied to those DMUs that are efficient. This assumption regarding errors can be reversed between inputs and outputs depending on expert opinions and the objective of the analysis (i.e., input versus output-oriented models).

As an extension of Land et al. [11] and Forrester and Anderson [18], DEA-Chebyshev model relaxes the distributional assumption. In doing so, convolution of errors can be accommodated without having to specify some distributional form for both components. This method of approximating radial contraction of inputs or expansion of outputs is generally less computationally intensive than the bootstrap method, as CCP can be directly incorporated into the LP and solved in a similar fashion as the standard DEA technique. The bootstrap method introduced by Simar and Wilson [23] is more complex in that it requires certain assumptions regarding the data generating process (DGP) of which the properties of the frontier and the estimators will depend upon. However, this method of bootstrapping is nonparametric since it does not require any parametric assumptions except those to establish consistency and the rate of convergence for the estimators.

Theoretically, the DEA, algorithm allows the evaluation of models containing strictly outputs with no inputs and vice versa. In doing so, it neglects the fact that inputs are crucial for the production of outputs. However, the properties of a production process are such that they must contain inputs in order to produce outputs. Let the theoretically attainable production possibility set characterize the absolute efficient frontier, which is unknown, be denoted as Ψ={(X,Y)∈ℜm+n∣X can produce Y}. Thus, given that the set Ψ is not presently bounded, the inclusion ΨEFF,ΨDEA,ΨDCF⊂Ψ is always true where ΨEFF, ΨDEA, ΨDCF denote the attainable set in Error-Free Frontier (EFF), DEA, and the DEA-Chebyshev frontier, respectively. It is certain that a DMU cannot produce outputs without inputs although the relationship between them may not be clear. The following postulates regarding the relationship between the three frontiers are expressed as follows.

Postulate 1. The DEA frontier will converge to the EFF; ΨDEA→q→∞ΨEFF according to the central limit theorem [24]. Appendices A, B, and C provide the details. However, both DEA and DCF will exhibit a very slow rate of convergence to the theoretical frontier as the number of dimensions increases or when the sample size is small. This is known as the curse of dimensionality [25].

Postulate 2. The production possibility set of DEA is contained in that of DCF{ΨDEA⊂ΨDCF}. The DEA and the corrected frontier may likely overlap the EFF depending on the degree of data variation observed and estimated.

4.3. Mathematical Formulation

An input-oriented BCC model will be used to illustrate this work. Here, θ is defined as the radial input contraction factor and λ is defined as the column vector corresponding to the “best practice” units, which will form the projection unto the frontier for an inefficient unit (28)θ=min{θ∣yj0≤∑r=1qyjrλr, θxi0≥∑r=1qxirλr, ∑r=1qλr=1, λr≥0}. Consider the following chance constraint sequence as defined by Allen et al. [21]: (29)S={X˘=(x1,x2,…,xm)∣P(∑r=1qλrxir-θxi0≤0)≥α; θ≥0, xir≥0, ∀r=1,…,q(∑r=1qλrxir-θxi0≤0)}, where α is a real number such that 0≤α≤1 for all j=1,…,n and for all i=1,…,m.

Since it is difficult to establish a specific form of distribution with empirical data due to the convolution of different types of errors, a distribution-free approach is taken. In this case, the Chebyshev one-sided inequality [21] will be applied to convert (29). A deterministic equivalent can be approximated to (30) for the ith input of DMUr: (30)SC(α)={X˘∣E(∑rxirλr-θxi0)±σiτα≥0, θ≥0, xir≥0 ∀r(∑rxirλr-θxi0)}, where σi=var(∑rλrxir-θxi0)=λ12var(xi1)+⋯+λq2var(xiq)+θ2var(xi0) and 0<α≤1, with strict inequality on the left hand side. For example, if r=1, then xi0=xi1; hence, σi is calculated as σi=(λ1-θ)2var(xi1)+⋯+λq2var(xiq). Based on the assumption that DMUs are independent of each other, then var(xir)=c, for all r=1,…,q where c denotes some constant and cov(xir,xil≠ir)=0, for all r,l. The value for τα can be defined as (31)Let τα=α1-α, where α denotes the probability of staying within the tolerance region defined using the one-sided Chebyshev's inequality. As α increases, τα and the standard deviation will also increase; hence, it becomes more likely that the EFF will be within the confidence limits.

The value of α can be defined such that the τα will be equal to or less than 1.645 so that DCF can provide a less conservative estimate of the upper and lower limits of the frontier when compared to z0.05=1.645. The standard normal distribution value z0.05=1.645 has been used in the previous CCP efficiency evaluation methodology in [11, 26]. The reasoning behind wanting a less conservative estimation is because data collected will more likely be accurate than inaccurate. When α≥0.99, then τα increases exponentially into infinity. For 0.7<α<0.75, note that τ0.7<z0.05<τ0.75; α can be defined such that DEA-Chebyshev model provides less conservative estimates. Taking a glance at the CCP DEA developed by Land et al. [11], the results obtained, when assuming a normal distribution, can be shown to be drastically different from that of the expected frontier depending on the level of data disparity.

The deterministic LP formulation for DEA-Chebyshev model can be written in the following mathematical form: (32)Minλ θ^Subject to E(∑r=1qxirλr-θ^xi0)±σiτ^α≤0,Subject toE ∑r=1qyjrλr-yj0≥0,Subject toE ∑r=1qλr=1,Subject toE λr≥0 ∀r,Subject toE θ≥0. Let τ^α be an estimate for τα which is defined as (33)τ^α=α1-α, where α is a value based on management's expectations or is inferred from a time series of data which has been transformed into a single value. The model shown in (32) can also be modified such that only discretionary inputs are considered for stochastic treatment [27].

The value of α can be defined such that its values are restricted between 0.5 (the point of inflection) and 0.6 if no qualitative information regarding expectations is available, but we are almost certain that the data obtained is accurate. The value of τα is then approximated as 1≤τ^α≤1.2247. In this case, the results produced will be less conservative than that of the normal distribution at α=0.05 (i.e., z0.05=1.645). For α<0.5, a deterministic model will suffice since the DEA-Chebyshev model will provide the same results as that of the DEA.

4.4. The “<inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M223"><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:math></inline-formula><italic>-Flexibility Function” </italic><inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M224"><mml:mrow><mml:msub><mml:mrow><mml:mi>τ</mml:mi></mml:mrow><mml:mrow><mml:mi>α</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>: Unique Management Interpretation

It may not be sufficient to develop a model that technically sounds with appropriate theoretical proofs. We cannot discount the fact that management expertise can play an important role in defining the corrected frontier nor should we cause the user to reject the model. Hence, DEA-Chebyshev model is designed to incorporate management input, which can become a crucial factor in the modeling process. One of the major advantages of this model is its flexibility as compared to models that require a distributional form. It can provide crucial information to management based upon their expertise and experience in their own field of specialization thereby redefining the efficient frontier.

In DEA-Chebyshev model, α has a unique management interpretation and implication. It can be defined as the management’s opinion of the expected degree of competence with regard to either input or output usage. In other words, it is the estimated degree of deviation from the observed level of performance. The smaller the value of α is, the more certain that the data is accurate and that little improvements can be made ceteris paribus or that expectations have been approximately met. When α=0, then DCF=DEA, implying that management is certain that the data they have obtained is accurate (no need to account for deviation or random effects or inefficiency) or that present expectations have been met. If α~1, then it implies that the data obtained is extremely erroneous or that expectations are not met.

The value for α is an aggregate of two factors (or two events). First, the certainty of inaccuracy is denoted by P(E), and second, the translated percentage of inaccuracy is denoted by P(D). Let P(E) denote the true/false existence of errors. When P(E)=1, it implies that the data is inaccurate. If P(E)=1, then 0.5<P(D)<1; otherwise, P(D)=0. In other words, event E implies D; when the data is 100% accurate, then there is no deviation. Therefore, α can be defined:(34a)α=P(DE)=P(D∩E)P(E)=P(D)P(E).

Proof.

P ( D ) = P ( D ∩ E ) + P ( D ∩ E ′ ) , since P(D∩E′)=0, then P(D)=P(D∩E).

Hence, for P(E)=1, α can be approximated as (34b)α~P(D)P(E)+k=P(D)+k. The constant, k≥0, represents the degree of (the expert's) uncertainty.

When deviation due to errors is negligible, then % deviation from observed ~0. Hence α will be at most 0.5. P(error)=0 implying that the data is error-free, thus % deviation from observed =0. In this case, α=0 and DCF=DEA. Based on (31), the value for τ^α should be restricted to not be less than 1, and therefore, α≥0.5. Otherwise, the confidence limits become too small, which implies that DCF≅DEA. We do not want this to occur because DCF should only equal DEA when there is absolute certainty that the data is error-free. Hence, P(D) must be defined such that 0.5≤α<1 (34b) for P(E)=1 and zero otherwise.

4.5. Approximating the Error-Free Frontier: Development of the DCF

Unlike the straightforward method in which DEA scores are calculated, DEA-Chebyshev model efficiency scores are slightly more complicated to obtain. There are five stages to the determination of the best efficiency rating for a DMU.

Stage I. Determining the DEA efficient units.

Stage II. Establishing the upper and lower limits for efficiency scores using DEA-Chebyshev model where the value of α is defined to reflect management concerns.

Stage III. Establishing the corrected frontier from the upper and lower limits calculated in stage II for DEA efficient units. The upper and lower limits of efficiency scores established by DEA-Chebyshev model for each of the DEA-efficient units form the confidence bounds for the error-free efficiency scores. These limits determine the most likely location of the EFF. The following are characteristic of DEA-Chebyshev model efficiency scores. (1)

An efficient DMU with a smaller standard deviation implies a smaller confidence region in which the EFF resides, hence, this particular DMU is considered to be more robustly efficient since it is closer to the EFF.

(2)

It can be conjectured that for DEA efficient DMUs, θU≤1 and θL≥1 will always be true (not so for the inefficient units).

(3)

When θL≥c where c is a very large constant, it may be an indication that the DMU is likely an outlier.

(4)

In general, the mean efficiency score in DEA-Chebyshev model is such that θ-=(θU+θL)/2≈θDEA, unless the third characteristic previously mentioned is observed.

5. Simulation

Five data sets, each containing 15 DMUs in a two-input one-output scenario, were generated in order to illustrate the approximation of the EFF using the DEA-Chebyshev model. This will demonstrate the proximity of the DCF to the EFF. A comparison is drawn between the results provided by the DCF, DEA, and the CCP input-oriented VRS models as compared against the EFF.

5.1. Step I: Simulation: The Data Generating Process

The first data set shown in Table 1 is known as the control group. It contains two inputs and one output generated using a logarithmic production function of the following form: (35)y=β0+β1lnx12+β2lnx22, where β0 is some constant and β1 and β2 are arbitrary weights or coefficients assigned to inputs. Input 1 (x1) has been chosen arbitrarily and input 2 (x2) is a function of x1;x2=c(1/x1), where c is some arbitrary constant; in this case c=24. This is to ensure that the frontier generated by the control group contains only efficient units and is convex. The linear convex combination in EFF consists of discrete production possibility sets defined for every individual DMU. Output (y) is then calculated using the equation shown in (35) from a discrete set of inputs where β0, β1, and β2 have been arbitrarily defined and are fixed for the all groups (control and experimental). The control group is one that contains no measurement errors or statistical errors and no inefficient DMUs. It will be the construct of the EFF.

Table 1

Control group: the error-free production units.

DMU	Output	Input 1	Input 2
1	12.55	2	12
2	10.43	3	8
3	9.68	4	6
4	9.53	5	4.8
5	9.68	6	4
6	10.01	7	3.43
7	10.43	8	3
8	11.45	10	2.4
9	11.99	11	2.18
10	12.55	12	2
11	13.12	13	1.85
12	14.25	15	1.6
13	15.36	17	1.41
14	16.46	19	1.26
15	16.99	20	1.2

The experimental groups are generated from the control group with the error components. Their outputs are the same as the control groups and are held deterministic, while inputs are stochastic containing confounded measurement errors distributed as half-normal nonzero inefficiency N+(μ,σ2) and statistical noise N(0,1)(36a)y~β0+β1lnx^12+β2lnx^22. In (36a), inputs are confounded with random errors and inefficiency: (36b)x^i=xi+εi,where ε=v+u. Variability in the inputs across simulations is produced by different arbitrarily chosen μ and σ for the inefficiency component which is distributed half normally; u~N+(μ,σ2) for each simulation. Table 2 shows the details.

Table 2

Four experimental groups with variations and inefficiencies introduced to both inputs while keeping outputs constant.

DMU	Output	Experimental Grp 1		Experimental Grp 2		Experimental Grp 3		Experimental Grp 4
DMU	Output	Input 1	Input 2	Input 1	Input 2	Input 1	Input 2	Input 1	Input 2
1	12.55	3.16	12.5	2.34	12.85	2.91	12.6	2.68	13.92
2	10.43	3.69	9.08	1.6	10.07	2.34	8.23	3.32	8.34
3	9.68	4.88	8.41	3.58	5.97	6.1	6.43	4.25	6.53
4	9.53	5.27	5.31	7.28	9.43	7.84	3.96	6.44	4.25
5	9.68	8.39	7.43	6.98	5.9	7.64	2.96	9.93	3.55
6	10.01	9.17	3.8	7.04	5.57	9.6	4.01	10.46	4.98
7	10.43	10.92	3.11	9.6	3.26	7.71	2.9	6.29	2.95
8	11.45	13.14	3.95	11.41	1.88	10.38	3.14	11.71	3.05
9	11.99	9.33	2.85	11.53	4.75	13.88	0.59	13.25	2.47
10	12.55	10.38	7.43	13.94	2.46	12.55	4.44	12.19	3.73
11	13.12	12.67	1.69	12.46	4.79	13.53	1.1	13.24	1.1
12	14.25	17.59	4.8	15.71	2.09	16.57	2.27	14.14	2.08
13	15.36	17.35	4.23	17.33	4.44	15.35	1.38	15.47	2.25
14	16.46	19.13	1.4	20.33	3.49	19.11	0.06	18.67	0.57
15	16.99	19.98	2.51	19.31	4.85	20.57	1.21	19.32	2.59

5.2. Step II: Establishing Efficiency Scores: DEA, DEA-Chebyshev Model, and CCP Efficiency Evaluation

The DEA results were calculated using ProDEA, while CCP and DEA-Chebyshev model results were calculated using MathCad. The CCP LP formulation follows that from [11, 18], the upper and lower bounds for the CCP frontier are(37a)θCCPU=min{θU∣yj0≤∑r=1qyjrλr, E(θLxi0-∑r=1qxirλr) -1.645σ≥0, ∑r=1qλr=1, λr≥0},(37b)θCCPL=min{(θLxi0-∑r=1qxirλr)θL∣yj0≤∑r=1qyjrλr, E(θLxi0-∑r=1qxirλr) +1.645σ≥0, ∑r=1qλr=1, λr≥0}.

Table 3 shows the results of the efficiency analysis for the DEA and CCP models. The λ-conditions which CCP must satisfy will be the same for the DCF. The value, ∑R=1qλr,R, for CCP is approximately the same as that for the DCF. Although DMU11 is DEA efficient, it is not CCP efficient given that is has violated one of the two λ-conditions. Note that ∑R=1qλ-r,R=(∑R=1qλr,RU+∑R=1qλr,RL)/2 shown in Tables 3, 4, 5, and 6.

Table 3

DEA and CCP efficiency evaluation for simulation 1.

	DEA θ	∑ R = 1 q λ r , R	CCP (U) θCCPU	CCP (L) θCCPL	Average θ-CCP	CCP θ^CCP	∑ R = 1 q λ r , R U	∑ R = 1 q λ r , R L	∑ R = 1 q λ - r , R
DMU1	1	1.674	0.795	1.52	1.158	1	1.834 (8)	2.124 (6)	1.979
DMU2	1	1.58	0.762	1.259	1.011	1	1.31 (3)	1.18 (5)	1.245
DMU3	0.892	0	0.694	1.074	0.884	0.884	0	0.323	0.162
DMU4	1	2.56	0.69	1.277	0.984	0.984	3.458 (7)	1.785 (8)	2.621
DMU5	0.679	0	0.481	0.852	0.666	0.666	0	0	0
DMU6	0.909	0	0.706	1.089	0.898	0.898	0	0.5463	0.273
DMU7	0.882	0	0.653	1.094	0.873	0.873	0	0.5199	0.26
DMU8	0.715	0	0.538	0.885	0.711	0.711	0	0	0
DMU9	1	4.778	0.777	1.238	1.008	1	4.876 (10)	2.415 (9)	3.645
DMU10	0.787	0	0.665	0.894	0.779	0.779	0	0	0
DMU11	1	1.105	0.82	1.593	1.206	0.91	0.0996 (3)	2.37 (9)	1.235
DMU12	0.749	0	0.666	0.819	0.743	0.743	0	0	0
DMU13	0.879	0	0.772	0.962	0.867	0.867	0	0	0
DMU14	1	2.302	0.912	2.154	1.533	1	1.532 (4)	2.134 (6)	1.833
DMU15	1	1	0.924	2.906	1.915	1	1.892 (2)	1.601 (5)	1.747

Table 4

DEA and CCP efficiency evaluation for simulation 2.

	DEA θ	∑ R = 1 q λ r , R	CCP (U) θCCPU	CCP (L) θCCPL	Average θ-CCP	CCP θ^CCP	∑ R = 1 q λ r , R U	∑ R = 1 q λ r , R L	∑ R = 1 q λ - r , R
DMU1	1	1.222	0.803	1.702	1.252	1	1.61 (6)	1.449 (5)	1.53
DMU2	1	1	0.759	1.924	1.341	0.879	0.875 (2)	1.117 (6)	0.996
DMU3	1	4.377	0.699	1.329	1.014	1	4.205 (8)	2.998 (7)	3.602
DMU4	0.593	0	0.425	0.764	0.595	0.595	0	0	0
DMU5	0.822	0	0.615	1.012	0.814	0.814	0	0.0678	0.034
DMU6	0.848	0	0.639	1.038	0.839	0.839	0	0.3006	0.15
DMU7	0.948	0	0.73	1.164	0.947	0.947	0	0.7558	0.378
DMU8	1	2.872	0.78	1.629	1.204	1	3.263 (10)	2.305 (10)	2.784
DMU9	0.843	0	0.727	0.963	0.845	0.845	0	0	0
DMU10	0.915	0	0.779	1.243	1.011	0.889	0	0.7534	0.377
DMU11	0.917	0	0.789	1.026	0.907	0.907	0	0.0958	0.048
DMU12	1	3.074	0.847	1.64	1.243	1	2.603 (6)	2.132 (8)	2.367
DMU13	0.941	0	0.856	1.033	0.944	0.944	0	0.1427	0.071
DMU14	1	1	0.888	1.439	1.163	0.944	0.259 (2)	1.264 (4)	0.761
DMU15	1	1.455	0.922	1.514	1.218	1	2.186 (5)	1.62 (5)	1.903

Table 5

DEA and CCP efficiency evaluation for simulation 3: if the data contains small nonsystematic errors, the DEA model outperforms the CCP. CCP works well under conditions where inefficiency has not been partially offset by noise.

	DEA θ	∑ R = 1 q λ r , R	CCP (U) θCCPU	CCP (L) θCCPL	Average θ-CCP	CCP θ^CCP	∑ R = 1 q λ r , R U	∑ R = 1 q λ r , R L	∑ R = 1 q λ - r , R
DMU1	1	1	0.794	1.566	1.18	1	1.136 (4)	1.283 (2)	1.20945
DMU2	1	1.901	0.731	1.603	1.167	1	3.148 (11)	1.305 (4)	2.22655
DMU3	0.845	0	0.659	1.003	0.831	0.831	0	0	0
DMU4	0.898	0	0.67	1.079	0.874	0.874	0	0	0
DMU5	1	2.986	0.728	1.235	0.982	0.982	0 (0)	2.137 (7)	1.0685
DMU6	0.779	0	0.571	0.954	0.762	0.762	0	0	0
DMU7	1	2.704	0.725	1.24	0.982	1	5.681 (10)	2.598 (7)	4.13975
DMU8	0.877	0	0.705	1.028	0.867	0.867	0	0	0
DMU9	1	1	0.791	2.408	1.599	0.896	0 (0)	1.963 (10)	0.98141
DMU10	0.779	0	0.664	0.88	0.772	0.772	0	0	0
DMU11	1	1	0.799	1.298	1.048	0.899	0	0.6928	0.3464
DMU12	0.814	0	0.674	0.926	0.8	0.8	0	0	0
DMU13	1	2.409	0.893	1.161	1.027	0.947	2.634 (8)	0.451 (3)	1.54245
DMU14	1	1	0.936	29.92	15.43	0.968	0.585 (2)	3.528 (6)	2.05655
DMU15	1	1	0.926	2.77	1.848	1	1.816 (2)	1.041 (2)	1.42865

Table 6

DEA and CCP efficiency evaluation for simulation 4.

	DEA θ	∑ R = 1 q λ r , R	CCP (U) θCCPU	CCP (L) θCCPL	Average θ-CCP	CCP θ^CCP	∑ R = 1 q λ r , R U	∑ R = 1 q λ r , R L	∑ R = 1 q λ - r , R
DMU1	1	1.036	0.797	1.613	1.205	1	1.182 (7)	1.383 (3)	1.283
DMU2	1	1	0.726	1.294	1.01	0.863	1.954 (6)	0.911 (3)	1.432
DMU3	1	1.255	0.773	1.207	0.99	0.99	0 (0)	0.715 (4)	0.358
DMU4	0.899	0	0.667	1.129	0.898	0.898	0	0.939	0.469
DMU5	0.747	0	0.462	0.99	0.726	0.726	0	0	0
DMU6	0.6	0	0.428	0.815	0.622	0.622	0	0	0
DMU7	1	5.52	0.712	1.367	1.039	1	7.079 (13)	3.819 (10)	5.449
DMU8	0.754	0	0.57	0.981	0.775	0.775	0	0	0
DMU9	0.774	0	0.601	1.013	0.807	0.807	0	0.018	0.009
DMU10	0.818	0	0.696	0.929	0.812	0.812	0	0	0
DMU11	1	2.009	0.797	1.781	1.289	0.899	0 (0)	2.338 (9)	1.169
DMU12	0.969	0	0.829	1.079	0.954	0.954	0	0.518	0.259
DMU13	1	1.87	0.935	1.098	1.017	0.968	1.455 (3)	0.506 (3)	0.981
DMU14	1	1.31	0.912	3.899	2.406	1	1.303 (5)	2.734 (7)	2.018
DMU15	1	1	0.922	2.743	1.832	1	2.028 (3)	1.119 (2)	1.573

In this simulation, because we do expect data collected to be reasonably reliable, a less conservative model would be a better choice. Conservative models tend to provide results with greater standard deviation and therefore produce an estimate with less accuracy. The four simulations were designed to test CCP, DEA, and DEA-Chebyshev model to determine the accuracy of the results obtained in comparison to the EFF. The results for DEA, CCP, and DCF for all four simulations using the values of α can be found in Tables 3, 4, 5, 6, 8, 9, 10, and 11. The upper (38a) and lower (38b) bounds for the constraints in the DCF formulation are given as(38a)E(θUxi0-∑r=1qxirλr)-τ^ασ≥0,(38b)E(θLxi0-∑r=1qxirλr)+τ^ασ≥0.When α increases, τ^ασ also increases and so will the spread between the upper and lower bounds of θ.

When the degree of deviation from observed performance levels is available, the results generated using DEA-Chebyshev model are generally a more precise approximation of the EFF compared to CCP, which assumes the normal distribution. From the simulations, it has been shown that the alpha values based on the deviation from the observed level of performance consistently produce the best approximations. The estimated degree of deviation due to inefficiency from the observed level of performance is formulated as follows: (39)α~P(D)P(E)+k=P(D)+k=1+P(deviation)2+k, where α denotes management or expert defined values of data deviation (if available) and “k” denotes a constant correction factor. In other words, it is a reflection of the users’ confidence of their own expectations where “k” will always be greater than or equal to “0.” P(deviation) is defined to be the perceived excess of inputs to observed inputs. The numerical calculations using (39) are shown in Table 7.

Table 7

Qualitative information: determining the value for α.

Simulation 1 Largest % deviation from the expected level of performance of the 4 simulations	α ~ 1 + ( 0.112 + 0.282 ) 2 + k ~ 0.75 ∴ τ ^ α = 1.732

Simulation 2	α ~ 1 + ( 0.067 + 0.312 ) 2 + k ~ 0.74 ∴ τ ^ α = 1.687

Simulation 3 Smallest % deviation from the expected performance level of the 4 simulations	α ~ 1 + ( 0.118 + 0.132 ) 2 + k ~ 0.675 ∴ τ ^ α = 1.441

Simulation 4	α ~ 1 + ( 0.092 + 0.23 ) 2 + k ~ 0.72 ∴ τ ^ α = 1.604

Note that in the simulations, the correction factor is set to k~0.05 which implies that the user may have underestimated by 5%. Note that the value for k can be zero. The values are calculated as the perceived inefficiency divided by the observed values.

Table 8

DEA-Chebyshev model efficiency analysis from simulation 1 at α=0.75.

	θ ^ α = 0.75 U Upper bounds	θ ^ α = 0.75 L Lower bounds	∑ R = 1 q λ r , R U	∑ R = 1 q λ r , R L	∑ R = 1 q λ - r , R	St. dev (θ^)	θ ^ α = 0.75
DMU1	0.786	1.548	1.85 (8)	2.127 (6)	1.988 (0.63)	0.539	1
DMU2	0.751	1.272	1.297 (3)	1.184 (5)	1.24 (0.83)	0.368	1
DMU3	0.683	1.082	0	0.357	0.179	0.282	0.883
DMU4	0.673	1.287	3.491 (7)	1.72 (8)	2.605 (0.02)	0.434	0.98
DMU5	0.47	0.858	0	0	0	0.275	0.664
DMU6	0.696	1.096	0	0.591	0.295	0.283	0.896
DMU7	0.643	1.104	0	0.547	0.274	0.326	0.874
DMU8	0.531	0.892	0	0	0	0.255	0.712
DMU9	0.767	1.249	4.839 (10)	2.363 (9)	3.601 (0.03)	0.341	1
DMU10	0.659	0.898	0	0	0	0.169	0.779
DMU11	0.813	1.628	0.101 (3)	2.328 (9)	1.214 (0.006)	0.577	0.906
DMU12	0.662	0.822	0	0	0	0.113	0.742
DMU13	0.768	0.965	0	0	0	0.14	0.867
DMU14	0.906	2.225	1.53 (4)	2.193 (8)	1.862 (0.55)	0.932	1
DMU15	0.92	3.232	1.892 (2)	1.592 (5)	1.742 (0.75)	1.635	1

Table 9

DEA-Chebyshev model efficiency analysis from simulation 2 at α=0.74.

	θ ^ α = 0.75 U Upper bounds	θ ^ α = 0.75 L Lower bounds	∑ R = 1 q λ r , R U	∑ R = 1 q λ r , R L	∑ R = 1 q λ - r , R	St. dev (θ^)	θ ^ α = 0.75
DMU1	0.793	1.739	1.787 (8)	1.45 (5)	1.619 (0.5)	0.669	1
DMU2	0.748	1.964	0.809 (1)	1.178 (7)	0.994 (0.25)	0.86	0.874
DMU3	0.684	1.342	4.027 (8)	2.832 (7)	3.429 (0.05)	0.465	1
DMU4	0.417	0.771	0	0	0	0.25	0.594
DMU5	0.604	1.02	0	0.115	0.058	0.294	0.812
DMU6	0.628	1.047	0	0.337	0.169	0.296	0.837
DMU7	0.719	1.174	0	0.764	0.382	0.322	0.947
DMU8	0.769	1.657	3.568 (10)	2.269 (10)	2.918 (0.08)	0.627	1
DMU9	0.719	0.967	0	0	0	0.176	0.843
DMU10	0.78	1.264	0	0.794	0.397	0.342	0.89
DMU11	0.782	1.03	0	0.115	0.057	0.175	0.906
DMU12	0.84	1.664	2.26 (5)	2.103 (8)	2.182 (0.83)	0.582	1
DMU13	0.852	1.037	0	0.152	0.076	0.131	0.944
DMU14	0.887	1.46	0.646 (1)	1.273 (4)	0.959 (0.08)	0.405	0.943
DMU15	0.918	1.556	1.904 (5)	1.617 (5)	1.761 (0.29)	0.452	1

Table 10

DEA-Chebyshev model efficiency analysis from simulation 3 at α=0.675.

	θ ^ α = 0.675 U Upper bounds	θ ^ α = 0.675 L Lower bounds	∑ R = 1 q λ r , R U	∑ R = 1 q λ r , R L	∑ R = 1 q λ - r , R	St. dev (θ^)	θ ^ α = 0.675
DMU1	0.794	1.566	1.073 (3)	1.356 (2)	1.214 (0.48)	0.4796	1
DMU2	0.731	1.603	2.528 (9)	1.47 (7)	1.999 (0.006)	0.5503	1
DMU3	0.659	1.003	0	0	0	0.213	0.833
DMU4	0.67	1.079	0	0.461	0.23	0.255	0.877
DMU5	0.728	1.235	0.377 (1)	2.111 (8)	1.244 (0.06)	0.3195	0.985
DMU6	0.571	0.954	0	0	0	0.24	0.765
DMU7	0.725	1.24	5.206 (10)	2.573 (8)	3.889 (0.008)	0.326	1
DMU8	0.705	1.028	0	0.027	0.014	0.204	0.87
DMU9	0.791	2.408	0.805 (1)	1.061 (8)	0.933 (0.005)	0.9715	0.905
DMU10	0.664	0.88	0	0	0	0.1347	0.774
DMU11	0.799	1.298	0	1.077	0.538	0.2745	0.921
DMU12	0.674	0.926	0	0	0	0.157	0.803
DMU13	0.893	1.161	2.855 (7)	1.113 (5)	1.984 (0.03)	0.1655	1
DMU14	0.936	29.92	1 (1)	2.718 (6)	1.859 (0.04)	17.971	1
DMU15	0.926	2.77	1.156 (2)	1.034 (4)	1.095 (0.37)	1.2342	1

Table 11

DEA-Chebyshev model efficiency analysis from simulation 4 at α=0.725.

	θ ^ α = 0.725 U Upper bounds	θ ^ α = 0.725 L Lower bounds	∑ R = 1 q λ r , R U	∑ R = 1 q λ r , R L	∑ R = 1 q λ - r , R	St. dev(θ^)	θ ^ α = 0.725
DMU1	0.8	1.605	1.207 (7)	1.377 (3)	1.292 (0.68)	0.57	1
DMU2	0.729	1.291	1.951 (6)	0.918 (3)	1.4347 (0.05)	0.398	1
DMU3	0.776	1.204	0 (0)	0.719 (4)	0.359 (0.1)	0.303	0.99
DMU4	0.67	1.126	0	0.92	0.46	0.322	0.898
DMU5	0.464	0.987	0	0	0	0.37	0.726
DMU6	0.43	0.813	0	0	0	0.271	0.622
DMU7	0.716	1.363	6.874 (12)	3.849 (10)	5.361 (0.00)	0.458	1
DMU8	0.572	0.979	0	0	0	0.288	0.775
DMU9	0.603	1.01	0	0.015	0.007	0.288	0.807
DMU10	0.697	0.928	0	0	0	0.163	0.812
DMU11	0.799	1.767	0 (0)	2.327 (9)	1.164 (0.002)	0.685	0.9
DMU12	0.831	1.077	0	0.512	0.256	0.174	0.954
DMU13	0.884	1.097	2.217 (4)	0.514 (3)	1.366 (0.06)	0.15	0.991
DMU14	0.913	3.862	1.316 (5)	2.731 (7)	2.023 (0.03)	2.085	1
DMU15	0.923	2.682	1.435 (3)	1.119 (2)	1.277 (0.29)	1.244	1

Note: In Tables 8–11, the values shown in columns 4 and 5 in brackets represent the frequency with which a DEA-efficient DMU is used as a reference unit in DCF. Those in column 6 represent the P values for the upper and lower limits for the lambdas for the DEA-efficient units.

The Tables 8-11 show efficiency scores determined under DEA-Chebyshev model, based on the α-values shown in Table 7.

5.3. Step III: Hypothesis Testing: Frontiers Compared

All the efficiency evaluation tools will be measured against the control group to determine which of these would provide the best approximation method. Both CCP and DEA-Chebyshev model efficiency scores are defined in the same manner. The upper and lower bounds of the frontier determine the region where the EFF may likely be and is approximated by the DCF efficiency score, θ^.

Using the results obtained in Step II, the four simulated experimental groups are adjusted using their respectively efficiency scores. The virtual DMUs are the DMUs from the four experimental groups in which their inputs have been reduced according to their efficiency scores from Step II, according to the contraction factor, θ for DEA, θ^CCP for CCP, and θ^ for DCF.

In this step, in order to test the hypothesis, the 12 data sets of virtual DMUs are each aggregated with the control group, forming a sample size of 30 DMUs per simulation. “DMU#” denotes the control group (or “sample one”) and “V.DMU#” denotes the efficient virtual units derived from the experimental group (or “sample two”) using the efficiency scores generated by DEA, CCP, and DEA-Chebyshev model, respectively. There are 12 data sets in total: three for each of the simulations (three input contraction factors per DMU, from DEA, CCP (normal), and DEA-Chebyshev model). The inputs for the virtual DMUs calculated from each of these three methodologies for the same experimental group will be different. The sample size of 30 DMUs in each of the 12 sets is a result of combining the 15 error-free DMUs with the 15 virtual DMUs. These 30 DMUs are then evaluated using ProDEA (software). It is logical to use DEA for our final analysis to scrutinize the different methods since this is a deterministic method, which would work perfectly in an error-free situation. The DEA results for the 4 simulations are given in Table 12.

Table 12

Deterministic efficiency results for all four simulations with an aggregate of 30 DMUs; 15 from the control group and another 15 virtual units calculated according to CCP and DEA, respectively.

	Simulation 1			Simulation 2			Simulation 3			Simulation 4
	DEA	CCP	DCF	DEA	CCP	DCF	DEA	CCP	DCF	DEA	CCP	DCF
DMU1	1	1	1	1	1	1	1	1	1	1	1	1
DMU2	1	1	1	0.986	0.946	0.942	0.962	0.962	0.962	1	0.937	1
DMU3	1	1	1	0.96	0.96	0.96	1	1	1	1	0.981	1
DMU4	1	1	1	1	1	1	1	1	1	0.989	0.977	0.989
DMU5	1	1	1	1	1	1	1	1	1	0.945	0.94	0.945
DMU6	1	1	1	1	1	1	0.991	0.987	0.988	0.888	0.888	0.888
DMU7	1	1	1	1	1	1	0.965	0.965	0.965	0.901	0.885	0.885
DMU8	1	0.965	0.963	1	1	1	0.971	0.933	0.943	0.914	0.872	0.872
DMU9	0.991	0.937	0.935	1	1	1	0.968	0.917	0.931	0.918	0.863	0.863
DMU10	0.978	0.906	0.903	1	1	1	0.962	0.901	0.919	0.926	0.87	0.871
DMU11	0.966	0.882	0.878	1	1	1	0.954	0.893	0.913	0.932	0.876	0.877
DMU12	0.985	0.931	0.929	1	1	1	0.934	0.903	0.911	0.939	0.906	0.914
DMU13	0.996	0.966	0.965	1	1	1	0.914	0.909	0.912	0.949	0.932	0.939
DMU14	1	0.991	0.991	1	1	1	0.973	0.957	0.973	0.967	0.967	0.967
DMU15	1	1	1	1	1	1	1	1	1	1	1	1

V.DMU1	0.889	0.885	0.884	0.921	0.921	0.921	0.898	0.999	0.898	0.841	0.84	0.84
V.DMU2	0.86	0.86	0.86	1	1	1	1	0.987	0.993	0.938	1	0.938
V.DMU3	0.864	0.872	0.873	1	1	1	1	1	1	0.931	0.92	0.941
V.DMU4	0.929	0.944	0.948	0.976	0.972	0.974	1	0.971	0.986	0.982	0.979	0.984
V.DMU5	0.926	0.943	0.946	0.934	0.943	0.945	1	1	1	1	1	1
V.DMU6	0.915	0.927	0.928	0.955	0.966	0.968	1	0.998	1	0.999	0.963	0.964
V.DMU7	0.959	0.947	0.946	0.926	0.926	0.927	1	1	1	1	1	1
V.DMU8	0.977	0.954	0.951	1	1	1	1	1	1	1	0.929	0.93
V.DMU9	1	0.99	0.987	0.956	0.946	0.946	0.989	0.989	0.989	1	0.898	0.899
V.DMU10	0.959	0.938	0.936	0.933	0.959	0.958	1	1	1	0.989	0.958	0.959
V.DMU11	1	1	1	0.939	0.94	0.94	0.938	0.954	0.952	1	1	1
V.DMU12	0.977	0.953	0.952	0.933	0.932	0.932	0.972	0.989	0.988	0.996	0.976	0.987
V.DMU13	0.971	0.975	0.975	0.903	0.899	0.899	0.998	1	1	0.992	1	0.995
V.DMU14	0.986	0.98	0.979	0.872	0.924	0.924	0.99	1	1	1	1	1
V.DMU15	1	1	1	1	1	1	1	1	1	1	1	1

In order to determine if the frontiers created by these models are substantially different from that of the control group (or the error-free units), the rank-sum-test and statistical hypothesis test for mean differences were used.

The DEA-Chebyshev model is scrutinized using several statistical methods, which show that there is a strong relationship between the DCF and the EFF. All the statistical tools used to test the DCF against the EFF have produced consistent conclusions that the corrected frontier is a good approximation of the EFF. The statistical methods used to test the DCF versus the EFF are the Wilcoxon-Mann-Whitney test (or the rank-sum test) and the t-test for the differences in mean values of θ shown in Table 13. The rank-sum test is used to determine if the virtual DMUs established by the DCF are from the same population as that of the DMUs in the control group; if they are, then the difference in efficiency scores of both groups will not be statistically significant. This does not imply that the EFF and the corrected frontier are exactly the same but rather that the latter is a good approximation of the former. Its results are better than that of the CCP performance evaluation method developed by Land et al. [11] and Forrester and Anderson [18].

Table 13

Hypothesis tests for mean differences of efficiency scores. Sample 1 is denoted as the “Control group” and sample 2 is denoted as the “Virtual group”.

	Simulation 1		Simulation 2		Simulation 3		Simulation 4
	Control group	Virtual group	Control group	Virtual group	Control group	Virtual group	Control group	Virtual group
DEA
Mean	0.999	0.943	0.996	0.95	0.973	0.986	0.951	0.978
Variance	0.00001	0.00187	0.00011	0.00153	0.0007	0.0009	0.0015	0.0019
Observations	15	15	15	15	15	15	15	15
Pearson correlation	0.7117		0.1166		- 0.5253		- 0.1409
Hypothesized mean difference	0		0		0		0
Df	14		14		14		14
Rank-sum test	−3.09		−3.2146		1.3688		1.7213
t stat	5.2614		4.5917		- 1.0167		- 1.6501
P(T ≤ t) two tail	0.00012		0.00042		0.3266		0.1212
t critical two tail	2.145		2.145		2.145		2.145

CCP efficiency evaluation
Mean	0.972	0.944	0.994	0.955	0.955	0.992	0.926	0.964
Variance	0.0016	0.0019	0.00028	0.0011	0.00176	0.00018	0.0025	0.00231
Observations	15	15	15	15	15	15	15	15
Pearson correlation	0.35661		- 0.5373		- 0.14		- 0.5035
Hypothesized mean difference	0		0		0		0
Df	14		14		14		14
Rank-sum test	−1.8873		−3.0072		2.136		2.0117
t stat	2.2373		3.334		- 3.1453		- 1.7383
P(T ≤ t) two tail	0.042		0.005		0.0072		0.1041
t critical two tail	2.145		2.145		2.145		2.145

DCF
Mean	0.971	0.944	0.993	0.956	0.961	0.987	0.934	0.962
Variance	0.00168	0.0019	0.0003	0.0011	0.0013	0.0008	0.00304	0.00217
Observations	15	15	15	15	15	15	15	15
Pearson correlation	0.3296		- 0.5296		- 0.4235		- 0.0966
Hypothesized mean difference	0		0		0		0
Df	14		14		14		14
Rank-sum test	−1.8873		−2.9657		1.8665		1.2236
t stat	2.1038		3.2401		- 1.8448		- 1.4533
P(T ≤ t) two tail	0.05396		0.0059		0.08633		0.1682
t critical two tail	2.145		2.145		2.145		2.145

The Rank-sum test shown previously is used to determine if the two samples being tested are of the same population. If they are of the same population, then we can conclude that the two frontiers for both the samples respectively, are one, and the same or that they consistently overlap one another, thus they can be assumed to be of the same surface.

5.4. Step IV: Efficiency Scores: DEA versus DEA-Chebyshev Model and Ranking of DEA Efficient Units

There can be more than one way of ranking efficient units. In the simplest (or naïve) case, empirically efficient DMUs can be ranked according to the score θ- calculated as an average of the upper and lower limits from the DEA-Chebyshev model.

5.4.1. Naïve Ranking

Table 14 illustrates the ranking of all DMUs. The figures in bold denote the DEA-Chebyshev model efficiency scores for the DEA efficient units. All production units are ranked in descending order of efficiency according to the average of the upper and lower limits, θ-. An anomaly in DMU14 of simulation 3 is caused by an extremely small value for Input 2. Because the LP formulation for DEA, DEA-Chebyshev model, and CCP (normal) applies the greatest weight to the input or output in order to make a DMU appear as favourable as possible, Input 2 in this case is weighted heavily. In DEA, the mathematical algorithm does not allow the efficiency score to exceed 1.00; thus, this problem is not detected. In DEA-Chebyshev model and CCP, because efficiency scores are not restricted to 1.00, this problem arises indicating a possible outlier. It would be advisable to remove this DMU from the analysis. In this simulation, because the errors are generated randomly, the error-value for this DMU lies in the tail end of the distribution, hence, creating an outlier.

Table 14

“Naïve” ranking of empirically efficient DMUs in order of declining levels of efficiency. Values in bold correspond to DEA efficient units with a score of “1”.

	Rank	θ -
	DMU15	2.076
	DMU14	1.566
	DMU11	1.22
	DMU1	1.167
	DMU2	1.012
	DMU9	1.008
Simulation 1	DMU4	0.98
	DMU6	0.896
	DMU3	0.883
	DMU7	0.874
	DMU13	0.867
	DMU10	0.779
	DMU12	0.742
	DMU8	0.712
	DMU5	0.664

	DMU2	1.356
	DMU1	1.266
	DMU12	1.252
	DMU15	1.237
	DMU8	1.213
	DMU14	1.173
Simulation 2	DMU10	1.022
	DMU3	1.013
	DMU7	0.947
	DMU13	0.944
	DMU11	0.906
	DMU9	0.843
	DMU6	0.837
	DMU5	0.812
	DMU4	0.594

	DMU14	13.632
	DMU15	1.807
	DMU9	1.498
	DMU1	1.157
	DMU2	1.15
	DMU11	1.036
Simulation 3	DMU13	1.024
	DMU7	0.986
	DMU5	0.985
	DMU4	0.877
	DMU8	0.87
	DMU3	0.833
	DMU12	0.803
	DMU10	0.774
	DMU6	0.765

	DMU14	2.388
Simulation 4	DMU15	1.802
	DMU11	1.283
	DMU1	1.202
	DMU7	1.039
	DMU2	1.01
	DMU13	0.991
	DMU3	0.99
	DMU12	0.954
	DMU4	0.898
	DMU10	0.812
	DMU9	0.807
	DMU8	0.775
	DMU5	0.726
	DMU6	0.622

This method of ranking is naïve because it ignores the standard deviation, which indicates the robustness of a DMU's efficiency score to the possible errors and the unobserved inefficiency. It also does not distinguish between possible outliers and legitimate units.

5.4.2. Ranking by Robustness of DEA-Chebyshev Model Efficiency Scores

The ranking in the order of robustness of a DMU begins with the efficiency score defined as θ^. Those with θ^=1 are ranked from the most robust to the least robust (from the smallest standard deviation to the largest). The standard deviation is determined using the upper and lower bounds of the efficiency scores. Then the rest of the empirically efficient units are ranked based on their respective θ^ (using their standard deviations will also provide the same ranking for these units). Once all the empirically efficient units have been ranked, the remainders are organized according to their stochastic efficiency scores from the most efficient to the least efficient. The ranking of these inefficient units is very similar to that of the empirical frontier.

Ranking from the most efficient down, those DMUs which have a DEA-Chebyshev model score of θ^=1 (input oriented case) can fall into either of two categories: hyper-efficient or efficient/mildly efficient depending on how robust they are (based on their standard deviation). DMUs that are not printed in bold are DMUs that are DEA-inefficient (See Table 15), and hence, they are ranked below those which have been deemed empirically efficient. DEA efficient DMUs that fail to satisfy the conditions for θ^=1 will be given efficiency scores of at most 1.00.

Table 15

Ranking of efficient DMUs according to robustness based on their standard deviations. The DMUs in bold denote the empirically efficient DMUs.

	Simulation 1			Simulation 2			Simulation 3			Simulation 4
	θ ^ α = 0.75	Std. dev.		θ ^ α = 0.75	Std. dev.		θ ^ α = 0.675	Std. dev.		θ ^ α = 0.725	Std. dev.
DMU9	1	0.34111	DMU15	1	0.45149	DMU13	1	0.16546	DMU13	1	0.15033
DMU2	1	0.36819	DMU3	1	0.46499	DMU7	1	0.32591	DMU3	1	0.30285
DMU1	1	0.53889	DMU12	1	0.5823	DMU1	1	0.47956	DMU2	1	0.39775
DMU14	1	0.93225	DMU8	1	0.62735	DMU2	1	0.55027	DMU7	1	0.45771
DMU15	1	1.63455	DMU1	1	0.66871	DMU15	1	1.23418	DMU1	1	0.56972
DMU4	0.98	0.43388	DMU7	0.947	0.3218	DMU14	1	17.971	DMU11	1	0.68462
DMU11	0.906	0.57657	DMU13	0.944	0.13124	DMU5	0.985	0.31947	DMU15	1	1.24437
DMU6	0.896	0.28298	DMU14	0.943	0.4051	DMU11	0.921	0.2745	DMU14	1	2.0849
DMU3	0.883	0.28164	DMU11	0.906	0.17515	DMU9	0.905	0.97149	DMU12	0.969	0.17444
DMU7	0.874	0.32605	DMU10	0.89	0.34217	DMU4	0.877	0.25512	DMU4	0.899	0.32244
DMU13	0.867	0.13958	DMU2	0.874	0.85998	DMU8	0.87	0.20386	DMU10	0.818	0.16313
DMU10	0.779	0.16935	DMU9	0.843	0.17572	DMU3	0.833	0.21305	DMU9	0.774	0.28765
DMU12	0.742	0.11335	DMU6	0.837	0.29614	DMU12	0.803	0.15726	DMU8	0.754	0.28786
DMU8	0.712	0.25534	DMU5	0.812	0.2938	DMU10	0.774	0.1347	DMU5	0.747	0.3701
DMU5	0.664	0.2745	DMU4	0.594	0.25039	DMU6	0.765	0.24013	DMU6	0.6	0.27103

5.5. Further Analysis

Additional analyses were conducted by taking the observed DMUs in each simulation and evaluating them against the EFF, DEA, CCP, and DEA-Chebyshev model results. If DCF is a good approximation of the EFF, then the efficiency scores for the observed DMUs should not be substantially different from the efficiency scores generated by the EFF. This also holds true for CCP.

5.5.1. Observed DMUs Evaluated against the EFF, CCP, and DCF

The efficiency scores of the observed DMUs from the experimental groups determined by the EFF (to be denoted as “exp.grp+EFF”) will provide a benchmark for evaluating the DEA frontier (“exp.grp+DEA”), CCP (normal) frontier (“exp.grp+CCP”), and the corrected frontier (“exp.grp+DCF”). A comparison is drawn between the efficiency scores of the experimental groups generated by the four frontiers.

The hypothesis is that the mean of the efficiency scores for the 15 observed units in the “exp.grp+EFF” group and the “exp.grp+DCF” group should be approximately the same (i.e., the difference is not statistically significant). From Table 16, the null hypothesis can be seen from the rank-sum test and the t-test at α=0.05, and the difference is not statistically significant in simulations 3 and 4, hence, the corrected frontier is a good approximation of the EFF. Although the hypothesis test for simulations 1 and 2 indicates some level of significance, the results generated by the DCF model are still superior to those of the CCP and the DEA.

Table 16

Statistical analysis for frontier comparisons. Observed DMUs are evaluated against the 3 different frontiers to determine their efficiency scores which are calculated using the normal DEA model and to determine if the efficiency scores for each group are substantially different when comparing EFF to DEA, EFF to DCF, and EFF to CCP.

	Exp.grp+EFF	Exp.grp+DEA	Exp.grp+EFF	Exp.grp+DCF	Exp.grp+EFF	Exp.grp+CCP
Simulation 1
Mean	0.852	0.899	0.852	0.885	0.852	0.887
Variance	0.01396	0.01355	0.01396	0.01389	0.01396	0.01376
Observations	15	15	15	15	15	15
Pearson correlation	0.922		0.87986		0.881
Hypothesized mean difference	0		0		0
Df	14		14		14
Rank-sum test	1.2858		0.9125		0.9125
t stat	- 3.9644		- 2.2335		- 2.3537
P(T ≤ t) two tail	0.0014		0.04235		0.03372
t critical two tail	2.145		2.145		2.145

Simulation 2
Mean	0.875	0.922	0.875	0.908	0.875	0.908
Variance	0.01272	0.01242	0.0127	0.0115	0.0127	0.0115
Observations	15	15	15	15	15	15
Pearson correlation	0.94071		0.8875		0.8918
Hypothesized mean difference	0		0		0
Df	14		14		14
Rank-sum test	1.3066		0.9747		1.0162
t stat	- 4.6604		- 2.3984		- 2.487
P(T ≤ t) two tail	0.00037		0.031		0.02611
t critical two tail	2.145		2.145		2.145

Simulation 3
Mean	0.92	0.933	0.92	0.916	0.92	0.902
Variance	0.00879	0.00815	0.00879	0.00806	0.00879	0.0077
Observations	15	15	15	15	15	15
Pearson correlation	0.95301		0.8804		0.9082
Hypothesized mean difference	0		0		0
Df	14		14		14
Rank-sum test	0.7259		−0.0622		−0.6014
t stat	- 1.8125		0.29423		1.68719
P(T ≤ t) two tail	0.0914		0.7729		0.1137
t critical two tail	2.145		2.145		2.145

Simulation 4
Mean	0.882	0.904	0.882	0.887	0.882	0.868
Variance	0.0153	0.0173	0.0153	0.0184	0.0153	0.0162
Observations	15	15	15	15	15	15
Pearson correlation	0.9425		0.905		0.8996
Hypothesized mean difference	0		0		0
Df	14		14		14
Rank-sum test	0.8503		0.1452		−0.394
t stat	- 1.9248		- 0.312		1.0043
P(T ≤ t) two tail	0.0748		0.7599		−0.312
t critical two tail	2.145		2.145		2.145

Table 16 shows the statistical tests used to compare the DEA, CCP, and DCF against the EFF. The Pearson Correlation analysis, (regression line) which ranges from −1 to 1 inclusively, reflects the extent of the linear relationship between two sets of data. The P values, the rank-sum test, and the Pearson correlation observed for all the four simulations indicate that in general the DCF outperforms DEA and CCP (which assumed the normal distribution).

Outliers have a tendency to exhibit large standard deviations, which is translated to large confidence limits. Consequently, the reason for establishing DCF and CCP scores is to reduce the likelihood of a virtual unit from becoming an outlier. Also, the results generated by the stochastic models (as opposed to deterministic ones) such as the DCF and CCP can be greatly affected because the efficiency scores are generally not restricted to 1.00. In reality, outliers are not always easily detected. If the data set contains some outliers, the stochastic models may not perform well. DMU14 in Simulation 3 is an example of this problem. It can be solved by either removing the outliers or by imposing weight restrictions. However, weight restrictions are not within the scope of this paper.

6. Conclusions

Traditional methods of performance analysis are no longer sufficient in a fast paced constantly evolving environment. Observing past data alone is not adequate for future projections. The DEA-Chebyshev model is designed to bridge the difference between conventional performance measurements and new techniques to incorporate relevance into such measures. This algorithm not only provides a multidimensional evaluation technique, but it has successfully incorporated a new element into an existing deterministic technique (DEA). This is known as the k-flexibility function which was originally derived from the one-sided Chebyshev's inequality. This in turn allows management to include expert opinion as a single value, such as a 20% net growth by next year end from the current year. The single value is dichotomized into unmet (or over target) present level of growths (or declines). Because management expertise is included, the expected growth (or decline) is not unreasonable and will inherently include factors which do not need to be explicitly expressed in the model such as environmental, economic, and social changes. Since these changes are becoming increasingly rapid, performance measures can no longer ignore qualitative inputs. In a highly competitive environment, future projections and attainable targets are key performance indicators. Intellectual capital and knowledge are today’s two most important assets.

The combination of normal DEA with DCF can successfully provide a good framework for evaluation based on quantitative data and qualitative intellectual knowledge of management. When no errors are expected, then standard DEA models will suffice. DCF is designed such that in the absence of errors, the model will revert to a DEA model. This occurs when the k-flexibility function equals zero. DEA provides a deterministic frontier which DEA-Chebyshev model works on to define the estimate of the EFF.

The simulated dataset was tested on DEA-Chebyshev model. It has been statistically proven that this model is an effective tool with excellent accuracy to detect or predict the EFF frontier as a new efficiency benchmarking technique. It is an improvement over other methods, easily applied, practical, not computationally intensive, and easy to implement. The results have been promising thus far. The future work includes using a real data application to illustrate the usefulness of DEA-Chebyshev model.

Appendices A.

Note that semi-positive is defined to be the nonpositive characteristics of all data where at least one component in every input and output sector is positive; mathematically, Xi≥0, Xi≠0 and Yj≥0, Yj≠0. That is to say that for every DMU, there must be at least one positive value in both input and output. The following properties were noted from Cooper et al. [26, 28].(P.1)

Let Ψ be the production possibility set (PPS of physically attainable points (x,y): (A.1)Ψ={(x,y)∈R+m+n∣x can produce y}.

Each pair of input x∈Rm and output y∈Rn is regarded as semi-positive orthant point in m+n dimensional space in Ψ

(P.2)

Inefficiency:

For any semi-positive PPS where x~≥x and/or y~≥y, it is also true that they belong to the set of attainable points (x~,y~)⊂Ψ.

(P.3)

Convexity:

if (xir,yir)∈Ψ, r=1,…,q, and λr≥0, such that ∑ryr=1, then (∑rxirλr,∑ryjrλr)∈Ψ

(P.4)

Ray unboundedness:

if a PPS (x,y)⊂Ψ, then (tx,ty)⊂Ψ for any scalar t (refer to CRS).

(P.5)

Any semi-positive linear combination of PPS in Ψ also belongs to Ψ.

Therefore, satisfying (P.1)–(P.5), (A.2)Ψ={(x,y)∣x≥Xλ, y≤Yλ, λ≥0}.x+y represents a vector of inputs and a vector of outputs for one DMU; X+Y represents a matrix of inputs and outputs, respectively, for all DMUs.

Definition of ΨDEA is as follows: (A.3)ΨDEA={(y,x):y≤∑r=1qλryr, x≥∑r=1qλrxr, ∑r=1qλr=1, λr≥0, ∀r}.

As theoreticians have shown and used in the past the characteristic of errors in efficiency analysis, the simulations generated will incorporate those elements. These characteristics are (1)

statistical noise: v~N(0,σ2), i.i.d., v unrestricted;

(2)

inefficiency: u~N+(μ,δv2), i.i.d. half normal distribution where μ≥0.

Data variability is caused by statistical noise, measurement errors, and inefficiency. There errors can arise from either exogenous or endogenous variables such as poor management, economic growth, and environmental and sociological contributions.

The corrected frontier is defined such that the production possibility space will always be greater than that of the DEA spaces (see (P.2)). That is to say that there will always be room for improvement in efficiency which companies are always aspiring for. The following are the properties of the corrected frontier.(P.1)

if v≠0 and/or u>0, then DCF will be different from DEA. If v=0 and u=0, then the corrected frontier is also the DEA frontier.

(P.2)

DMUs on the DEA frontier are a subset of those on the DCF frontier; some DEA efficient units will appear inefficient in DCF, in which case, the frontier is shifted away from the PPS (i.e., expansion of the PPS).

(P.3)

ΨDEA⊆ΨDCF, where the radial contraction of inputs (or the radial expansion of outputs) can be improved for DEA efficient units.

Although we do not have formal proof of the convergence of the corrected frontier to the EFF, due to the convergence of the DEA estimator to the EFF, ΨDEA → ΨEFF when q→∞, shown in [24], we can conjecture about the convergence of the DCF that it also converges to the EFF as sample sizes increase.

Aigner

Lovell

C. A. K.

Schmidt

Formulation and estimation of stochastic frontier production function models

Journal of Econometrics 1977 6 1 21 37

MR0448782

10.1016/0304-4076(77)90052-5

ZBL0366.90026

Meussen

Van Den Broeck

Efficiency estimation from cobb-douglas production functions with composed error

International Economic Review 1977 18 435 444

Kneip

Simar

A general framework for frontier estimation with panel data

Journal of Productivity Analysis 1996 7 2-3 187 212

2-s2.0-21344455907

Sengupta

J. K.

Data envelopment analysis for efficiency measurement in the stochastic case

Computers and Operations Research 1987 14 2 117 129

2-s2.0-0023244499

Charnes

Cooper

W. W.

Deterministic equivalents for optimizing and satisficing under chance constraints

Operations Research 1963 11 18 39

MR0153482

10.1287/opre.11.1.18

Cooper

W. W.

Huang

S. X.

Satisficing DEA models under chance constraints

Annals of Operations Research 1996 66 279 295

10.1007/BF02187302

MR1409847

ZBL0864.90003

Kall

Stochastic Linear Programming 1976

Springer

Olesen

O. B.

Petersen

N. C.

Chance Constrained Efficiency Evaluation

Management Science 1995 41 3 442 457

Olesen

O. B.

Petersen

N. C.

Chance constrained efficiency evaluation

Management Science 1995 41 3 442 457

Charnes

Cooper

W. W.

Rhodes

Measuring the efficiency of decision making units

European Journal of Operational Research 1978 2 6 429 444

MR0525905

10.1016/0377-2217(78)90138-8

ZBL0416.90080

Land Kenneth

Lovell Knox

C. A.

Thore

Chance-constrained data envelopment analysis

Managerial and Decision Economics 1993 14 541 554

Cooper

W. W.

Seiford

L. M.

Tone

Data Envelopment Analysis: A Comprehensive Text with Models, Applications, References and DEA-Solver Software 2006 2nd

New York, NY, USA

Springer

Thompson

R. G.

Singleton

F. D.

Jr. Thrall

R. H.

Smith

B. A.

Comparative site evaluations for locating a high-energy physics lab in Texas

Interfaces 1986 16 35 49

Thompson

R. G.

Langemeier

L. N.

Lee

C. T.

Lee

Thrall

R. M.

The role of multiplier bounds in efficiency analysis with application to Kansas farming

Journal of Econometrics 1990 46 1-2 93 108

2-s2.0-44949288016

Charnes

Cooper

W. W.

Wei

Q. L.

Huang

Z. M.

Cone ratio data envelopment analysis and multi-objective programming

International Journal of Systems Science 1989 20 7 1099 1118

10.1080/00207728908910197

MR1000462

Tomlinson

G. A.

Confounder Measurement Error and Exposure Odds Ratio Estimation [M.S. thesis] 1993

Department of Community Health, University of Toronto

Ritter

Simar

Pitfalls of normal-gamma stochastic frontier models

Journal of Productivity Analysis 1997 8 2 167 182

2-s2.0-0001479142

Forrester

Anderson

T. R.

A new technique for estimating confidence intervals on DEA efficiency estimates

Proceedings of the International Conference on Technology and Innovation Management (PICMET '99)

1999

Seattle, Wash, USA

Hogg

R. V.

Craig

A. T.

Introduction to Mathematical Statistics 1970 3rd

New York, NY, USA

The Macmillan

MR0251823

Birge

J. R.

Louveaux

Introduction to Stochastic Programming 1997

New York, NY, USA

Springer

MR1460264

Allen

F. M.

Braswell

R. N.

Rao

P. V.

Distribution-free approximations for chance constraints

Operations Research 1974 22 3 610 621

MR0437038

10.1287/opre.22.3.610

ZBL0284.90064

Cockburn

I. M.

Henderson

R. M.

Stern

Untangling the origins of competitive advantage

Strategic Management Journal 2000 21 10-11 1123 1145

2-s2.0-0000063190

Simar

Wilson

P. W.

Statistical inference in nonparametric frontier models: the state of the art

Journal of Productivity Analysis 2000 13 1 49 78

2-s2.0-0034393216

Kneip

Park

B. U.

Simar

A note on the convergence of nonparametric DEA estimators for production efficiency scores

Econometric Theory 1998 14 6 783 793

10.1017/S0266466698146042

MR1666696

Simar

Wilson

P. W.

Sensitivity analysis of efficiency scores: how to bootstrap in nonparametric frontier models

Management Science 1998 44 1 49 61

2-s2.0-0031677963

Cooper

W. W.

Huang

Lelas

S. X.

Olesen

O. B.

Chance constrained programming formulations for stochastic characterizations of efficiency and dominance in DEA

Journal of Productivity Analysis 1998 9 1 53 79

2-s2.0-0042636786

Charnes

Cooper

W. W.

Lewin

A. Y.

Seiford

L. M.

Data Envelopment Analysis, Theory Methodology and Applications 1994

Kluwer Academic Publishers

Banker

R. D.

Charnes

Cooper

W. W.

Some models for estimating technical and scale efficiency in data envelopment analysis

Management Science 1984 30 9 1078 1092

2-s2.0-0021497874