A Data Envelopment-Based Clustering Approach for Public Sugar Factories in Privatizing Process

Turkish Sugar Inc., a public enterprise including 25 factories, is the first corporation of Turkish industry. According to the government policy, public sugar factories PSFs will be privatized as geography-based 6 portfolio groups in two years. As performance measures of PSF affect government, sugar producers, and several unions in privatizing process, a systematic approach is necessary to measure efficiencies and grouping factories. This paper uses a new DEAData Envelopment Analysisbased clustering approach for measuring efficiency scores of PSF and grouping them instead of geographybased portfolio groups. This new approach can help decision makers in privatizing process. At the same time, target values obtained by dual model can be used to eliminate inefficiencies of some PSFs.


Introduction
Sugar factories are the first corporations of Turkish industry.The first sugar factory was established by the direction of Kemal Ataturk in Alpullu in 1926.Annual sugar demand of Turkey which is supplied by three different sugar producers is 2.3 million ton.These producers are Turkish Sugar Inc., a public enterprise including 25 factories, Pankobirlik beet producers union with 6 factories, and starch-based sugar producers with 5 factories.The market share of these producers is 70%, 20%, and 10%, respectively.Turkish sugar Inc. and Pankobirlik use beet to produce sugar instead of starch.

DEA-Based Clustering Approach
Conventionally, most clustering algorithms are procedures that minimize total dissimilarity; examples of such algorithms are given in the paper of Po et al. 1 .
A general clustering method is to find c cluster centers z 1 , z 2 , . . ., z c so that the total dissimilarity measure J s z with J s z c i 1 n j 1 a ij f d x j , z i is minimized.J s z is usually defined as a distance-based function, and the problem here is to select a useful and reasonable distance measure d x j , z i .
On the other hand, the stated clustering approaches can be seen as a feature analysis technique.An assumption of the underlying feature analysis is to regard the feature items A 1 , A 2 , . . ., A s as multiple features so that the minimization of J s z presents the closer of data among their features and makes it more possible for these DMUs to be classified into the same cluster.However, the clustering results derived from the minimization of the total feature dissimilarity J s z may not be helpful in some cases of clustering DMUs, especially in production units.In these cases, we use their production data to cluster them.Suppose that the production data have feature items A 1 , A 2 , . . ., A k , A k 1 , . . ., A s with A 1 to A k being input items and Mathematical Problems in Engineering

3
A k 1 to A s being output items.The clustering information obtained from the conventional clustering approaches can only reveal that DMUs are more similar to another one.However, the more important information we want to know is the production feature functions implied from the production data of all DMUs.That is, f j A 1 , A 2 , . . ., A k , A k 1 , A k 2 , . . ., A s 0. From these derived production functions, f 1 , f 2 , . .., all DMUs are classified into different clusters production functions .Therefore, each DMU knows not only the cluster that it belongs to but also knows the production function type that it confronts.Each DMU can compare its production feature with the other production functions so that the combination of its input resources or the combination of inputs and outputs can be readjusted.That is, for the case of data feature with input and output items, the cluster derived from production functions is more valuable than that derived from feature dissimilarity measures.
The idea of Po et al.'s study 1 is to employ the production functions to cluster production data.The method supporting this idea is DEA, as initiated and developed by Charnes et al. 2 .The DEA is a data-oriented method for evaluating the relative efficiency of DMUs where each DMU is an entity responsible for converting multiple inputs into multiple outputs.Since the fundamental of DEA uses the nonparametric mathematical programming approach to estimate piecewise frontiers and envelop the DMU data sets, in this study, each piecewise frontier is regarded as one cluster of production functions.Therefore, we use all piecewise frontiers as a base to cluster production data.That is, they give up traditional clustering approaches of feature dissimilarity and propose a new approach by adopting the production functions revealed by the observation data to cluster all DMUs.
DEA is a nonparametric method for the estimation of production frontiers.It is a useful tool for evaluating the relative efficiency for a group of DMUs.Up to now, DEA has been widely studied and applied in various areas for 30 years since Charnes et al. 2 first proposed the DEA method with the CCR model.Among them, the main forms of DEA models and their extensions include those of BCC model 3 , the additive model, 4 and the imprecise DEA models 5, 6 .Modifications and extensions are the assurance region models 7, 8 , superefficiency models 9, 10 , cone ratio models 11, 12 .Stochastic and chanceconstrained extensions are considered by some authors 13-17 .Taxonomy and general model frameworks for DEA can be found in 18, 19 .The CCR is the original model of DEA see the M1 model and is used in this study to explain the DEA-based clustering approach.The DEA model generalizes the usual input/output ratio measure of efficiency for a given unit in terms of a fractional linear program formulation.According to the economic notion of Pareto optimality, the DEA method states that a DMU is considered inefficient if some other DMUs or some combinations of other DMUs produce at least the same amount of output with less of the same resources input and not more of any other resources.Conversely, a DMU is considered Pareto efficient if the above is not possible.Suppose that there are n DMUs to be evaluated, x ij is the noted amount of the ith input for the jth DMU and y rj is the noted amount of the rth output for the jth DMU.Output multipliers are u 1 , u 2 , . . ., u s one for each item of output and input multipliers are v 1 , v 2 , . . ., v m one for each item of input .The mathematical formulation of the method is summarized next, where the relative efficiency of the DMU k is determined 20 .See the M1 model.

2.1
The DEA model is essentially a fractional programming problem with a ratio of a weighted sum of outputs to a weighted sum of inputs where the weights for both inputs and outputs are to be selected in a manner that calculates the efficiency of the evaluated unit.Therefore, the original form of the DEA model is both nonlinear and nonconvex problem.Charnes et al.21 proved that fractional programming problem can be transformed into linear programming formulations.The first formulation is "input based," constraining the weighted sum of outputs to be unity and minimizes the inputs that can then be obtained.The second formulation is "output based," constraining the weighted sum of inputs to be unity and maximizes the outputs that can then be obtained see the M2 model .Given constant returns to scale assumption, the result from the input-based model is the reciprocal of that from output-based model.If variable returns to scale are assumed, there is no direct relation can be found between these two models.
For the clustering approach used in this study, the results can be different for those PSFs which are not on the production frontier according to the way that input-based or output based model is applied.The choice of using an input-based or output-based model depends on the production process characterizing the firm i.e., minimize the use of inputs to produce a given output or maximize the output with given levels of inputs .The objective of this study is to find the set of coefficients associated with each output and input that will give the PSF being evaluated the highest possible efficiency by using the M2 model.Then, target values are calculated by using this model to eliminate the inefficiencies of some PSFs.

2.2
DEA differs from the production theory of economics in that it is nonparametric.In economics, the production function is a function that summarizes the process of converting multiple inputs into a single output.Thus, a general mathematical form for the production function in economics can be expressed as y f x 1 , x 2 , x 3 , . . ., x n , where y is a quantity of output and x 1 , x 2 , x 3 , . . ., x n are quantities of inputs.However, DEA is a nonlinear programming model for evaluating a process converting multiple inputs into multiple outputs, that is, g y 1 , y 2 , y 3 , . . ., y n f x 1 , x 2 , x 3 , . . ., x n .Most previous studies had mentioned and discussed the properties of production function that are hidden in DEA methods 8-10, 14, 15, 17, 22 .
Since the number of DMUs is usually much larger than the number of inputs, we prefer to express the linear programming in its duality form.Further, the duality form can interpret the geometric meaning of DEA and provide information about conservation of resources or expansion of outputs to have DMUs from inefficiency to efficiency.
If Eff * k is the optimal value of Eff k , the DMU k is said to be efficient if and only if Eff According to the efficiency ratio, DMUs may be grouped as good Eff * k 1 and poor Eff * k < 1 performers or clustered by assigning different efficiency ratio grades 23-27 .Although clustering by efficiency ratio gives some information about the rationality of output/input, it does not reveal the intrinsic relationship between the input and output production features.Therefore, this study adopts piecewise production functions derived from the DEA method to cluster data.
In M2 model, it is obvious that the constraint s r 1 u r y rj − m i 1 v i x ij ≤ 0 is an inequality formula of production functions.Solving M2 model yields the virtual multipliers u * r and v * i .Thus, s r 1 u * r y rj − m i 1 v * i x ij 0 is derived.Running M2 model for k 1 to n gives all production functions.Then, all DMUs are classified into different clusters by these piecewise production functions.Thus, a clustering method using production functions via the DEA method is implemented.Po et al. 1 find that there is less consideration in using these production functions as a reference to classify evaluated DMUs, and they propose a clustering approach according to the properties of DEA and its production possibility set such that they can use these production functions as a reference to classify evaluated DMUs.The details about the algorithm used in which the DEA-based clustering method is applied are given in their paper.

DEA-Based Clustering of PSF
In this study, we have an efficiency evaluation problem with 25 PSFs DMU , each PSF with three inputs and one output obtained by 2009-2010 annual activity reports.Actually processed beet quantity PBQ , fuel consumption FC , number of total personnel TP , sugar production SP , and molasses production MP data are placed in annual activity reports of PSF, and all of them are real and correct.PBQ, FC, and TP are considered as inputs.Only SP is selected as output because it is correlated with MP.
The simplified production data of PSF are shown in Table 1.This table shows the required quantity of inputs to produce one unit of one metric ton sugar.For example, PSF 22 uses 9.96 ton beet, 0.419 ton fuel, and 0.0213 personnel to produce one ton sugar according to Table 1.
The objective is to find the set of coefficients u's associated with each output and v's associated with each input that will give the PSF being evaluated the highest possible efficiency.By using the M2 model for each PSF its efficiency ratio Eff k and the solution of virtual multipliers, v * 1 , v .By taking the maximal value, the efficiency ratio for PSF 7 is re-evaluated as 0.6307.In addition, PSF 7 is classified into the cluster determined by the corresponding envelope y 0.142x 1 0.09x 2 5.36x 3 PF1 .
In this study, some PSFs achieve 100 percent efficiency and are referred to as the relatively efficient units, whereas other units with efficiency ratings of less than 100 percent are referred to as inefficient units.According to the results of Table 2, there are six efficient PSF1, PSF8, PSF11, PSF14, PSF21, and PSF25 and 19 inefficient PSFs.The 5 PSFs out of 19 have greater than or equal to 0.95 efficiency ratio.Additionally, the net revenues of PSF are supported by the DEA results.According to four different production functions, 25 PSFs are classified into four clusters.Clustering results are shown in Table 3.
By considering PF1 , PF2 , and PF3 , TP is the most critical input for the PSF in clusters 1, 2, and 3.The multiplier of TP has the biggest value for these clusters.The relative increase in efficiency is 5.36 with each number reduction of TP for the inefficient PSF placed in clusters 1 and 3. Similarly, the relative increase in efficiency is 1.415 for the inefficient PSF placed in cluster 2 see Table 3 .The order of multipliers for other inputs changes.For example, for the PSF in cluster 2, the multipliers of FC and TP are similar and higher than PBQ.On the other hand, FC is the most critical input for the PSF in cluster 4. The relative increase in efficiency is 1.018 with each ton reduction of FC for the inefficient PSF placed in cluster 4 see Table 3 .DEA and geography-based clustering results are compared in Table 4.
As you can see in Table 4, DEA-based clusters contain different geography-based portfolio groups 1-E, C, D , 2-A, B, D , 3-C , and 4-A, B, E .Moreover, the clustering  results derived from geography-based portfolio may not be helpful in cases of clustering PSF.From the derived production functions PF1 , PF2 , PF3 , and PF4 , all PSFs are classified into four different clusters production functions .Therefore, each PSF knows the PF type that it confronts.Additionally, each PSF can compare its production feature with the other production functions so that the combination of its input resources or the combination of inputs and outputs can be readjusted.That is, for the case of data feature with input  5 for the inefficient PSF.Target values can help decision makers to eliminate the inefficiencies.For example, 9.96 ton beet, 0.419 ton fuel, and 0.0213 personnel are used to produce one ton sugar in PSF 22.When 6.477 ton beet, 0.273 ton fuel, and 0.0014 personnel are used to produce one ton sugar, PSF 22 becomes efficient.

Conclusions
This study develops a DEA-based clustering approach for the evaluation of PSF.The proposed approach employs the piecewise production functions derived from the DEA method to cluster the data with input and output items.Compared with geography-based clustering that only considers geographical location of PSF, our proposed approach reveals the input-output relationships hidden in the data items of input and output.Thus, for each evaluated PSF, we know not only the cluster that it belongs to but also the production function type that it confronts.It is very important for managerial decision making where decision makers are interested in knowing the changes required in combining input resources so that it can be reclassified into a different and desired cluster/class in privatizing process.
The focus of this paper is to examine the CCR model of DEA and then establish the DEA-based clustering.Without loss of generality, while this approach has been carried out for the CCR model, the proposed approach can be easily extended to other DEA models.The clustering results drawn from the DEA-based clustering are unit invariant, meaning that they are not affected by the scale of data.
The DEA-based clustering approach is suitable for most clustering problems, where there are inputs-and-outputs or cause-and-effect relationships between the features.For example, we use the proposed approach in the analysis of industry classification, sorting of PSF by input-output data.
In summary, in view of the advantages of the DEA-based clustering approach, it is uniquely poised for clustering problems.We believe that future researches are necessary to unleash the full potential of this DEA-based clustering approach.It thus has tremendous potential to be used for various clustering problems.DEA-based clustering algorithm developed by Po et al. 1 is robust to a slight change in the input and output data sets, but not to outliers.Future researches will consider developing a robust-type DEA-based clustering algorithm.

Table 1 :
, * * , * * * in Table 2 confront the degenerative frontier.Po et al. 1 suggest that they should be reclassified into the nearest effective frontier the frontier with nonzero virtual multipliers .In this application, it is observed that PSFs with * confront the nearest effective frontier y 0.142x 1 0.09x 2 5.36x 3 , thus their efficiency Production data of PSF. *

Table 2 :
Analytical results derived from M2 model for PSF.They are reevaluated by y 0.142x 1 0.094x 2 5.36x 3 , because of zero virtual multipliers.They are reevaluated by y 0.101x 1 1.181x 2 1.415x 3 , because of zero virtual multipliers.They are reevaluated by y 0.109x 1 1.018x 2 0.978x 3 .because of zero virtual multipliers.

Table 4 :
DEA-/Geography-based PSF clusters.These PSFs are out of geography-based portfolio according to government policy.

Table 5 :
Input target values of PSF.