An Endogenous Project Performance Evaluation Approach Based on Random Forests and IN-PROMETHEE II Methods

1 School of Management Science and Engineering, Central University of Finance and Economics, Beijing 100081, China 2 School of Civil Engineering, Tsinghua University, Beijing 100084, China 3 China Economics and Management Academy, Central University of Finance and Economics, Beijing 100081, China 4 School of Statistics and Mathematics, Central University of Finance and Economics, Beijing 100081, China


Introduction
Performance evaluation of an infrastructure project is a comprehensive and systematic process aiming to identify the fundamental causes for the repeated issues and clarify the impact of the infrastructure project on the regional development.In the scenario of assessing multiple infrastructure projects and identifying the best or poorest alternative project, it is necessary to incorporate multicriteria decision aided methods into the project performance evaluation so as to give an overall ranking result and supervise project selection.Generally, the key steps of multiple-project performance evaluation from multicriteria decision aided method include scoring each individual project criterion, attributing weights to the project criteria, and comprehensively ranking alternative projects by utilizing the scores and weights from the previous steps.Due to the subjective bias occurring in most of the previous approaches, the procedures of project performance scoring and criterion weight determination deserve further improvement.
Abundant work has been attempted to accomplish the project performance scoring by fuzzy multicriteria decision method (fuzzy MCDM) [1], neuron model [2], Delphi scoring method, and so forth.In fuzzy MCDM approach, evaluators' judgments on projects are presented by linguistic values such as "very satisfied, " "not satisfied, " and "fair." Then, they are translated into triangular fuzzy number within the scale range of 0-100, which serves as the basis for the comprehensive performance evaluation.However, the subjectivity of evaluators' experience involved in performance scores may lead to bias.Data envelopment analysis model can yield pairwise efficiency scores for projects out of entirely objective information [3], which makes it possible to rank the alternative projects by comparing their pairwise efficiencies.Nevertheless, this scoring model only turns out the relative superiority among projects on the whole.Moreover, no information about the key factors influencing the performance of the projects is provided.Besides, the inappropriate selection on evaluation criteria by decision maker also imposes unfavorite impact on the scoring process.Another aspect, for other scoring models like neuron networks, a great amount of sampling data is required to train the model to obtain better scoring ability, whereas the sample data collected from infrastructure projects is usually rather limited in practice.Therefore, the scores given by these models may deviate much from the reality.In addition, their detailed approach quantifying the performance with an exact numerical value may be inappropriate to handle the vagueness and uncertainty, which is inherent in decision-making process for real projects.
In the process of assigning weights to evaluation criteria, the frequently used methods can be divided into two categories, subjective and objective approaches [4].Traditional methods for weight assigning such as Delphi method [5] heavily rely on experts' definitions of quantified weights to attributes.To avoid the strong reliance on experts' experience, integrated approaches combining subjective information and mathematical calculation are proposed, such as eigenvector method [6], analytic hierarchy process [7], and weighted least square method [8].These methods objectively assign weights to criteria in accordance with the pairwise comparison matrix of relative importance for each criterion.However, the uncertainty brought by the decision makers' choice of criteria set and their preference towards certain evaluation criteria still imposes fundamental influence on weight determination.The objective approaches, based entirely on objective information, include principle element analysis [9], discrimination analysis [10], and mathematical programming method [11].Nevertheless, the observation data for infrastructure projects is usually of high dimension with nonlinear and complex interactions among variables.Thus, classical statistical methods can hardly provide meaningful analysis for such data.
To efficiently handle the problems mentioned above, the Random Forests and PROMETHEE II incorporated with interval numbers (IN-PROMETHEE II) are employed to develop an endogenous approach for objective project performance evaluation in this paper.In Section 2, the two major procedures in the proposed methodology of performance evaluation, Random Forests for single-valued performance scoring and IN-PROMETHEE II method for alternative ranking, are demonstrated.The primary role of Random Forests is to turn out more reliable prediction values of each criterion as the single-valued performance scores.Furthermore, IN-PROMETHEE II method facilitated with weight determination serves to measure preference index from interval-valued performance scores and determine the weight of each criterion from the importance of each criterion, which is inferred from the deviation values of alternatives.In Section 3, a case study utilizing the above method is further presented.The obtained results are in well correspondence with the reality in the project, which well validated the proposed approach.In Section 4 some comments of this method and further application are further concluded.Comparing to former studies, the approach proposed herein for performance evaluation integrating Random Forests and IN-PROMETHEE II is more realistic, reasonable, and free from subjectivity.Additionally, it serves well in tracing the influential factors for project performance.

Methodology for Performance Scoring and Alternative Ranking
2.1.Random Forests for Single-Valued Performance Scoring.
As a powerful statistic model [12], Random Forests method is widely utilized in the field of data mining for classification and regression in ecology, biochemistry, geology, and even technologies in facial recognition [13][14][15][16].It is not sensitive to data with "small  large " problems (the number of observation data is rather smaller than the number of evaluation criteria).Due to this favorable advantage, this statistic model is more effective than neuron models in objective performance scoring.Besides, it was proved that Random Forests method is more accurate than other widely used methods including partial-least-squares regression, support vector machines, and artificial neutral networks in prediction [14].Therefore, Random Forests method is very suitable to score initial project performance for the cases that involve multiple infrastructure projects with multiple evaluation criteria.
Single-valued performance scoring process is designed to evaluate criteria referring to a number of factors that quantify the project's performance from various aspects.The set of investment projects is denoted as {  } ( ∈  + ).And the set of  criteria is denoted as {  } ( = 1, 2, . . ., ).Since the future performance of a project is the result of the development throughout the previous years, the predicted value for the coming year under criterion   serves as a good indicator for project performance.Random Forests regression is employed herein to predict the future performance.Moreover, the predicted values are treated as the initial performance scores for criteria.
Random Forests method is widely used for classification and regression, which usually goes through the following procedures. bootstrap samples, each of which forms a tree, are first obtained by bagging for an original set with  samples.Secondly, to train the trees with the ability to predict the value of the targeted criteria,  = /3 criteria are randomly chosen at each time from the original sample set to form a new criteria set.Then, the best criteria set is selected to split the node of a tree, which serves as a training process.After training, original data is further input into the trained model for regression.And the mean value of the output from each tree is considered as the result of regression.In this section, Random Forests method is utilized to predict the output value for all criteria as the single-valued performance scores.
Without loss of generality, it is assumed that there is a set of infrastructure projects {  }.Within a time span of  years, annual field-survey data for a certain project form the original performance data set.The original performance data set is denoted as {(   ,   +1 )} ( = 1, 2, . . ., ), where    = { 1 ,  2 , . . .,   } is the evaluation vector consisting of numerical values for each performance evaluation criterion in the th year.And   +1 is the output value of the targeted criterion in the next year, which serves as a point estimate value of the performance score for a certain criterion   in the th year.The single-valued performance scoring process for project   includes two steps, which are detailed as follows.
Step 1: Bagging to Obtain Evaluation Sets.Bagging is a method specifically designed to improve the scoring process by drawing several training sets of the same size from the original performance data set with replacement.Each training set forms the bases for a tree to grow.After more training sets are drawn from the original data set, Random Forests method is formed.
To obtain a training set to form a single tree in the forest,  samples are randomly chosen from the original performance data set {(   ,   +1 )} with replacement.Then, an evaluation set is obtained and further denoted as {(   ,   +1 )} ( = 1, 2, . . ., ).It contains the information about the performance scores corresponding to the values of evaluation criteria as well as the information to monitor splitting nodes of the tree growing on this evaluation set.Therefore, such an evaluation set serves well as a training set for a tree to grow.
Step 2: Random Forests for Performance Scoring.After evaluation sets are chosen by the former Step 1, the bases for trees in the Random Forests are formed.Moreover, the forests need to be trained before carrying out the scoring process.During the training process, the most important step is to find the best way to split the evaluation set {(   ,   +1 )} at each node of a tree.As the range of the performance score   +1 is discretized into several intervals, how the evaluation set is split at a certain node determines the interval that the performance score   +1 will fall into.That is the reason why this step is very important.
To find the best criterion to split each node, additional randomness is introduced into this model during the process of node splitting.That is to say, an evaluation criteria set is drawn for each tree to split, which is a random sample of  criteria.Typically,  is adopted as /3.After all the allowable criteria in the evaluation criteria set for splitting are explored at each node, all the ways to split the node are evaluated with the Gini impurity [17], which is a measure of the frequency about the performance score for a randomly chosen sample from the evaluation set covered by the wrong interval of performance score.Gini impurity is calculated by summing the probability   of each sample in the evaluation set correctly covered by the corresponding interval   times the probability 1 −   of incorrectly scoring that sample.At the node , the Gini impurity is given as Finally, the criteria with the minimum Gini impurity are chosen to split that node .
The training process for Random Forests is accomplished by generating an ensemble of regression tree and then splitting each of the trees without prune in the way described in the former paragraph.The unpruned trees can be further considered as the bases for regression, which ultimately turn out numerical estimations of performance scores for the targeted criterion.For a given evaluation vector   +1 , different trees will turn out different performance score   .Then, Random Forests having  trees will make a prediction value  = (1/) ∑  =1   .After that, the regression process is accomplished with the yielded value of  as the singlevalued performance score for project   under the criterion   .Finally, the bagging and scoring process are finished.
Usually, there is another step of testing the generalization error (GE), which is adopted to quantify the Random Forests' ability to correctly score the project.The upper bound of generalization error GE in Random Forests is given as in which  and str denote the correlation between trees and the strength of the forest, respectively.The strength str of the forest is calculated as str = (  − max   * ), where   and   * represent the probability that the sample  in the evaluation is scored by a tree in the forest correctly and incorrectly, respectively.For detailed proof and calculation, one can refer to [12].However, the generalization test is redundant in this approach for the next step offsetting the generalization error of Random Forests in single-valued performance scoring.In the following procedure of determining criteria weights and ranking multicriteria project, the single-valued performance scores will be converted into interval-valued performance scores to reduce the uncertainty in performance scoring, which is brought by the relatively short observation period and inadequate statistics for project assessment.

IN-PROMETHEE II Method for Criteria Weights Determination and Multicriteria Ranking.
Based on the single-valued performance scores achieved by Random Forests method, IN-PROMETHEE II facilitated with reasonable weight determination is further developed from typical PROMETHEE II to accomplish the weight assigning and performance ranking procedures.It is well known that PROMETHEE II method is greatly adapted to the problems of selecting the most desirable project from a finite number of alternatives [18], although the projects are evaluated with considerable conflicting criteria [19][20][21].The fundamental procedure of PROMETHEE II is to determine evaluative difference of alternatives from pairwise comparisons under each criterion, which is then translated into single-criterion preference index by a preference function.Single-criterion outranking flows for alternative  are calculated by aggregating preference indices of all the other alternatives over .Symmetrically, sum of preference indices of  over all the other alternative gives single-criterion outranked flows.Then the singlecriterion net flows, which indicate the difference between single-criterion outranking and outranked flows, can be obtained.Furthermore, criteria weights are also incorporated to derive the multicriteria net flow, which is the weighted average of all single-criterion net flows.Based on comparison of multicriteria net flows of alternatives, the procedure finally comes to an end with a complete ranking.
Herein the regular PROMETHEE II algorithm is modified in two ways.The input data, namely, original performance scores, is first converted to interval numbers to handle inherent uncertainty in performance assessment.Moreover, a distance measure of interval numbers is introduced to provide evaluative differences from pairwise comparisons.Secondly, weights of criteria are determined endogenously by utilizing their own deviation values among alternatives, which are calculated from single-criterion net flows.The idea thinking that the criteria with a large deviation value among alternatives should be assigned with a large weight is adopted in this approach.
Detailed procedures of IN-PROMETHEE II are presented step by step as follows.Before weight determination, the single-valued performance scores are firstly transformed into intervals to make the evaluation results more realistic.After that, the single-criterion preference index is derived from the distance measure of interval-valued performance scores and then is used for calculating single-criterion outranking flow.The weights of criteria can be further obtained through the deviation values of alternatives.Then, the rank of the alternative projects can be yielded by comparing the weighted average of single-criterion outranking flows.
Step 1: Interval Numbers and the Distance Measure.In the previous section, the performance scores of project alternatives are derived for each criterion.Considering the relatively short observation period and inadequate statistics for project assessment, the obtained single-valued performance scores often contain a relatively high uncertainty for most cases.To properly address the imprecision inherent in the performance evaluation of projects, the concept of interval numbers is introduced to represent the vagueness for a more realistic ranking.For more details about the definitions of interval number sets and basic operations, one can refer to [22].
In this study, the derived single-valued performance scores   for alternative   under criterion   by Random Forests method are converted into interval numbers within certain fuzzy boundary .After choosing the boundary of fuzzy interval , the single-valued performance scores   can be transformed into the interval number ũ , and ũ = The interval number ũ is also named as the interval-valued performance score as follows.
One fundamental principle of IN-PROMETHEE II methods is that decision maker's preference of one alternative over another is determined by the difference between their quantitative assessments.Therefore, in terms of the aforementioned interval-valued performance scores, quantification of evaluative difference requires a distance measure of interval numbers, which is presented as follows.
Herein ũ = [  ,   ] and ũ = [  ,   ] are assumed as two interval-valued performance scores.After integrating the difference between points over all possible values in each interval, the evaluative difference d(ũ  , ũ ) between ũ and ũ can be defined as [23] d (ũ  , ũ ) = (∬ which can be further arranged as In the above equations ( 3) and ( 4),  and  serve as the integral variables.The evaluative difference d(ũ  , ũ ) is justified as a proper distance measure if three measure axioms on d(ũ  , ũ ) are satisfied, which are (1) strict positivity, (2) symmetry, and (3) triangle inequality.The condition (3) has been proved by Minkowski's integral inequality [24].Furthermore, proof of conditions ( 1) and ( 2) follows immediately.
Step 2: Single-Criterion Preference Index.In order to specify decision maker's preference and facilitate ranking alternatives, the evaluative difference d(ũ  , ũ ) in pairwise comparison of alternatives needs to be transformed into a preference degree within a uniform range.Quantifying the preference requires two additional kinds of information.First, the value of an indifference threshold  has to be fixed.Introduction of indifferent threshold is justified by the fact that minute difference usually has no effect on preference.Decision makers usually choose their upper bound for indifference as corresponding threshold .Then, the determination of the preference index of   over   in relation to criterion   is started to check whether the inferior possibility (ũ  ≤ ũ ) of   over   is greater than .The inferior possibility (ũ  ≤ ũ ) is calculated as Second, the form of preference function has to be defined.In order to facilitate the selection of a specific preference function, six basic types have been proposed as follows [25]: (1) usual criterion, (2) U-shape criterion, (3) V-shape criterion, (4) level criterion, (5) V-shape with indifference criterion, and (6) Gaussian criterion.Herein, Gaussian criterion function is chosen for the form of preference function   (⋅), which is given in terms of parameter  as From the characteristics of normal distribution in statistics,  equals the value of input data corresponding to the inflection point of Gaussian preference function.From this consideration, the value of  can be easily determined by the decision maker [25].In the following empirical study,  is adopted as 0.035.After the indifference threshold  and the form of preference function   (⋅) are determined, the singlecriterion preference index   (  ,   ) ∈ [0, 1] of   over   under criterion   can be further defined as Step 3: Single-Criterion Net Flows.The preference index   (  ,   ) provides only an intensity measurement for the preference of alternative   over another   but not the overall preference intensity of   .In order to quantify the total outranking character of alternative   under the criterion   , the single-criterion outranking flow  +  with respect to   is introduced as the summation of preference indices of   over all the other alternatives: Symmetrically, the single-criterion outranked flow  −  can be defined as the aggregation of preference indices of all the other alternatives over   : The single-criterion outranked flow  −  indicates how much alternative   is dominated by all other alternatives and thus gives the weakness of   in relation to the criterion   .The difference between  +  and  −  , indicating the net dominance power of   corresponding to criterion   , is named as singlecriterion net flows   : Step 4: Weight Determination Based on Standard and Mean Deviation.In multicriteria decision making, one important way to avoid subjective bias is the objective weights determination.Considering that the importance of a criterion can be inferred from the deviation of outranking characters of performances over all alternatives, the standard and mean deviations method has been proposed to deal with this objective weight-determining problem [26].If single-criterion net flows of all alternatives corresponding to a certain criterion are remarkably different from others, such a criterion plays an important role in the priority procedure.On the contrary, if one criterion has similar single-attribute net flows across alternatives, it deserves little importance in the priority procedure.According to the criteria importance determined in this way, the weights can be reasonably assigned.
The deviation of single-criterion net flows among alternatives can be quantified by standard deviation or mean deviation.Herein, a combination of mean and standard deviations is adopted to balance the two aspects at the same time by decision makers.For criterion   , the mean deviation MD  and standard deviation SD  for all corresponding singlecriterion net flows are derived as Considering that the mean value of all single-attribute net flows under   is zero, Equation ( 11) for MD  and SD  can be further arranged as Then, the aggregated deviation AD  is introduced as the linear combination of mean deviation MD  and standard deviation SD  : in which  and ] are the weights given by decision makers from their own consideration.
To assign larger weights to criterion with larger aggregated deviation deserving more importance, the optimal weight vector  * = ( * 1 ,  * 2 , . . .,  *  ) is established by maximizing the weighted sum of aggregated deviation over all criteria: Solving the above equation ( 15) by the Lagrange function, the optimal weight vector  * is obtained as It is noted that the derived attribute weights herein are entirely based on the measure of outranking characters of alternatives.Therefore, objectivity is achieved in criterion weight determination for project performance evaluation.
Step  out alternatives can be further achieved by ranking the multicriteria net flows.
Aggregating single-criterion net flows of alternative   for all criteria by the derived optimal criteria weights  * = ( * 1 ,  * 2 , . . .,  *  ), the multicriteria net flows (  ) for all alternative can be yielded as Via the IN-PROMETHE II, the complete preorder {≻, ∼} is obtained by ranking the net flows: Alternatives with higher value of net flow are ranked higher.And identical net flows indicate no difference between the two alternatives.In summary, an endogenous approach for performance scoring, criteria weight assigning, and project ranking through the Random Forests and IN-PROMETHEE II methods is proposed.The detailed process for the project performance evaluation approach proposed herein is given in Figure 1.Bagging is first carried out to obtain bootstrap sample set, on which trees grow in the forests.After inputting original data for regression and training the trees, the mean value of all trees' output is treated as the single-valued performance scores.The scores are further converted into interval numbers, from which the evaluative difference between different alternatives is calculated.By applying the Gauss preference function, the obtained difference is translated into preference index.After that, the outranking and outranked flows of each alternative corresponding to each criterion can be calculated.Then, the single-criterion net flows can be derived for each criterion.Furthermore, the weight for each criterion can be assigned after measuring the deviation of single-criterion net flows.Finally, the ultimate ranking of alternatives is determined by the multicriteria net flow, which is obtained by aggregating all net single-criterion flows over criteria weights for each alternative.To validate the applicability of the proposed approach, an empirical study on the Ningxia Roads Development Project is further caPlease rried out in the next section.The criteria described in Table 1, which can be divided into three categories, capture the main aspects of public wellbeing closely related to the project.GDP growth rate ( 1 ), GDP per capita ( 2 ), and growth of per capita income of farmers ( 3 ) are economic welfare metrics.Moreover, social wellbeing is mainly quantified from the viewpoint of food consumption ( 4 ), school attendance ( 5 - 6 ), and health care ( 7 - 10 ).In another aspect, percentage of villages with all-weather roads ( 11 ) generally indicates the mobility of villagers after development in transportation.Average time to reach clinic/hospital, market, and primary/secondary schools ( 12 - 14 ) in the observing period is capable of reflecting how much time cost is saved for villagers to accept public ) also capture the project's impact on road safety.It is worth noting that selection of criteria should take the difference in the size of three-subproject area into consideration.Therefore, per capita statistics are reasonably adopted in this empirical study.

Initial Performance Scoring.
In order to obtain the initial performance scores for a project, Random Forests regressors need to be trained before regression.The Random Forests package in  system, initially developed by Brieman and Cutler, is utilized to accomplish this mission.The number of trees  is set as  = 500.Moreover, at each split the number of variables randomly sampled as candidates is adopted as  = 7.As explained in the previous section, the performance evaluation of a road project   under one certain criterion   in the year  greatly depends on the development of all the factors in the previous year.So, performance scoring data of the three roads  1,2,3 under each criterion in the year  − 1 are chosen as the input vector for the training set.After sufficient training, the performance estimation for road projects  1,2,3 under criteria  1 - 19 in 2012 can be obtained by inputting overall performance data of all criteria in 2011.The derived results are listed in Tables 5 and 6.

Comparative Analysis through the IN-PROMETHEE II
Ranking Method.To obtain nondimensional form of original data, every performance score under each criterion is divided by the mean value of performance scores for that criterion concerned.In this way, the performance score data for each one of the 19 criteria is standardized.For the criteria including  6 ,  10 ,  12 - 14 ,  18 , and  19 , smaller performance value is better.Therefore, the corresponding performance values for these criteria, called minimum criteria, need to be replaced by their opposite number.Moreover, the fuzzy boundary  is adopted as 0.15 for construction of interval representations.After calculation of superior probability and distance measure through (4) and ( 5), the preference index for each pair of subprojects under each criterion is achieved by (7).Aggregation calculation presented in ( 8)- (10) further gives the single-criterion net outranking flows, as showed in Tables 7 and 8.
In each column of Tables 7 and 8, the single-criterion net outranking flows characterize the discriminating ability of each criterion.Through the proposed deviation-maximizing method formulated in ( 13)-( 16), weights are then assigned to all criteria.The criterion with zero single-criterion outranking flows for each subprojects is attributed zero weights and thus invalid in ranking subprojects, indicating that the IN-PROMETHEE II method can also help to identify useful criteria for evaluating subprojects.Therefore, the criteria finally counted in comparing performance of subprojects are listed in Table 9.
The selected criteria with nonzero weights cover all the three main aspects, economic benefit, social wellbeing, and transport development.According to ( 17) and ( 18), multicriteria flows can be further derived by aggregating singlecriterion flows with optimal weights, as given in Table 10.From Table 10, the ranking of the three subprojects is clearly demonstrated.The subproject  2 obviously shows a good superiority over both  1 and  3 .Moreover, the subproject  1 performs slightly better than  3 .

Ranking Results
Analysis.The performance differences among the three subprojects are critical in the ultimate ranking.To further analyze the result ranking, the dominating criterion and dominated criterion are introduced.The dominating criterion with respect to one subproject is defined as the one giving the highest single-criterion outranking flows.Similarly, the dominated criterion is defined as the one ranking the lowest single-criterion outranking flows for the corresponding subproject.The identified dominating and dominated criteria for the concerned three subprojects are showed in Table 11.
By comparing the dominating and dominated criteria, some important information about each subproject can be detailed.For Tongxin-Guyuan subproject ( 1 ),  1,18 = 1.32 and  1,19 = 1.50.It is indicated that the traffic safety conditions are more important for  1 .Traffic regulations and management in Tongxin-Guyuan subproject ( 1 ) possibly work better.Guyuan-Shizi subproject ( 2 ) has the largest number of dominating criteria and the smallest number of dominated criteria.The project area for Guyuan-Shizi subproject ( 2 ) stands out due to the criteria about GDP per capita, average time to school, number of bus routes, and passenger cars per capita, indicating a strong economic  much to the mediocrity of its performance.In another aspect, although the number of rural clinics and hospitals per capita for Shizi-Yanchuanzi subproject ( 3 ) is the largest, infant mortality for  3 is also the highest among the three subprojects.It reveals the existence of contradiction in health care system for that project area.After excluding the indicators with no discrimination ability, the performance ranking results can be further explained by tracing the key factors affecting the performance of each road.For the Tongxin-Guyuan subproject ( 1 ), the low traffic accident frequency and traffic accident death rate indicate an improved traffic safety condition, which may be attributed to the construction of the road subproject.However, as suggested by the scanty per capita rural clinic and hospital number in that area, the project has little positive effect on the relatively poor public health system.Besides, its overall development is mediocre from the per capita GDP.The ranking for subproject  1 is slightly better than the Shizi-Yanchuanzi subproject ( 3 ).For the Guyuan-Shizi subproject ( 2 ) winning the highest rank among the three roads, its high single-criterion outranking flows for the criteria including per capita GDP, average time to school, and the number of bus routes and passenger cars per capita indicate the great improvement in local economy and transportation mobility, while, for the Shizi-Yanchuanzi subproject ( 3 ), the poorest performance ranking is largely attributed to its high traffic accident frequency and infant mortality as well as insufficient public transport service.
Through comparing outranking flows and analyzing the reasons behind ultimate ranking results, comparative analysis greatly facilitates further investigation on related government policy and transportation industry research.In addition, it also helps the infrastructure investments to attain the designed objects and contribute to the project area development in a larger scale and longer time span.

Concluding Remarks
For project performance evaluation and attribute weights assigning, traditional approaches are prone to subjective bias.Besides, the fact that observation data available to road project assessment is often insufficient against considerable performance criteria selected by the decision maker is usually ignored.To handle these problems efficiently, a quantitative approach for scoring the performance of each project under certain criteria is proposed through the Random Forests and IN-PROMETHEE II methods.Ranking the alternative projects is objectively achieved with the calculation of net outranking flows.By comparing the single-criterion outranking flows, the key factors influencing the performance of the infrastructure project are investigated.To validate the applicability of the proposed approach, it is further applied to analyze the Ningxia Road Project funded by ADB.It is found that the Guyuan-Shizi subproject ( 2 ) area enjoys the best development while the Shizi-Yanchuanzi subproject ( 3 ) performance is the poorest.
The proposed method has three prominent advantages.Firstly, it is well adapted to performance assessment with inadequate observation data, which is very common in project evaluation practice.Since the utilized Random Forests technique is suitable for data mining on few samples, the initial scoring phase could offer a reliable scoring result.Secondly, it helps a lot to avoid subjective bias imposed by decision makers in traditional performance assessment.Only field-survey data and historical statistics are employed in the derivation of initial performance scores and criteria weights, which depend on performance-related statistics information rather than subjectivity.Thirdly, it can also well handle the inherent vagueness and uncertainty in practical project evaluation by introducing fuzzy interval numbers into ranking process.The extended ranking method, IN-PROMETHEE II, features the distance measure for fuzzy intervals and provides realistic ranking results in the context of fuzzy logic.Comparing to the traditional performance evaluation approaches that depend heavily on human experiences, the method proposed in this paper greatly enhances the efficiency and objectivity in the appraisal procedure.From this consideration, it has strong power in the future application of performance evaluation for complicated infrastructure projects.

Figure 1 :
Figure1: The detailed calculation process for the endogenous project performance evaluation approach.

3. 1 .
Data Sources and Criteria Selection.Ningxia Roads Development Project was the first road project supported by the Asian Development Bank (ADB) in Ningxia Province, China.The four-lane expressway of 182 kilometers from Tongxin to Yanchuanzi comprises three highway subprojects.(i) Tongxin-Guyuan subproject  1 with a length of 117.5 km, linking Tongxin and Guyuan, was commenced in May 2004 and opened to traffic in November 2005.(ii) Guyuan-Shizi subproject  2 with a length of 38.5 km, connecting Guyuan and Shizi, was commenced in July 2005 and opened to traffic in December 2007.(iii) Shizi-Yanchuanzi subproject  3 with a length of 24.5 km, connecting Shizi and Yanchuanzi, was commenced in August 2009 and opened to traffic in December 2011.According to the guidance available in Handbook for Selecting Performance Indicators for ADBfunded Projects in the PRC, totally 19 criteria are chosen to evaluate the project performance from the overall development of local society, social service improvement, and local transportation promotion, as listed in Table 1.All the original data used in this empirical study is collected from the complete report and annual monitoring report of Ningxia Roads Development Project in China (2005-2012) provided by the ADB, as listed in Tables 2, 3, and 4.

Table 1 :
Development goals and performance evaluation criteria.
19rvice by the constructed roads.The per capita number of bus routes and vehicles, especially passenger cars ( 15 - 17 ), determines the available transport service and thus serves as predictors of transport business in the area.Besides, the frequency and death rate of traffic accident ( 18 , 19

Table 2 :
The original performance data for the Tongxin-Guyuan subproject  1 .

Table 3 :
The original performance data for the Guyuan-Shizi subproject  2 .Guyuan subproject ( 1 ), the former subproject  3 performs worse than  1 with respect to several important attributes like  18 .So, the overall performance evaluation for Shizi-Yanchuanzi subproject ( 3 ) is not as good as Tongxin-Guyuan subproject ( 1 ).The reason responsible for inferior Shizi-Yanchuanzi subproject ( 3 ) is largely attributed to its high traffic accident frequency ( 3,19 = −1.79).Besides, since both  3,15 and  3,10 are less than or equal to −1, insufficient food expenditure and public transport service also contribute

Table 4 :
The original performance data for the Shizi-Yanchuanzi subproject  3 .

Table 5 :
Initial performance scores of three subprojects under criteria  1 - 10 .

Table 6 :
Initial performance scores of three subprojects under criteria  11 - 19 .

Table 9 :
Weights of performance evaluation criteria.