A Statistical Comparison of Three Goodness-ofFit Criteria Used in Modelling Distances

Distance predicting functions may be used in a variety of applications for estimating travel distances between points. To evaluate the accuracy of a distance predicting function and to determine its parameters, a goodness-of-fit criteria is employed. AD (Absolute Deviations), SD (Squared Deviations) and NAD (Normalized Absolute Deviations) are the three criteria that are mostly employed in practice. In the literature some assumptions have been made about the properties of each criterion. In this paper, we present statistical analyses performed to compare the three criteria from different perspectives. For this purpose, we employ the `kpθ-norm as the distance predicting function, and statistically compare the three criteria by using normalized absolute prediction error distributions in seventeen geographical regions. We find that there exist no significant differences between the criteria. However, since the criterion SD has desirable properties in terms of distance modelling procedures, we suggest its use in practice.


Introduction
When objects in space, such as different cities in a geographic region, activity centers in a plant, or computer terminals of a LAN, can be represented by points, a distance predicting function may be used to transform point coordinate differences of two points into an estimate of the distance between the points.Thus, distance predicting functions have a number of uses.Some of these uses are discussed below.
For validating the accuracy of actual road network distance data, distance predicting functions can be used as suggested by Ginsburgh and Hansen [8].
To determine the optimal mix of trunking and tramping of a truck transportation network for the movement of finished goods and raw materials among national distribution centers, regional depots, and producers, a distance predicting function was utilized by Westwood [25] to obtain estimates of the travel distances between possible links in the network.In some distribution problems for which only the demands and the general location of customers are known [7], a distance predicting function may be employed to calculate a predicted travel distance between the depot and the general area.
Distance predicting functions can also be used in models that determine the response time of emergency vehicles to calls such as the model proposed by Kolesar [10] for calculating the response time of fire engines to fires.
Klein [9] suggests that distance predicting functions which reflect the nature of a geographic region's road network should be used for constructing Voronoi diagrams of the region.A Voronoi diagram subdivides a region into a number of subregions with each subregion being formed around a point belonging to a set of points.For example, the set of points may be the region's police stations, fire halls, or hospitals.Once the location of a query point is determined, the appropriate point of the set is notified to respond to the call by looking at the Voronoi diagram.
Distance predicting functions appear within the context of larger models such as facilities location problems [6], [13].Distance predicting functions in these models obviate the need for determining actual distances between the new facilities and the existing facilities.In addition, by using distance predicting functions which have empirical parameters that reflect the nature of a region's road network, more accurate cost structures should be obtained than if an assumed distance function is used by an analyst.
Presently, a distance predicting function is being utilized in software packages TruckStops2 [20] and Roadnet [21].When an analyst provides data regarding the customer demands, customer locations, and truck types for a transportation network, TruckStops2 assigns customers to different trucks and determines the routes for the trucks.
Distance predicting functions may be used for calculating distances in a Geographic Information System (GIS).As Star and Estes [22] state, distance measurements are of value in many geographic circumstances.Some of these circumstances are planning an irrigation channel between a pond and a field, locating a site for a fire tower in a forest, and calculating the distances among different geographic regions.To calculate distance measurements, a distance predicting function may be incorporated into a GIS.
In order to evaluate the accuracy of a distance predicting function, a criterion is required.The criterion not only provides a numerical value so that different distance predicting functions can be compared but also provides the means for determining any empirical parameters of a distance predicting function.Researchers are presently using three goodness-of-fit criteria [1], [2], [3], [5], [11], [12], [18], [23], [24]: 1. Sum of Absolute Deviations (AD), 2. Sum of Squared Deviations (SD),

Sum of Normalized Absolute Deviations (NAD),
In addition, AD and SD have been used by Love and Morris [11], [12] to develop tests for statistically comparing the accuracy of different distance predicting functions.
There are several motivations for conducting the study presented in this paper.Love, Walker and Tiku [18] describe a procedure to find the confidence intervals for a fitted distance.The procedure utilizes the statistical properties of the errors produced when a distance predicting function is fitted to a particular geographic region.Since different criteria could lead to different statistical properties of the fitting errors, we do statistical analyses of these errors for the three fitting criteria.
Secondly, in the literature the three criteria were assumed to have different properties in terms of predicting distances.For example, it has been assumed that if the AD criterion is used, the weighted p -norm will predict long distances more accurately than short distances.The SD criterion has been characterized as having prediction errors with better statistical properties but still being similar to the AD criterion in terms of its accuracy in predicting long distances [11], [12].The NAD criterion, on the other hand, has been assumed to predict short distances as accurately as long distances [3], [5], [15], [18].
In this paper, we present statistical properties of the fitting errors and a comparison of the above mentioned criteria.Statistical analyses are applied to seventeen different geographic regions using the kpθ -norm as the distance predicting function.In Section 2, the three criteria and the distance predicting function are described.In Section 3, the statistical test procedures and comparison results are presented.Finally, in Section 4, conclusions based on our analyses of these results are discussed.

The Distance Function and the Criteria
The weighted p -norm ( kpθ ) was employed as the distance predicting function.With the kpθ -norm the travel distance between points x = (x 1 , x 2 ) and y = (y 1 , y 2 ) is given by where x = (x 1 , x 2 ), y = (y 1 , y 2 ), θ ∈ [0, 90 • ] and This norm was selected because insights into the peculiarities of road networks are provided by the empirical parameters k, p, and θ of the norm when the empirical parameters are determined for a sample of road distances from a geographic region.The parameter p measures the rectangular bias of the road network.The angle θ is a rotation parameter which ensures that the coordinate axes are rotated counterclockwise from the analyst's defined coordinate axes until the road network is in phase with the rotated coordinate axes [5].The parameter k is an inflation factor which accounts for the hills, valleys and other types of noise in the road networks.
A criterion is used to measure the accuracy of a distance predicting function and also to determine its optimal parameters.We next describe the general methodology for fitting the distance predicting function to a given geographic region.A random sample of points within the geographic region is chosen.Based on an arbitrary coordinate system, cartesian coordinates for each point are assigned and the actual distances between each pair of points are measured or read from distance charts.Then the parameters (k, p and θ) of the distance predicting function are computed to minimize the value of the selected criterion.
Let kpθ (a i , a j ) be the predicted distance between points a i and a j and A(a i , a j ) be the actual distance between a i and a j , and n be the number of points in the data set.Then the mathematical expressions for three goodness-of-fit criteria that will be analysed in this paper are the minimizations of the following sums: The first criterion AD is the minimization of the sum of absolute deviations.Since the terms in AD are not the weighted ones but only the absolute errors for each pair, it has been described as a criterion which should estimate long distances more accurately than short distances.The second criterion SD is the minimization of the sum of squared deviations where each squared error term is weighted by 1/A(a i , a j ).Squared deviations and the division by actual distance provide the criterion with certain desirable statistical properties [11], [12].However, the assumption has still been made that the difference in the accuracy of predictions involving long and short distances in a region will favour the long distances [1], [2], [11], [12], [23], [24].
The last criterion NAD is relatively new in the literature [3], [5], [15], [18].With the NAD criterion, a sum of normalized absolute deviations is minimized and the basic premise is that equal accuracy in predicting long and short distances in a region will result.Normalization is realized by dividing the absolute deviation by the actual distance between each pair.In this way both long and short distances are treated on the same relative basis.
Besides their above-mentioned structures, the three criteria also differ from each other by the computational procedures performed to determine the optimal parameter values of the distance predicting function.The computational procedures for fitting the AD and the SD criteria are given by Brimberg and Love [4].For the NAD criterion it is shown that the computational procedure is identical to that of AD [15].In general, the best θ and p values are determined by using a two-stage incremental search procedure and a four-stage incremental search procedure, respectively.To find the best k value some properties of the criteria are used.It is known that AD is a convex function of k, and SD is a strictly convex function of k [4].NAD was shown to be a convex function of k by Love and Walker [15].When using the AD and NAD criteria it is necessary to employ an algorithm to find the optimal k for a given (θ, p) pair.The optimal k for the SD criterion is calculated with a simple closed-form formula derived by Brimberg and Love [4].The property of having a closed-form formula to find the best value of k makes the application of the SD computationally more efficient than using either the AD or the NAD criterion.
To model the parameters of the kpθ -norm Love and Walker [16] collected sample data from seventeen geographic regions.The sample data for each geographic region included 15 points (locations) based on random selection of point coordinates on the map.These 15 points provided 105 actual distances to be modelled by the distance predicting function kpθ using each criterion.
The empirical parameter values for the kpθ -norm and the corresponding minimum criterion values for seventeen geographic regions are computed by Love and Walker [17] for the AD and SD criteria, and by Love and Walker [15] for the NAD criterion.In Table 1 we present the best parameter values of the kpθ function used in this study.

Statistical Comparison of the Criteria
The purpose of this section is to conduct statistical comparisons of the three criteria by adopting the normalized absolute prediction error as the random variable.For our work on road distances, the errors are the differences between actual distance and fitted distance pairs.Using the previously defined notation the model that we used to determine the relationship between the fitted distance and the actual distance is given by where e(x i , x j ) is the error term for the x i , x j pair.From the random sample of points for a geographic region, the point estimates of the empirical distance predicting function parameters are calculated.Substituting these point estimates into the empirical distance predicting function, an estimate of the actual distance, kpθ (x i , x j ), is obtained.The error term for any pair of points embodies errors that may arise in determining the fitted distance for that pair of points.For empirical distance functions which utilize point coordinate differences, these errors may arise from point coordinate measurements, inaccurate instrument calibrations, and road network peculiarities that are not captured by the distance model.In order to compare the three criteria, we use a transformed random variable given as |e(x i , x j )|/A(x i , x j ).There are three reasons for using this transformation.First, the new random variable frees the error terms from their directions so that the absolute errors are to be compared.Second, since each criterion produces errors in different units the division of each error term by its actual distance provides the comparison to be performed on the same basis for the criteria.Finally, the accuracy in predicting long and short distances in a given region can be compared on the same basis by this new random variable.

Comparison of the |e(x
In order to compare the normalized absolute errors, |e(x i , x j )|/A(x i , x j ), their distribution for each criterion was first checked for normality.For that purpose and also to present the descriptive statistics for each distribution, Table 2, which includes the means (x), variances (s 2 ), skewness (μ 3 ) and kurtosis (in (μ 4 − 3) form), is constructed.In Table 2, we observe that skewness and kurtosis values for the distributions are different enough from zero that we cannot conclude the distributions of |e(x i , x j )|/A(x i , x j ) are from normal distributions for each criterion in the regions.The normal probability plots and histograms for the United States and Toronto can be found in Love and Üster [14].These plots also support the non-normality of the |e(x i , x j )|/A(x i , x j ) distributions.Therefore, a nonparametric test was applied to determine if the |e(x i , x j )|/A(x i , x j ) distributions for each criterion were significantly different from each other in a given region.The Friedman Test [19], which is used for multiple matched samples, was employed as the main effect test to compare the three |e(x i , x j )|/A(x i , x j ) distributions at the 5% significance level.The p-values for seventeen geographic regions are listed in Table 3.Since the p-values are well above 0.05, no pair of criteria is significantly different at the 5% significance level.
In Table 4 the mean absolute percent errors are reported for each criterion and region.Based on the figures in Table 4, it can be said that the average percent absolute errors for a given region are very close to each other for the criteria, and in general, they are small enough to conclude that the predicted distances are close approximations of actual distances.For example, in Brussels the percent absolute errors in predicting distances are 4.46%, 4.47% and 4.46% for the AD, SD, and NAD criteria, respectively.
We next test the long and short distance distributions of the random variable |e(x i , x j )|/A(x i , x j ) for normality.Non-normality of this variable's distribution formed by 105 pairs in a given region does not guarantee that a subset of these 105 pairs which is non-randomly formed by 35 pairs, also comes from a non-normal distribution.There are six different distributions (two for each criterion) used in the comparisons to identify the differences between the three criteria for predicting long actual distances and short actual distances.If we find that in these six distributions there is at least one non-normal distribution for each criterion (either long or short distance distribution, but not all three are the same), then we need to use nonparametric tests for the following parts of this section.Therefore, the skewness and kurtosis values of the |e(x i , x j )|/A(x i , x j ) distributions for long distances using the AD and NAD criteria, and for short distances using the SD criterion are reported for each region in Table 5.The skewness and kurtosis values in Table 5 are sufficiently different from zero (the distributions are always skewed right, and are generally more peaked than the normal distribution) to provide evidence that the distributions come from non-normal populations.This is also supported by the normal probability plots and histograms which can be found in Love and Üster [14].There- fore, nonparametric tests should be used for unbiased comparisons of the criteria involving the six distributions.

Accuracy of the criteria in predicting long and short distances
It has already been shown that we must employ a nonparametric test to compare the accuracy of the three criteria in predicting long and short distances.For that purpose, two Friedman tests (long and short distance distributions) for the matched triples are performed for each region.A p-value less than 0.05 is supposed to indicate the existence of a significantly different pair among the criteria for the given region.The p-values of these main effect tests are provided in Table 6.The significance levels listed in Table 6 can be interpreted as follows: In general, the accuracy in predicting long or short distances is not significantly different for the criteria.However, for long actual distances, in five of the eight large geographic  In order to identify which criterion is more accurate in predicting long or short distances in the above exceptional regions, multiple comparisons are performed by using nonparametric Wilcoxon matched pairs tests.However, instead of reporting the results of this test, average percent absolute errors (100 * E[|e(x i , x j )|/A(x i , x j )]) for predicting long and short distances for the criteria are presented in Table 7. Inspecting the average percent absolute errors of the first five exceptional regions listed above, it is observed that the AD and SD criteria generate less average percent absolute error than the NAD criterion in predicting long distances.For example, in Australia the average percent absolute errors for the AD and SD criteria are 5.02% and 5.03%, respectively, whereas for the NAD criterion, it is 6.42%.Inspecting Table 7 for short actual distances, we see that the NAD criterion provides better prediction accuracy than either of the AD and SD criteria for the four exceptional regions.For example, in Australia the NAD criterion generates a 6.23% absolute error in predicting short distances.However, the AD and SD criteria provide 8.45% and 8.37% absolute errors for the same region, respectively.

Accuracy of each criterion in predicting the long versus short distances
The purpose of this section is to compare the accuracy of predicting long distances versus short distances in a region.First, in order to determine whether the variance of the |e(x i , x j )|/A(x i , x j ) distribution is constant for a given criterion in a region, Levene tests are conducted for each criterion in seventeen regions.Hence if the p-value of the Levene test for a criterion is significant, then we can conclude that the variance of the |e(x i , x j )|/A(x i , x j ) distribution for that criterion in that region is constant and vice versa.The 2-tail p-values for the Levene tests are presented in Table 8.Based on this table, the variances of the |e(x i , x j )|/A(x i , x j ) distributions are not homoscedastic in 10 regions for AD, again in 10 regions for SD, and in 11 regions for NAD.However, note that the significance levels for the urban centers are not as strong as the significance levels for the large geographical regions.In order to see the general pattern of differences in variances for the three criteria we inspect Table 9 where the variances of the |e(x i , x j )|/A(x i , x j ) distributions resulting from long and short distance predictions are reported.In general, it can be said that the variance of the distribution of |e(x i , x j )|/A(x i , x j ) for long distances is less than the variance for short distances for each criterion in each region.But this conclusion does not always hold at the 5% significance level as the Levene tests suggest in Table 8.To see the converging funnels formed by the difference in variances the scatter plots of |e(x i , x j )|/A(x i , x j ) are inspected.The plots for United States and Toronto are shown in Love and Üster [14].5 The accuracy of each criterion in predicting the long versus short distances in a given region is examined by using the nonparametric 2-tailed • The |e(x i , x j )|/A(x i , x j ) populations are non-normal for each criterion.The histograms are highly peaked with more occurrences close to zero and skewed right.
• There are generally no pairs of the criteria for which the |e(x i , x j )|/A(x i , x j ) distributions are significantly different.
• In terms of the |e(x i , x j )|/A(x i , x j ), the three criteria are not significantly different in predicting either long distances or short distances.However, each criterion has a higher accuracy in predicting relatively long distances than in predicting relatively short distances.
• The variance of the |e(x i , x j )|/A(x i , x j ) distribution for long distances is significantly different than the variance of the |e(x i , x j )|/A(x i , x j ) distribution for short distances.The former is smaller than the latter, and hence the scatter plots of the |e(x i , x j )|/A(x i , x j ) distributions form a converging funnel as the actual distance between the sample points increases.
Finally, we can say that because of the computational efficiency provided by the closed form formula to determine the best value of parameter k when fitting the k p -norm, and since the e(x i , x j ) have an expected value of zero without any exceptions in all the regions, it would seem to be advantageous to use the SD criterion in practice.

Table 3 .
The p-values for the Friedman Test of |e(x i , x j )|/A(x i , x j ) distributions

Table 4 .
Mean absolute percent errors for criteria

Table 5 .
Skewness and Kurtosis Values for the Normality of Three Distributions.British Columbia, Great Britain, New York State, and the United States, and for the short actual distances, in four regions; Australia, New York State, Los Angeles and Paris, there is at least one pair of criteria with significantly different distance prediction accuracy.

Table 6 .
The p-values of Friedman test comparing long and short distance distributions

Table 7 .
Average percent absolute errors in predicting long and short distances

Table 8 .
p-values of the Levene test for |e(x i , x j )|/A(x i , x j )

Table 10 .
p-values of the Mann-Whitney Test for long and short distance |e(x i , x j )|/A(x i , x j )