A Linear Approximate Algorithm for Earth Mover ’ s Distance with Thresholded Ground Distance

Effective and efficient image comparison plays a vital role in content-based image retrieval (CBIR). The earth mover’s distance (EMD) is an enticing measure for image comparison, offering intuitive geometric interpretation and modelling the human perceptions of similarity. Unfortunately, computing EMD, using the simplex method, has cubic complexity. FastEMD, based on min-cost flow, reduces the complexity to (O(N logN)). Although bothmethods can obtain the optimal result, the high complexity prevents the application of EMD on large-scale image datasets. Thresholding the ground distance can make EMD faster and more robust, since it can decrease the impact of noise and reduce the range of transportation. In this paper, we present a new image distance metric, EMD, which applies a threshold to the ground distance. To compute EMD, the FastEMD approach can be employed. We also propose a novel linear approximation algorithm. Our algorithm achieves O(N) complexity with the benefit of qualified bins. Experimental results show that (1) our method is 2 to 3 orders of magnitude faster than EMD (computed by FastEMD) and 2 orders of magnitude faster than FastEMD and (2) the precision of our approximation algorithm is no less than the precision of FastEMD.


Introduction
The development of advanced multimedia technology increases the importance of image retrieval, since large-scale image datasets are proliferating.Effective and efficient methods for computing the similarity between images is vitally important to content-based image retrieval (CBIR) systems.
The EMD defines the comparison of two histograms as the transportation problem.(In [1], EMD is used to compute the distance between two signatures.For the convenience of statement, we use histogram in this paper.In fact, our algorithm can be used to both histogram and signature.)The minimum cost to transport one histogram to another one is taken as the distance of the two histograms.Rubner et al. [1] employ the transportation simplex algorithm [18] to compute the EMD.Thus the computation of EMD is very expensive, with cubic time complexity (( 3 log )).Pele and Werman [19] present the FastEMD algorithm which is faster than the simplex algorithm.FastEMD is based on the min-cost flow algorithm, thus it still suffers from quadratic complexity (( 2 log )).Ling and Okada [10] propose EMD- 1 which employs  1 -norm as the ground distance.EMD- 1 can reduce the number of variables in the transportation simplex algorithm from ( 2 ) to (), so the time complexity can be reduced to ( 2 ).But only using  1 -norm as ground distance limits the application of EMD- 1 .Rahimi and Kiram [20] propose a greedy algorithm with complexity of ( 2 ).Shirdhonkar and Jacobs [21] present a linear time approximate EMD algorithm, Wavelet EMD, based on a weighted wavelet transform.To approximate EMD, wavelet EMD uses the weighted wavelet coefficients of the difference histogram.Another linear time approximate EMD algorithm proposed by Jang et al. [22] is MHC which is used for dominant color descriptor (DCD).This algorithm employs Hilbert curves to fill the multidimensional color space and computes cost according to the Hilbert order of dominant color.To improve accuracy, MHC uses (2  ⋅ !)/2 Hilbert curves to fill the -dimensional space, then computes (2  ⋅ !)/2 cost values, and takes the minimum cost as the distance between two images.Furthermore, EMD is not a metric if the two histograms have different total weight.To solve this drawback, Ljosa et al. [4] introduce a special bin, called the bank, for each image.This extra bin allows the two histograms to exchange mass.The bank has the same ground distance to all other bins.Pele and Werman [3] propose a variant of EMD, named ÊMD, which takes the difference of two weights as an extra mass.
The moving distance of the extra mass is  times maximum ground distance.If the ground distance is a metric and  ≥ 0.5, ÊMD is a metric for all histograms.
The EMD with thresholded ground distance has proven to be more robust, since it can reduce the interference of noise and correspond to the way humans perceive distance [2,3,19].With the help of thresholded ground distance, FastEMD can reduce the number of edges of flow network from ( 2 ) to () [19] and thus makes an improvement in both accuracy and speed.
In this paper, we present a new image distance metric, EMD + , based on thresholded ground distance.We prove that EMD + is a metric by deducing that EMD + equals ÊMD with the same ground distance threshold.Therefore, we can employ FastEMD [19] to compute the value of EMD + .However, its quadratic complexity makes FastEMD a nonideal choice.According to the definition of EMD + , we derive a novel linear approximation algorithm which can compute EMD + very fast.In experiments we find that our algorithm is 2 to 3 orders of magnitude faster than EMD (computed by FastEMD) and 2 orders of magnitude faster than FastEMD.Moreover, the results of our approximation algorithm are no worse than the exact results.
In the remainder of the paper, we first review the earth mover's distance and ÊMD in Section 2. We then formalize the EMD + in Section 3. Section 4 proposes the details of our novel linear approximation algorithm.In Section 5, we experimentally evaluate the practical performance of our algorithm.We conclude this paper in Section 6.

Earth Mover's Distance
The earth mover's distance is a perceptual and intuitive measure between two histograms.EMD which defines the distance between two histograms as the minimum cost that must be paid to transform one histogram to another one is based on the famous transportation problem.We consider one histogram as a supplier with a given amount of products distributed at some places and another histogram as a demander that has some warehouses with a given limited capacity.To transport products from the supplier to the demander, the optimal solution is required to minimize the transportation cost.The following is the formal definition of EMD.
Definition 1 (earth mover's distance).Given two histograms  = {( 1 ,   1 ), . . ., (  ,    )} and  = {( 1 ,   1 ), . .., (  ,    )} with sizes  and  respectively, where    is the weight of bin   and    is the weight of bin   .Let   be the distance between   and   and let   represent the amount of weight transferred from   to   .The total cost of matching  and  is defined as with the following constraints: The earth mover's distance is defined as the minimum total cost for solving this problem; that is, EMD (, ) = min The EMD is a many-to-many matching method which can handle variable-size structures.If the ground distance is a metric and the total weights of two histograms are equal, that is, ∑  =1    = ∑  =1    , EMD is a metric [1].However, in many applications, the total weights of two histograms cannot be equal.To solve this drawback, Pele and Werman [3] present a new EMD variant, ÊMD, which is defined as subject to the same constraints as the EMD.
If the two histograms are probability distributions (i.e., total weight is one) EMD and ÊMD are equivalent.Pele and Werman proved that if  ≥ 0.5 and the ground distance is a metric, ÊMD is a metric.The proof can be found in [3].
ÊMD is more appropriate for two cases.One case is when the total weight of histograms is important.And the other one is when difference of total weight between histograms is distinguished [3].

New Distance Metric: EMD +
Thresholded distances are distances that are limited by a threshold.Let   be the distance between   and   and let  > 0 be a threshold; the thresholded distance between   and   is defined as    = min(  , ).As mentioned in [2,3,19], thresholding the ground distance can improve the performance of EMD family.Motivated by this thought, we design a new variant of EMD with thresholded ground distance.We call this new measure EMD + .If the ground distance is a metric, the EMD + is a metric.Firstly, we formally define EMD + in Definition 2.
Definition 2 (distance measure: EMD + ).Given two histograms  = {( 1 ,   1 ), . . ., (  ,   m )} and  = {( 1 ,   1 ), . .., (  ,    )} with sizes  and , respectively, where    is the weight of bin   and    is the weight of bin   .Let   be the ground distance between   and   , let   be the amount of weight transferred from  i to   , and let  > 0 be a threshold of ground distance.If   < , EMD + moves as much mass as possible from  to ; otherwise, EMD + does not move any mass from  to .Support  is the maximum amount that can be transported from The distance between  and  is defined as s.t We denote the total weight of  and  as () and () respectively.Without loss of generality, we assume () ≥ ().The condition   <  in Definition 2 limits the range of bins that can be matched by   .Only the bins whose distances to   are smaller than the threshold have the chance to receive mass from   .The optimal solution is the minimum total cost of moving  amount of mass from  to .After that, we ignore the limitation of capacity of  and transport all the remainder weight of  to  with cost of .
Figure 1 shows two examples of EMD + .From Figure 1, we obtain () = 20, () = 18, and () = 9.Set  2 -norm to be the ground distance and  = 2 to be the threshold.We get EMD + (, ) = (10 + 6 × √ 2) + (20 − 16) × 2 ≈ 26.5, If we calculate the EMD values, we get That is to say, EMD(, ) > EMD(, ), while EMD + (, ) < EMD + (, ).Actually,  is more similar to  than .Thus, EMD + has more discernibility.In fact, the EMD + is equivalent to the ÊMD with a thresholded ground distance and  = 1 [19].Because the ÊMD is a metric, we can deduce that EMD + is a metric.Theorem 3. If the ground distance  is a metric, then EMD + is a metric.Proof.A thresholded ground distance was introduced in [19].  is the ground distance between   and   , and  > 0 is a threshold.The thresholded ground distance is defined as    = min(  , ).If  is a metric,   is a metric; a proof can be found in [19].Without loss of generality, we assume () ≥ ().Set  = 1; we rewrite ÊMD as Due to the fact that ÊMD 1 is a metric with   (⋅, ⋅), thus EMD + is a metric.

Fast Computation of EMD +
Intrinsically, EMD + has the same optimization problem as EMD and ÊMD.Hence, EMD + can be computed by either transportation simplex algorithm [18] (e.g., EMD [1]) or mincost flow algorithm (e.g., FastEMD [19]).However, the high complexity of simplex and min-cost flow algorithms drives us to develope an optimal method for efficient image retrieval.In this section, we proceed to develop a more efficient algorithm to obtain the value of EMD + .
In the definition of EMD + (Definition 2), there is a very important condition,   < , which restricts that each bin in one histogram can only match a small part of bins in another histogram (Figure 1).Taking advantage of this characteristic, we derive a linear approximate algorithm to compute EMD + .We will explain our basic idea by an example (Figure 2).In Figure 2, it is clear that only the bins that fall into the solid or dash circle have the qualification of receiving mass from  (5,3) or  (7,4) , respectively.We call these bins as qualified bins.If   is one qualified bin of  (5,3) , it is easy to get the qualified bin of  (7,4) at the corresponding position by paralleling   .Inspired by this heuristic, we present to precompute all the possible qualified bins of a reference bin (e.g., (0, 0)) and then utilize the possible qualified bins to improve the computation of EMD + .
To precompute the qualified bins, we propose a novel method which is composed of function  (Algorithm 1) and  (Algorithm 2) (in Algorithm 2, function getBins() uses   -norm as the ground distance, in other cases it is distance dependent). first calls the function  to enumerate all possible qualified bins of the reference bin (e.g., (0, 0)) (line 3) and then sorts the bins in ascending order according the distances to the reference bin (line 4).Let   be the th qualified bin to reference bin.For each bin   , we can get its th possible qualified bin by paralleling   .Notice that the reference bin itself does not appear in qualified bin set; the reason will be shown later.
Let  be the size of qualified bin set ,  ≪ 2  ⋅  2 ( is the dimensionality of histogram).Thus the getBins() (Algorithm 2) can be done in () time and Sort() (line 4) can be done in ( log ) time.In fact,  is very small corresponding to the size of histogram and can be treated as a constant.Thus the time of Algorithm 1 can be ignored (in our experiments (Section 5), the time of precomputing qualified bins is no more than 0.001 second).
Next thing is to compute the value of EMD + , that is, the core of our idea.The details are depicted in the following.
(1) The first step is to saturate all the zero-cost flows, since matching corresponding bins does not cause cost.After this step, many bins will be empty (Figure 2).( 2) The second step focuses on cross-bin matching.We plan to transport as much mass as possible when the cost is as low as possible.We employ the qualified bin set  generated by Algorithm 1 to accelerate the speed of our algorithm.We traverse  from front to end.In the th iteration, we match each nonempty bin   to its corresponding th qualified bin.If the th qualified bin of   exists, we transfer the highest possible mass; otherwise, we do nothing.Since corresponding bins are already handled in the first step, we have no necessity to store the reference bin in the qualified bin set .
Algorithm 3 gives the pseudocode of our algorithm.The proposed algorithm is named as LinearEMD, because its complexity is linear.The very important parameter  stores the qualified bins of original point (e.g., (0, 0) for 2-dimensional histogram).Notice that  only needs to be computed once.
The time complexity of part 1 is () where  = min(||, ||).Part 2 adopts greedy idea to match cross bins (lines 13∼23).The outer loop (line 13) guarantees that nearest bins get preferential treatment and each bin in  is shifted to the same position.Ideally, many bins in  or  will be empty after several iterations.The speed will be faster and faster.Assume the average of bin numbers in each iteration is  ( ≪ ) and the size of  is . is a constant.The complexity of part 2 is () = ().Part 3 which gets the result of EMD + by a simple calculation has an (1) expensive (line 31).Therefore, the total cost is ( + ) = () ( ≪ ).

Experimental Evaluation
This section aims to experimentally test and verify the following things.
(1) LinearEMD is efficient enough for massive image retrieval.
(2) Thresholding the ground distance can get a more effective result.
(3) The result set obtained by LinearEMD is very close the exact value.Size of RGB color histogram Average searching time of one query   To do that, we use image retrieval as an example to perform the experiments.Section 5.1 gives the setup of our experiments and Section 5.2 describes some evaluation criteria.The experimental results are shown in Section 5.3.

Experimental Setup.
Wang's dataset [23] is a benchmark for image retrieval.This dataset contains 1,000 color images in 10 classes.Each class has 100 images.Since there are some ambiguous images in this dataset, we use a subset of Wang's dataset filtered by [19] (we downloaded the filtered image list from http://www.seas.upenn.edu/ofirpele/FastEMD/).This subset is composed of 773 images.We exact the first 5 images from each classe as the queries.We perform our experiments on 5 different size RGB color histograms, that is, 8 * 8 * 8, 10 * 10 * 10, 12 * 12 * 12, 14 * 14 * 14, and 16 * 16 * 16, respectively.We use RGB color histogram because it is simple and has the ability to estimate the experimental goals.We do not claim that RGB color histogram is the best descriptor for image retrieval.
We compare EMD + with ÊMD (in this section, when we say ÊMD, we mean ÊMD with thresholded ground distance), EMD- 1 , and EMD.To compute EMD + , we use LinearEMD algorithm (we implement this algorithm in C++).For EMD- 1 , we use the authors implementation [10] (we downloaded the code from http://www.dabi.temple.edu/hbling/code data.htm/).For ÊMD, we use FastEMD [19] (we downloaded the code from http://www.seas.upenn.edu/ofirpele/FastEMD/code/).For EMD, we also use FastEMD [19], since simplex is very expensive.The ground distance of EMD- 1 is, of course,  1 -norm.For others, we choose both  1 -and  2norm.For EMD + and ÊMD, the ground distance thresholds are 3.0, 3.5, 4.0, 4.5, and 5.0 corresponding to the size of RGB color histogram, respectively.We set the coefficient  of ÊMD to be equal to 1. Since EMD + equals ÊMD, the value of EMD + (computed by LinearEMD) is approximate value and the value of ÊMD (computed by FastEMD) is the exact one.
The experiments are run on a computer with one Intel Xeon E5645 2.40 GHz CPU, 8 GB memory.The OS is CentOS release 6.3 and gcc version is 4.4.6.

Evaluation
Criteria.This section introduces some criteria to fairly evaluate the performance of these distance measures and algorithms.
To evaluate the speed, we compare the average runtime of each query.The effectiveness is measured by the average number of correct retrieved images.We define the mean overlap rate (MOR) to calculate the common images retrieved by LinearEMD and FastEMD.MOR is the average Jaccard coefficient: where  is the number of query,    is the result set of the th query retrieved by algorithm , and    is the result set of the th query retrieved by algorithm .To measure how close the values computed by two methods are, we employ the symmetric mean absolute percentage error (SMAPE) [24].SMAPE is defined as follows: where  is the number of common retrieved images and   and   are the values of the th image computed by algorithms  and .

Results
. Firstly, we compare the speed of each method.Figure 3 shows the average running time of each query.Obviously, our algorithm exceeds other methods in both ground distances.Our algorithm is faster by two to three orders of magnitude faster than EMD, two orders of magnitude faster  than ÊMD and an order of magnitude faster than EMD- 1 .
The larger the size of RGB color histogram, the better our algorithm.When the size of RGB color histogram is 16 * 16 * 16, the LinearEMD is to 500 times faster than the FastEMD and nearly 3000 to 4500 times faster than the EMD.Therefore, the LinearEMD is competent enough to handle massive image retrieval task.
The second thing of this section is to compare the retrieved images of each method.Figure 4 shows the average number of correct retrieved images of each query.It is obvious that thresholding the ground distance of EMD (i.e., EMD + and ÊMD) can obtain better retrieval result.That is, EMD + and ÊMD are more robust and have more discernibility.EMD + and ÊMD with  1 norm as ground distance can achieve better results than  2 -norm as ground distance.However, for EMD,  2 as ground distance retrieves more images.Generally, EMD + ( 1 ) (EMD + (  ) denotes EMD + with   -norm as the ground distance; ÊMD(  ) and EMD(  ) have similar meanings) gets the best results and EMD + ( 2 ) is better than ÊMD( 2 ).In addition, Figure 4 illustrates the precision of EMD + and ÊMD will be improved while the size of histogram is increased.Although our method is the approximate algorithm, the precision of our method is no less, even higher, than the exact algorithm (FastEMD).Figure 5 shows an example of retrieved images.The images retrieved by EMD + ( 1 ), EMD + ( 2 ), and ÊMD( 1 ) are all correct, yet the last image retrieved by ÊMD( 2 ) is incorrect.However, there are only 2 correct images in the results retrieved by EMD- 1 , EMD( 1 ), and EMD( 2 ), respectively.This result shows the advantage of thresholding the ground distance once again.
The third thing is to count the common images retrieved by our algorithm and the FastEMD.To this end, we employ the mean overlap rate (MOR).Figure 6(a) shows the MOR of the two 50 nearest neighbors obtained by our algorithm and the FastEMD (we only count the correct retrieved images).The MOR( 1 )s (MOR(  ) denotes the MOR value of LinearEMD and FastEMD with   -norm as the ground distance; SAMPE(  ) has the same meaning.)are between 95% and 98% and the MOR( 2 )s are between 93% and 95%.Although the MOR( 2 )s are a little bit lower than the MOR( 1 )s, they are still very high.Figure 6(a) indicates that the results between our algorithm and the FastEMD share many common images.
Finally, we compute the SMAPE of our algorithm and the FastEMD.Like the MOR, we use the two 50 nearest neighbors and only count the correct retrieved images.The SMAPE results are shown in Figure 6(b).Obviously, the SMAPE( 1 )s are lower than the SMAPE( 2 )s.In spite of this, the maximum value is lower than 4%. Figure 6(b) illustrates that the values computed by our approximate algorithm highly approach the values computed by the FastEMD.
In one word, our approximate algorithm is an optimal choice for comparing images.

Conclusion
In this paper, we have presented a new variant of EMD with thresholded ground distance, named as EMD + .We proved that EMD + is a metric for any kind of histogram.To compute the value of EMD + , we have proposed a linear approximate algorithm.This algorithm is an effective and efficient method for image retrieval.Experimental results show that our method break through the comparing methods.And thus, our method is competent enough for massive image retrieval task.Our method may also be able to compare other histograms (e.g., HSV color histogram) and descriptors (e.g., SIFT). and

Figure 1 :
Figure 1: Three histograms , , and .The light cyan circles indicate histogram , the light yellow squares indicate histogram , and the light yellow hexagons indicate histogram .In both figures,  is the same histogram.One circle, square, and hexagon denote one bin of , , and , respectively.The number inside the circle or square or hexagon is the weight of that bin.The blue arrows describe the flow of mass; the number beside the arrow denotes the weight of flow.

Figure 2 :
Figure 2: Red crosses indicate the distribution of histogram  and blue circles indicate the distribution of histogram .The radii of the green solid and cyan dash circles equal the threshold .

Figure 5 :
Figure 5: An example of retrieved images.The image on the top is the query.(b)-(h) are the 5 nearest neighbors retrieved by different methods.The last two images in (c) look very similar, but they are not the same image.The first 4 images in (b)-(d) are the same.

Figure 6 :
Figure 6: The MOR and SMAPE between our algorithm and FastEMD.