Dynamic Vehicle Routing Problems with Enhanced Ant Colony Optimization

the


Introduction
In the past few decades, because of the global developments in transportation and logistics, our lives have been significantly changed.For any local products that need to be sold to other cities or countries, the cost of transportation and logistics is indispensable.Actually, recent research data has shown that the cost of transportation and logistics usually accounts for 20% of the product's value or more [1]; the logistics system has played an ever-growing and indispensable role in daily economic lives.Nevertheless, it also brings many negative effects, for example, air contamination, noises, and traffic accidents [2].
Although transportation and logistics have inevitable consequences for daily lives, efficient vehicles routing arrangements based on optimization algorithm could reduce negative impacts as little as possible, as well as enterprise logistics cost.This is because shortening vehicles running distances will promote the efficiency of vehicles and drivers.In addition, the algorithm can improve customers service quality, reduce exhaust emissions, and promote vehicles dispatch efficiency [3].Consequently, the research of vehicle routing problem, which is a significate topic, has grabbed the scholars' attentions during the past few decades [4].
In the history of VRP, the simplest and most famous routing problem is the Travelling Salesman Problem (TSP): given a set of urban locations, a salesman must go to every city once and return to the initial starting city, to find out the shortest travelling routes [5].By the variant of TSP, researchers design many kinds of VRP; the basic VRP involves a set of customers (each customer should just be serviced once by one vehicle), who need to be serviced by a fleet of vehicles, and all vehicles start and return to the same depot.In addition, due to the limits of the vehicle running length and/or travelling time, service process may need multiple different routes [6,7].Actually, the VRP has been classified to several variants, such as Capacitated VRP (CVRP), Multidepot VRP (MDVRP), and VRP with time windows (VRPTW) [8][9][10][11][12].
In most studies of VRP, researchers almost define some basic information concerning customers' locations and demands, available vehicles, and so on, which are entirely known before carrying out service.However, in actual service processes, VRP is dynamic; that is to say, customers' demands and arrangements are changing gradually over time, although a part of customers' demands may be known in advance before starting service.In addition, the DVRP is NP-hard problem, so traditional exact algorithms (linear programming, dynamic programming, greedy algorithm, etc.) are notoriously difficult to solve it under time limitations.However, modern optimization techniques (Ant Colony Optimization (ACO) [13,14], genetic algorithm (GA) [15][16][17], particle swarm optimization (PSO) [18], etc.) which have the ability to generate high-quality solutions (although they are not exact) are the most suitable methods to solve DVRP.In these approaches, ACO is a classical and efficient heuristic algorithm.
ACO is a classical bionic algorithm, which is inspired by the process of observing foraging behavior of ant colony.Ant individuals communicate and exchange information by secreting pheromone (a special chemical substance) in the environment.Via sensing concentration of pheromone, ants can choose the appropriate path to reach food sources.This behavior has grabbed people's attentions and created artificial ant systems to resolve combinatorial optimization problems [19].
The initial ACO was proposed by Dorigo in 1991, called ant system.However, it suffered from nonconvergence and local optima problems.A large number of variants of ant system were introduced to make up for its disadvantage effectively, such as elitist ant system, max-min ant system, and ant colony system [20].Moreover, several novel mechanisms are proposed to promote the performance of algorithm, such as changing rules to enlarge the space of random search [14], N-Opt local random searches, and applying social insects to design distributed control [21].For the DVRP, the goal of algorithm is not only to search optimum solution, but also to track the optimal solution over time by information of the previous search space.The algorithm needs to be sufficiently quick and flexible to adapt to the changed information.Based on this consideration, the adaptability of algorithm should be enhanced adequately.
ACO is a typical adaptive algorithm since it can transfer information from past environment to new environment and quickly adapt to dynamic changes.In addition, ACO has strong robustness and handles extreme conditions reasonably.In order to better meet dynamic environment, a great number of strategies are introduced to enhance ACO for resolving the DVRP.These can be summarized as (a) maintaining diversity by immigrant schemes [22], (b) memorybased methods [23], (c) multiple population approaches [24], and (d) clustering based algorithms [25].
In this paper, we design an enhanced ACO to solve different scale DVRP.A large number of actual instances show that ACO algorithms can efficiently solve optimization problems in different fields, including the Feature Subset Selection [26], Set Covering Problem [27], and Wireless Sensor Networks [28].
There are two main contributions in this paper.The first is that this paper solves DVRP by enhanced ACO which tries best to improve the degree of randomization and avoid falling into local search prematurely.In order to enhance the ACO, this paper proposes the following modifications: (1) Dividing region by improved K-means (2) Optimizing the initial solutions with the crossover (3) Improving the solutions with 2-Opt.
By a mass of comparative experiments based on different scale data sets, enhanced ACO has shown its advantages.
The second contribution is to design a more equitable evaluation system for DVRP.To date, in most published papers, time-based assessment strategy and cost-based assessment strategy are widespread among VRP.However, those evaluation approaches are biased: they just show the separate cost of several methods.Therefore, except the customary evaluations, the concepts of dynamic degree, vehicles utilization rate, and the -test are added to the evaluation system.
The reminder of this paper is organized as follows.In Section 2, we describe DVRP model and define the problem.In Section 3, the details of enhanced ACO are shown.The experimental details and results are discussed in Section 4. Some conclusions and future works are provided in Section 5.

Problem Description and Definition
In this section, the DVRP will be described in detail.The problem model is defined in the following part.

Static Vehicle Routing
Problem.Generally, static VRP can be defined nearly as follows: to search a route or several routes that link depot with a crowd of customers; meanwhile the total cost is as small as possible.
In the past decades, most papers use an undirected graph  = (, ) to establish a mathematical model.In the model,  = {V 0 , V 1 , . . ., V  } represents the vertex set and  = {(V  , V  ) | V  , V  ∈ ,  < } is an edge set.A set of  homogenous vehicles (having the same and invariable capacity ) depart from a single depot, which is represented by the vertex V 0 , and must visit total customers that are represented by  vertexes {V 1 , . . ., V  }.In , we calculate the distance of customers V  and V  and get distance matrix  = (  ).Every customer V  has a demand   and needs to be visited once by only one vehicle. is divided into  routes { 1 , . . .,   } that include all customers.The distance of route and calculating cumulative Cost(  ) to total cost of solutions  is as follows: On the whole, the static VRP needs to observe the following constraints [15]: (1) Each vehicle starts from and returns to the same depot.
(2) Each customer is to be visited once by only one vehicle exactly.
(3) Each vehicle (assuming all vehicles are of the same model) has a capacity limitation.
(4) The vehicle arrives and service time must satisfy the stipulated time.Generally, the dynamism has mainly revealed the uncertainty of customer requests during the services.More concretely, the varieties of requests can be the number of goods [30][31][32] and services [33].The travel time [34] and service time, two dynamic factors of the most read-world environments, have been taken into account.In this paper, dynamism focuses on the changes of service time.According to customer request time, the algorithm handles the orders dynamically.

DVRP Model.
We study a classical DVRP model which is proposed by Montemanni et al. [35].In this paper, DVRP is regarded as a variety of the ordinary static VRP by dividing a whole DVRP into a set of standard VRP and then solving them in sequence with ACO.A number of vehicles have been arranged to serve customers that are known in advance; meanwhile many new customers' demands are emerging constantly over time.These newly joined customers' demands should be sent to the vehicles that are working or are handled by additional vehicles according to new customers' required time.Thus, there are always some customers which have been serviced and new customers who wait to be serviced at any moment in working day.If a day is divided into a lot of little time periods, the DVRP can be regarded as a set of standard static VRP in every time period.Due to the fact that VRP is a NP-hard problem, this indicates that the DVRP also is a NPhard problem [36], so DVRP must be also handled in each setting time period.A DVRP example is shown in Figure 1; as shown, some known customers (black dots) orders have been known in advance.Red lines and black lines represent initial designed routes to service known customers.As time goes on, new customer/s (blue triangle) orders are added to the system; thus the additional new customers are inserted into existing routes and will generate more new routes [15].

Measuring Dynamism.
In different problems, the levers of dynamism cannot be the same.The dynamism is usually characterized by the frequency of changes and the urgency of requests [37].Three metrics have been applied to describe dynamism concretely; they are, respectively, degree of dynamism [17], effective degree of dynamism, and effective degree of dynamism with TW [38].
This paper adopts the DVRP model in [35] and regards DVRP as a set of static VRP.Therefore, this paper selects the metric, degree of dynamism (dod), which is the ratio of the known to unknown customers before the system starts to serve: where dod ∈ [0, 1].If dod is 1, all customers are known in advance and the problem is completely static, while if it is 0, no customers are known in advance and the problem is completely dynamic [17].

Converting DVRP to Static VRP with Event Scheduler
System.The event scheduler system is aimed to manage customers' orders, including accepting orders, distributing orders, and creating static problems.It is used as a dispatching center that connects new real-time orders with the optimization procedure.Firstly, the system needs to submit the known customers' orders to optimization procedure, let it create static problems, and serve the known customers; meanwhile the system accepts new real-time customers' orders.Secondly, during the process of handling the static problem, if new added orders need to be dealt with in time, the new orders should be added in prior unserved orders list immediately.Lastly, today's remaining orders are arranged to the next working day.An event scheduler system is shown in Figure 2.
As shown, the system accepts the customers' orders, creates static problems, and sends problems to the optimization procedure (our algorithm).After the enhanced ACO (E-ACO) produces static solutions and returns to the system, then system commits orders.In K-means algorithm, the metric is the Euclidean distance.The sum of error squares   is calculated by ( 4), and it is applied to classify the initial clustering centers [25].In (4),   is a series of clustering centers,   denotes the average value of the clustering center i, and p are the data points included in i

Enhanced ACO for DVRP
Although the traditional K-means algorithm can deal with clustering problems, it relies on  value.In this paper, ( 5) is applied to determine the  value.In (5),  denotes the average value of all data points, and other parameters are the same as (5).Once the minimal  is obtained, the  value will be determined ultimately [40] According to the modified K-means, the data points are divided into  different regions reasonably; then E-ACO will handle each region, respectively.

Generating Initial Solutions.
In ACO, the scale of colony is defined as , an individual ant represents a vehicle, and the route is generated by gradually visiting customers until all customers have been visited.The customers, who have been visited or who violated the vehicle capacity constraint, are stored in the tabu list, and the list can ensure that this ant does not select those customers again.
The strategy of how to decide to select the next visiting customer depends on a probabilistic rule, which takes into consideration the visibility of the ants and the pheromone information.Therefore, the ants will rely on the following formula to decide the next customer  for the th ant at the th node where   () is the probability of selecting  as the next customer of  on the route,  (,) is the pheromone density of edge (, ),  (,) is the visibility of edge (, ),  and  are the relative influence of the pheromone trails and the visibility values, respectively, and tabu  is the set of the unfeasible nodes for the th ant [13].

Optimization Operation
3.4.1.Crossover Operation.In general, crossover operation comes from the genetic algorithm (GA) [41], but it can be applied to other algorithms.For example, the operator can help the ACO reach further solutions in the search space.Crossover operation's main idea is to randomly select two tours and exchange, respectively, the customers of their solution by crossover rate.Consequently, the process of operation will probably generate new solutions and increase the possibility of finding better solutions.
This paper modifies the version of the Best Cost Route Crossover (BCRC) in [42].The detailed steps concerning the conventional BCRC are introduced in [39,42]; the new version that we modify is shown in Figure 4, where  is randomly generated decimal; it is used to compare with the crossover threshold ().The best insertion location is a point of selected route which is exchanged with another selected route and makes the new routes' results minimum; the results can be calculated by (1).In this approach, the customer () is inserted into primary customers (, ), and the newest cost is calculated as follows: Cost , = Dist (, ) + Dist (, ) − Dist (, ) .(7) In order to implement this idea, the E-ACO system initially sets  = 1 (normal BCRC).If it obtains the best route in ten continuous tests, it is added to the best list.When the system is aware of the fact that this has been in the best list, the system will be far from the list and reach a global optimum; the following threshold will be decreased by 10% and the system will run ten continuous tests again.With time, the threshold is less than 0.1; in that condition it is reset to 1.0 again [17,43].
Finally, a best strategy based on different crossover thresholds is selected, and it ensures that the best solution is detected over time.Figure 5 is an example of crossover operator process.The black line route (1-3-2-4-1) and the red line route (1-6-5) are two different routes.According to exchanging 4 and 6, two new routes (1-3-2-6-1 and 1-4-5-1) are structured.[44] in 1958.The main idea is to select a route and exchange the two neighboring locations.This operation provides a way to obtain better solution and avoid the algorithm falling into local optimum.

Local Search. 2-Opt is a classical local search heuristic, which was proposed by Croes
In this paper, the operation of 2-Opt is applied to optimize routes.Firstly, exchanging all possible neighboring customers' locations generates some new routes.Then each new route is tested to see whether this pair exchange can improve the solutions' quality [45].Finally, the best solutions will be adopted.The operation has been applied to several ACO applications (Chen and Ting, 2006 [14] and Gao et al. 2016 [25]) for the VRP. Figure 6 is an example of implementing.In this figure, A1, A2, and A3 denote three routes.In A1, firstly, customers 5 and 6 exchange locations and obtain a new route.Then exchanging 4 and 8 forms another route, and so on, to exchange 3 and 6.Finally, calculate each new route's value by (1), and find the minimal value of all new routes.By comparing all new routes, exchanging 2 and 7 (A2) will become the best route, so exchanging 2 and 7 forms route A3. Figure 7 is the graphic of this operation; the elements are the same as those in Figure 1. Figure 7(a) is the route before implementing the operation of exchanging 2 and 7, and Figure 7(b) is the route of implementing 2-Opt.

Update of Pheromone Information.
One of the most important steps of ACO is the updating ant pheromone.It is the key to adapting the self-learning technique of ACO and obtaining high-quality solutions.In order to simulate the process of pheromone evaporation, reducing pheromone concentration is applied to all links, and it can ensure that no link has unique advantages.The approach is implemented with the following updating formula: where  new  is the final pheromone of link (, ),  old  is the initial pheromone of link (, ),  is a constant that adjusts the speed of evaporation,  is the number of all routes,  is the number of the routes in the solution, and Δ   is the increased pheromone of link (, ) in route .
In this paper, the rule of updating pheromone refers to the ant-weight strategy put forward by Yang et al. [46].The strategy is indicated as where  is a constant and  is the sum of all routes' lengths, that is,  = ∑    ,   is the th route length,   is the length The ant-weight strategy is based on actual solution quality to update the increased quantity of pheromone, which includes two parts: the local pheromone increment and the global pheromone increment.In this strategy, the increased quantity of global pheromone /( × ) is relevant to the total paths length, while the local pheromone increment (  −   )/(  ×   ) is related to the corresponding link (, ) contribution to this solution.
In addition, in order to avoid the risk getting in local optimization, upper and lower limits [ min ,  max ] are set as follows: where  0 is the distance between the depot and the th customer [13].

Experimental Results and Discussions
In this section, the performance of the enhanced Ant Colony Optimization algorithm to solve DVRP will be assessed rigorously and integrally.By a number of experiments based on different scales of data sets, we analyze the performance of the proposed algorithm by solution quality, utilization rate of vehicles, and degrees of dynamism.The data sets are open and available at http://neo.lcc.uma.es/vrp/.The E-ACO parameters used for instances are shown in Table 1.The algorithm is implemented by the MATLAB (version: R2010b) language, and the configuration of our experimental computer is an Intel5 Core6 i5-6500 3.19 GHz, 8 GB RAM running Windows 10 (x64).All the results are averaged over 25 runs.

Comparison Based on Small and Medium Scale of Data
Sets.In this part, a comparison of the solution quality in terms of the best value and the average value among three proposed ACO algorithms is implemented; the three algorithms are ACO, K-ACO, and E-ACO, respectively.The ACO is from Montemanni et al. 's [35] ACO, the K-ACO is basic ACO fusing K-means and 2-Opt, and the modifying K-ACO with crossover operation forms the final E-ACO.In addition, the scale of data sets is between 50 and 199.
Table 2 gives the best and average value of three approaches.The numbers in bold are the best results among three algorithms.It can be observed that E-ACO attains 17 out of 20 best values and 12 out of 20 average values compared with ACO and K-ACO.This may be attributed to the fact that the introduction of K-means algorithm can divide the search space reasonably and make the ACO using crossover operation obtain better solutions in each clustering region.
Meantime, we perform statistical analysis using a paired -test to investigate whether there are statistically significant differences between E-ACO and other algorithms according to the solution quality.Since it is expected that the best solution of E-ACO is better than other approaches, a null hypothesis,  0 , is given below: where  E-ACO and  CA are population mean for E-ACO and CA, respectively.CA refers to the compared algorithms; if E-ACO is compared with K-ACO, the CA will be K-ACO.Table 3 shows pairs, mean differences for instances, and  value at statistical level of  = 0.05.In Table 3, the union of group 1 and group 2 shows the two improved ACO (K-ACO and E-ACO) compared with the original ACO.By observing the  value of group 1 and group 2 among best and average values, the mean differences of ACO versus K-ACO are 69.79 and 178.55 with  value of 0.834 and 0.699, respectively; this indicates that the basic ACO using K-means and 2-Opt is worse than ACO in [35].However, the mean differences of ACO versus E-ACO are −136.51and −38.88 with  value of 0.890 and 0.972, respectively; E-ACO is statistically significantly different from ACO.This indicates that K-ACO fusing crossover can be able to explore more possibilities and search for better solutions.The union of group 2 and group 3 shows the mean of differences and  value; the analysis is similar to group 1 and group 2; this analysis tests the efficiency of E-ACO again.
Although it seems that K-means did not contribute to the E-ACO, the efficiency of K-means is shown in the section of comparison based on high scale of data sets.
In addition, this paper also compares the utilization rates of vehicles among ACO, K-ACO, and E-ACO.Because Montemanni did not discuss utilization rate of vehicles in [35], the data is blank.The vehicles' utilization rate is a significant value for assessing the performance of the approach; it can show whether each vehicle is utilized fully.
Table 4 shows utilization rate of vehicles.Nine out of 20 results from E-ACO outperform 7 out of 20 from K-ACO, and the other 5 instances are of the same value.Due to application of K-means, the utilization rate of vehicles among K-ACO and E-ACO is more than 80%.This indicates that K-means is an appropriate approach to improve the utilization rate of vehicles.If any VRP has limited number of vehicles, K-means should be preferred.

Comparison Based on High Scale of Data Sets.
Although most of the studies of vehicle routing problem adopt the small and medium scale of data sets, this paper will add the high scale of data sets to our experiment.The data sets' scale is from 225 to 480.Because we are unable to find previous papers which use these Kelly data sets, the ACO is replaced by the basic ACO, which was not optimized with K-means and crossover.K-ACO and E-ACO are the same as those in Section 4.1.Table 5 gives the best and average value of three approaches.The numbers in bold are the best results among three algorithms.It can be observed that E-ACO attains 6 out of 8 best values and 5 out of 8 average values compared with basic ACO and K-ACO.This may be attributed to the fact that the introduction of K-means algorithm can divide the search space reasonably and make the ACO using crossover operation obtain better solutions in each clustering region.
In order to show the efficiency of each optimal operation, we also applied the paired -test.The definition of paired test is the same as Section 4.1.By observing the basic ACO versus K-ACO in Table 6, the mean differences of ACO versus K-ACO are −64.99 and −119.12 with  value of 0.682 and 0.654, respectively; K-ACO is statistically significantly different from basic ACO.This also indicates that the basic ACO using K-means is better than basic ACO and the K-means improved the ACO effectively.The mean differences of K-ACO versus E-ACO are −47.01 and −43.20 with  value of 0.835 and 0.866, respectively; this indicates that crossover operation has a contribution to E-ACO.By comparison based on high scale of data sets, the contributions of K-means and crossover to E-ACO are confirmed completely.

Comparison Based on Different Degrees of Dynamism (dod).
In this part, for the sake of evaluating different degrees of dynamism's effect to ACO, we will perform some ACO tests based on different dod (1/3, 1/2, and 2/3).Table 7 gives the best and average results of this comparison.From this table, the same data set with different degrees of dynamism will obtain diverse results.In order to clearly show the results, we count the number and proportion of different dod in terms of the best value and the average value.In different dod, the distribution of the best value and average value is relatively stable and balanced; this illustrates that E-ACO is robust and stable.

Comparison with Previously Published DVRP Systems.
In order to evaluate the performance of the proposed enhanced ACO regarding the solutions quality in terms of minimizing the travelling distances, comparisons have been conducted between it and previously published DVRP systems.These    8 gives the best and average value of this comparison and counts the number and proportion of the best values of each algorithm.In this paper, the best found solutions are in bold.This table shows that the proposed E-ACO based on DVRP finds 10 new best solutions in the 20 problems, accounting for 50% of the best, while GA-based DVRP reaches 6 best possible solutions, accounting for 30%.The DVRP-GA, DAPSO, and VNS produce number of the best solutions being 2, 1, and 1, respectively.
Meanwhile, we perform statistical analysis using a paired -test to investigate whether there are statistically significant differences between E-ACO and other algorithms according to the solution quality.Table 9 shows pairs, mean differences for instances, and  value at statistical level of  = 0.05.As is shown in Table 9, E-ACO is statistically significantly different from tabu, DVRP-GA, DAPSO, and VNS; their  values are 0.841, 0.899, 0.807, and 0.800 and mean differences are −62.99,−33.17, −65.26, and −67.13, respectively.Particularly, GA-based DVRP is the relatively newer and better results; E-ACO is statistically significant from it with 0.983 and mean difference of −5.45.This analysis indicates that the E-ACO performs as well as other algorithms which are the most effective approaches recently proposed in this paper.

Convergence Characteristics of the E-ACO.
In this section, we investigate the proposed approach by observing its convergence characteristics while solving DVRP.Here we will consider two examples of problem "Kelly09" and problem "Kelly10." In Figure 8, (a) and (b) show the convergence characteristics.The red line and blue line are the trend of "best value" and "average value" with iterations.In (a), after 30 iterations, E-ACO almost begins to converge.From 30 to 180 iterations, convergence rate is very slow.In Figure 8 to search the solution.For DVRP, the goal of the algorithm is not only to search for the best solution, but also to attain the best solution in short term.By a large convergence experiment, we found that E-ACO can almost obtain the best value compared with tabu, DAPSO, VNS, and GA-based DVRP in 40 iterations.

Conclusions and Future Work
In most published papers, their methods are proposed to solve static VRP.However, in the real world, the vehicle routing problems are dynamic.So we proposed an enhanced Ant Colony Optimization algorithm to solve DVRP; E-ACO With the aim of testing the performance of the approach, the experimental results are compared with previously published results.The experimental results showed that the E-ACO is able to find high-quality solutions.
Although our approach has a minor advantage in finding the best solution in DVRP, there are many issues that need to be modified.For example, the convergence rate of our approach is slower than others in later period.Therefore, various further works need to be extended.On the one hand, continuing to improve our algorithm makes it more efficient and robust.On the other hand, more complex dynamic situations, such as real-world problem application, can be solved with E-ACO.

Figure 2 :Figure 3 :
Figure 2: The flow chart of event scheduler.
.2.The Improved K-Means.K-means is a classical clustering algorithm, and its objective function is the sum of distances between clustering center and data points.The main idea of K-means is to divide a set of  points into  regions according to minimizing objective function and let each region be compact as far as possible.
[39]lem with time windows (VRPTW) by adopting ACO[39].In this paper, in order to gain better solutions, Kmeans, crossover, and 2-Opt are applied to enhance ACO.The flowchart of E-ACO for the DVRP is shown in Figure3.The following sections will introduce the details of E-ACO.3

Table 3 :
-test significance results between each ACO.

Table 4 :
Comparison of vehicles' utilization rates.
[17]ch (TS) and GA, Khouadjia et al. 's[41]Variable Neighborhood Search (VNS) with a local search, and particle swarm optimization (PSO) with a local search, Abdallah's[17]GAbased DVRP.Table (b), the status of convergence is similar to (a).Quick convergence of algorithm may harm optimization problem, but it is necessary for DVRP 3752.80

Table 8 :
Comparison between the published systems.

Table 9 :
Paired -test based on the published systems.based on fusion of ACO and K-means; meanwhile it uses 2-Opt and crossover to optimize our routes further.The algorithm is tested in a number of data sets' environments, which are derived from public VRP benchmark data.In the part of experiments, we compared the ACO, K-ACO, and E-ACO.In order to demonstrate the efficiency of proposed algorithm, the -test is applied to perform statistical analysis.In addition, the relationship of ACO and degree of dynamism (dod) is researched by a number of tests based on different dod. is