Many planning applications must address conflicting plan objectives, such as cost, duration, and resource consumption, and decision makers want to know the possible tradeoffs. Traditionally, such problems are solved by invoking a single-objective algorithm (such as A*) on multiple, alternative preferences of the objectives to identify nondominated plans. The less-popular alternative is to delay such reasoning and directly optimize multiple plan objectives with a search algorithm like multiobjective A* (MOA*). The relative performance of these two approaches hinges upon the number of

Most realistic planning problems have multiple competing objectives. It is common in practice to select a preference over the objectives and solve the problem with respect to this preference [

There are two ways to find a set of solutions that trade off the objectives differently: iterate a single objective algorithm over multiple preferences, or use a multiobjective algorithm with no assumptions about the preferences. However, the poor scalability of multiobjective heuristic search algorithms, such as multiobjective A* (MOA*) [

MOA* generalizes A* to find multiple nondominated solutions in a very straightforward manner. A* searches for a single “best” solution in terms of one optimization metric captured by each node’s

Noticing that having a single nondominated

We explore several approaches to computing heuristics for MOA*: (i) compute a single

We compare A* and MOA* in probabilistic planning, where the plan length and probability of goal satisfaction are the competing objectives. Probabilistic planning largely simplifies our analysis because in partial solutions, one objective (plan length) is free to change, but the other objective (probability of goal satisfaction) is fixed (i.e., no partial plan collects the probability of goal satisfaction until it is a complete plan). The effect is that there is a single best

In the following, we present the MOA* algorithm, provide the intuition for investigating which

MOA* [

Let

Consider the example in Figure

Two-objective example (hypervolume depicted).

MOA* [

MOA*

(1) Initialize

(2) Find a subset of

(3) If

(4) Otherwise, select node

(5) If

(6) Otherwise, Expand

successor

MOA* is guaranteed to terminate with an optimal efficient set of solutions (a Pareto optimal set) when (i) edge cost vector components are all greater than zero, (ii) the graph is locally finite, (iii) the

The intuition for judiciously selecting which

Since it is possible for A* and MOA* to expand the same nodes and find the same solutions, whichever algorithm with a lower combined per node and per iteration cost will perform better. A*’s per node cost is dependent upon how many times the node is reexpanded (i.e., its

With MOA*, it is possible to bootstrap a node’s

Without an oracle, there are several options for computing the

While there are many ways to compute

Probabilistic planning is a naturally multiobjective problem, where at its simplest, the plan objectives are the plan length and probability of goal satisfaction. More precisely, we use the risk (one minus the probability of goal satisfaction) so that we can minimize each objective. We choose to study probabilistic planning because it has one property that largely simplifies our analysis and otherwise boosts empirical performance when comparing MOA* and A*. As previously noted, MOA* associates cost vectors with search graph edges; in probabilistic planning, all edges leading to nonterminal nodes incur unit cost with respect to plan length and zero cost with respect to risk; however, edges leading to terminal nodes incur zero cost for the plan length and a possibly nonzero risk cost. The effect of this property is that there is a single efficient

The following subsections define the conformant probabilistic planning problem, its formulation as a graph search problem in both A* and MOA*, and discussion of an existing reachability heuristic used in the search algorithms.

A conformant probabilistic planning problem is given by the tuple

A belief state is a probability distribution over all states, where each state is a set of propositions. The probability of a state

An action

A sequence of actions

We formulate CPP as both A* and MOA* search over the belief state space. Each search node is a belief state, and each edge is an action.

A* search associates a unit cost with each edge, and defines terminal nodes as those nodes where the belief state’s probability of goal satisfaction is no less than a given threshold

MOA* search associates a cost vector with each edge. The first component of the vector is the action execution cost (as in A*), and the second component is the risk. Risk is only incurred when transitioning to terminal nodes, and only action execution cost is incurred when transitioned to nonterminal nodes. MOA* treats terminal nodes differently than A* because it does not exit upon finding a solution (i.e., MOA* does not use a goal satisfaction threshold

The most effective heuristics for CPP involve estimating the cost to achieve the goal with probability no less than

In MOA*, the heuristic is computed once for the bound

We compare A* to multiple versions of MOA* (using different heuristic computation strategies) on several CPP problems across four domains. The questions that we attempt to answer are as follows.

Will multiple invocations of A* with different preferences or one invocation of MOA* with no preferences find a better set of solutions, find the set faster, or both?

Which method for computing

The following describes the evaluation metrics, domains, the test environment, and results.

As previously mentioned, we measure the quality of a set of solutions by its hypervolume. We can measure the hypervolume over time with MOA*, but because A* finds a single solution at a time, only measuring its hypervolume over time is not as appealing. Thus, we compute a set of solutions using A* and also compare the total time taken by MOA* to find a plan set with the same or better hypervolume. We also compare the maximum hypervolume found by each technique, within twenty minutes for MOA*, and for the fixed number of solutions found by A* (where each invocation to find a solution is given a twenty-minute limit).

The evaluation domains include an artificial domain, called Grid and Ladder (GL), and several domains from the CPP literature, including Logistics, Gripper, and Sand Castle.

The GL domain is an adaptation of the Grid domain presented by Hyafil and Bacchus [

The other domains are not modified from their original versions to help gauge the MOA* approaches in problems without clear structure. Logistics [

All experiments were conducted on a 3 Ghz Xeon processor running Linux with 8 GB of RAM. All code was written in C

Figure

A* versus MOA* in GL domain.

G/L | M+ | M2 | M10 | MN2 | MN10 | |||||||

T(s) | HV | T(s) | HV | T(s) | HV | T(s) | HV | T(s) | HV | T(s) | HV | |

3/2 | 2.55 | 0.60 | 2.14 | 0.64 | 4.66 | 0.63 | 45.36 | 0.63 | 1.95 | 0.63 | 10.94 | 0.63 |

3/6 | 9.12 | 0.40 | 1211.68 | 0.41 | 10.48 | 0.43 | 109.56 | 0.43 | 27.24 | 0.42 | 37.64 | 0.42 |

3/10 | 42.08 | 0.27 | — | — | 24.58 | 0.29 | 208.48 | 0.28 | 61.90 | 0.27 | 82.44 | 0.27 |

6/2 | 34.67 | 0.47 | 47.33 | 0.50 | 322.80 | 0.47 | — | 0.39 | 21.07 | 0.48 | 45.23 | 0.47 |

6/6 | 69.47 | 0.29 | 470.21 | 0.29 | 107.82 | 0.30 | — | 0.26 | 44.75 | 0.28 | 58.99 | 0.30 |

6/10 | 238.14 | 0.23 | — | — | 207.63 | 0.24 | — | 0.17 | 137.30 | 0.23 | 90.50 | 0.24 |

10/2 | 481.47 | 0.38 | 139.32 | 0.39 | 349.38 | 0.38 | — | — | 122.49 | 0.38 | 319.03 | 0.38 |

10/6 | 562.31 | 0.26 | — | 0.18 | 753.27 | 0.26 | — | — | 361.63 | 0.26 | 588.06 | 0.26 |

10/10 | 867.51 | 0.18 | — | — | 845.73 | 0.18 | — | — | 205.28 | 0.18 | 719.37 | 0.20 |

Hypervolume over time for GL domain.

The plots that show the hypervolume over time indicate that the

Figure

A* versus MOA* in Logistics, Gripper, and Sand Castle domains.

M+ | M2 | M10 | MN2 | MN10 | ||||||||

T(s) | HV | T(s) | HV | T(s) | HV | T(s) | HV | T(s) | HV | T(s) | HV | |

Logistics p2-2-2 | 32.01 | 0.15 | 129.00 | 0.41 | 98.03 | 0.16 | — | — | 54.30 | 0.28 | 132.64 | 0.28 |

Gripper | 1.74 | 0.87 | 2.47 | 0.94 | 0.73 | 0.95 | 3.07 | 0.95 | 0.46 | 0.95 | 1.00 | 0.95 |

Sand Castle | 2.33 | 0.88 | 0.91 | 0.89 | 1.34 | 0.96 | 4.79 | 0.96 | 1.34 | 0.96 | 2.59 | 0.95 |

Hypervolume over time for Logistics, Gripper, and Sand Castle domains.

From our analysis, we have seen the following trends in comparing MOA* and A*.

MOA* improves the quality of solution sets over A* because it is not limited to finding a predetermined number of solutions, and it continues to search and improve upon the solutions.

The

The

The

A* is fast to find a first solution, but much slower than MOA* at finding a set with large hypervolume.

Multiobjective problem solving has been previously studied in planning [

The work of Van Den Briel et al. [

Finding a set of diverse solutions to planning problems is an important problem recently studied in planning [

MOA* was originally studied by Stewart and White [

We have shown that MOA* can find a better

In future work, we intend to explore additional techniques for computing