A Decentralized Partially Observable Markov Decision Model with Action Duration for Goal Recognition in Real Time Strategy Games

Multiagent goal recognition is a tough yet important problem in many real time strategy games or simulation systems. Traditional modeling methods either are in great demand of detailed agents’ domain knowledge and training dataset for policy estimation or lack clear definition of action duration. To solve the above problems, we propose a novel Dec-POMDM-T model, combining the classic Dec-POMDP, an observation model for recognizer, joint goal with its termination indicator, and time duration variables for actions with action termination variables. In this paper, a model-free algorithm named cooperative colearning based on Sarsa is used. Considering that Dec-POMDM-T usually encounters multiagent goal recognition problems with different sorts of noises, partially missing data, and unknown action durations, the paper exploits the SIS PF with resampling for inference under the dynamic Bayesian network structure of Dec-POMDM-T. In experiments, a modified predator-prey scenario is adopted to study multiagent joint goal recognition problem, which is the recognition of the joint target shared among cooperative predators. Experiment results show that (a) Dec-POMDM-T works effectively in multiagent goal recognition and adapts well to dynamic changing goals within agent group; (b) Dec-POMDM-T outperforms traditional Dec-MDP-based methods in terms of precision, recall, and F-measure.


Introduction
Recently, more and more commercial real time strategy (RTS) games have received attention from AI researchers, behavior scientists, policy evaluators, and staff training groups [1].
A key aspect in developing these RTS games is to create human-like players or agents who can act or react intelligently against changing virtual environment and incoming interactions from real players [2].Though many AI planning and decision-making algorithms have been applied to agents in RTS games, their behavior patterns are still easy to be predicted and thus making games less entertaining or intuitive.This is partially because of agents' low information processing and understanding ability, for example, the recognition of goal or intention from opponents or friends.In other words, understanding goals or intentions in time helps agents cooperate better or make counter decisions more efficiently.
A typical scenario in RTS games is a group of AI players cooperating to achieve a certain mission.In the Star-Craft, for example, the AI players have to cooperate so as to besiege enemy bases or intercept certain logistic forces [3].Therefore, if AI players can recognize the real moving or attacking target, they will be better prepared, no matter with early defense employment or counter decision-making.Considering these benefits, goal recognition has attracted lots of attention from researchers in many different fields.Many related models and algorithms have been proposed and applied, such as hidden Markov models (HMMs) [4], conditional random fields (CRFs) [5], Markov decision processes (MDPs) [6], and particle filtering (PF) [7].
Hidden Markov models [8] are especially known for their applications in temporal pattern recognition such as speech, handwriting, and gesture recognition.Though convenient in representing system states, HMMs have low ability in describing agent actions in dynamic environment.Comparing to HMMs, MDPs have a better representation of actions and their future effects.MDP is the framework for solving sequential decision problems: agents select actions sequentially based on states and each action will have an impact on future states.They have been successfully applied in goal and intention recognition [6].Several modifications based on the MDP framework have a finer formalization towards more complex scenarios.Among these models, the Dec-POMDM (decentralized partially observable Markov decision model) [9] is a MDP-based method focusing on solving multiagent goal recognition problem.Though having all details of cooperation embedded in team's joint policy, Dec-POMDM is only concerned about actions starting and terminating within one time step.This is usually not applicable in RTS games.
Based on ideas from Dec-POMDM and SMDPs [10], we propose a novel decentralized partially observable Markov decision model with time duration (Dec-POMDM-T) to formalize multiagent cooperative behaviors with durative actions.The Dec-POMDM-T models the joint goal, the actions, and the world states hierarchically.Compared to works in [9,11], Dec-POMDM-T explicitly models the time duration for primitive actions, indicating whether actions are terminated or not.In Dec-POMDM-T, the multiagent joint goal recognition consists of three components: (a) formalization of behaviors, the environment, and the observation for organizers; (b) model parameter estimation through learning or other methods; and (c) goal inference from observations: (a) For the problem formalization, agents' cooperative behaviors are modeled by joint policies, ensuring model's effectiveness without considering domainrelated cooperation mechanism.Besides, explicit time duration modeling of primitive actions is also implemented.
(b) For the parameter estimation, under the assumption of agents' rationality, many algorithms for Dec-POMDP could be exploited for accurate or approximate policy estimation, making the training dataset unnecessary.This paper uses a model-free algorithm named cooperative colearning based on Sarsa [12] in policy learning.
(c) For the goal reference, the modified particle filtering method is exploited because of its advantages in solving goal recognition problems with different sorts of noises, partially missing data and unknown action duration.
Like the modified predator-prey problem presented in [9], the scenario in this paper also has more than one prey and predator.The predators first establish joint pursuing target or goal, which would be changed halfway, before capturing it.The model and its inference methods applied in this paper are to recognize the real goal behind agents' cooperative behaviors which are partially observable traces with additional noises.Based on this scenario, we retrieve agents' optimal policies using a model-free multiagent reinforcement learning (MARL) algorithm.After that, we run a simulation model in which agents select actions according to policies and generate a dataset consisting of 100 labeled traces.With this dataset, statistical metrics including precision, recall, and -measure are computed using Dec-POMDM-T and other Dec-MDP-based methods, respectively.Experiments show that Dec-POMDM-T outperforms the others in all three metrics.Besides, recognition results of two traces are also analyzed, showing that Dec-POMDM-T is also quite robust when joint goals change dynamically during the recognition process.The paper also analyzes the estimation variance and time efficiency of our modified particle filter algorithm and thus proves its effectiveness in practice.
The rest of the paper is organized as follows.Section 2 introduces related works.Section 3 analyzes the moving process in RTS games and presents the formal definition of Dec-POMDM-T as well as its DBN structure.Based on that, Section 4 introduces the way to use modified particle filter algorithm in multiagent joint goal inference.After that, experiment scenarios and parameter settings as well as results are shown in Section 5. Finally, the paper draws conclusions and discusses future works in Section 6.

Related Works
As an interdisciplinary research hotspot covering psychology and artificial intelligence, the problem of goal recognition or intention recognition has been tried from many different ways.In early days, the formalization of goal recognition problem is usually related to the construction of plan library, in which the recognition process is based on logical consistency matching between observations and plan library.After that, the well-known Probabilistic Graphic Models (PGMs) [13] family, including MDPs [6], HMMs [3], and CRFs [5], were further proposed as a more compact graph-based representation approach.Additionally, PGMs have their advantage in modeling the uncertainty and dynamics both in environments and the agent itself, which is not possible in the above consistency-based methods.Among PGMs, several modifications including forming hierarchical graph model structure [14][15][16] and explicit modeling of action duration [17,18] are also proposed.Although probabilistic methods have their advantage in uncertainty modeling, still they cannot represent and process structural or relational data.Statistical relational learning (SRL) [19] is a relatively new theory applied in intention recognition, including logical HMMs (LHMMs) [20], Markov logic networks (MLNs) [21], and Bayesian logic programs (BLPs) [22].It combines relation representation, first-order logic, probabilistic inference, and machine learning altogether.Besides, several other methods based on probabilistic grammar have also been proposed on the discovery of the similarity between natural language process (NLP) and intention recognition [23].Most recently, deep learning and other intelligent algorithms in retrieving agent's decision model are also applied in intention recognition [24].Other considerations like goal recognition design (GRD) [25,26] try to solve the same problem from different aspects.

Goal Recognition with Action Duration Modeling.
There is a group of models in PGMs, like HMM-/MDP-based models, that has close relationship with Markov property.The property assumes that the future states depend only on the current state.Generally speaking, the Markov property enables reasoning and computation with the model that would otherwise be intractable.Though it is desirable for models to exhibit Markov property, it is not always the truth in real goal recognition scenarios, causing serious performance degradation like lower precision, longer convergence time, and even wrong prediction.One main reason for Markov property violation occurs in agents having durative primitive actions.Typically there are two approaches in solving the above problem.One is forming hierarchical structures.Fine et al. [14] proposed Hierarchical HMM (HHMM) in 1998.Bui et al. [3] used abstract hidden Markov models (AHMM) for hierarchical goal recognition based on abstract Markov policies (AMPs).A problem of the AHMM is that it does not allow the top-level policy to be interrupted when the subplan is not completed.Saria and Mahadevan [27] extended the work by Bui to multiagent goal recognition.Similar modifications include works like Layered HMM (LHMM) [15], Dynamic CRF (DCRF) [28], and Hierarchical CRF (HCRF) [16].
Another kind of approaches tackling non-Markov property falls into explicit modeling of action duration time.Hladky and Bulitko [17] applied hidden semi-Markov model (HSMM) to opponent position estimation in the first person shooting (FPS) game Counter Strike.Duong et al. [18] proposed a Coxian hidden semi-Markov model (CxHSMM) for recognizing human activities of daily living (ADL).The CxHSMM modifies HMM in two aspects: on one hand, it is a special DBN representation of two-layer HMM, and it also has termination variables; on the other hand, it used Coxian distribution to model the duration of primitive actions explicitly.Besides, Yue et al. [9] proposed a SMDM (semi-Markov Decision Model) based on AHMM, which not only has hierarchical structure, but also models the time duration.Similar methods also include Semi-Markov CRF (SMCRF) [29] and Hierarchical Semi-Markov CRF (HSCRF) [30].

Multiagent Goal Recognition Based on MDP Framework.
As what we have known, MDP is the framework for solving sequential decision problems.Baker et al. [6] proposed a computational framework based on Bayesian inverse planning for recognizing mental states such as goals.They assumed that the agent is rational: actions are selected based on an optimal or approximate optimal value function, given the beliefs about the world, and the posterior distribution of goals is computed by Bayesian inference.Ullman et al. [31] also successfully applied this theory in more complex social goals, such as helping and hindering, where an agent's goals depend on the goals of other agents.In the military domain, Riordan et al. [32] borrowed Baker's idea and applied Bayesian inverse planning to inferred intents in multi-Unmanned Aerial Systems (UASs).Ramırez and Geffner [11] extended Baker's work by applying the goal-POMDP in formalizing the problem.Compared to the MDP, the POMDP models the relation between real world state and observation of the agent explicitly.Comparing to POMDP, I-POMDP defines an interactive state space, which combines the traditional physical state space with explicit models of other agents sharing the environment in order to predict their behavior.Ramirez and Geffner also solved the inference problem even when observations are incomplete.Besides, Yue et al. [9] also proposed a Dec-POMDM model based on Dec-POMDP in recognizing multiagent goal recognition.Its model, however, does not consider situations when agents are having durative actions in RTS games.Above modifications based on MDP framework, like SMDPs, POMDPs, and Dec-POMDPs, all have a finer formalization towards more complex scenarios.

The Model
We propose the Dec-POMDM-T for formalizing the world states, behaviors, goals, and action durations in goal recognition problem.In this section, we first introduce how agents do path planning and move between adjacent grids in RTS games.Then, the formal definition of the Dec-POMDM-T and relations among variables in the model is explained by a DBN representation.Based on that, the planning algorithm for finding out the optimal policies is given.

Agent Maneuvering in RTS Games
. Agents' maneuvering in RTS games usually consists of two processes: one is the path planning knowing the starting point and destination beforehand; the other one is agents moving from current positions to adjacent grids.

Path Planning.
Like many classical planning problems, path planning would also generate several courses of actions given starting points and destinations, which is a sequence of positions specifically.In dynamic environments however, the effects of actions would be uncertain.Besides, agent maneuvering is essentially a sequential decision problem, in which agents select actions according to current states and destinations.Further, in multiagent cooperative behaviors, path planning also needs to follow joint policy shared among the agent group.Thus a probabilistic Markov decision model is needed.

Moving between Adjacent
Grids.After knowing the next position or grid from path planning algorithm, agents need to move from original position to it.In real situations, one moving action usually lasts for several steps before agent arriving in target position.This situation breaks the Markov property, and thus making the agent decision process falls into a semi-Markov one.
As in Figure 1, which is originally shown in [33], assume that an agent is on the point  which is in the grid of C2 and wants to go to the point  which is in the grid A3.For the path planning level, the agent needs to choose a grid among the five adjacent grids (B1, B2, B3, C1, and C3).In this example, the agent decides to go to grid B2.In the moving level, the agent will move along the line from point  to point  which is the center of grid B2.Because the simulation step is a short time, the agent will compute how long it will take to reach point  according to the current speed.Because the position of the agent is a continuous variable, it is very unlikely that the agent just gets the grid center when a simulation step ends.Thus, the duration of moving is usually computed by where speed is a constant in the moving process and  step is the time of a simulation step.‖position  − position  ‖ is the distance between the point  and point .The duration is computed by a floor operator.In this case, duration = 3.After moving for 3 steps from position , the agent will get position  and choose the next grid.This moving process will not be intercepted except that the intention is changed.

Formalization.
In standard definition of Dec-POMDP, there is no concept of intention or joint intention.The Dec-POMDP defines the states which consist of all information needed for making decisions.When formalizing a model for goal recognition, the original definition of states should be further decomposed into inner and external states, corresponding to agents' intentions and outside environment, respectively.Thus the action selection is determined by all inner and external states.Besides, in multiagent goal recognition for cooperative behaviors, inner states could further be extended to joint intentions or goals.In our Dec-POMDM-T, it should also satisfy situations when joint goal can be terminated as of goal achievement or halfway interruption.Thus, the Dec-POMDM-T is a combination of four parts: (a)    Figure 2 shows the subnetwork for joint goal in cooperative missions.As shown in Figure 2(a), the full dependency of the joint goal  +1 would include no more than original goal   , goal termination variable   , and the current state   at time .When   takes on 0 at time , showing that joint intention is not terminated,  +1 would remain the same as   .While if   takes on 1, agents would select another joint goal according to goal selection function  with (, goal) = (goal | ).In our modified predator-prey scenario, it means that predator team would change their joint target with the consideration of their inner and outer situations.
Similarly, we also depict the subnetwork for action taking by different agents in Figure 3.As shown in Figure 3(a), action selection for agent  at time  + 1 would always be determined by the previous executing action     , action termination indicator    , observation   +1 , and the joint goal  +1 at time  + 1. Different situations are described in Figure 3 The full DBN structure of Dec-POMDM-T in two time slices is presented in Figure 5.
For simplicity and clarity, a snapshot of only the agent  in two time slices is presented in Figure 5, with its activities being depicted using dashed frame in both slices.Detailed relationships among variables have already been explained in Figure 2 to Figure 5. Agents have no knowledge about each other and make their decision based on individual observations.Apparently, the DBN structure of the Dec-POMDM-T is much more complex than previous works in [3,9,33].Compared to goal or plan recognition models with hierarchical structures like AHMM [3] and SMDM [33], the Dec-POMDM-T implicitly represents task decomposition   and mission allocation in joint policies.While for models [9] based on Dec-POMDP, the Dec-POMDM-T explicitly models time duration of primitive actions.

Inference
Recognizing the multiagent joint goal is an inference problem trying to find out the real joint goal behind agent actions based on observations online.Essentially, this process is to compute the distribution of joint goal   given   , which is (  |   ).It can be achieved either by accurate inference methods or approximate ones.As we have already exhibited the complexity of Dec-POMDM-T's DBN structure in above section, accurate inference of (  |   ) would be quite time consuming and thus impractical in many RTS games.Besides, accurate inference requires nearly perfect observations which would also be impossible in RTS games permitting only partially observable data using similar applications of war fog.
Traditional methods like Kalman filter and HMM filter usually rely on various assumptions to ensure mathematical tractability.However, data in multiagent goal recognition involves elements of non-Gaussianity, high-dimensionality, and nonlinearity and thus preclude analytic solutions.As a widely applied method in sequential state estimation, particle filter (PF) is a kind of sequential Bayesian filter based on Monte Carlo simulations [35].Unlike methods like extended Kalman filter and grid-based filters, PF is very flexible, easy to implement, and applicable in very general settings.Besides, PF also has no restriction on types of system noises.
The working mechanism of classic particle filter is as follows.The state space is partitioned as many parts, with the particles being filled-in according to prior distribution of states.The higher the probability or weight is, the denser the particles are concentrated.All of particles evolve along the time according to state transitions, reflecting the evolvement of state estimation.The weights of particles would then be updated and normalized.Further, particles are resampled after a certain period as a countermeasure for sample impoverishment.The above description is a standard SIS (Sequential Importance Sampling) particle filter with resampling, consisting of four steps, including initialization, importance sampling, weight update, and particle resampling.The essence of PF is to empirically represent a posterior distribution or density using a weighted sum of   samples drawn from the posterior distribution where  ()   are assumed to be .. drawn from (  |   ).When   is large enough, p(  |   ) approximates the true posterior distribution (  |   ).The importance weights  ()   can be updated recursively: When the PF is applied in multiagent goal recognition under the framework of Dec-POMDM-T, the set of particles is defined as { ()   } =1:  , where →  ()  ⟩.   is the number of particles and the weight of th particle is  ()   .As we use the simplest sampling, the ( ()  |  ()  0:−1 ,  0: ) is set to be ( ()  |  () −1 ).And as the observation   only depends on   , the importance weight  ()   can be updated by The detailed procedure of multiagent goal recognition under the framework of the Dec-POMDM-T is given in Algorithm 1.
Four classic components of the SIS PF with resampling are all present in Algorithm 1, with particle initialization from line (2) to line (4), sequential importance sampling from line (6) to (25), weight updating and normalizing from line (26) to (29), and particle resampling in line (30).The joint goal sampling in line (10) follows  ()   ⋅ ( ()  |  () −1 ,  ()  ).The observation for agents follows ⃗  ()  ⋅ ( ⃗  ()  |  () −1 ) as in line (12).The joint goal termination samples are from  ()   ⋅( ()  |  ()  ) in line (13).Time duration for action  ()()  would be updated following  (17).Also, action changes would be sampled from ) in line (19).Compute the action time duration of ) as in line (20).Further, sample the action termination following ) in line (22).Each agent performs its action and changes the states accordingly.In the resampling process, the algorithm first calculates Neff according to The resampling process returns if Neff >   , where   is the predefined threshold which could be   /3 or   /2; otherwise generate a new particle set {x ()   } by resampling with replacement of   times from the previous set {x ()  −1 } with probabilities W() , and then reset the weights to 1/  .

The Modified Predator-Prey Problem.
In this paper, a modified predator-prey problem [9] is used.Compared to the classic one, the modified one has more than one prey for more than one predator to catch.This gives the test bed for evaluating our multiagent goal recognition algorithm based on Dec-POMDM-T.Our aim is to recognize the real target of predators based on noisy observations.Figure 6 shows the 5 m × 5 m map and the predator's observation model in modified predator-prey problem.There are two predators and two preys on the map, denoted by red triangle and blue diamond, respectively.Predators establish a joint goal by choosing one of the prey and work cooperatively to capture it.The predator's observation model has also been explained in Figure 6.As we know, agents using tactical sensors in RTS games usually have a noisy and partial observation.They know exactly what is happening around, but the information quality drops when the distance gets larger.This degeneration process is simply modeled by the red circle with its radius set to 2 m in Figure 6.Further, we use several vertical and horizontal lines to separate cardinal directions into N, NE, E, SE, S, SE, W, and NW, respectively.The directions inside the circle are denoted by "direction 1," while those outside are denoted by "direction 2." Thus the example in our 5 m × 5 m map is as follows.According to Predator A's observation, Prey B is close to it and locates in SE 1 while Prey A and Predator B each locates in NW 2 and S 2. Predator B, however, has a clear sight of Prey B in the near northeast NE 1, while Prey A and Predator A are all in a relatively far direction of NW 2 and N 2. All agents can move in four directions (north, east, south, and west) or stay at the current position.Rules are set to prevent agents from moving out of the map.The joint goal would be achieved when both of predators have less than 0.5meter distance with their target.Predators' target, or joint goal, could be changed halfway.The observation model for the recognizer is that it can have exact positions of preys while getting noisy observation of predators.Our purpose is to compute the posterior distribution of predators' joint goal using observation traces.Some important definitions in Dec-POMDM-T under this scenario are as follows.
(i) : the two predators; (ii) : the positions of predators and preys; (iii) : five actions for predators with moving in 4 directions and staying still; (iv) : Prey A or Prey B; (v) Ω: the directions of agents faraway and exact positions of agents nearby; (vi) : the real positions of prey and noisy positions of predators; (vii) ℎ: planning horizon.
Input: particle number   , agent team size   , resampling threshold   .

Experiment Settings.
In this section, we provide parameter settings in scenarios, policy learning, and goal inference algorithm.

Scenario.
Preys have no decision-making ability.They are senseless and select all five actions randomly.The initial positions of agents are randomly generated.The initial goal distribution is set to be ( 0 = Prey A) = 0.6 and ( 0 = Prey B) = 0.4.As the map is 5 m × 5 m, we set the moving speed to 0.5 m/step.
The goal termination function is simplified in the following way.If predators capture their target, then the goal is achieved; otherwise the predator team would change their joint goal with a probability of 0.05 for every time step.
At each time step, the recognizer has half a chance of getting each predator's true position, with the other half chance being of getting noisy positions: where  and Directions [1,1]} each represents the vibration strength of observation noise and its 8 possible directions.

Policy Learning.
Under the assumption of agents' rationality, the paper applies a model-free MARL algorithm, named cooperative colearning based on Sarsa, in learning agent's optimal policy.The core idea of the algorithm is to choose at each step a subgroup of agents and update their policies to optimize the task, given the fact that the rest of the agents have fixed plans; then, after a number of iterations, the joint policies can converge to Nash equilibrium.
The discount factor  is set to 0.8.And the predator selects an action   given the observation    with a probability where  = 0.1 is the Boltzmann temperature.We set  > 0 as a constant, which means that predators would always select approximately optimal actions.In our scenarios, the -value would converge after 750 iterations.In the learning process, if predators cannot achieve their goal in 5000 steps, the process would be reset.

Goal Inference.
In our multiagent joint goal inference algorithm based on SIS PF with resampling, we set particle number   according to experiment needs.We also make the resampling threshold   equal to one-third of the particle number   .

Experiment Results and Discussion
. The paper first retrieves the agents' optimal policies using MARL algorithm.
Based on that, we run the agent decision model repeatedly and collect a test dataset consisting of 100 labeled traces.After analyzing the dataset, we find that there are on average 28.05 steps in one trace, and the number of steps in one trace varies from 16 to 48, respectively, with a standard deviation of 9.24.Also we find that among 100 traces, there are approximately 60% traces where predators changed their joint goal for at least once halfway, 27% where goals are changed at least twice, and 15% where goals changed greater than or equal to three times.The statistics above almost cover all situations we need in validation of our method.
Based on the test dataset, we did our experiments on three aspects: (a) to discuss details of the multiagent goal recognition, present and analyze results of two specific traces, and testify to the ability of our method in recognizing dynamic changing goals; (b) to compare the performance of joint goal recognition under Dec-POMDM-T framework and that of Dec-POMDM [9] in terms of precision, recall, and measure; (c) to show the effectiveness of our multiagent goal inference method based on SIS PF with resampling.

Goal Recognition of Specific Traces.
To show the details of the recognition results, we select two specific traces from the dataset (Trace Number 1 and Number 13).These two traces are selected because Trace Number 1 is the first trace where the goal is changed before it is achieved, while Number 13 is the first trace where the goal is kept until it is finally achieved.The detailed information is shown in Table 1.
Given the optimal policies and other parameters of the Dec-POMDM-T including , , , , Ω, , and ℎ, we used the  In Trace Number 13, predators selected Prey B as their initial goal.The goal was kept until it was achieved at  = 22.From Figure 7(b) we can see that, our method reacted very fast to observation information, and the probability of Prey B as the joint goal rose directly from no more than 0.4 towards 0.9 at  = 3.This high confidence continued and stayed at almost 1 along the whole recognition process.Besides, the algorithm in Figure 7(b) shows its ability in reaching early convergence point for multiagent joint goal recognition.

Comparison of the Dec-POMDM-T and Dec-POMDM.
As stated above, the performance comparisons are made in terms of three classic metrics in goal recognition domain, which are precision, recall, and -measure [36].They are computed as where  is the number of possible goals.TP  , TI  , and TT  are the true positives, total of true labels, and total of inferred labels for class , respectively.Formulas (8) show that, precision is used to scale the reliability of the recognized results; recall is used to scale the efficiency of the algorithm applied in the test data set; and -measure is an integration of precision and recall.We can find that the value of all these metrics will be between 0 and 1, and a higher metric means a better performance.In order to solve the problem of traces having different lengths, the paper defines a positive integer  ( = 1, 2, . . ., 5 It is obvious in Figure 8 that the performance of Dec-POMDM-T was much better than Dec-POMDM when more observations were received.Specifically, all the three metrics of the Dec-POMDM-T had exceeded 0.75 when more than half of traces had been observed at  ≥ 3. The Dec-POMDM, however, did not perform that well in all three.This is mainly because Dec-POMDM has no definition of action durations.As predators will not select actions in every time step, the filtering process of Dec-POMDM would usually fail.

Effectiveness of Multiagent Goal Inference Based on SIS PF with Resampling.
In this section, we test the effectiveness of our multiagent goal inference based on SIS PF with resampling.In Figure 9, we first give the changing patterns of variances for above-mentioned two specific traces.The weighted variances at time  are computed by where    is the weight of particle    and ĝ is the estimated goal distribution in    .From Figure 9, it is obvious that all variances of two traces had large values at the beginning and they would all be affected by noisy observations or observations containing vague information.Then they dropped with more information coming in.The variance for Trace Number 13 in Figure 9(b) dropped continually along the recognition process with several small up and downs as of reasons above.Similar situations happened in Trace Number 1 in Figure 9(a).However, its variance rose up dramatically when agents changed their joint goal halfway.This happened at  = 23, as shown in Table 1, and thus pushed up the variance to more than 0.4.Finally, the curve dropped down fast to less than 0.05 within 3 time steps and now estimated goal was changed from Prey B to Prey A.
We also conduct experiments on variance using goal inference algorithm with different particle numbers.The difference between the red and blue lines is that the former exploits 4000 particles while the latter 8000.The results show that variances are not sensitive to the particle number of PF algorithm.It can get good performance with a few particles.
As a common problem in PF algorithms, particles may not survive till the end of goal recognition process as their number is not enough.In this scenario when   ≤ 1000,   the goal inference algorithm may suffer from serious failure.To view the specific effects of it, we ran the test dataset for 10 times with different numbers of particles.The average failure rates are shown in Figure 10(a) and also summarized in Table 2. Two more rates when the numbers of particles is equal to 4000 and 6000 are also given in Table 2. Obviously, average failure rate drops significantly as the particle number gets larger.
The time cost with different particle numbers is shown in Figure 10(b).The program was written in Matlab script and ran in computer with an Intel Core i7-4770 CPU (3.40 GHz).We can see that time cost would increase as we expand particle population.Consider the considerably long effects of agent intention; this approximate inference method would still be applicable under certain combination of parameter settings.Further, we also compare the precision, recall, and -measure under different number of particles as in Figure 11.
In Figure 11, the red, blue, cyan, green, and magenta dashed curves indicate the metrics of SIS PF with resampling each with 1000, 2000, 4000, 6000, and 16000 particles, respectively.The PF with the largest particle number, having all metrics reaching to almost 0.9 at last, performed best than the remaining ones.Filters with numbers 4000 and 6000 had similar trends along the process and came even closer at last, while filters with numbers 1000 and 2000 performed the worst as of being short of particles.

Conclusions
In this paper, we propose a novel model for solving multiagent goal recognition problems, the Dec-POMDM-T, and present its corresponding learning and inference algorithms, which solve a multiagent goal recognition problem.First, we use the Dec-POMDM-T to model the general multiagent goal recognition problem.The Dec-POMDM-T presents the agents' cooperative behaviors in a compact way, and thus the cooperation details are unnecessary in the modeling process.
It can also make use of existing algorithms for solving the Dec-POMDP problem.Then we use the SIS particle filter with resampling to infer goals under the framework of the Dec-POMDM-T.Last, we also design a modified predatorprey problem to test our method.In this modified problem, there are multiple possible joint goals and agents may change their goals before they are achieved.Experiment results show that (a) Dec-POMDM-T works effectively in multiagent goal recognition and adapts well to dynamic changing goals within agent group; (b) Dec-POMDM-T outperforms traditional Dec-MDP-based methods in terms of precision, recall, and measure.In the future, we plan to apply the Dec-POMDM-T in more complex scenarios.

Figure 1 :
Figure 1: An example of maneuvering on a grid map.

Figure 3 :
Figure 3: Subnetwork for action taking by different agents.

Figure 4 :
Figure 4: Subnetwork for action time duration.

Figure 5 :
Figure 5: The DBN structure of the Dec-POMDM-T.

Figure 6 :
Figure 6: The 5 m × 5 m map and predators' observation model in predator-prey problem.
Recognition results of Trace Number 13

Figure 7 :
Figure 7: Recognition results of two specific traces under the Dec-POMDM-T.
The variance of Trace Number 13

Figure 10 :
Figure 10: The average failure rate and time cost with different numbers of particles.

Figure 11 :
Figure 11: Three metrics of the recognition results with different particle numbers.

Table 1 :
The details of two traces.
/5⌉ is the observation sequence from time 1 to time ⌈ * length  /5⌉ of the th trace; the length  is the length of the th trace.The metrics under different  show the models' performance in different simulation phases.