A Logical Hierarchical Hidden Semi-Markov Model for Team Intention Recognition

and


Introduction
Intention Recognition (IR) is to identify the specific goals that an agent/agents is/are attempting to achieve [1].Since the goals are always hidden in mind, they can only be inferred by analyzing the agents' observed actions and/or the changes in the environment resulting from their actions.
The meaning of recognizing intention is significant in both real and virtual worlds.For example, in real-time strategy games, AI players can choose more efficient policies if their enemies' actions are known [2]; in the field of public security, we want to distinguish the persons who may behave abnormally and pay more attention to them through monitor devices [3].Since the IR problem is common, it has attracted many attentions in recent decades.
As a famous branch of probabilistic graphical models (PGMs), the hidden Markov model (HMM) is very popular for analyzing sequential data in many applications [4].In the IR domain, actions are discrete and observations are got sequentially.Thus, the HMM can also be used to solve IR problems, whose main idea is as follows: the hidden states and observation sequence correspond to actions and the change of world states, respectively, and the intention is represented by the transitions between actions.Then, the inference of intention is solved by a most likelihood estimation: to find the intention which can reproduce the given observation sequence with the largest probability.
Even though the HMM looks suitable to model intentioned behaviors and observations, there are still some problems when the HMM is applied in the IR domain: (a) The HMM has a strong Markov assumption.In the HMM, the duration of a state is implicitly a geometric distribution, whose parameter is the state selftransition probability.However, in many applications, the duration of hidden states does not follow the geometric distribution.(b) The HMM does not have a hierarchical structure.
However, when a mission is complex, we always need to decompose it into subtasks repeatedly until the mission only consists of primitive actions, and the hierarchical structure is necessary to present the task decomposition and allocation.(c) The HMM is actually propositional.There is no concept of the class or object in the HMM, which makes the HMM a poor representative.Additionally, the propositional model is not suitable to tackle relations, which are important in the IR.(d) The most likelihood inference assumes that the intention is static.However, the team intention may be interrupted and changed because of new situations or other reasons.
To solve these problems, researchers modified the HMM in different ways.For example, the Hidden Semi-Markov Model (HSMM) uses an arbitrary distribution such as Poisson and Gamma to model the state duration explicitly and sets the self-transition probability zero [5].In this way, the HSMM models the state duration more precisely, and it always outperforms the HMM when the type of state duration distribution is known.The Hierarchical Hidden Markov Model (HHMM) and Abstract Hidden Markov Model (AHMM) are two extensions of the HMM [6,7].They present the task hierarchies based on the theories of Hierarchical Finite State Machine and the Abstract Markov Decision Process, respectively.The Coxian Hidden Semi-Markov Model (CxHSMM) further combines ideas of the HSMM and the HHMM [8].Two important contributions of the CxHSMM are as follows: (a) proposing a novel Dynamic Bayesian Network (DBN) structure to model human daily activities; (b) introducing the discrete Coxian distribution in the behavior modeling domain.
Another type of extensions is related to relation data.These models fall into the theory of the Statistical Relational Learning (SRL), which integrates the relational or logical representations and probabilistic reasoning mechanisms with machine learning [9].For example, Kersting et al. proposed a logical hidden Markov model which combined the first-order logic with the HMM [10].Comparing with the HMM, the LHMM can infer complex relations and have fewer parameters.However, it does not relax Markov assumption, which leads to a performance decline when long-term dependences between hidden states exist.To solve this problem, Zha et al. proposed a Logical Hidden Semi-Markov Model (LHSMM), which modified the LHMM in the same way the HSMM modified the HMM [11].Natarajan et al. proposed a Logical Hierarchical Hidden Markov Model (LHHMM) and applied it upon recognizing intentions of an agent in a virtual game world [12].The LHHMM has a hierarchical architecture and divides the state space into two types: (a) the unobserved user state and (b) the completely observed world state.Besides, the abstract state transitions are conditioned by the world state, which reflects how the world state affects the state of user.
Extensions of the HMM above focus on recognizing intentions of one agent.However, teamwork is quite common in many scenarios.In this paper, we focus on the problem of team intention recognition.Obviously, recognizing the intention of a team is more complex than that of a single agent.Because (a) we need to recognize the goal of each agent as well as the team working mode, (b) task decomposition and allocation, which depend on the team working mode, need to be presented hierarchically, (c) the team intention may be interrupted with some unknown reasons, and (d) observations are noisy and partially missing.
The research on the LHSMM has shown that (a) the logic predicates and instantiation process can represent the working mode and the composition of the team well and (b) modeling duration of abstract state can get a higher precision and smoother recognition curve [13].However, the LHSMM does not provide a hierarchical architecture.The LHHMM is also promising, but Natarajan et al.only consider single agent scenarios, and the LHHMM may suffer some problems when it is used in team intention recognition: (a) tasks modeled by the LHHMM are executed from bottom to up, which means the top goal cannot be interrupted when the subtasks are not completed; (b) the duration of the working mode is not modeled; (c) observation cannot be noisy or partially missing; (d) the chain structure is not proper to represent the primitive actions [12].Actually, unlike higher level task, primitive actions usually depend on the world state and the goal, as that defined in a Markov decision process (MDP).
To solve problems of applying existing models in team intention recognition, we propose a framework named Logical Hierarchical Hidden Semi-Markov Model (LHHSMM).The LHHSMM borrows the ideas of the LHHMM and combines it with the LHSMM as well as the MDP.Our LHHSMM has advantages as below: (a) Comparing to PGMs such as the HMM and its extensions, the LHHSMM has the advantages of SRL methods.By introducing the first order logic, LHHSMM can infer complex relations and use logical inference to replace some probabilities computing.Additionally, the predicates and instantiation process are very suitable to represent team working mode and changes of team member.
(b) A novel structure named logical hierarchical Markov chain (LHMC) is proposed to present logical transitions and decomposition of the intention and its subtasks (we call the intention and its subtasks policies).With this structure and transition conditions, our model inherits the compactness of presenting policies hierarchically in the LHHMM, plus a mechanism which makes the executing subtasks terminated forcedly in case that intentions are changed.
(c) Considering that the team intention may be interrupted by some unknown reasons, we use a lognormal distribution to model the duration that the team working mode is not interrupted, or the time remaining before an interruption event happens.With this explicit duration modeling, the current team intention depends on not only the team intention and world state at previous step, which is another difference between the LHHMM and the LHHSMM and the reason that our model is called semi-Markov.
(d) Primitive actions are selected based on the current policies and previous world states.This MDP-like process makes the agents choose any primitive actions without limits of the executed action in previous time.
(e) Observation functions are used to present the probabilistic relations between the world state and observation.It makes our model able to tackle noisy and partially missing observations.
To infer the team intentions modeled by the LHHSMM approximately, we provide a Logical Particle Filtering (LPF) based on logical definitions and dependency of the LHHSMM.In the LPF, we use the simplest importance distribution and a forward sampling method to sample the particles, and logical transitions and instantiation functions in the LHHSMM are introduced in this process.
We design a combat scenario to validate the LHHSMM and LPF: two agents move around and attack targets on a grid map.Our methods are used to infer the team working mode and targets of agents online according to the observed agents' traces.Based on this scenario, we design a decision model for the agents and generate a dataset consisting of 100 traces.We use three traces (one changes intentions, others do not) in the dataset to evaluate the LHHSMM and the LPF.Then, three metrics including precision, recall, and measure of recognizing the team working mode and targets of the agent are computed by the LHHSMM and a modified LHHMM, respectively.Last, we compare the performances of LHHSMMs with three different duration distributions to evaluate the effects of explicit duration modeling.
The rest of the paper is organized as follows: Section 2 introduces some related works including extensions of the HMM applied in IR and other researches on team intention recognition using PGMs.Section 3 gives the formal definition of the LHHSMM, the dependency among variables, and discussions on how to use a LHMC to present policies.Section 4 introduces the standard PF briefly and gives the process of inferring intentions approximately by the LPF.Section 5 presents the background, settings, and results of our experiments.Subsequently, we have conclusions and discuss future works in Section 6.

Related Works
By making an intersection of psychology and artificial intelligence [14], a great number of IR methods have been proposed; these methods can be divided into four categories broadly: consistency matching, probabilistic methods, hybrid methods, and statistical relational learning methods [15].
The logical reasoning based on event hierarchy [16] and fast and complete symbolic methods [17] are two representative consistency matching approaches.They solved the IR problem by determining which intention was consistent with observed actions, that is, whether the observed actions matched at least a plan achieving the intention.Their main drawback is that they may fail when there are two or more hypotheses to give contradict explaining upon the same observations.Relatively, probabilistic methods can provide us a probability for every possible intention, and they can make full use of prior knowledge by Bayesian inference [18].Because of these advantages, some kinds of probabilistic method are applied in IR domain, such as probabilistic grammar [19], PGMs including the HMM [20], and the Conditional Random Field (CRF) [21].To improve the computational efficiency, some scholars proposed hybrid models, such as hybrid symbolic-probabilistic plan recognizer [22] and probabilistic hostile agent task tracker [23].Their primary idea is to compute the likelihood of possible intentions using Bayesian reasoning after scaling down the hypothesis space by logical algorithms.However, these hybrid methods inherit the drawbacks of the related consistency matching and probabilistic methods.One key problem of PGMs and their corresponding hybrid methods is that they are actually propositional, which means they handle only sequences of unstructured symbols.Thus, the SRL was proposed and developed rapidly in the recent years.The SRL presents and infers the relations, which are quite important to recognize intentions in real world.Therefore, many researchers made great efforts to apply the SRL methods in the IR problem, such as the LHMM [10] and the Markov Logic Network (MLN) [24,25].
In this section, we first review some research about applying PGMs in team intention recognition.Then, some extensions of the HMM related with our model will be analyzed.

Team Intention Recognition Using PGMs.
A probabilistic graphical model is used to encode a complex distribution over a high-dimensional space, by using a graph-based representation.The graph usually consists of the nodes and edges, which correspond to the variables in the domain and direct probabilistic interactions between variables, respectively [26].Two types of PGMs are widely applied in team intention recognition: the directed ones such as the DBN and HMM and the undirected ones such as the relational Markov network and the CRF.
Masato et al. [27] introduced the CRF to automatically recognize the composition of teams and team activities in relation to a plan.Mao et al. [28] viewed group plan recognition as inferring the decision making strategy of observed agents.They assumed that agents were rational and made the probabilistic reasoning based on the maximum expected utility principle.Their plan representation and inference were actually done on a Bayesian network.Pfeffer et al. [29] studied the problem of monitoring goals, team structure, and state of agents, in dynamic systems where teams and goals changed over time.By using DBN, they modeled coordination and communication of attackers explicitly in an asymmetric urban warfare environment.Saria and Mahadevan [30] presented a theoretical framework for online probabilistic IR in cooperative multiagent systems.Their model extended the abstract hidden Markov Model (AHMM) and consisted of a hierarchical dynamic Bayesian network that allowed reasoning about the interaction among multiple cooperating agents.Gaitanis [31] modified Saria's model by releasing the assumption that there was only one team grouping all the agents at only one level of coordination.Although these models solve multiagent IR problems successfully to some extent, they suffer the drawback of traditional PGMs: they are propositional and cannot make use of relations.
Some SRL methods such as the MLN and LHMM can also be regarded as special cases of PGMs.Sadilek and Kautz [32] modeled "capture the flag" domain using MLN and learned a theory that jointly denoised the data and inferred occurrences of high-level activities.Their research showed that the MLN was quite potential to solve multiagent IR problems.To compare the performance of the CRF, HMM, and MLN, Auslander et al. [33] used these methods to help commanders to detect maritime threats.Their evaluation corpus was from the 2010 Trident Warrior exercise, and they proved that the MLN was better to represent domain knowledge and learn weight settings from a few training instances.As far as we know, there are few attempts to apply extensions of the LHMM in team intention recognition, except for our previous research in [13].

Extensions of the HMM Applied in the IR Domain.
Even though the HMM has some advantages in behavior modeling, the strong Markov assumption limits its application in many areas.Thus, some research has been done to extend HMM in the IR domain.
A Hidden semi-Markov model (HSMM) is the same as the HMM, except for modeling the duration of hidden states explicitly.Because of the advantage of modeling state duration precisely, people use HSMMs to solve IR problems in the digital games.For example, Hladky and Bulitko applied the HSMM to predict the position of a player in the first person shooting game [34].The database consisted of 190 game logs collected in Counter Strike competitions.With these data, they trained the HSMM and evaluated predictor performance by both prediction accuracy error and human similarity error.Southey et al. also used the HSMM to recognize the destination and start point of an agent on the grid map of War-Craft 3 [35].Their main contribution is that the observed trajectory can be partially missing.
van Kasteren et al. compared the performances of activity recognition using the HMM, HSMM, CRF, and Semi-Markov CRF, respectively; their activity data was recorded by real sensors in smart home [36].The experiments showed that the modeling of duration explicitly always improved the recognition performances.
AHMM is a stochastic model for representing the execution of a hierarchy of contingent plans [7].The core conceptions of the AHMM are the abstract policy and the termination variable.The abstract policy is defined as the selection of lower-level abstract policy or primitive action, given the current states and higher-level policy (the toplevel policy only depends on the state).The termination variable indicates whether the corresponding abstract policy will terminate in each time slice.When a policy does not terminate, its higher level polices cannot terminate either.When the AHMM is used to recognize policies, there is also an observation layer, which depends on the states.Bui et al. applied the AHMM to track an object and predict the object future trajectory in a wide-area environment [37].
Another famous extension of the HMM is the Coxian hidden semi-Markov model (CxHSMM) [8].This model modifies the HMM in two aspects: on one hand, it is a special DBN representation of a two-layer HMM, and it also has termination variables; on the other hand, it used Coxian distribution to model the duration of primitive actions explicitly.This model was applied in recognizing human activities of daily living (ADLs), and Duong et al. showed that Coxian duration model had advantages over existing duration parameterization using multinomial or exponential family distributions.

Logical Hierarchical Hidden Semi-Markov Model
The LHHSMM is a fusion of logical hierarchical hidden Markov model, logical hidden semi-Markov model, and Markov decision process.It is used to model the team to be recognized as well as the world state and observation.In this section, we will give a formal definition of the LHHSMM and describe the dependency by a DBN representation.Then, we will explain how to use a logical hierarchical Markov chain to present the logical transition and decomposition of policies.
where N  is a belief network with a time label, Δ  is the set of transitions from N  to N +1 , and   defines the initial distribution in N  .N  represents the set of the policies to be recognized from level-1 to level- ( is the top level), the primitive action, the world state, and observation at time .The level- policy is called intention and the primitive action is in level-0.Each N  is associated with a logical alphabet Σ = (Σ 0 , Σ 1 , . . ., Σ  ) in first-order logic, where Σ  belonging to level- ( = 0, 1, . . .,  − 1, ), is a set of relation symbols  with arity  ≥ 0 and a set of function symbols  with arity  ≥ 0. When  = 0,  becomes a proposition, and  is a constant when  = 0.An atom ( 1 ,  1 , . . .,   ) is a relation symbol  followed by a bracketed -tuple of terms   .A term is a variable  or a function symbol ( 1 ,  1 , . . .,   ) immediately followed by a bracketed -tuple of term   .A term is called ground when it contains no variables.The Herbrand base of Σ  , denoted as ℎ Σ  , is the set of all ground atoms constructed with the predicate and function symbols in Σ  .The set  Σ  () of an atom  consists of all ground atoms that belongs to ℎ Σ  [10].
Definition 1 (policy and primitive action).A level- abstract policy   is an atom with items in Σ  ( > 0), and the instantiated   is called ground policy   , which belongs to  Σ  (  ).In our paper, the top level ground policy   is also regarded as an intention.An abstract primitive action  0 is an atom in Σ 0 , and  0 is the corresponding ground primitive action.Each Σ  is determined by a function Σ  ( +1 ), except for Σ  which is predefined.Thus, the abstract policy   ( > 0) determines the logical alphabet in its lower level.
Definition 3 (the world state and observation).The world state  and observation  are both sets of variables. depicts the status of the agents and the environment; the variable symbols in  are not predefined but are generated by instantiation functions and logical transitions, which will be introduced later.The observation  is a function of ; it provides us some inaccurate and partial information about .

Definition 5 (instantiation function
).An instantiation function   in level- is defined by policy  +1 in the higher level, except that   is predefined.  is a mapping  × Σ  × ℎ Σ  → [0, 1], which can also be presented as a conditional probability   (  |   , ), where   is an atom relation symbol in Σ  ,  is the world state in previous time, and   ∈  Σ  (  ) is an instantiated object of   .It should be noted that when a variable in   has been instantiated in the higher level policy, the variable should be substituted by the former instance directly and cannot be selected by   again.Example 6.For the Σ 1 in Example 2, we can set  1 (Attack ( 1 )) = 0.3,  1 (Attack ( 2 )) = 0.7,  1 (Avoid ( 1 )) = 0.8, and  1 (Avoid ( 1 )) = 0.2.If  1 is instantiated as  2 in level-1.Then, we can set  0 (Move (right)) = 0.8,  0 (Move (left)) = 0.2 (there are only two directions), but  0 (Fire ( 2 )) can only be 1.
Definition 7 (policy termination [7]).Policy termination variable   ∈ {0, 1} indicates whether the level- policy   will be terminated (  = 1) or not (  = 0) at the current time.When   = 0,   will continue at the next time; in the other case, it will be changed according to Δ  or   .Definition 8 (intention duration [8]).Intention duration variable  is a counter to model the time remained before an interruption event happens.Since the reason of the interruption is always unknown, we initialize the value of  using a lognormal distribution when an intention starts.The value of  will reduce 1 in each step, and the intention will be interrupted when  < 1.Then,  will be initialized again.
Δ  is the set of horizontal transitions, which determines how N  transits into N +1 .There are two types of transitions in Δ  : ones are logical transitions which can be further classified into conditional logical probabilistic transitions, specific logical transitions, and unified logical transitions and the other ones are standard probabilistic transitions (we just call them probabilistic transitions) as those in PGMs.The logical transitions only exist between abstract policies in the same level, and transitions in each level are generated by abstract policies in the higher level, except that level- logical transitions are predefined.The probabilistic transitions will be analyzed when we introduce the DBN representation of our model.Definition 9 (conditional probabilistic logical transition [12]).A conditional probabilistic logical transition has a form  :  → , which means that the abstract policy will transit to  from  with probability , where  is the current abstract policy and  is the next abstract policy, they are both atoms in the alphabet of the current level,  is a conditional probability which can be presented as Δ  ( | , ()), where () represents conditions associated with , which is a set of logical sentences whose variables are included in the world state.Thus, we get the true value of () if we know the current world state.() can also be an empty set, and the value of  only depends on the higher level and policy and  in this case.We need to emphasize that if the variables of  have been instantiated in , they will be substituted by the constants directly.
Definition 10 (specific logical transition).A specific logical transition has a form  →   , which means that we replace the current policy  with   , where   is the more specific case of .As a kind of default reasoning, this kind of transition is often used to model exceptions, and it will be followed when the instance in  is consistent with that in   .Definition 11 (unified logical transition).A unified logical transition has a form  →   , which means that we replace the current policy  with   , where relation symbols of  and   are the same, but the orders of their variable symbols are different.When we use unified logical transition, the grounds   and  will be unified.This kind of transition is forced to follow if the current policy is .
These three kinds of logical transitions can be represented by solid edge, dashed edge, and dotted edge, respectively, in a FSM, as in the LHMM [10].We need to note that since dotted and dashed transitions do not take real time, they can only be presented in a FSM but cannot be reflected in the DBN representation.Figure 1 shows three examples to explain them.A 1 , A 2 , and A 3 are relation symbols which represent abstract policies,  1 ,  2 , and  3 are variables, and  2 is a constant.Suppose that we have computed the values of all conditional probabilistic logical transition according to the world state, and they are denoted on the edges.In Figure 1(a), the abstract policy can transit to A 2 ( 1 ,  3 ) or A 3 ( 2 ,  3 ) from A 1 ( 1 ,  2 ) with probabilities 0.4 and 0.6, respectively.If the abstract policy reaches A 1 , the instantiation results of  1 in A 1 must be the same as  1 in A 2 .In Figure 1(b), if the instantiated result of  2 in A 1 is not  2 or we have no idea about it, the abstract policy transits to A 2 ( 1 ,  3 ) or A 3 ( 2 ,  3 ) with probabilities 0.4 and 0.6, respectively.However, if we know the instance of  2 is  2 , the probabilities above will change to 0.6 and 0.4.In Figure 1(c), if the current abstract policy is A 1 ( 2 ,  3 ), we will follow the dotted edge automatically and replace the abstract policy with A 1 ( 1 ,  2 ), and the instances of  1 and  2 will be the same as instances of  2 and  3 in A 1 ( 2 ,  3 ).In this way, A 1 ( 1 ,  2 ) changes its instances consuming one time.  defines the initial distributions of abstract policies in N  ,   (  |  +1 ) returns the initial distribution of abstract policy in level- ( = 1, 2, . . .,  − 1), which is generated by the current abstract policy  +1 , and   (  ) is predefined.  (  |  +1 ) is only used to sample   when the level- policy in the previous time has been terminated.For the Σ 1 in Example 2,   can be represented as   : {Attack ( 1 ): 0.7; Avoid ( 1 ): 0.3}.

Dependency in the LHHSMM.
In this section, we will use a DBN presentation to describe the dependency among variables in the LHHSMM.However, the standard DBN is not available to present the logical transitions and instantiation process in our model, since it is actually propositional.Thus, we can only show the full DBN after substituting all variables.To explain logical dependency under standard probabilistic transitions, we will analyze the factors which each policy, primitive action, termination, duration, and state depend on and discuss the details about the logical transitions and instantiation process after that.Figure 2 shows the subnetwork for a level- policy.
When 1 ≤  < ,  should also satisfy the logical transitions between    and   +1 .Figure 2(c) means that   +1 inherits    totally when    is not terminated at time .When  = ,   +1 only depends on    ,   , and    , and the selection and instantiation of the intention are similar as other level policies, except that there is no influence from higher level policies.Figure 3 shows subnetwork for duration of the intention.
Duration  +1 depends on a relation symbol   +1 and two variables    and   , as is shown in Figures 3(a), 3(b), and 3(c), explain two cases, respectively: when    is terminated at time ,  +1 will be initialized and have a new value according to   +1 ; when    is not terminated at time , we make  +1 =   − 1. Figure 4 shows subnetwork for level- ( < ) policy termination.
Figure 4(a) shows that    depends on a ground relation symbol    and two variables    and   .In Figure 4(b), when  +1  = 1,    will be forced to 1, which means a policy will be terminated forcedly when its higher level policy ends.In   Figure 5(a) shows that    depends on a ground relation symbol    and two variables   and   .In Figure 5(b), when   < 1,    = 1 which means the intention has been interrupted because of unknown reasons.Figure 5(c) shows the case that when   ≥ 1, the value of    is determined by (   |   ,    ), just like policies in other levels.Relationships between world state, observation, and primitive action are similar as Markov decision process.First, given the level-1 policy  1 +1 and the world state   , an abstract primitive action  0 +1 in Σ 0 will be selected by a probability ( 0 +1 |  1 +1 ,   ).Then,  0 +1 will be instantiated into  0 +1 by  0 +1 , and  0 +1 determine how the current world state   transits to  +1 , and the observation   is a function of   .The primitive action does not have a termination variable; it needs to be selected according to the process above in each time slice.The variables and dependency are presented in a full DBN representation, which is shown in Figure 6.
The DBN representation depicts the dependency in N  and transitions between N  and N +1 .We need to note that the vertical dashed lines in Figure 6 are not logical transitions but are brief representations for the nodes and edges from level-3 to level -1.As is shown in the DBN representation, our model actually consists of three parts: sensors which are modeled by the world state and observation levels; a Markov decision process depicted in the structure below level-1, if we regard the level-1 policy as a special case of state; a hierarchical chain structure above level-0, which depicts the team policies.However, the DBN is unavailable to present the logical transitions and instantiation process in the third part.Thus, we will use a hierarchical logical Markov chain to describe the decomposition and transition process of team policies in our model.

Team Policies Presented by Hierarchical Logical Markov
Chain.Unlike the DBN structure, a logical hierarchical Markov chain (LHMC) ignores the variables of duration, conditions of logical transitions, primitive actions, world states, and observations but only depicts the decomposition of policies and logical transitions among them.A LHMC is a tree structure whose each node is a logical Markov chain (LMC); the LMC can be regarded as a LHMM without any observation, just like the state transition process in a standard HMM is a Markov chain.In this paper, an abstract state in the LMC represents an abstract policy and its child is a LMC in its lower lever except for the policies in leaf nodes.Figure 7 shows an example of team policies presented by a hierarchical logical Markov chain.
The team policies in Figure 7 consist of two layers: level-1 and level- ( = 2).The LMC in level-2 presents the possible intentions transitions between them.In team intention recognition problems, the relation symbols in highest level usually represent different team working modes.For example, Intention B indicates a team working mode, and  3 and  2 are specific constants.simulation process.If the intention reaches the End node, the simulation will be ended.The outlines from the Start node are associated with an initial distribution of abstract intention.When agents choose a level- policy ( > 1), they need to execute policies in its associated LMC, as shown in the eclipse (Figure 6 ignores mappings of other policies for simplicity).For example, to finish Intention B( 3 ,  2 ), the execution in level-1 will begin according to the initial distribution defined by Intention B( 3 ,  2 ), and the first executed level-1 abstract policy may be Subtask A( 1 ) or Subtask B( 2 ); when the level-1 policy is terminated but Intention B( 3 ,  2 ) is not, level-1 policy will be transited in the LMC.But when the world state satisfies the terminating conditions of Intention B( 3 ,  2 ) or it is interrupted, we will return to abstract policy in level-2 and transit to another abstract intention in level- LMC immediately.

Approximate Inference
In this section, we will discuss how to infer team intentions presented by the LHHSMM.Online intention recognition is essentially treated as a filtering problem.The policies, primitive actions, duration, and termination can be regarded as states of a dynamic system.They cannot be observed directly but can produce some observations sequentially.Our goal is to infer the real state at each time according to these observation series.Since there are noise and data missing when we observe the state, we will use a particle filter (PF) to solve the approximate inference problem.However, the standard PF does not define any logical transition and instantiation process.Thus, we will propose a new logical particle filter by introducing logical definitions and dependency in LHHSMM.

Standard PF with Simplest
Sampling.Suppose that we have a dynamic system whose real state at time  is x  , and y  is an observation of x  .(x 0 ) is the prior initial probability.{x   ,    }   =1 is a particle set which describe the posterior distribution (x  | ỹ ), where x = {x 0 , x 1 , x 2 , . . ., x  } and ỹ = {y 1 , y 2 , . . ., y  }.   is the number of particles,    is the normalized weight of x  , and ∑     = 1.Then, the posterior distribution of the state can be computed by where  is the Dirac function and    is got by in the PF, the  is known as the important distribution which is used to sample x  , and it can be factorized by And the posterior distribution can be represented by After substituting formula (3) and formula (4) into formula (2), we can update the importance of weights by Subt askB (X 1 ) Subt askC (X 2 ) Subt askA (X 1 ) When we choose optimal importance distribution ( ; however, it will be very hard to compute (x  | x −1 , y  ) in our models.Thus, we use a simplest importance distribution (x  | x −1 , y  ) = (x  | x −1 ), and    ∝   −1 (y  | x  ) in this case.After normalizing    , we need to resample the particles to solve the problem of particle degeneration.This process will abandon the particles with small weight and copy particles with large weight.The weights of the new particles after resampling will be set equal and we can compute the poster probability of the system state by using formula (1).Other details about this process can be found in [38].

Logical Particle
Filtering in the LHHSMM.The standard PF cannot be applied to recognize intentions in our LHHSMM, since it does not allow logical transitions and instantiation process.In this section, we presented a logical particle filtering based on logical definitions and dependency of our model Steps of LPF are shown as follows.
Step 2 (forward sampling).Set  =  + 1, sample elements in each particle x   ( = 1, 2, . . .,   ).Based on dependency in the DBN structure, the forward sampling process can be further decomposed into five steps: sampling intentions and duration, policies, primitive actions, the world state, and termination variables.The pseudocode is shown in Pseudocode 1.
There are two points to be noted: one is that when an abstract policy is selected and executed for the first time, it should update the alphabets, instantiation functions, and logical transitions in its corresponding lower level LMC.However, when a policy has not been terminated, elements in LMC do not need to be changed, and those in the top level LMC will keep; the other point is that when sampling an abstract policy from logical transitions, we must follow the specific logical transition and unified logical transition if conditions are satisfied and then get the new abstract policy by conditional probabilistic logical transition immediately.
Step 3 (updating weights and resampling).Update the weight of each particle by    =  Step 4 (computing poster distribution of intentions).In this paper, we only care about the abstract intention and the constants in the instantiated intention.Since they are discrete, we can compute them by ( With the four steps above, we can compute the probabilities of the team working mode and goals of agents at each time.

Background.
To evaluate the performance of applying the LHHSMM and LPF in team intention recognition, we design a battle scenario.In this scenario, two agents execute an attacking mission individually or cooperatively in a known environment.Their team intention consists of two parts: the team working mode and the specific target of each agent.Our recognition is to compute the probabilities of the team intention sequentially according to the continuous observation of agent traces.
There are some characters of this scenario: first, the agents can act individually or constitute a team; second, the attacking mission can be decomposed into subtasks and primitive actions; third, the team intentions can be interrupted because of new orders or other unknown reasons; last, the observed traces have noise and we may not have any position record at all at some time; Furthermore, the observed data is got sequentially, and we need to update our recognition results when new evidence arrives.The initial situation of the battlefield is shown in Figure 8.
The battlefield map consists of 22 × 22 grids; the blue and red diamond points are the initial positions of agent A and agent B, respectively; the black points indicate buildings which the agents cannot get through; the two green grids are assembling positions; the three yellow grids are targets which may be attacked by agents; the white girds make up passable ways for motion.In each time step, the agent can move to an adjacent blank grid or stay in its current position.Before inferring intentions, we give a decomposed LHMC representation of the team policies in this scenario, as is shown in Figure 9.
Figure 9(a) depicts the logical Markov chain in the top level ( = 2); there are two relation symbols: (a) Attack C ( 1 ) means the two agents will attack cooperatively, and their common target is  1 ; (b) Attack I ( 1 ,  2 ) means they will execute their missions individually; that is, agent A attacks  1 and agent B attacks  2 .Variables  1 and  2 represent the targets, and the possible value could be  1 ,  2 , or  3 .We also depict the initial distribution of level- LMC in Figure 9(a).Figures 9(b) and 9(c) depict the LMCs under Attack C ( 1 ) and Attack I ( 1 ,  2 ), respectively.In level-1, there are two relation symbols; they are (a) Assemble (As), which means agent A and agent B are trying to get together at the assembling position As and (b) Destroy ( 1 ,  2 ) which means agent A is going to destroy  1 and agent B is going to destroy  2 .As is a variable, and its possible value could be as 1 or as 2 . 1 and  2 have been instantiated in the top level.According to transitions in Figures 9(b) and 9(c), the agents will first assemble at a point and then go to destroy the target together, when they attack a target cooperatively.But when Attack I ( 1 ,  2 ) is executed, agents have different targets and do not affect each other.
The team has only one abstract primitive action Move (Direction A, Direction B), where Direction A and Direction B indicate the moving directions of agents A and B, respectively, and there are 5 possible directions: north, south, east, west, and null.If the direction is instantiated as null, the agent will stay at the current position.Selection and instantiation of abstract primitive action depend on the level-1 policy and the previous state.
In this scenario, the selection of policies and actions is only related to the positions of agents.Thus, the world state is defined as the set of pos(agent A) and pos(agent B), where pos() returns the current gird of .The observations are functions of them which will be introduced when we explain the observation model.

Transition Probabilities in LMCs.
There is only one possible logical transition in level-1 and the transition probability is always 1, and the conditional transition probabilities of level- LMC are shown in B, C, D, E, and F represent policies shown in Figure 9(a), IsOccupied( 1 ) means that the target  1 is occupied, and this proposition will be true when any agent reaches the grid of  1 . CB is the probability of transiting from abstract policy C to abstract policy B.  CD is the probability of terminating the simulation after executing abstract policy C, and other transition symbols have the similar meaning.The level- LMC shows that the team working mode may alter between cooperation and independence, and the simulation can only end when one of the targets is destroyed. is Destroy ( 1 ,  2 ) or Destroy ( 1 ,  1 ), agent A will move in the direction which can make it on the shortest way to its instantiated target; if there are two directions satisfying this condition, agent A will choose one with a probability 0.5, and so will agent B.

Policy
(2) When  1   is Assemble (As) and  −1 shows that both agent A and agent B have not reached the assembling position, agent A will move along with the direction which ensures it moving on the shortest way to As; if there are two directions satisfying this condition, agent A will choose one with a probability 0.5, and so will agent B.
(3) When  1   is Assemble (As) and  −1 shows that one agent has reached the assembling position, but the other one has not, the agent who is still on its way will move on the shortest way to As; if there are two directions satisfying this condition, this agent will choose one with a probability 0.5, and the other agent will choose the direction as null.

The Duration Model. After the abstract intention
Attack C ( 1 ) or Attack I ( 1 ,  2 ) keeps for time , an interruption event will happen with a probability () = ∫  −1 (),  = 1, 2, . . ., ,  is the necessary time to realize the intention which depends on the state when the intention begins, () is the probability density function of the lognormal distribution lognorm(, , ), and We set  = 25,  = 100, and  = 0.

Observation Model.
Observation at one time is a set {obs(pos(agent A)), obs(pos(agent B))}, where obs() is a function which returns the true grid  with a probability 0.6, returns null with a probability 0.2, and returns a false gird   ̸ =  with a probability 0.2 ⋅    ;    is computed by where Go is the set of grids containing buildings and Gs is the set of all grids except  and grids in Go.   and   represent the horizontal and vertical coordinates of grid , respectively.When obs() returns null, we will have no information about the observed agent, and the observations of two agents are independent.

Other Settings.
To simplify the instantiation functions of intentions, the specific targets are selected according to their tactical values.We set the normalized values of  1 ,  2 , and  3 as 0.3, 0.4, and 0.3, respectively.The agents will instantiate the target variable in abstract intention one by one.The probability of choosing a target is proportionate to its tactical value.For example, when we instantiate a variable  1 whose possible constants are  2 and  3 (we have known the value of  2 is  1 ), the probability of instantiating  1 as  2 will be 0.4/(0.4+ 0.3) = 0.5714.For instantiation function in level-1, there is only one variable As; the probability of its value depends on the value of  1 , as is shown in Table 1.In the approximate inference, we set the number of particles   = 2000.

Results and Discussion
. We run the scenario repeatedly and produce a test dataset consisting of 100 traces.With this data set, we compute the recognition results of specific traces to validate the LHHSMM and the LPF and compare the performances when intention duration distributions are different.

The Recognition Results of Specific
Traces.Three traces in test data set are selected to compute the probabilities of team intentions by LHHSMM.The details of these three traces are shown in Table 2.
As shown in Table 2, agents in trace number 5 execute the attacking mission individually: the target of agent A is  3 and the target of agent B is  2 .For trace number 5 there is no interruption and the mission is completed at  = 21.In trace 17, the agents choose the cooperation working mode and do not change their intention during mission either.They occupy the target  1 successfully at  = 36.Trace number 57 is a bit more complicated.The agents decide to attack  3 and  2 individually first, but they change the intention and attack  2 together at  = 14.They successfully assemble and go to  2 together.However, before they reach  2 , their intention is interrupted again and agent A attacks  3 alone.This intention is terminated at  = 44 because agent B occupied  2 .We use LHHSMM to recognize both team working modes and intentions in these three cases, and the results are shown in Figures 10,11,and 12. Figure 10(a) shows the probabilities of working modes in trace number 5.Even though the prior probability of the real working mode, that is, Attack I ( 1 ,  2 ), is lower in the initial phase, it increases very fast, and the value exceeds 0.8 at  = 12.The reason of the shake at  = 8 is that the agent B is turning left at a cross, but there is noise in the observation of agent B at this time.show the probabilities of targets of agents in trace number 5.
In Figure 10(b), our models recognize the target of agent A very well, and the probability of target  3 increases very fast.In Figure 10(c), the probability of  1 is larger than that of the real target  2 from  = 8 to  = 13, because agent B chooses a path which is indispensable to reach  1 in this period.When agent B leaves this path at  = 13, the probability of the real target is higher again.Figure 11(a) shows the probabilities of working modes in trace number 17. Except for the shaking near  = 30, the probability of the real working mode is always high.The reason of the failure is that some observations are missing in that period.At the same time, the observed noisy data support that the working mode has been interrupted.Fortunately, the probability of Attack C ( 1 ) recovers fast when new evidences arrive.Figures 11(b) and 11(c) show the probabilities of targets of agents in trace number 17. Since agent A and agent B have the same target, the curves in Figures 11(b) and 13(c) are very similar.The reason of the confusion before  = 25 is that the agents are assembling and their traces cannot provide new information about their targets; we can find that, even the prior probability of  2 is quite higher, the red curve increase fast.
Figure 12(a) shows the probabilities of working modes in trace number 57.Generally, our models can recognize the working modes well even if it is interrupted at  = 14 and  = 41.The reason of two shakes near  = 10 and  = observation is not accurate.Figures 12(b) and 12(c) show the probabilities of targets of agents in trace number 57.We can find that the results after  = 14 can reflect the real results well except some shake; even the target of agent A changes twice.However, the real targets are not the highest in most case before  = 14.In Figure 12(b), the probability of  2 is generally higher before  = 8, because agent A and agent B are on the indispensable paths to  2 and  1 , respectively.After they leave the indispensable paths, the probability of  3 which is the real target of agent A increases a lot, which makes the target of agent B much like  2 , since their working mode is attacking individually.We can also find that although the probabilities of the real targets are not the highest during this period, they are not the lowest at least.

Comparison of the LHHSMM and a Modified LHHMM.
To show the advantages of LHHSMM compared with the LHHMM, we add our observation model to the standard LHHMM (we still call it the LHHMM later) and also use a particle filtering to infer intentions.However, since the LHHMM does not model intention interruption, the weights of all particles may be 0 after an interruption happens.In this situation, we reset the particle weights to 1/  .Then, inference can be continued to update the probabilities of intentions until the end of the simulation.
The performances of our model and the LHHMM are compared statistically: their performances are evaluated by three metrics: precision, recall, and -measure.
where  is the number of possible classes, TP  , TI  , and TT  are the true positives, total of true labels, and total of inferred labels for class , respectively.Formulas (10) show that precision is used to scale the reliability of the recognized results; recall is used to scale the efficiency of the algorithm applied in the test data set; and -measure is an integration of precision and recall.We can find that the value of all these The red solid curves and the blue dash dotted curves are computed by the LHHSMM and the LHHMM, respectively.We can find that the LHHSMM outperforms the LHHMM in all the three metrics, especially in the second half of the simulation.In the starting phase, the intention just keeps for a short time and the probability of changing intentions is not large.Thus, the LHHSMM and the LHHMM have similar performances.However, with more and more interruptions of intention occurring, the LHHSMM shows its advantage: it will generally improve the recognition performance when there are more observations.

The Effects of Duration Modeling.
Our LHHSMM is semi-Markov because it uses a specific distribution to model the duration that the team intention is not interrupted explicitly.To evaluate the effects of duration modeling, we compare recognition metrics computed by LHHSMMs with three different duration distributions: Distribution A is the real distribution of the working mode introduced in Section 5.2; Distribution B is a geometric distribution whose expectation is equal to that of () in Distribution A; Distribution C is the same as Distribution A except that  = 45.The test dataset is a subset of the 100 traces, which consists of all of 51 traces where the working mode is changed at least once.Besides, we only evaluate the recognition results of the second half of these traces, because, in the beginning phases of the traces, the lack of evidences may affect the recognizing performances.The recognition results are still evaluated by the metrics precision and recall.Because we do not need to show the metrics at different phases, the recognizing object set is not with the parameter , and we recognize the working modes (WM), targets of agent A (TA), and targets of agent B (TB) at all steps of the second half of the 51 traces.Recognition metrics with different duration distributions are shown in Figure 14.
The blue, green, and red bars indicate the performances of our LHHSMMs with Distribution A, Distribution B, and Distribution C, respectively.Obviously, the comparison results of six terms are the same: (a) Distribution A performs the best, which shows that the type of duration distribution is important to recognize the working mode and the target of each agent; (b) Distribution B performs better than Distribution C, which shows that the expectation of the duration distribution has a large impact on the recognizing results.These comparisons prove that a precise duration modeling of the working mode is necessary and our semi-Markov framework is effective in team intention recognition.

Conclusions and Future Works
In this paper, we proposed a LHHSMM to recognize team intentions.As a fusion of the LHHMM, LHSMM, and MDP, the LHHSMM possesses advantages to solve the team intention recognition problems in a complex environment: first, it uses a logical predicate to represent the team working mode, which can be recognized together with each goal of agents; second, it has a hierarchical structure and can make use of domain knowledge to present complex tasks; third, due to the modeling of intention duration, LHHSMM can update the probabilities of results correctly even if the intention is interrupted and changed; last, LHHSMM can deal with noisy and partially missing observations.
To solve inference problems of the LHHSMM, a LPF is proposed based on logical definitions and the dependency of the LHHSMM.We also design a combat scenario to evaluate the LHHSMM and LPF; the results show the following: first, no matter intentions are changed or not, our methods can effectively recognize the team working mode and the targets of each agent; second, the LHHSMM outperforms the LHHMM in precision, recall, and -measure in case that intentions are interrupted within a high probability; last, an explicit duration modeling of the team working mode is effective in team intention recognition.
In the future, we would like to continue our research on two aspects: (a) to learn parameters in the LHHSMM and (b) to discuss how to get the optimal importance distribution in the LPF.Moreover, applying our model to recognize intentions in a real scenario is also absorbing.

Example 4 .
= ⟨Variables: Position A; Constants:  1 ,  2 ,  3 ,  4 ⟩, which means the world state consists of a variable, the position of agent A, and there are four possible positions totally. = ⟨Functions: Obs (Position A); Constants:  1 ,  2 ,  3 ,  4 , null⟩ is the observation of , the observed value of Position A may reflect the real position of agent A or not, and the relations of  and  are presented by conditional probability Pr( | ).

Figure 4 (= 1 .
Figure 4(c),  +1  = 0, and the value of    depends on a conditional distribution (   |   ,    ).Particularly, we set (   = 0 |   ,    ) = 1, for all the policies which are not any head of logical transition in Δ  .In other words, this kind of policies can only be terminated when  +1  = 1. Figure 5 shows subnetwork for intention termination.Figure 5(a) shows that    depends on a ground relation symbol    and two variables   and   .In Figure 5(b), when   < 1,

Figure 7 :
Figure 7: An example of team policies represented by a two-layer LHMC.

Figure 8 :
Figure 8: The initial situation of the battlefield.

Figure 9 :
Figure 9: A decomposed LHMC representation of the team policies.

5 Figure 10 :
Figure 10: The recognition results of trace number 5 computed by the LHHSMM.

Figures and 10 17 Figure 11 :
Figure 11: The recognition results of trace number 17 computed by the LHHSMM.

Figure 12 :
Figure 12: The recognition results of trace number 57 computed by the LHHSMM.

Figure 14 :
Figure 14: Recognition metrics with different duration distributions.
The Start node and End node in the level- LMC are used to control the t Figure 6: The full DBN representation in two time slices.

Table 1 :
The instantiation probabilities of As given value of  1 .

Table 2 :
The details of three traces.