^{1}

In high stakes situations decision-makers are often risk-averse and decision-making processes often take place in group settings. This paper studies multiagent decision-theoretic planning under Markov decision processes (MDPs) framework with considering the change of agent’s risk attitude as his wealth level varies. Based on one-switch utility function that describes agent’s risk attitude change with his wealth level, we give the additive and multiplicative aggregation models of group utility and adopt maximizing expected group utility as planning objective. When the wealth level approaches infinity, the characteristics of optimal policy are analyzed for the additive and multiplicative aggregation model, respectively. Then a backward-induction method is proposed to divide the wealth level interval from negative infinity to initial wealth level into subintervals and determine the optimal policy in states and subintervals. The proposed method is illustrated by numerical examples and the influences of agent’s risk aversion parameters and weights on group decision-making are also analyzed.

Decision-theoretic planning is to compute optimal policy that is formed by courses of action to maximize expected reward with considering actions that have uncertain outcomes [

For decision-theoretic planning problems, Markov decision processes (MDPs) framework is adopted broadly as an underlying model. Howard and Matheson in their seminal paper introduce risk-sensitive MDPs based on maximizing the expected exponential utility [

In reality decision-making processes often take place in group settings due to a single decision-maker’s limited decision-making ability. For the group decision-making problem, group utility is usually got by aggregating personal utilities and then group decisions are made based on the group utility. The aggregation methods include additive value model and multiplicative value rule. Other methods such as multiobjective linear programming [

This paper focuses on decision-theoretic planning problem in which sequential decisions are made by a group of risk-sensitive members. Considering agent’s risk-sensitive attitude and wealth level, this paper studies the risk-sensitive multiagent decision-theoretic planning problem based on one-switch utility function and MDP framework. Two group utility functions based, respectively, on additive value model and multiplicative value model of one-switch utility functions are given. Backward-induction algorithms for these two kinds of group utility functions to compute optimal policy of risk-sensitive group decision-making under MDP framework are proposed.

The rest of this paper is organized as follows. One-switch utility function and risk-sensitive MDP model augmented with wealth level are introduced in Section

One-switch utility function is a kind of utility function to describe the change of agent’s risk attitude as his wealth level varies. In detail, there exists a wealth level

In the paper goal directed Markov decision problem (GDMDP) is adopted as underlying model of decision-theoretic planning problem [

Formally, a GDMDP consists of a finite set of states

The agent’s action set is

The agent’s execution of action

We also use

For the MDP model augmented with wealth level, the optimal policy maps every combination of a state

For agent

The optimal value

An optimal policy

Similarly, for linear utility function

It is worth noting that differently from Liu and Koenig [

Group utility is the aggregation of personal utilities. The common methods include additive value model and multiplicative value model. In the following sections we will discuss additive and multiplicative value model for the aggregation of personal one-switch utility functions, respectively.

In general, additive aggregation model of group utility is defined as follows:

Thus the additive aggregation model of one-switch utility functions is defined as follows:

For all policies

In the paper we adopt the following multiplicative aggregation model of group utility:

For simplicity, in the paper we only consider the case

For

To solve the optimal policy of the additive and multiplicative aggregation model of one-switch utility functions, backward-induction method is adopted. In the paper the value range of wealth level is a continuous interval

For additive aggregation model of one-switch utility functions, if agent

For all optimal policies

Thus

On the other hand, for all optimal policies

Lemma

For multiplicative aggregation model of one-switch utility functions,

For all optimal policies

Then

Therefore,

Lemma

The above section gives the optimal policy as the wealth level approaches negative infinity for additive and multiplicative aggregation model of one-switch utility functions, respectively. The next step is to divide the wealth level interval and determine the wealth level thresholds and optimal policies in the intervals by using backward-induction method. In this section we will discuss the backward-induction method in the cases of additive and multiplicative aggregation model.

For the additive aggregation model of one-switch utility functions, we first give the following theorem to prove the existence of a wealth level threshold

For all optimal policies

Theorem

After getting

From the algorithm above, we can get the wealth level threshold

By maximizing the expected exponential utility, get the optimal policy

According to (

For all states

Calculate the wealth level threshold

For the wealth level interval

If, for all

For the multiplicative aggregation model of one-switch utility functions, we also have the following theorem that shows the existence of a wealth level threshold

For all optimal policies

Similarly to additive aggregation model, we determine the wealth level threshold

Then, we can get the wealth level threshold

After getting

By maximizing the expected exponential utility, get the optimal policy

According to (

For all states

Calculate the wealth level threshold

For the wealth level interval

If, for all

Consider a simple GDMDP model. There are two agents named Agent_{1} and Agent_{2} with risk aversion parameters

System state transitions.

Without loss of generality, the GDMDP model’s parameters are assumed as follows:

First, consider the situation that each agent makes decisions alone. The agent’s optimal policy and the wealth level threshold are solved by utilizing the method proposed by Liu and Koenig [_{1} makes decisions alone, the optimal policy is taking action _{2} makes decisions alone. If they make decisions together, then action

Now we consider how the wealth level threshold

The change of the wealth level threshold of group decision-making with different risk aversion parameters of agents.

Figure _{2} who is more risk-averse even if

Consider the situation that two agents have similar risk attitude; that is, their risk aversion parameters are similar; for example, their one-switch utility functions are _{1} when

The change of the wealth level threshold of group decision-making with similar risk aversion parameters of agents.

Finally, we consider group decision-making based on the multiplicative aggregation model and especially focus on the influence of product term of group utility, that is,

The change of the wealth level threshold of group decision-making based on the multiplicative aggregation model.

The two curved lines gradually approach each other to a point when the absolute value of _{1} and Agent_{2}; therefore the wealth level threshold of multiplicative aggregation model will approach the threshold of additive aggregation model with

This paper has put an effort on how to extend a single agent’s risk-sensitive decision-theoretic planning under the MDP framework to the multiagent problem. Based on one-switch utility function that is used to describe agent’s risk-sensitive attitude, the additive and multiplicative aggregation models of group utility have been proposed in this paper. According to the characteristics of group utility, a backward-induction method has been presented to divide the wealth level interval and compute the optimal policy. The paper has also offered numerical examples and discussed how the weights and risk aversion parameters influence the group decision-making. From numerical examples we can observe that, for the additive aggregation model, if the risk aversion parameters of agents are different, the risk aversion parameters will have an obvious influence on the group decision-making, while the weights of agents will play a critical role if the risk aversion parameters are similar. For the multiplicative aggregation model, group decision-making will not be dominated by the weights of individuals completely. The product term of group utility will also influence the group decision-making.

In the future we intend to further study multiattribute group decision-making under the MDP framework with one-switch utility function. Based on the work of Tsetlin and Winkler [

Let

Assume that there exists a state

Additionally, as

Let

Assume that there exists some state

Additionally, as

The authors declare that there is no conflict of interests regarding the publication of this paper.

This work was supported by the National Natural Science Foundation of China under Grant 70971048.