A novel decision making for intelligent agent using quantuminspired approach is proposed. A formal, generalized solution to the problem is given. Mathematically, the proposed model is capable of modeling higher dimensional decision problems than previous researches. Four experiments are conducted, and both empirical experiments results and proposed model's experiment results are given for each experiment. Experiments showed that the results of proposed model agree with empirical results perfectly. The proposed model provides a new direction for researcher to resolve cognitive basis in designing intelligent agent.
Decision making model is crucial to build successful intelligent agent. Therefore, study of decision making model plays a key role in order to improve performance of intelligent agent. Traditionally, decision making model is represented and implemented by employing Bayesian or Markov process [
There are two major violations of “rational decision” found in previous studies: “sure thing principle” and “order effects.” The sure thing principle claims that human should prefer
Recently, quantum mechanics inspired explanation of “rational violation” is proposed and tested [
In this paper, a generalized quantuminspired decision making model (QDM) is proposed. QDM helps to extend previous research findings and model more complicated decision space. Four experiments are concluded and verified QDM where the experiment results agree with empirical almost perfectly. The cognitive biases in decision making process are resolved in experiments. QDM is expected to help researches to model real life decision making process and improve the performance of current intelligent agent for generating “humanlike” decision.
This paper is based on three hypotheses. First, because QDM is capable to explain violations of “rational decisions” of human behavior, authors believe that QDM could result “humanlike” decisions. Second, all decisions in a scenario can be quantified. Third, some parameters are predefined because the paper is mainly discussing decision making model. The paper offers a preliminary result of QDM and its applications. The representation introduced in this paper has its own advantages and limitations. In future, more theoretical works of QDM are needed to be explored. An elegant representation of QDM is also required.
The paper is formatted as follows. Section
In this section, the paper sets two types of environment for further discussion. In this paper, authors considered two players involved only in order to simplify the scenario and establish fundamental analysis of the topic.
First Type Two Players Game (FTTP) contains two characters: Player 1 and Player 2. In this context, at least one of the players is an intelligent agent which is sufficient to provide and execute necessary functionalities and make decisions. Mathematically, let
This type of game is used as main scene in the following sections to describe QDM.
Second Type Two Players Game (STTP) contains two players: Player 1 and Player 2. In this context, at least one of the players is an intelligent agent which is sufficient to provide and execute necessary functionalities and make decisions. Both players share same decision space
Players will receive amount of rewards by performing any decision. A payoff matrix which assigns rewards to each decision for both players is defined. The payoff matrix according to two players is necessary to be produced before the game started in both FTTP and STTP. The received payoff of a player is determined by utility function or utility vector. The elements of payoff matrix are not necessarily real numbers. However, payoff has to be real number due to the limitation of Hamiltonian operator (explained in Section
Let
Before the game starts, set the initial state
Assume Player 1 makes decision
According to timedependent Schrödinger equation, the time evolution is determined by (
The solution to (
The detailed description of Hamiltonian operator can be found in Section
State
The previous section presented the decision making strategy based on Player 1's decision. In this section, a
This approach will produce fuzzier result of decisionmaking certainly; however, it is extremely important when Player 2 is not able to collect enough evidence to perform twostage QDM.
The concept of constructing
Equation (
According to quantum mechanics, Hamiltonian operator in matrix form is required to be a Hermitian matrix at least for ensuring that
An adjust matrix
Therefore,
For each
The
Other than the
Let
Employing (
As the paper discussed previously, payoff matrix and corresponding utility function/vector are necessary to be produced and affect the result fundamentally. Some suggestions of settings are presented in this section.
The concepts of payoff matrix and utility function/vector are borrowed from Game Theory, which are useful to represent decision space in two dimensions. Payoff matrix, for certain purposes, can be abstracted and estimated from environment. Utility function is used to calculate the expected payoff of a player. There are many ways to perform this function in Game Theory and reinforcement learning. Utility function/vector may be learned during training process. A reliable utility function/vector would increase the robustness of QDM.
Usually, payoff matrix is easy to define or estimate. On the other hand, although utility function has well definition in Game Theory, the actual received payoff is different from mathematical formalization. For example, a famous hypothesis in Game Theory is that every participant in the game is “evil.” Altruism, an important factor of humanity, on the contrary, is rarely mentioned. Involving “altruistic” factor to adjust utility function may help model produce more “humanlike” decision.
Prisoner’s Dilemma is a canonical Game Theory problem which has been used in discussing and analyzing human behavior and decision making. The payoff matrix is described in Table
Payoff matrix of Prisoner's Dilemma.
Your defect  Your cooperate  

Other defects  Other: 10, You: 10  Other: 25, You: 5 
Other cooperate  Other: 5, You: 25  Other: 20, You: 20 
Table
Empirical studies and experiment results using QDM on Prisoner's Dilemma (the probability indicates that Player 2 chooses “defect” by known “defect,” “cooperate,” or “unknown”).
Known defect  Known cooperate  Unknown  

Shafir and Tversky [ 
97%  84%  63% 
Li and Taplin [ 
83%  66%  60% 
Croson [ 
67%  32%  30% 
Buesmeyer et al. [ 
91%  84%  66% 
Average of above 



QDM  81%  65%  57% 
The experiment is set as follows.
The state vector
Initial state
If Player 1 chooses “defect,” the state vector changes to
Rotation matrix
Set
By performing the settings above, the experiment concludes the following result.
By known Player 1 choosing “defect,” the probability vector is
By known Player 1 choosing “cooperate,” the probability vector is
By unknown Player 1's decision, the probability vector is
Therefore, the probability of Player 2's decision, in this case, “defect,” is
Splitting Money Game is also a frequently used example in Game Theory. The game is described as follows. You and your friend are splitting 7 dollars. Your friend makes an offer to you from 0 dollar to 7 dollars, such as 3 dollars or 5 dollars. If you accept the offer, then you will receive such dollars, and your friend will take the rest. However, if you reject the offer, you and your friend both will receive nothing, and the money will be donated. The payoff matrix is showed in Table
Payoff matrix of Splitting Money Game.
Offer  

0$  1$  2$  3$  4$  5$  6$  7$  
Accept  You: 0$  You: 1$  You: 2$  You: 3$  You: 4$  You: 5$  You: 6$  You: 7$ 
Other: 7$  Other: 6$  Other: 5$  Other: 4$  Other: 3$  Other: 2$  Other: 1$  Other: 0$  


Reject  You: 0$  You: 0$  You: 0$  You: 0$  You: 0$  You: 0$  You: 0$  You: 0$ 
Other: 0$  Other: 0$  Other: 0$  Other: 0$  Other: 0$  Other: 0$  Other: 0$  Other: 0$ 
An online anonymous survey of this game has been conducted and received 302 respondents. The result is showed in Table
The Game Theory prediction, survey result, and experiment results using QDM on Splitting Money Game.
Offer  

0$  1$  2$  3$  4$  5$  6$  7$  Unknown  
Game Theory accept  100%  100%  100%  100%  100%  100%  100%  100%  100% 
Game Theory reject  0%  0%  0%  0%  0%  0%  0%  0%  0% 
Survey accept  15.19%  16.19%  21.28%  68.42%  82.58%  61.07%  54.96%  54.42%  46.90% 
Survey reject  84.81%  83.81%  78.72%  31.58%  17.42%  38.93%  45.04%  45.58%  53.10% 
QDM accept  14.90%  16.15%  20.97%  68.34%  83.21%  60.87%  54.99%  53.99%  46.68% 
QDM reject  85.10%  83.85%  79.03%  31.66%  16.79%  39.13%  45.01%  46.01%  53.32% 
The experiment is set as follows.
The state vector
Initial state
Stage vector for offer
Rotation matrix
Set utility vector as
By performing the setting above, the experiment concludes the following results.
For known different offers, experiment produces a probability vector for choosing “accept” by Player 2:
For known different offers, experiment produces a probability vector for choosing “reject” by Player 2:
For unknown offer, the probability vector is
The Price is Right is a game where participants need to choose the same price as opponent’s choice in order to win. The description of the game is given as follows. Las Vegas proposed a new game. The dealer will give you four cards, and each card has a price on it; for example, card 1 is 1000$, card 2 is 2000$, and so on. Before the game starts, dealer would write down a price from one of the cards secretly and then save it in an envelope; witness would make sure nobody can touch the envelope during the game. Now the game started; you need to choose one of the cards. After you made your choice, witness will open the envelope and dealer will judge the result according to the following rules.
If the price of the card you chose is same as the price which is written, you win the such amount of money. For example, you choose a card with 1000$, and dealer also wrote 1000$. You will win 1000$.
If the price of the card you chose is different from the price that is written, you will lose. And you will be judged as loser, and you need to pay half of the difference. For example, you choose the card with 1000$, but the dealer wrote 4000$ instead, then you need to pay
The payoff matrix of The Price is Right is presented in Table
Payoff matrix of The Price is Right.
Offer  1000$  2000$  3000$  4000$ 

1000$  You: 1000$  You: −500$  You: −1000$  You: −1500$ 
Dealer: −1000$  Dealer: 500$  Dealer: 1000$  Dealer: 1500$  


2000$  You: −500$  You: 2000$  You: −500$  You: −1000$ 
Dealer: 500$  Dealer: −2000$  Dealer: 500$  Dealer: 1000$  


3000$  You: −1000$  You: −500$  You: 3000$  You: −500$ 
Dealer: 1000$  Dealer: 500$  Dealer: −3000$  Dealer: 500$  


4000$  You: −1500$  You: −1000$  You: −500$  You: 4000$ 
Dealer: 1500$  Dealer: 1000$  Dealer: 500$  Dealer: −4000$ 
The Game Theory prediction, survey result, and experiment results using QDM on The Price is Right.
Offer  

1000$  2000$  3000$  4000$  
Survey's choice  13.89%  20.83%  44.44%  20.83% 
QDM choice  15.12%  16.64%  45.72%  22.52% 
The experiment is set as follows.
The state vector
Initial state
First stage vector for different
Rotation matrix
For
For
Set utility vector as
By performing the setting above, the experiment concludes the following results.
For choosing 1000$ : 0.1512.
For choosing 2000$ : 0.1664.
For choosing 3000$ : 0.4572.
For choosing 4000$ : 0.2252.
A Sheriff’s Dilemma is a classic Bayesian Game in Game Theory. A Bayesian Game introduces multiple payoff matrices with corresponding probability to describe the scenario. The description of the game is presented as follows. You, the sheriff, are facing a suspect. The suspect has a gun. You are pointing at each other, and now, you need to make the decision whether you are going to shoot him (assume there is no way to talk). The suspect has a possibility to be the criminal, but also can be innocent. Here, let us say it is half and half, which means that you cannot really tell whether the suspect is a criminal or innocent. The criminal would rather shoot even if the sheriff does not, as the criminal would be caught if he does not shoot. The innocent suspect would rather not shoot even if the sheriff shoots. The payoff matrix is presented in Table
Payoff matrix of A Sheriff's Dilemma.
If the suspect is innocent  If the suspect is a criminal  

Shoot  Not Shoot  Shoot  Not Shoot  
Shoot  You: −3  You: −1  Shoot  You: 0  You: 2 
Suspect: −1  Suspect: −2  Suspect: 0  Suspect: −2  


Not Shoot  You: −2  You: 0  Not Shoot  You: −2  You: −1 
Suspect: −1  Suspect: 0  Suspect: −1  Suspect: −1 
An online anonymous survey of this game has been conducted and received 89 respondents. The result is showed in Table
The Game Theory prediction, survey result, and experiment results using QDM on A Sheriff's Dilemma.
Known suspect shoot  Known suspect not shoot  Unknown suspect shoot/not shoot  

Survey shoot  88.76%  26.97%  61.80% 
Survey not shoot  11.24%  73.03%  38.20% 
QDM shoot  87.52%  29.29%  58.27% 
QDM not shoot  12.48%  70.71%  41.73% 
The experiment is set as follows.
The state vector
When
When
When
Initial state
First stage vector for different
Rotation matrix
Set utility vector as
By performing the setting above, the experiment concludes the following results.
For known suspect shoot, experiment produces a probability for choosing shoot by sheriff which is 0.8752.
For known suspect not shoot, experiment produces a probability vector for choosing shoot by sheriff which is 0.2929.
For unknown suspect shoot/not shoot, experiment produces a probability vector for choosing shoot by sheriff which is 0.5827.
This paper introduced a generalized quantuminspired decision making model for intelligent agent. And the proposed model is verified by four experiments successfully. The model is aiming to provide a tool for intelligent agent to perform “humanlike” decision instead of “machinelike” decision. Even though this paper limits the setting between two players, twodimensional decision spaces are in fact the foundation of multiagents environment. Furthermore, the presented model is able to model much more complex and larger decision space than previous researches.
Some future works are considered. The first problem is that the number of decisions does not always follow
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors acknowledge scholarship from University of Malaya (Fellowship Scheme). The research is supported in part by HIR Grant UM.C/625/1/HIR/MOHE/FCSIT/10 from University of Malaya.