A Generalized Quantum-Inspired Decision Making Model for Intelligent Agent

A novel decision making for intelligent agent using quantum-inspired approach is proposed. A formal, generalized solution to the problem is given. Mathematically, the proposed model is capable of modeling higher dimensional decision problems than previous researches. Four experiments are conducted, and both empirical experiments results and proposed model's experiment results are given for each experiment. Experiments showed that the results of proposed model agree with empirical results perfectly. The proposed model provides a new direction for researcher to resolve cognitive basis in designing intelligent agent.


Introduction
Decision making model is crucial to build successful intelligent agent. Therefore, study of decision making model plays a key role in order to improve performance of intelligent agent. Traditionally, decision making model is represented and implemented by employing Bayesian or Markov process [1,2]. However, traditional methods usually introduce biases problems in designing stage. And many empirical findings in cognitive science in recent years have indicated that human usually violates "rational decisions" which are produced by traditional methods [3][4][5][6]. Hence, many researchers in cognitive science, psychology, neuroscience, and artificial intelligence have proposed different explanations to complete traditional methods [7][8][9]. However, none of the explanations is able to resolve all human's violations of "rational decisions" completely.
There are two major violations of "rational decision" found in previous studies: "sure thing principle" and "order effects. " The sure thing principle claims that human should prefer over if is always better than in world . This principle was tested by Tversky and Shafir [10] in a simple two-stage gambling experiment and showed violation to the principle. The order effect argues that human's decision pattern violates a fundamental requirement of classical probability theory: Pr( ∩ | ) = Pr( ∩ | ) which implies Pr( | ∩ ) = Pr( | ∩ ) according to Bayesian rule [9]. The violations of "rational decision" are recognized as "wrong decisions" or "stupid decisions" in some game theorists' perspective. For example, the violation of the sure thing principle is an obvious wrong decision theoretically. However, people usually choose differently even though they completely understand the risks and benefits in the certain scenario such as [10]. And the order effect challenges the classical theory even more fundamentally. The commutative property is not followed in human decision making process, which means the analyses based on classical probability theory introduce serious bias of modeling decision making process. Therefore, the traditional methods should be enhanced or replaced for modeling and describing human decision making.
Recently, quantum mechanics inspired explanation of "rational violation" is proposed and tested [11][12][13]. It showed that quantum explanation is able to explain and illustrate two previously mentioned violations successfully. Essentially, noncommutative property and superposition principle of quantum mechanics inspired probability theory are natural tools to explain the violations. Furthermore, the approach is also capable to produce "human-like" decision during simulation [11][12][13][14]. "Human-like" in this paper refers to that 2 The Scientific World Journal the intelligent agent is able to perform decisions similar to human in same scenario. On the other hand, previous researches did not model complex environment and decision spaces which are practical to implement on intelligent agent.
In this paper, a generalized quantum-inspired decision making model (QDM) is proposed. QDM helps to extend previous research findings and model more complicated decision space. Four experiments are concluded and verified QDM where the experiment results agree with empirical almost perfectly. The cognitive biases in decision making process are resolved in experiments. QDM is expected to help researches to model real life decision making process and improve the performance of current intelligent agent for generating "human-like" decision.
This paper is based on three hypotheses. First, because QDM is capable to explain violations of "rational decisions" of human behavior, authors believe that QDM could result "human-like" decisions. Second, all decisions in a scenario can be quantified. Third, some parameters are predefined because the paper is mainly discussing decision making model. The paper offers a preliminary result of QDM and its applications. The representation introduced in this paper has its own advantages and limitations. In future, more theoretical works of QDM are needed to be explored. An elegant representation of QDM is also required.
The paper is formatted as follows. Section 2 presents the methodology and mathematical description. Section 3 showed experiment results to verify the model and finally Section 4 concludes the paper and discusses future works.

Environment Setting.
In this section, the paper sets two types of environment for further discussion. In this paper, authors considered two players involved only in order to simplify the scenario and establish fundamental analysis of the topic.

First Type Two Players Game. First Type Two Players
Game (FTTP) contains two characters: Player 1 and Player 2. In this context, at least one of the players is an intelligent agent which is sufficient to provide and execute necessary functionalities and make decisions. Mathematically, let A = { 1 , 2 , . . . , | ≥ 1} be the Player 1's decision space and B = { 1 , 2 , . . . , | = 2 , ≥ 1} be the Player 2's decision space. Elements of A and B can be formed with any semantic description: names, codes, and so on.
This type of game is used as main scene in the following sections to describe QDM.

Second Type Two Players Game. Second Type Two
Players Game (STTP) contains two players: Player 1 and Player 2. In this context, at least one of the players is an intelligent agent which is sufficient to provide and execute necessary functionalities and make decisions. Both players share same decision space D = { 1 , 2 , . . . , | = 2 , ≥ 1}. Similarly, the elements of D can be any semantic description. During the game, Player 1 will act a decision from D, and Player 2 have to select an appropriate decision from D to respond.

Payoff.
Players will receive amount of rewards by performing any decision. A payoff matrix which assigns rewards to each decision for both players is defined. The payoff matrix according to two players is necessary to be produced before the game started in both FTTP and STTP. The received payoff of a player is determined by utility function or utility vector. The elements of payoff matrix are not necessarily real numbers. However, payoff has to be real number due to the limitation of Hamiltonian operator (explained in Section 2.4).

Two-Stage Quantum Decision Model. Two-Stage Quantum Decision
Model assumes that Player 1 makes a decision then Player 2 has to react an appropriate decision .
Before the game starts, set the initial state Ψ 0 as where there are × elements in the state vector.

Stage
One. Assume Player 1 makes decision and Player 2 recognized it successfully; the state vector is transformed to state Ψ 1 that rules out other probabilities and only retains , and the description of Ψ is defined as follows: where = (1 1 , 1 2 , . . . , 1 ) .

Stage Two.
According to time-dependent Schrödinger equation, the time evolution is determined by (3), where is imaginary unit, defined as = √ −1: The solution to (3) is The Scientific World Journal 3 where ≥ 0 and Hamiltonian operator is determined by the sum of two matrices: The detailed description of Hamiltonian operator can be found in Section 2.4.
State Ψ 1 is transformed to Ψ 2 by employing (4) with given time . Note that (4) is not a conventional representation to time-dependent Schrödinger equation. In this paper, the Plank constant ℎ is omitted. And authors assume that Ψ 1 and Ψ 0 are associated with = 0.

One-Stage Quantum Decision
Model. The previous section presented the decision making strategy based on Player 1's decision. In this section, a one-stage quantum decision model is described. The model in the section does not require Player 1's decision as reference and makes decision directly.
This approach will produce fuzzier result of decisionmaking certainly; however, it is extremely important when Player 2 is not able to collect enough evidence to perform two-stage QDM.
The concept of constructing one-stage QDM is same as Section 2.2.2. Equation (4) is modified to Equation (6) indicates that the state Ψ 0 is directly transformed to Ψ 2 without knowing Ψ 1 and it seems unreasonable. In fact, (6) can be viewed as the summarization of all possible Ψ 1 due to superposition principle. The interpretation is showed as follows: where Ψ 1 represents that Player 1 makes decision .

Hamiltonian
Operator. According to quantum mechanics, Hamiltonian operator in matrix form is required to be a Hermitian matrix at least for ensuring that − ⋅ ⋅ is a unitary operator. And due to the property of Hermitian matrix, payoff has to be real number. Hamiltonian is used to rotate the state vector to the desire basis. A suggested solution of and is presented.
. is a diagonal matrix where = ( , 1 , , 2 , . . . , , ) rotates the state to the desired decision according to Player 1's decision, and , ∈ R × is defined as a edited Hadamard transform: where is same as the defined in Section 2.1.
An adjust matrix ∈ R × is defined as where is the received payoff of Player 2 given according to utility function or utility vector of Player 2. Therefore, , is defined as where 1/√ 2 + 2 − 1 is a scalar and "∘" represents entrywise product.

.
exists in STTP only. In FTTP, is set as null matrix. Cognitive Science findings suggest that if Player 1 chooses an action, Player 1 would tend to think that Player 2 will choose the same decision; is a symmetric matrix.
(ii) The th row and th column of must be full of 1 s.
(iii) Other than the th row, each row contains /2 positive 1 s and /2 negative 1 s.
= 0 be the initial matrix. 0 represents null matrix. Let the index of be ( 1 , 1 ) and the index of 0 be ( 2 , 2 ). Replace 0 in 0 as the corresponding element in using the following relationship: Employing (11) to every ∈ [1, ]. Normalizing the final product from above, where − / √ is a scalar and is a constant.

Payoff Matrix and Utility Function.
As the paper discussed previously, payoff matrix and corresponding utility function/vector are necessary to be produced and affect the result fundamentally. Some suggestions of settings are presented in this section. The concepts of payoff matrix and utility function/vector are borrowed from Game Theory, which are useful to represent decision space in two dimensions. Payoff matrix, for certain purposes, can be abstracted and estimated from environment. Utility function is used to calculate the expected payoff of a player. There are many ways to perform this 4 The Scientific World Journal function in Game Theory and reinforcement learning. Utility function/vector may be learned during training process. A reliable utility function/vector would increase the robustness of QDM.
Usually, payoff matrix is easy to define or estimate. On the other hand, although utility function has well definition in Game Theory, the actual received payoff is different from mathematical formalization. For example, a famous hypothesis in Game Theory is that every participant in the game is "evil. " Altruism, an important factor of humanity, on the contrary, is rarely mentioned. Involving "altruistic" factor to adjust utility function may help model produce more "human-like" decision.

Prisoner's Dilemma.
Prisoner's Dilemma is a canonical Game Theory problem which has been used in discussing and analyzing human behavior and decision making. The payoff matrix is described in Table 1. The Nash Equilibrium suggests that both parties have to defect in standard Game Theory. However, empirical studies argue differently. Table 2 presented several well-known empirical studies on Prisoner's Dilemma. By employing proposed model, experiment result showed that quantum-inspired decision making model matches Prisoner's Dilemma almost perfectly.
The experiment is set as follows.
By performing the settings above, the experiment concludes the following result.

Splitting Money Game.
Splitting Money Game is also a frequently used example in Game Theory. The game is described as follows. You and your friend are splitting 7 dollars. Your friend makes an offer to you from 0 dollar to 7 dollars, such as 3 dollars or 5 dollars. If you accept the offer, then you will receive such dollars, and your friend will take the rest. However, if you reject the offer, you and your friend both will receive nothing, and the money will be donated. The payoff matrix is showed in Table 3.
An online anonymous survey of this game has been conducted and received 302 respondents. The result is showed in Table 4.
The experiment is set as follows.
(iv) Rotation matrix is diagonal matrix ( ,1 , . . . , ( By performing the setting above, the experiment concludes the following results. The Scientific World Journal 5 Table 2: Empirical studies and experiment results using QDM on Prisoner's Dilemma (the probability indicates that Player 2 chooses "defect" by known "defect, " "cooperate, " or "unknown").

The Price Is Right?
The Price is Right is a game where participants need to choose the same price as opponent's choice in order to win. The description of the game is given as follows. Las Vegas proposed a new game. The dealer will give you four cards, and each card has a price on it; for example, card 1 is 1000$, card 2 is 2000$, and so on. Before the game starts, dealer would write down a price from one of the cards secretly and then save it in an envelope; witness would make sure nobody can touch the envelope during the game. Now the game started; you need to choose one of the cards. After you made your choice, witness will open the envelope and dealer will judge the result according to the following rules.
(1) If the price of the card you chose is same as the price which is written, you win the such amount of money. For example, you choose a card with 1000$, and dealer also wrote 1000$. You will win 1000$.
(2) If the price of the card you chose is different from the price that is written, you will lose. And you will be judged as loser, and you need to pay half of the difference. For example, you choose the card with 1000$, but the dealer wrote 4000$ instead, then you need to pay (4000 − 1000)/2 = 1500$ to the dealer.
The payoff matrix of The Price is Right is presented in Table 5. An online anonymous survey of this game has been conducted and received 72 respondents. The result is showed in Table 6.
The experiment is set as follows.
(a) For , 6 The Scientific World Journal   By performing the setting above, the experiment concludes the following results.

A Sheriff's Dilemma. A Sheriff's Dilemma is a classic
Bayesian Game in Game Theory. A Bayesian Game introduces multiple payoff matrices with corresponding probability to describe the scenario. The description of the game is presented as follows. You, the sheriff, are facing a suspect. The suspect has a gun. You are pointing at each other, and now, you need to make the decision whether you are going to shoot him (assume there is no way to talk). The suspect has a possibility to be the criminal, but also can be innocent. Here, let us say it is half and half, which means that you cannot really tell whether the suspect is a criminal or innocent. The criminal would rather shoot even if the sheriff does not, as the criminal would be caught if he does not shoot. The innocent suspect would rather not shoot even if the sheriff shoots. The payoff matrix is presented in Table 7.
An online anonymous survey of this game has been conducted and received 89 respondents. The result is showed in Table 8.
The experiment is set as follows.
(iv) Rotation matrix is a diagonal matrix ( ,1 , . . . ,  S u s p e c t : 0 S u s p e c t : −1 S u s p e c t : −1 * The row is sheriff 's decisions; column is suspect's decisions. By performing the setting above, the experiment concludes the following results. (i) For known suspect shoot, experiment produces a probability for choosing shoot by sheriff which is 0.8752.
(ii) For known suspect not shoot, experiment produces a probability vector for choosing shoot by sheriff which is 0.2929.
(iii) For unknown suspect shoot/not shoot, experiment produces a probability vector for choosing shoot by sheriff which is 0.5827.

Conclusions and Future Works
This paper introduced a generalized quantum-inspired decision making model for intelligent agent. And the proposed model is verified by four experiments successfully. The model is aiming to provide a tool for intelligent agent to perform "human-like" decision instead of "machine-like" decision. Even though this paper limits the setting between two players, two-dimensional decision spaces are in fact the foundation of multiagents environment. Furthermore, the presented model is able to model much more complex and larger decision space than previous researches. Some future works are considered. The first problem is that the number of decisions does not always follow 2 , and how to disable one or more necessary dimensions is fairly important to study. Second, since the payoff matrix and utility function affect the results fundamentally, the study of both may improve the performance of the model. Third, more social studies and empirical results on human decision making are needed; they are used for adjusting and improving the model. Fourth, the performance of the model in multiagents environment is worthy of being studied.