We consider the best-choice problem with disorder
and imperfect observation. The decision-maker observes sequentially
a known number of i.i.d random variables from a known distribution
with the object of choosing the largest. At the random time the distribution
law of observations is changed. The random variables cannot
be perfectly observed. Each time a random variable is sampled the
decision-maker is informed only whether it is greater than or less than
some level specified by him. The decision-maker can choose at most
one of the observation. The optimal rule is derived in the class of
Bayes' strategies.
1. Introduction
In the papers we consider the following best-choice problem with disorder and imperfect observations. A decision-maker observes sequentially n iid random variables ξ1,…,ξθ-1,ξθ,…,ξn. The observations ξ1,…,ξθ-1 are from a continuous distribution law F1(x) (state S1). At the random time θ, the distribution law of observations is changed to continuous distribution function F2(x) (i.e., the disorder happen—state S2). The moment of the disorder has a geometric distribution with parameter 1-α. The observer knows parameters α, F1(x), and F2(x), but the exact moment θ is unknown.
At each time in which a random variable is sampled, the observer has to make a decision to accept (and stop the observation process) or reject the observation (and continue the observation process). If the decision-maker decided to accept at step k (1≤k≤n), she receives as the payoff the value of the random variable discounted by the factor λk-1, where 0<λ<1. The random variables cannot be perfectly observed. The decision-maker is only informed whether the observation is greater than or less than some level specified by her.
The aim of the decision-maker is to maximize the expected value of the accepted discounted observation.
We find the solution in the class of the following strategies. At each moment k (1≤k≤n), the observer estimates the a posterior probability of the current state and specifies the threshold s=sn-k. The decision-maker accepts the observation xk if and only if it is greater than the corresponding threshold s.
This problem is the generalization of the best-choice problem [1, 2] and the quickest determination of the change-point (disorder) problem [3–5]. The best-choice problems with imperfect information were treated in [6–8]. Only few papers related to the combined best-choice and disorder problem are published [9–11]. Yoshida [9] considered the full-information case and found the optimal stopping rule which maximizes the probability that accepted value is the largest of all θ+m-1 random variables for a given integer m. Closely related work to this study is Sakaguchi [10] where the optimality equation for the optimal expected reward is derived for the full-information model. In [11], we constructed the solution of the combined best-choice and disorder problem in the class of single-level strategies, and, in this paper, we search the Bayes' strategy which maximizes the expected reward in the model with imperfect observation.
2. Optimal Strategy
According to the problem the observer does not know the current state (S1 or S2). But she can estimate the state using the Bayes' formula:
πs=π(s)=P{S1∣x≤s}=P(S1)P(x≤s∣S1)P(x≤s)=απF1(s)Fπ(s).
Here, s=si is the threshold specified by the decision-maker within i steps until the end (i.e., at the step n-i), π is the a prior probability of the state S1 (i.e., before getting the information that x≤s), Fπ(s)=πF1(s)+π¯F2(s), and π¯=1-π.
We use the dynamic programming approach to derive the optimal strategy. Let vi(π) be the payoff that the observer expects to receive using the optimal strategy within steps until the end. The optimality equation is as follows:
vi(π)=maxsE[λvi-1(πs)I{x≤s}+xI{x>s}],i≥1,v0(π)=0∀π.
Simplifying (2.2), we get
vi(π)=maxs[λvi-1(πs)Fπ(s)+πE1(s)+π¯E2(s)],i≥1,v0(π)=0∀π.
Here, Ek(s)=∫s∞xdFk(x), k=1,2.
The following theorem gives the presentation of the expected payoff in linear form on π.
Theorem 2.1.
For any i the function vi(π) can be written in the form
vi(π)=πAi(s1,…,si)+Bi(s1,…,si),
where
si=si(π)=argmaxs[λvi-1(πs)Fπ(s)+πE1(s)+π¯E2(s)],i≥1,0≤π≤1.
Proof.
Using the formula (2.3), one can show that
v1(π)=maxs[π(E1(s)-E2(s))+E2(s)]=πA1(s1)+B1(s1),
where A1(s1)=E1(s1)-E2(s1), B1=E2(s1) and
s1=s1(π)=argmaxs[π(E1(s)-E2(s))+E2(s)],0≤π≤1.
Threshold s1=s1(π) is the solution of (2.3) for 0≤π≤1 for i=1.
Assume the theorem is correct for certain i=k. Then, for i=k+1vk+1(π)=maxs[λ(πsAk(s1,…,sk)+Bk(s1,…,sk))Fπ(s)+πE1(s)+π¯E2(s)]=maxs[π(λαF1(s)Ak(s1,…,sk)+λBk(s1,…,sk)(F1(s)-F2(s))+E1(s)-E2(s))+λBk(s1,…,sk)F2(s)+E2(s)]=πAk+1(s1,…,sk+1)+Bk+1(s1,…,sk+1),
where
Ak+1(s1,…,sk+1)=λαF1(s)Ak(s1,…,sk)+λBk(s1,…,sk)(F1(s)-F2(s))+E1(s)-E2(s),Bk+1(s1,…,sk+1)=λBk(s1,…,sk)F2(s)+E2(s),si=si(π)=argmaxs[λvi-1(πs)Fπ(s)+πE1(s)+π¯E2(s)],i≥1,0≤π≤1.
The theorem is proved.
The following lemma takes place.
Lemma 2.2.
Assuming Ek<∞, k=1,2 as i→∞, there is a limit of the expected payoff vi(π)→v(π).
Proof.
It is obvious that the sequence vi(π) is increasing by i.
Now, we prove that the sequence of the expected payoffs has an upper bound. v1(π)≤πE1+π¯E2,Ek=∫0∞xdFk(x),k=1,2v2(π)=maxs[λv1(πs)Fπ(s)+πE1(s)+π¯E2(s)]≤λ(πE1+π¯E2)+πE1+π¯E2.
Further one can show using the induction that for any i≥1 and any 0≤π≤1 the expected payoff at the step i has the upper bound
vi(π)≤πE1+π¯E21-λ.
The lemma is proved.
Corollary 2.3.
Theorem 2.1 and the lemma yield that there are such A and B that
limi→∞vi(π)=limi→∞(πAi(s1,…,si)+Bi(s1,…,si))=πA+B=v(π).
As i→∞ the expected payoff satisfies the following equation:
v(π)=limivi(π)=maxs[λv(πs)Fπ(s)+πE1(s)+π¯E2(s)].
To find the components of the expected payoff for a case of huge number of observation we should solve the following equation:
πA+B=maxs[π(λαF1(s)A+λB(F1(s)-F2(s))+E1(s)-E2(s))+λBF2(s)+E2(s)],
therefore,
A=λαF1(s)A+λB(F1(s)-F2(s))+E1(s)-E2(s),B=λBF2(s)+E2(s).
The solution of the system is as follows
A=E1(s)(1-λF2(s))-E2(s)(1-λF1(s))(1-λF2(s))(1-λαF1(s)),B=E2(s)1-λF2(s).
The expected payoff is
v(π)=maxs(πA+B)
and the optimal threshold is
s=s(π)=argmaxs(πA+B).
The above results are summarized in the following theorem.
Theorem 2.4.
For i→∞, the solution of (2.3) is defined as
v(π)=maxs(πA+B),
where
s=s(π)=argmaxs(πA+B),A=E1(s)(1-λF2(s))-E2(s)(1-λF1(s))(1-λF2(s))(1-λαF1(s)),B=E2(s)1-λF2(s).
3. Examples
Consider the examples of using the Bayes' strategy B defined by the formula (2.18) comparing with two strategies with constant thresholds that do not depend on π.
3.1. Normal Distribution
Consider the example of the normal distribution of the random variables where functions F1(x) and F2(x) have the variance σ2=1 and the expectation μ1=10 and μ2=9, respectively.
Strategies A1 and A2 with constant thresholds defined by the following formula:
s=E(s)1-λF(s),
where F(s)≡F1(s) and E(s)≡E1(s) for the strategy A1; F(s)≡F2(s) and E(s)≡E2(s) for the strategy A2.
The values of the thresholds of strategies A1 and A2 depending on discount rate are tabulated in Table 1.
The values of the thresholds of strategies A1 and A2.
λ
Strategy A1
Strategy A2
0.99
10.851
9.902
0.9
9.088
8.210
0.7
7.000
6.300
Table 1 shows how much the discount rate is affect on the thresholds.
Figure 1 shows the graphics of the optimal thresholds for strategies A1 and A2 (s1 and s2, resp.) and strategy B(sopt) depending on π. As the figure shows, the strategy B depends on the a posterior probability of the state S1(π). As π tends to zero, the optimal threshold of the strategy B tends to threshold s2.
Graphics of the optimal thresholds for strategies A1, A2, and B for α=0.9, λ=0.99.
We compare the payoffs that the observer expects to receive using different strategies. Define Vα as the expected payoff for π=1 and depending on probability of disorder α.
Figure 2 shows the numerical results of the expected payoffs of the observer who uses the strategies A1, A2, and B (thresholds s1, s2, and sopt, resp.).
Expected payoffs of the observer who uses the strategies A1, A2 and B for α=0.9, λ=0.99.
The expected payoff of the observer who uses the Bayes' strategy B is greater if she uses one of the strategies A1 or A2. The difference is significant for α∈[0.75,0.98], because of uncertainty of the current state of the system.
Table 2 shows the numerical results of the main characteristics of the best-choice process.
Main characteristics of the best-choice process for α=0.9, λ=0.99.
Characteristic
Strategy A1
Strategy A2
Strategy B
Expected payoff
10.035
10.429
10.500
Average time of accepting the observation
14.526
2.472
3.072
Average number of steps after the disorder
30.406
4.503
5.031
Number of the values accepted before the disorder, %
64.100
83.066
79.738
For the small probability of the disorder (1-α=0.1), the expected payoff according to the strategy A2 is greater (10.429) than according to the strategy A1 (10.035). But the Bayes' strategy B that depends on π gives the largest expected payoff (10.500).
Table 2 shows that the average time of accepting the observation is increasing with respect to the value of the threshold. Note that the strategy A1 does not depend on the disorder and this leads to a high value of the average time of accepting the observation. Both strategies A2 and B have a small average time of accepting the observation.
3.2. Exponential Distribution
Consider the example of the exponential distribution of the observations. Let F1(x) and F2(x) have the exponential distribution with parameters λ1=0.5 and λ2=1, respectively. As in the previous example, consider the strategies A1 and A2 comparing with the Bayes' strategy B,
s=E(s)1-λF(s),
where F(s)≡F1(s) and E(s)≡E1(s) for the strategy A1; F(s)≡F2(s) and E(s)≡E2(s) for the strategy A2.
Table 3 shows the values of the thresholds for the strategies A1 and A2 depending on the discount rate.
The values of the thresholds of strategies A1 and A2.
λ
Strategy A1
Strategy A2
0.99
6.756
3.378
0.9
3.358
1.679
The value of the optimal threshold of the strategy B as in the case of the normal distribution of the observations is increasing by π and equal to the threshold of the strategy A2 at π=0. The graphics of the expected payoffs have the same view as in Figure 2. Table 4 shows the main characteristics of the best-choice process for different strategies.
Main characteristics of the best-choice process for α=0.9, λ=0.99.
Characteristic
Strategy A1
Strategy A2
Strategy B
Expected payoff
2.355
4.438
4.499
Average time of accepting the observation
678.930
15.397
16.923
Average number of steps after the disorder
856.535
29.110
29.610
Number of the values accepted before the disorder, %
21.57
70.89
56.01
As in the previous example, the Bayes' strategy gives better payoff than the strategy A2, but it has bigger average time of accepting the observation. The strategy A1 is the worst for all the parameters.
4. Results
In the article, we consider the best-choice problem with disorder and imperfect observations. We propose the Bayes' strategy where the threshold depends on the a posterior probability of the disorder. The numerical results show that this strategy gives better expected payoff than the constant strategies.
Acknowledgment
The paper is supported by grants of Russian Fund for Basic Research, Project 10-01-00089-a and Division of Mathematical Sciences, Program “Mathematical and algorithmic Problems of New Information Systems”.
GilbertJ. P.MostellerF.Recognizing the maximum of a sequence1966613573019863710.2307/2283044BerezovskiĭB. A.GnedinA. V.1984Moscow, RussiaNauka197768372ŠirjaevA. N.197338Providence, RI, USAAmerican Mathematical Societyiv+1740350990BojdeckiT.Probability maximizing approach to optimal stopping and its application to a disorder problem1979316171546700ZBL0432.60051SzajowskiK.On a random number of disorders. Forthcoming2011EnnsE. G.Selecting the maximum of a sequence with imperfect information19757035164064310.2307/2285947ZBL0308.62082NeumannP.PorosińskiZ.SzajowskiK.On two person full-information best choice problem with imperfect observation19962Hauppauge, NY, USANova Science Publishers47551428249ZBL0871.90143PorosińskiZ.SzajowskiK.Modified strategies in two person full-information best choice problem with imperfect observation20005211031121783184ZBL1016.91017YoshidaM.Probability maximizing approach to a secretary problem with random change-point of the distribution law of the observed process19842119810773267510.2307/3213668ZBL0534.62060SakaguchiM.A best-choice problem for a production system which deteriorates at a disorder time20015411251341880713ZBL1019.62110MazalovV. V.IvashkoE. E.Full-information best-choice problem with disorder2007142215224