We consider the best-choice problem with disorder and imperfect observation. The decision-maker observes sequentially a known number of i.i.d random variables from a known distribution with the object of choosing the largest. At the random time the distribution law of observations is changed. The random variables cannot be perfectly observed. Each time a random variable is sampled the decision-maker is informed only whether it is greater than or less than some level specified by him. The decision-maker can choose at most one of the observation. The optimal rule is derived in the class of Bayes' strategies.

In the papers we consider the following best-choice problem with disorder and imperfect observations. A decision-maker observes sequentially

At each time in which a random variable is sampled, the observer has to make a decision to

The aim of the decision-maker is to maximize the expected value of the accepted discounted observation.

We find the solution in the class of the following strategies. At each moment

This problem is the generalization of the best-choice problem [

According to the problem the observer does not know the current state (

Here,

We use the dynamic programming approach to derive the optimal strategy. Let

Simplifying (

The following theorem gives the presentation of the expected payoff in linear form on

For any

Using the formula (

Assume the theorem is correct for certain

The following lemma takes place.

Assuming

It is obvious that the sequence

Now, we prove that the sequence of the expected payoffs has an upper bound.

Theorem

As

To find the components of the expected payoff for a case of huge number of observation we should solve the following equation:

The solution of the system is as follows

The expected payoff is

The above results are summarized in the following theorem.

For

Consider the examples of using the Bayes' strategy

Consider the example of the normal distribution of the random variables where functions

Strategies

The values of the thresholds of strategies

The values of the thresholds of strategies

Strategy | Strategy | |
---|---|---|

0.99 | 10.851 | 9.902 |

0.9 | 9.088 | 8.210 |

0.7 | 7.000 | 6.300 |

Table

Figure

Graphics of the optimal thresholds for strategies

We compare the payoffs that the observer expects to receive using different strategies. Define

Figure

Expected payoffs of the observer who uses the strategies

The expected payoff of the observer who uses the Bayes' strategy

Table

Main characteristics of the best-choice process for

Characteristic | Strategy | Strategy | Strategy |
---|---|---|---|

Expected payoff | 10.035 | 10.429 | 10.500 |

Average time of accepting the observation | 14.526 | 2.472 | 3.072 |

Average number of steps after the disorder | 30.406 | 4.503 | 5.031 |

Number of the values accepted before the disorder, % | 64.100 | 83.066 | 79.738 |

For the small probability of the disorder (

Table

Consider the example of the exponential distribution of the observations. Let

Table

The values of the thresholds of strategies

Strategy | Strategy | |
---|---|---|

0.99 | 6.756 | 3.378 |

0.9 | 3.358 | 1.679 |

The value of the optimal threshold of the strategy

Main characteristics of the best-choice process for

Characteristic | Strategy | Strategy | Strategy |
---|---|---|---|

Expected payoff | 2.355 | 4.438 | 4.499 |

Average time of accepting the observation | 678.930 | 15.397 | 16.923 |

Average number of steps after the disorder | 856.535 | 29.110 | 29.610 |

Number of the values accepted before the disorder, % | 21.57 | 70.89 | 56.01 |

As in the previous example, the Bayes' strategy gives better payoff than the strategy

In the article, we consider the best-choice problem with disorder and imperfect observations. We propose the Bayes' strategy where the threshold depends on the

The paper is supported by grants of Russian Fund for Basic Research, Project 10-01-00089-a and Division of Mathematical Sciences, Program “Mathematical and algorithmic Problems of New Information Systems”.