Bayesian Adaptive Estimation with Theoretical Bound: An Exploration-Exploitation Approach

This paper investigates the theoretical bound to reduce the parameter uncertainty in Bayesian adaptive estimation for psychometric functions and proposes an exploration-exploitation (E-E) approach to improve the computation efficiency for parameter estimations. When the experimental trial goes on, the uncertainty of the parameters decreases dramatically and the space between the maximal mutual information and the theoretical bound gets narrower, so the advantage of classical Bayesian adaptive estimation algorithm diminishes. This approach tries to trade off the exploration (parameter posterior uncertainty) and the exploitation (parameter mean estimation). The experimental results show that the proposed E-E approach estimates parameters for psychometric functions with same convergence and reduces the computation time by more than 34.27%, compared with the classical Bayesian adaptive estimation.


Introduction
Bayesian adaptive estimation plays an important role in certain parameter estimations of psychometric functions [1][2][3][4][5]. In psychophysics, psychometric function refects the quantitative relationship between physical stimulation and subject's psychological perception [2]. Watson and Pelli frst applied the QUEST method in psychophysics [4]. Gradually, Bayesian adaptive estimation has been developed and widely used in psychophysics, behavioral and neural sciences [1,6], clinical felds [7,8], etc. It sequentially selects the stimulus, in the way of minimizing the uncertainty of parameters, and then updates the parameter prior distribution, to efectively estimate the parameters.
More and more practical experiments are undertaken online [9] (e.g., the research of driving behaviors [6,10], clinical [7,8,11], and visual perception [12][13][14]). Terefore, the challenge faced by the researchers is the computation efciency in optimizing the stimulus after collecting subject's data during typical psychophysical experiments [10]. One way is to estimate multidimensional parameters simultaneously. Kontsevich and Tyler proposed the Ψ method to estimate two-dimensional parameters [5], and Kujala and Lukka applied this method to more general psychometric functions [1,2,10]. Ten, psychometric functions with higher dimensional parameters were estimated, such as fourdimensional parameters in the contrast sensitivity function and driving gap acceptance function [6,13]. Furthermore, Watson extended the QUEST method to estimate psychometric parameters with multiple dimensions [15]. On the other hand, it is well known that the optimization algorithm of Bayesian adaptive estimation considers to make full use of the information contained in parameter distribution and emphasizes the convergence of the estimation. Kuss et al. discussed the importance of parameter prior distributions to extract the information contained in experimental data [3]. Te authors in [16][17][18][19][20] considered the estimation deviation of parameters by efectively using the limited measurement information to improve the estimation efciency.
However, in each implementation trial of Bayesian adaptive estimation, the optimization algorithm selects the most informative stimulus, by searching the parameter space of the psychometric function [1,6,13]. If the parameter dimension increases, the time complexity of the stimulus selection increases exponentially. Te upper bound of the information gained from the optimization algorithm [21,22] has not been well studied, and how the theoretical bound impacts the stimulus selection as well as the computation efciency needs further investigation. Furthermore, the MSE curves of the estimated parameters in experiments usually become almost level after some trials [6], which is also desirable to be explained.
Tis paper investigates the theoretical upper bound of the information gain of the parameters resulting from the optimization algorithm of Bayesian adaptive estimation. Tis bound theoretically decides how much information the estimation algorithm can gain trial by trial and explains why the advantage of information gain from Bayesian adaptive estimation diminishes with the decrease of the uncertainty of the parameter distribution. Terefore, this paper proposes the exploration-exploitation (E-E) approach to improve classical Bayesian adaptive estimation by selecting the stimulus randomly once the low-parameter uncertainty is detected, from the perspective of machine learning [21,[23][24][25]. Te proposed approach tries to trade of the exploration (parameter posterior uncertainty) and the exploitation (parameter mean estimation). It is not necessary for the exploitation trials to search the stimulus space and parameter space to calculate the maximal mutual information repeatedly and thus to improve the computation efciency substantially. Te proposed E-E approach is applied to two parameter estimation instances, contrast sensitivity function (CSF) and heterogeneous gap acceptance function (GAF). Experiment simulation results demonstrate that the computation time is saved by 34.74% for CSF and 34.27% for GAF with same MSE convergence. Tus, the proposed algorithm, compared to the classical Bayesian adaptive estimation, is more suitable for the practical online experiment implementations.

Psychometric Function.
In psychophysics, the psychometric function is used to describe the probability of psychological feedback after a certain stimulus is applied to the individual subject [2]. Usually, the psychometric function with multidimensional parameter θ ∈ R l is represented as Φ(y, d, θ), where d is the stimulus and y is the random binary feedback which indicates that the subject "rejects" or "accepts" the given stimulus. When given the stimulus d, the conditional probability p(y | θ, d) of the subject's feedback y can be expressed as Te objective is to estimate the subject's true parameter θ of the psychometric function in as few steps as possible, due to the cost of collecting the individual subject's data.

Bayesian Adaptive Estimation.
Bayesian adaptive estimation is mainly used to estimate the subject's parameter θ in psychometric function Φ(y, d, θ). Let Y be the random variable of the subject's feedback and Θ be the random variable of the parameters. Given the feedback space Y � 0, 1 { }, stimulus space D⊆R l′ , and parameter space Ξ⊆R l , it selects the most informative stimulus where p t (θ) is the parameter prior distribution for trial t and mutual information I(Θ; Y | d, p t (θ)) measures the information gain between parameter Θ and observation Y of the subject [1,10]. Observe the subject's feedback y t after applying stimulus d t . According to Bayes rule, update the parameter posterior probability: which is the prior probability for trial t + 1. Te details of Bayesian adaptive estimation algorithm can be found in [1,5], and the fowchart is shown in Figure 1 [6]. Te basic idea of Bayesian adaptive estimation is to fnd the most informative stimulus d t to gain the maximal information in each trial t and thus to reduce the parameter posterior uncertainty maximally trial by trial, according to equation (3). Currently, Bayesian adaptive estimation adopts the gridding method to discretize the parameter space [29]. Te parameters of psychometric function are estimated by mathematical expectation (MEAN) or maximum a posterior probability (MAP) of the parameter posterior [6,13,29,30].

Theoretical Bound of Information Gain
Previous experiments indicate that the parameter posterior tends to be peaky and the uncertainty of the parameters decreases, when the implementation of Bayesian adaptive estimation converges. Te parameter posterior distribution is concentrated towards the mean of the distribution. Moreover, Kujala [31] and Paninski [32] presented the asymptotic theory about the convergence of Bayesian adaptive estimation, i.e., the parameter posterior distribution is asymptotically normal [31]. Bayesian adaptive estimation selects the most informative stimulus d t to gain the maximal information. However, this maximal mutual information has the upper bound, which decides that the space for the information gained from the trial t is limited. It is important to measure theoretically how much parameter uncertainty reduction or space for 2 Computational Intelligence and Neuroscience information gain can be anticipated by using this optimization strategy [1,10,21].
Given the psychometric function Φ(y, θ, d) and the conditional probability p(y | θ, d) in equation (1), the mutual information can be formulated as [1,2,6,32] For the symmetry, the mutual information can be rewritten as [2,33] where is defned as the entropy of the binary distribution with probability p and 1 − p [6].

Theorem 1. Let d be the stimulus and p(θ) be the prior distribution of parameter θ; then,
where Proof. For stimulus d, the following holds: Input: parameter space Ξ, initial parameter prior p 0 (θ), stimulus space D, threshold ε, experiment trial T. Output: estimated parameter θ.
Step 2: (5) and (7). Calculate mutual information Step 3: ) and compute parameter entropy H t (Θ). Apply the stimulus d t to the subject and observe the subject's response y t . Update the parameter prior distribution Step 2. Else, go to Step 4.
Step 4: Select d t ∈ D randomly. Apply the stimulus d t to the subject and observe the subject's response y t . Update the parameter where H(Θ | Y, d) is the conditional entropy of parameter θ.
Proof. By the theory of information cannot hurt [33,34], we can get Teorem 1 indicates that the mutual information I(Θ; Y | d, p(θ)) for any stimulus d will never be greater than the entropy H(Θ) of parameter θ, i.e., the information gained from the trial t will be less than the uncertainty of parameter θ, in the sequential decision of Bayesian adaptive estimation. Proposition 1 indicates that under the observations of the random subject's feedback Y, the parameter entropy H(Θ) can be reduced for any stimulus d and H(Θ) decreases monotonically and sequentially.
In the implementation of Bayesian adaptive estimation, the parameter posterior becomes peaky and asymptotically normal [31,32], i.e., the parameter posterior distribution is asymptotically normal such that the determinant of the posterior covariance in a certain neighborhood of the true subject parameter value is asymptotically minimal [32]. Tis can be explained by Proposition 1 that the uncertainty of parameters decreases monotonically. When the parameter uncertainty is low enough, the maximal mutual information will be close to the current parameter entropy. Te information gained from the maximal mutual information decreases gradually, and the advantage obtained from Bayesian adaptive estimation diminishes continuously. Tis can clarify why the MSE curves of the estimated parameters become almost level after some trials, which is mentioned in Introduction. On the other hand, the space to reduce the parameter uncertainty is narrow and the parameter uncertainty H(Θ) will continuously decrease with diferent stimulus from Proposition 1. So, diferent stimuli do not create much diference in the information gain from the Bayesian inference, especially in the MSE curves of the parameters. In this case, we can use the other strategy to select the stimulus instead of the most informative stimulus without hurting the accuracy of parameter estimation. □

Exploration-Exploitation Approach for Bayesian Adaptive Estimation
It should be noticed that the optimization algorithm of classical Bayesian adaptive estimation searches the parameter space and stimulus space to compute the maximal mutual information max d∈D I(Θ; Y | d, p t (θ)) for each trial t, by calling psychometric function repeatedly. According to Teorem 1, when the entropy of the parameters in the implementation of Bayesian adaptive estimation is low enough, the space to gain the information gets narrow dramatically. In this case, we can try the other strategy to select the stimulus to avoid the large computation. Tis paper proposes the exploration-exploitation (E-E) approach to generate the stimulus randomly to enhance the computation efciency, instead of the most informative stimulus in Bayesian adaptive estimation, when the low-parameter entropy is detected. Terefore, this proposed approach tries to trade of the exploration (parameter posterior uncertainty) and the exploitation (parameter mean estimation).

Exploration Based on Maximal Mutual Information.
For trial t, when the parameter distribution is still highly uncertain and Bayesian adaptive estimation has great advantages to explore the stimulus space, the maximal mutual information max d∈D I(Θ; Y | d, p t (θ)) is far away from the current bound H(Θ) and the algorithm chooses d t � max d∈D I(Θ; Y | d, p t (θ)) to gain the information maximally. Ten, observe the subject's response and update the parameter prior distribution by Bayesian inference.

Exploitation Based on Random Stimulus. For trial t,
when the parameter distribution has low uncertainty and the maximal mutual information is close to the bound, we carry out the exploitation strategy by randomly selecting one stimulus d t ∈ D, observe the subject's response, and update the parameter prior distribution. Because no searching in the stimulus space and parameter space is required, such strategy greatly improves the computation efciency. According to Proposition 1, this exploitation strategy will continuously reduce the parameter uncertainty H(Θ) and gradually sharpen the parameter distribution after Bayesian inference.

Algorithm for Exploration-Exploitation Approach.
Based on the above analysis, we propose the algorithm of the proposed E-E approach, the exploration-exploitation Bayesian adaptive estimation. To implement the algorithm, we adopt the threshold ε > 0 to the bound H(Θ) of the maximal mutual information. If H(Θ) > ε, the proposed algorithm selects stimulus through maximal mutual information; otherwise, the algorithm selects stimulus randomly.
To estimate the parameters of given psychometric function Φ(y, d, θ), all inputs of Algorithm 1 are initialized.
Step 2 of Algorithm 1 calculates the mutual information for all stimulus d in the current experimental trial.
Step 3 selects the most informative stimulus, by using exploration strategy to update the parameter prior for the next trial, and calculates the parameter entropy to decide whether to go to Step 4. Step 4 selects stimulus randomly, by using exploitation strategy to update the parameter prior. Te parameter estimator θ is calculated by the MEAN of the parameter posterior.
Te asymptotic theory presented by Paninski shows that the Bayesian adaptive estimation converges for psychometric functions [32]. It is well known that the convergence holds when choosing the stimulus d t randomly [32]. Tus, the E-E approach will fnally converge, no matter when to switch from the exploration procedure based on the maximal mutual information to exploitation procedure based on the random stimulus.

Experiment Simulations
To demonstrate the performance and computation efciency of the proposed E-E approach for Bayesian adaptive estimation, we conduct experiment simulations for the parameter estimation problems of contrast sensitivity function (CSF) and heterogeneous gap acceptance function (GAF). Te CSF and GAF are classic empirical models from the felds of vision [35] and transportation [28], respectively, and the Bayesian adaptive estimation method for CSF and GAF models was studied by Lesmes et al. [13] and Zhu and Zhang [6]. In this paper, we conduct computer simulations instead of real-word experiments. At each trial of the simulated experiment, the most informative design or random design for the parameter estimation is computed, and the subject's feedbacks are observed. Te performance of the proposed EE-BAE algorithm is compared with the classical Bayesian adaptive estimation algorithm, and both algorithms are implemented in MAT-LAB R2018a with CPU i5-10400F, RAM (16 GB DDR4), and GPU Nvidia GeForce RTX 2060s (8G).
In Bayesian adaptive estimation, the choice of parameter initial prior distribution greatly infuences the estimation convergence [2,3,6,13]. Tis paper focuses on the performance and computation efciency of the E-E approach with the theoretical bound. Terefore, to avoid the infuence of parameter initial prior distribution, the paper adopts the non-informative uniform prior [6] as the initial prior distribution for both the proposed E-E approach and the classical Bayesian adaptive estimation. To reduce the randomness efect in the simulations, each experiment is repeated for 5000 times. In order to make fair comparisons, all initial settings and gridding settings are set the same.
Te mean square error (MSE) [6,13] between the estimated parameter value and the true value is assessed as the criterion for both algorithms. Te MSE for the true parameter θ ture in the psychometric function is defned as where θ t � θ∈Ξ p t (θ) × θ is the estimator of parameter θ ture in trial t.

Contrast Sensitivity Function. Contrast sensitivity (CS) is
a clinical measure to predict the functional vision. Te parameter estimation problem of the contrast sensitivity function (CSF) mainly investigates how grating sensitivity varies with spatial frequency and contrast in the visual perception [13,35]. CSF can be represented as [13,35].
where μ � 4% and with the logarithmic sensitivity where κ � log 10 (2), β ′ � log 10 (2β 1 ). f and c are the stimuli, where f is the grating frequency and c is the grating contrast. θ � c max , f max , β 1 , δ 1 is the parameter vector to be estimated. Te subject has binary feedback y ∈ 0, 1 { }, where y � 1 indicates the correct response for the grating of frequency f and contrast c and y � 0 indicates the wrong response.
Te ranges of the parameters in CSF are set as c max ∈ [2,2000], f max ∈ [0.2, 100], β 1 ∈ [2, 128], and δ 1 ∈ [0.2, 3]. Te ranges of stimuli are set as f ∈ [log 10 0.2, log 10 36] and c ∈ [log 10 0.001, log 10 1] [13,35]. Te experiments are conducted by the grid searching. 20 grid points are set for each parameter, and 20 points are set for each stimulus (f and c. Te parameter estimation experiment simulations are conducted for 250 trials, and the threshold of the E-E approach value in the simulations is given as ε � 1.5 to apply the exploitation strategy in 110 trials. Te parameter entropy curves for H t (Θ) by the proposed E-E approach and classical Bayesian adaptive estimation can be seen in Figure 2. Te MSE performances of two approaches are compared (as shown in Figure 3). Figure 2 shows that H t (Θ) curves of two methods decrease monotonically as discussed in Proposition 1. Curves Computational Intelligence and Neuroscience of the E-E approach and classical Bayesian adaptive estimation decrease quickly. Tis means that the proposed E-E approach can efectively reduce the uncertainty of parameters. It is reasonable to see that the red line is a little higher than the black line after the exploitation strategy is applied because classical Bayesian adaptive estimation selects the most informative stimulus for all trials. Both the proposed E-E approach and the classical Bayesian adaptive estimation select the most informative stimulus when H t (Θ) > ε, and the E-E approach selects the stimulus randomly after H t (Θ) ≤ ε. Te MSE curves of the E-E approach and the classical Bayesian adaptive estimation converge and almost overlap as shown in Figure 3. Tis is slightly diferent from the H t (Θ) curves because we take the mean to compute the parameter estimator. Terefore, the proposed E-E approach trades of the parameter posterior uncertainty and the parameter mean estimation. Figure 3 shows that both the proposed E-E approach and the classical Bayesian adaptive estimation can accurately estimate all parameters in the CSF estimation, and the diference between their MSE performance is marginal. For 250 trials, the experiment of the E-E approach runs for 7.57 seconds, but the classical Bayesian adaptive estimation runs for 11.60 seconds. Te computation time of the E-E approach is substantially shortened by 34.74%.

Heterogeneous Gap Acceptance Function.
Te heterogeneous gap acceptance studies the driver's response (acceptance or rejection) to diferent driving gaps when crossing a trafc stream, to provide the driving propensity of the individual driver [6,28]. Miller's heterogeneous gap acceptance function with two parameters is represented as [6,28].  Computational Intelligence and Neuroscience Te stimulus d is the gap which driver faces, and Φ(•) is the standard cumulative normal probability function. Binary feedback y � 1 indicates that the driver accepts the gap d when facing it, and y � 0 indicates that the driver rejects d. θ � T cr , σ are the driver's parameters to be estimated.
Te ranges of the parameters of GAF are set as T cr ∈ [5,10] Te experiments are taken for 300 trials, and the threshold of the E-E approach is set as ε � 6.4 to apply the exploitation strategy in 160 trials. Te parameter uncertainty is compared in each experimental trial by the E-E approach and classical Bayesian adaptive estimation, as shown in Figure 4. Te MSE comparisons for GAF parameter estimation by two methods are shown in Figure 5. Figure 5shows that H t (Θ) in the parameter entropy of CSF between two methods decreases monotonically and quickly. Similar to the explanation for the results of CSF,   the red E-E approach line diverts a little higher from the black classical algorithm line. Figure 4 shows the performance comparisons of the parameter estimation of GAF between the E-E approach and the classical Bayesian adaptive estimation. Similar to the results of CSF experiments, the MSE curves of both E-E approach and classical Bayesian adaptive estimation converge and the MSE diference between two methods is minor (as shown in Figure 4). Te parameter estimation of T cr converges faster than parameter σ obviously. GAF has two parameters, and CSF has four parameters to be estimated. Te true parameter values for CSF are selected far away from the mean of initial prior distribution, and the true parameter values for GAF are selected close to the mean of initial prior distribution. So, the shapes of MSE curves of GAF are diferent from CSF, and the convergence of the estimations takes more trials. For 300 trials, the experiment of the E-E approach runs for 2.11 seconds, but the classical Bayesian adaptive estimation takes 3.21 seconds. Te computation time of the E-E approach is saved by 34.27%.

Conclusion
Te paper investigates the theoretical bound of the information gained from Bayesian adaptive estimation for the parameter estimation in psychometric functions. Te advantage to gain the information from classical Bayesian adaptive estimation is limited when the parameter posterior distribution gets peaky. Especially, the bound of the information gain gradually decreases when the estimation experimental trial goes on. Tus, the paper proposes the exploration-exploitation approach to accelerate the computation by selecting the stimulus randomly once the low-parameter uncertainty is detected and trades of the parameter posterior uncertainty and the parameter mean estimation. Te experiment simulation results, from the parameter estimations of psychometric functions CSF and GAF, indicate that the proposed approach improves the computation efciency by 34.74% for CSF and 34.27% for GAF with the same accuracy for estimations. Tis computation efciency is well suitable for online experiments. Te proposed exploration-exploitation approach for Bayesian adaptive estimation can be applied in parameter estimations of various psychometric functions in psychophysics. It can be also extended to behavioral and neural sciences and clinical and more felds using the idea of Bayesian adaptive estimation.

Data Availability
No underlying data were collected or produced in this study.

Conflicts of Interest
Te authors declare that there are no conficts of interest regarding the publication of this paper.