Let X(t) be a controlled one-dimensional standard Brownian motion starting from x∈(−d,d). The problem of optimally controlling X(t) until |X(t)|=d for the first time is solved explicitly in a particular case. The maximal value that the instantaneous reward given for survival in (−d,d) can take is determined.
1. Introduction
Consider the one-dimensional controlled standard Brownian motion process {X(t),t≥0} defined by the stochastic differential equation dX(t)=b0[X(t)]ku[X(t)]dt+dB(t),
where u is the control variable, b0>0, k∈{0,1,…} and {B(t),t≥0} is a standard Brownian motion. Assume that X(0)=x∈(-d,d) and define the first passage time T(x)=inf{t>0:|X(t)|=d∣X(0)=x}.
Our aim is to find the control u* that minimizes the expected value of the cost function J(x)=∫0T(x){12q0u[X(t)]2-λ}dt,
where q0 and λ are positive constants.
In the case when k=0, Lefebvre and Whittle [1] were able to find the optimal control u* by making use of a theorem in Whittle [2, page 289] that enables us to express the value function F(x):=infu[X(t)],0≤t≤T(x)E[J(x)]
in terms of a mathematical expectation for the uncontrolled Brownian motion {B(t),t≥0} obtained by setting u≡0 in (1.1). Moreover, Lefebvre [3] has also obtained the value of u* when k=0 if the cost function J in (1.3) is replaced by J1(x)=∫0T(x){12q0X2(t)u2[X(t)]+λ}dt.
Although we cannot appeal to the theorem in Whittle [2] in that case, the author was able to express the function F(x) in terms of a mathematical expectation for an uncontrolled geometric Brownian motion.
In Section 2, we will find u* when k=1. The problem cannot then be reduced to the computation of a mathematical expectation for an uncontrolled diffusion process. Therefore, we will instead find the optimal control by considering the appropriate dynamic programming equation. Moreover, if the instantaneous reward λ given for survival in the interval (-d,d) is too large, then the value function F(x) becomes infinite. We will determine the maximal value that λ can take in Section 3.
2. Optimal Control
The value function F(x) satisfies the following dynamic programming equation: infu(x){12q0u2(x)-λ+[b0xu(x)]F′(x)+12F′′(x)}=0.
It follows that the optimal control is given by u*(x)=-b0q0xF′(x).
Substituting this value into (2.1), we find that we must solve the nonlinear ordinary differential equation -λ-b022q0x2[F′(x)]2+12F′′(x)=0.
The boundary conditions are F(d)=F(-d)=0.
Next, let δ=2λq0b0.
Making use of a mathematical software program, we find that the solution of (2.3) can be expressed as F(x)=-2λq0b0∫-dxc1Y-1/4(δz2/2)+J-1/4(δz2/2)z[c1Y3/4(δz2/2)+J3/4(δz2/2)]dz,
where Jν and Yν are Bessel functions and c1 is a constant that must be chosen so that F(d)=0. Unfortunately, it seems very difficult to evaluate the integral explicitly. Notice however that actually we do not need to find F(x), but only F′(x) to determine the optimal value of the control variable u.
We will prove the following proposition.
Proposition 2.1.
The control u*(x) that minimizes the expected value of the cost function J(x) defined in (1.3), when k=1 in (1.1), is given by
u*(x)=-2λq0J1/4((λ/2q0)b0x2)J-3/4((λ/2q0)b0x2) for -d<x<d.
Proof.
We deduce from (2.6) that
F′(x)=-2λq0b0c1Y-1/4(δx2/2)+J-1/4(δx2/2)x[c1Y3/4(δx2/2)+J3/4(δx2/2)].
Moreover, from the formula (see Abramowitz and Stegun [4, page 358])
Yν(z)=Jν(z)cos(νπ)-J-ν(z)sin(νπ),
which is valid for ν≠-1,-2,…, we find that the function F′(x) may be rewritten as
F′(x)=-2λq0b0x(1-c1)J-1/4(δx2/2)+2c1J1/4(δx2/2)(1-c1)J3/4(δx2/2)-2c1J-3/4(δx2/2).
Now, because the optimizer is trying to maximize the time spent by X(t) in the interval (-d,d), taking the quadratic control costs into account, we can assert, by symmetry, that u*(x) should be equal to zero when x=0. One can check that it is indeed the case for any value of the constant c1. Furthermore, the function F(x) must have a minimum (that is, a maximum in absolute value) at x=0, so that F′(0)=0 as well.
With the help of the formula (see Abramowitz and Stegun [5, page 360])Jν(z)~(12z)ν1Γ(ν+1)
if z→0 and ν≠-1,-2,…, we find that
limx→0F′(x)=2λq0b0(1-c1)c1δ1/222Γ(1/4)Γ(3/4).
Hence, we deduce that the constant c1 must be equal to 1, so that
F′(x)=2λq0b0xJ1/4(δx2/2)J-3/4(δx2/2).
Formula (2.7) for the optimal control then follows at once from (2.2).
3. Maximal Value of λ
Because the optimizer wants X(t) to remain in the interval (-d,d) as long as possible and because u[X(t)] is multiplied by b0X(t) (with b0>0) in (1.1), we can state that the optimal control u*(x) should always be negative when x≠0. However, if we plot u* against x for particular values of the constants λ, b0, q0, and d, we find that it is sometimes positive. This is due to the fact that the formula in Proposition 2.1 is actually only valid for λ less than a critical value λcrit. This λcrit depends on the other parameters. Conversely, if we fix the value of λ, then we can find the largest value that d can take.
One way to determine λcrit is to find out for what value of λ the value function becomes infinite. However, because we were not able to obtain an explicit expression for F(x) (without an integral sign), we must proceed differently.
Another way that can be used to obtain the value of λcrit is to determine the smallest value of x (positive) for which the denominator in (2.13) vanishes.
Let f(x)=xJ-3/4(x2/2).
Using a mathematical software program, we find that f(x)=0 at (approximately) x=1.455. Hence, we deduce that we must have δd≃1.455.
We can now state the following proposition.
Proposition 3.1.
For fixed values of b0, q0, and d, the value of λcrit is given by
λcrit≃(1.455d)4q02b02.
To conclude, one gives the value of dmax when λ is fixed.
Corollary 3.2.
For fixed values of b0, q0, and λ, the maximal value that d can take is
dmax≃1.455δ=1.455(q02λb0)1/2.
4. Conclusion
We have solved explicitly a problem of the type that Whittle [2] termed “LQG homing.” Actually, the expression “LQG homing” corresponds to the case when the parameter λ in the cost function J(x) is negative, so that the optimizer wants the controlled process to leave the continuation region as soon as possible.
The author has studied LQG homing problems in a number of papers (see Lefebvre [4], in particular). They were also considered recently by Makasu [6]. However, in the present paper, we did not appeal to the theorem in Whittle [2] to obtain the optimal solution by reducing the optimal control problem to a purely probabilistic problem. Although this interpretation is very interesting, it only works when a certain relation holds between the noise and control terms. In practice, the relation in question is seldom verified in more than one dimension.
We could determine the optimal control when the parameter λ in (1.3) is negative, so that the optimizer wants X(t) to leave the interval (-d,d) as soon as possible. This time, the function F(x) would have a maximum at x=0. We could also consider other values of the constant k in (1.1). However, the value k=1 is probably the most important one in practice, apart from k=0.
LefebvreM.WhittleP.Survival optimization for a dynamic system1988121101119946270WhittleP.1982Chichester, UKJohn Wiley & Sonsxi+317Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics670949LefebvreM.Using a geometric Brownian motion to control a Brownian motion and vice versa1997691718210.1016/S0304-4149(97)00040-91464175LefebvreM.A homing problem for diffusion processes with control-dependent variance200414278679510.1214/1050516040000001072052902AbramowitzM.StegunI. A.1965New York, NY, USADoverMakasuC.cloud.makasu@univen.ac.zaRisk-sensitive control for a class of homing problems200945102454245510.1016/j.automatica.2009.06.015