Convergence in distribution of some self-interacting diffusions

. The present paper is concerned with some self-interacting diﬀusions ( X t , t ≥ 0) living on R d . These diﬀusions are solutions to stochastic diﬀerential equations: where µ t is the empirical mean of the process X , V is an asymptotically strictly convex potential and g is a given positive function. We study the asymptotic behaviour of X for three diﬀerent families of functions g . If g ( t ) = k log t with k small enough, then the process X converges in distribution towards the global minima of V , whereas if tg ( t ) → c ∈ ]0 , + ∞ ] or if g ( t ) → g ( ∞ ) ∈ [0 , + ∞ [, then X converges in distribution iif (cid:82) xe x ) d x = 0.


Introduction
The aim of this paper is to obtain necessary and sufficient conditions for the convergence in distribution of a selfinteracting diffusion living on R  .Consider a smooth potential  : R  → R + and a map  : R + → R + .We study the asymptotic behaviour of the self-interacting diffusion  given by where  is a standard Brownian motion and   denotes the empirical mean of the process : This is a model of reinforcement that could be used to represent the (simplified) behaviour of some social insects.Some insects, as ants, mark their paths with pheromones.This serves as a guide for other ants to return to the nest.The trail of pheromones is denoted by  and its evaporation by .Despite this evaporation, the path is reinforced and the insects gradually manage to find the best route.
The same model has been already studied by Chambeu and Kurtzmann [1], in case of an unbounded increasing function .The authors have proven that, under certain conditions, the process satisfies a kind of pointwise ergodic theorem and that if  admits a unique minimum at 0, then   converges almost surely.In this paper, we do not suppose that either  increases to the infinity nor that  admits a unique minimum at 0. This will obviously change the asymptotic behaviour of , even if  will converge in distribution in most of the cases.We will essentially use two different techniques here.The first one is the well-known theory of simulated annealing, which has been developed a lot since the 80s with a huge literature, whereas the second one is simply a change of scale added to a change of "speed measure." Let us explain briefly the simulated annealing method.An important question for physical systems is to find the globally minimum energy states of the system.Experimentally, the ground states are reached by chemical annealing.One first melts a substance and then cools it slowly, being careful to pass slowly through the freezing temperature.If the temperature decreases too rapidly, then the system does not end up in a ground state, but in a local nonglobal minimum.On the other hand, if the temperature decreases too slowly, then the system approaches the ground states very slowly.The competition between these two effects determines the optimal speed of cooling, that is, the annealing schedule.The study of simulated annealing has involved the theory of nonhomogeneous Markov chains and diffusion processes, large deviation theory, spectral analysis of operators, and singular perturbation theory.Pioneering work was done by Freidlin and Wentzell [2].The initial problem consists in finding the global minima of a given function .Actually, one has to study the diffusion Markov process   in R  given by the Langevin-type Markov diffusion    =    − ∇(  ).If the temperature  is constant for a sufficiently large amount of time, then the process   and the fixed temperature process behave approximatively the same at the end of that time interval.The optimal annealing schedule, that is,  for the convergence criterion P(   ∈ Min) where Min denotes the set of all the global minima of , was first determined by Hajek [3] for a finite state space.Chiang et al. [4] studied the convergence rate of P  (   ∈ ⋅) via the large deviations of the transition density of   .They were one of the first to show the convergence of the algorithm of the simulated annealing for  2  = / log , for  large enough, related to the second eigenvalue of the corresponding (to   ) infinitesimal generator.Finally, Holley and Stroock [5] initiated a new method and proved, in the discrete case, the convergence of the simulated annealing algorithm via the Sobolev inequality.They went further in their study with Holley et al. [6].Later, Miclo [7] proved, through some functional inequalities, that the free energy (that is, the relative entropy of the distribution of the process at time  with respect to the invariant probability measure for the elliptic operator considered as a time-homogeneous operator by fixing ) satisfies a differential inequality, which implies (under some decreasing evolution of the temperature to zero) the convergence of the process to the global minima of the potential.And if the temperature  decreases too fast to zero, then the potential can freeze in a local minimum (depending on the initial condition) and so the process converges to this local minimum.
We begin to study the R  -valued Markov process   :=   −   , which satisfies the following SDE: We will adapt the simulated annealing method to  for functions  large enough (that is,  does not go to zero) to prove the convergence in distribution of .
We wish to point out that a one-dimensional Brownian motion in a time-dependent potential has been recently studied by Gradinaru and Offret [8]: with  ,, (, ) = (/( + 1))(|| +1 /  )1  ̸ = −1 + (log ||/  )1 =−1 and  0 , , ,  ∈ R.This is quite close in spirit to the study of our process , even if the authors suppose in [8] that both  and 1/ are polynomial.They obtain conditions for the recurrence, transience, and convergence of the studied process .We refer to the survey of Ivanov et al. [9] for the existence and uniqueness of solutions to such equations.In the present paper, we do not suppose that  is polynomial and the dimension is  ≥ 1 and thus we obtain less precise results.
The remainder of the paper is organized as follows.First, in Section 2, we introduce some useful tools, such as the logarithmic Sobolev inequality and the Kullback information.They both will be needed for the simulated annealing study.Section 3 is devoted to the simulated annealing method in the case when  behaves asymptotically as  log .In this part, we will prove the (pointwise) ergodicity of the process  and the convergence in distribution of , depending on the potential .Finally, Section 4 deals with the convergence in distribution of  when () → +∞ and () →  ≥ 0, depending on the asymptotics of .

Some Useful Tools
2.1.Assumptions and Existence.In the following, (⋅, ⋅) denotes the Euclidean scalar product.We denote by P(R  ) the set of probability measures on R  .We denote by  the function () = ∫  0 ().We assume that the mapping  : R + → R + is C 1 (R + ).The precise hypothesis on  will be given at the beginning of each section.
In the sequel, the technical assumptions on the potential  : R  → R + are the following: (1) (regularity and positivity)  ∈ C 2 (R  ) and  ≥ 0; (2) (convexity)  =  + , where  is   (>0)-strictly uniformly convex and  is a compactly supported function and there exists   > 0 such that ∇ is   -Lipschitz; (3) (growth) there exists  > 0 such that for all  ∈ R  , we have We also assume that  has a finite number of critical points.Let Max = { 1 ,  2 , . . .,   } be the set of the saddle points and local maxima of  and let Min = { 1 ,  2 , . . .,   } be the set of the local minima of , such that the Hessian matrix is nondegenerate for all local minimum.Without any loss of generality, we suppose that min  = 0.
Remark 1.The case  of quadratic growth is excluded here, as it has been fully studied in [1].
Let us first prove the global strong existence and uniqueness of the process .Proposition 2. For any  ∈ R  ,  ∈ P(R  ), there exists a unique global strong solution (  ,  ≥ 0) of (1).
Proof.The local existence and uniqueness of such a process is standard.We only need to prove here that , hence,  (because   :=  +   + ∫ a finite time.To this aim, we apply Itô's formula to the function   → (): and introduce the sequence of stopping times  0 = 0 and By the convexity condition, we have (∇(), ) → and by the condition (5), there exists  > 0 such that E( ∧  ) ≤ E( 0 ) + .

Kullback Information
Definition 5. We define the free energy (up to an additive constant), known as the relative Kullback information, of a probability measure ] with respect to a probability measure Π by If we suppose that ] (resp., Π) has the density ] (resp., ) with respect to the Lebesgue measure , then one has In this paper, we will first prove the decrease to zero of the relative free energy of the law of   with respect to Π ,  .The classical Csiszár-Kullback-Pinsker inequality relates the total variation norm to the free energy in the following way (see for instance [5]): So as the total variation norm metrizes the convergence in distribution, once we have proven that the measure Π ,  converges weakly to a measure Π and (  | Π ,  ) goes to zero, then the distribution of   converges to Π.As   is the time-shifted process   , we obtain this way that  converges in distribution to Π.
Our strategy to show that (  |  ,  ) goes to zero is the following.To shorten notation, let   := ( 0 ,  0 , , ⋅) be the distribution law of the process   conditioned on   0 =  0 .We recall that the family of probability measures (Π ,  ,  ≥ 0) satisfies a Sobolev logarithmic inequality LSI(()).We have also Π ,  () =  ,  ()().So, we choose ℎ  = √  / Definition 6.The process  is an asymptotic pseudotrajectory for the flow  if for all  > 0, It is shown in [12] that if  is an asymptotic pseudotrajectory for , then the -limit set of the flow generated by  is the same as the -limit set of the process .

Convergence in Distribution towards the Global Minima of
where we have defined   () := () + || 2 /2().Actually, we will prove that this nonhomogeneous Markov process converges in distribution to a measure that could correspond to its "invariant" probability measure.Of course, if we suppose that () ≡  and   ≡ , then the convergence in distribution is obvious.It happens that the spectral gap  appears naturally in our study.Heuristically, when the time is of order   −2  , the process is very close to the probability measure It remains to show the convergence of Π , when  goes to the infinity.

Lemma 7.
Let  > 0 be fixed.The probability measure Π , converges weakly, as Proof.We only need to recall that  2  () =  −1 () diverges with .More explicitly, the normalization constant is Let  be the compact set  := { | () ≤ 1}.There exists a constant  > 0 such that  is included in the ball centered in 0 and with radius .Then, on one hand, we get On the other hand we obtain But we know by the Laplace formula (see [14]) that where (  )  are the global minima of  (we recall that they form a finite set).As a consequence, By the same method, if  is a continuous function with compact support containing, for example, only the global minimum  1 , we have This gives the explicit form of lim  → 0 Π ∞, () = Π 0 ().
Consider for a moment Π ∞, .We remark that   converges to  when  goes to infinity.Hwang established in [14] that Π ∞, converges weakly when  converges to zero.Let  be the set of the global minima of .Hwang has proved the following: (i) if () > 0 (where  is the Lebesgue measure on R  ), then Π ∞, converges weakly to (1/())1  ; (iii) more generally, suppose that  is the finite union of some smooth manifolds (C 3 ) and each component is a compact connected smooth manifold and the determinant of the Hessian (normal to  in  ∈ ) det(∇ 2 ()) is not identically zero.Then, there exists a probability measure M, on the highest dimensional manifolds, such that Π ∞, converges weakly to We adapt to our setting the results of Hwang in the following proposition.
Proposition 8.The probability measure Π ,  converges weakly to Π 0 as  goes to infinity.Moreover, the probability measure Π 0 concentrates on the global minima of .
Proof.The result of Hwang shows that the probability measure Π ∞,  converges weakly to Π 0 as  goes to the infinity, and the probability measure Π 0 concentrates on the global minima of .We combine this result with Lemma 7 to prove the proposition.
In order to show that  converges in distribution to a measure supported on the global minima of , we need two more technical results.We mix the approaches initiated by Holley et al. [6] and Miclo [7].Indeed, we will use some functional inequalities and show that the free energy (corresponding to our process) decreases.We suppose in the following that  ∘  −1 () = log / for some  sufficiently large (and the same proof actually reads when  ∘  −1 () is asymptotically equivalent to log /).Proof.Let  := sup{|| 2 ; () ≤ 1}.For any path , we easily have   () ≤  ∞ () + /().Then, by definition of   , we get As a consequence, there exists  > 0 such that and the result follows.
A very important theorem permits one to relate the height function to the second eigenvalue of the infinitesimal generator of   (i.e., the constant involved in the spectral gap inequality).
Proof.Hölder's inequality implies that the logarithmic Sobolev constant is smaller than the inverse of the spectral gap constant in Theorem 12.
We will now use some functional inequalities in order to prove the convergence of   (and thus   ) towards the global minima of .Let (, , , ) denote the density of the semigroup corresponding to the nonhomogeneous Markov process .Theorem 14. Suppose that  2  = / log , where  > 2.Then, for all initial  0 ,  0 , the free energy (( 0 ,  0 , , ⋅) | Π ,  ) converges to 0 as  goes to the infinity.
To prove Theorem 14, we need the three following technical results.We will first state them all, postponing there proofs, and deduce from them the latter Theorem 14.Let us state the first technical result.
Lemma 16 (Miclo,[7,Lemma 6]).Let  : [0, ∞[ → R + be a continuous function such that almost surely where  and  are two continuous nonnegative functions such that We now need a technical lemma to conclude that the free energy converges to 0. Lemma 17.For all  ≥ 0, the quantity ⟨|| 2 ⟩ Π ,  is bounded.
We are now ready to prove Theorem 14.
Proof of Theorem 14.Let  0 ≥ 0 and  0 ∈ R  .Consider the process   , solution to the SDE We can rewrite the result of Proposition 15 in the following way, where we remind that   = ( 0 ,  0 , , ⋅) denotes the distribution law of the process  conditioned on We remind the reader that () ≥ || 2 out of a compact set and it is proved in [1] that E(  ) = (1).We therefore have E  (  ) =  (1).Moreover, the function   → () is nondecreasing, while   →   is nonincreasing.Thus, as We now use Lemma 16.We easily compute the timederivative of (): Using the explicit expression of   ; that is  2  = / log , we have As  −1 () is a nondecreasing function and because of the hypothesis on , the first term converges to 0 when  goes to the infinity.For the second term, we recall that log ()/() is bounded and so because () = ( 2 ).Lemma 16 asserts that if  satisfies For  2  = / log  with the given condition on the constant , we meet the required conditions and the result follows.
Let us now prove Proposition 15 and Lemma 17.
Proof of Proposition 15.To shorten notation, let   := ( 0 ,  0 , , ⋅) be the distribution law of the process   , knowing that   0 =  0 .We recall that the family of probability measures (Π ,  ,  ≥ 0) satisfies a logarithmic Sobolev inequality LSI(()).We also have Π ,  () =  ,  ()().Define ℎ  , such that ∫ ℎ 2  Π ,  = 1: By Corollary 13, there exists a constant () such that We now have to compute the derivative of ℎ  : We put this last estimate in the preceding inequality (43) and thus We recall that we are looking for an inequality including the time-derivative of the free energy .We have Our strategy is to find an upper bound for the two terms on the right-hand side.The Kolmogorov forward equation reads We also remark that we have the following estimates: where we have used the usual notation ⟨⟩ Π ,  = ∫ Π ,  .Moreover, we also find On the other hand, we obtain the following equality for the second integral involved in the time-derivative of : We put all the pieces together and this leads to the result.
Proof of Lemma 17.Let  be the compact set  := {; () ≤ }, where  is a given positive constant.As Π ,  converges weakly to Π 0 , we only need to prove that ⟨|| 2 1   ⟩ Π ,  is bounded.We have By Proposition 8, we know that (,   ) ∼ , and so there exists a positive constant C such that We will now describe the law of the limit process  ∞ .
Remark 19.It is known since the work of Freidlin and Wentzell [2] that the Gibbs measure Π ,  satisfies a large deviation principle.Therefore, the speed of convergence of Π ,  toward Π 0 is exponential ( − log /2 =  −1/2 ).
Corollary 20.Suppose that  2  = / log , where  > 2.Then the process  converges in distribution to a random variable which concentrates on the global minima of .Thus, the process  converges in distribution to a random variable  ∞ , which concentrates on the global minima of .
Remark 21.The function  is supposed to decrease slowly to zero.This is why we obtain the convergence of  to the global minima of .But if  goes too fast to zero, that is, lim  → ∞ () −1 log () =  with  ≤ 2, then  may freeze in a local minimum.So,  does not converge in that case.

Study of 𝑋.
We give necessary and sufficient conditions for the convergence in distribution of .As usual, we start to work with the process   =   −  .In order to link this section with the preceding one, we recall that  2  = ( ∘  −1 ()) −1 = / log .It implies that we consider functions  such that (asymptotically) log () = ().
Let us first recall a former result.
The process  satisfies the pointwise ergodic theorem.This means that almost surely the empirical measure of  converges weakly to a random measure, which is a convex combination of Dirac measures taken in the minimal points of .More precisely, there exist   ≥ 0 such that We are now able to conclude the study of the asymptotic behaviour of the process .
Theorem 23.Suppose that lim  → ∞ () −1 log () =  > 2.Then one of the following holds.  → 0, and we now need to find the rate of convergence in order to conclude the proof.Moreover, by [1, Proposition 5.3], we know that the speed of convergence of the empirical mean of the time-changed process   −1 () is  −1 (1 + ) −  −1 ().But we are looking for the speed of convergence for   itself.By an integration by part, we obtain that Corollary 20 implies that the first right-hand term converges in distribution to 0 because () ≤ ().So it converges in probability to 0. It remains to prove the convergence of the second term.We recall that, up to a multiplicative positive constant, ∘ −1 () = log(2+).Moreover, we also know that

Convergence in Distribution of 𝑋 to a Random Variable
In this Section, we will prove that if  converges to 1 or 0 slowly enough, then the process  converges in distribution to an identified limit.We will first study the case  ≡ 1 and prove rigorously the convergence of   .Then, we will consider the case () → 0 and () → +∞.The proof of the convergence of   will be exactly the same as in the case  = 1 and so we will not reproduce it.Nevertheless, the convergence of  will be interesting and Section 4.2 is essentially devoted to its proof.

4.1.
If  Converges toward a Positive Constant.In this part, we suppose that  converges toward a positive constant, so that its primitive  goes to the infinity.In that case, we will show that the asymptotic behaviour of  is very close to the behaviour of , solution to Without any loss of generality, we suppose that () = 1 for  large enough.Actually, we will prove that  converges toward a random variable of law Π() = ( −2() /).
(Remark that the normalization constant  is well defined as  is strictly convex out of a compact set.)To this aim, we will use the exponential decrease to zero of the relative Kullback information between the law of   −1 () and Π.Once this is done, we study the convergence of the mean (1/) ∫  0   .Indeed, we will prove that the latter integral converges if and only if ∫ Π() = 0.
Theorem 24.  −  0 converges in distribution to  ∞ if and only if ∫  −2()  = 0.In that case,  ∞ has the distribution law  −2 /.The proof of this statement will be decomposed into several propositions and lemmas.We first present them all, postponing their proofs.Then, we deduce from them Theorem 24.Finally, we prove these intermediate results.
Let us state the first of the propositions mentioned, the one showing that the time-shifted process (  −1 () )  , and so (  )  converges in distribution.
Proposition 25.The process (  )  converges in distribution to a random variable  ∞ .The distribution law of  ∞ is  −2 /.We are now ready to prove Theorem 24.
Proof of Proposition 25.We will show that the process (  )  converges in distribution to  ∞ .Let   denote the law of   .By Lemma 4, the probability measure Π =  −2 / (where  denotes the normalization constant of Π) satisfies a logarithmic Sobolev inequality LSI( LS ).By inequality (16), we know that As ∇(√  /Π) = √  (  /2)(∇  /  + 2∇), we deduce that Moreover, by definition of the relative Kullback information, it is clear that (/)(  | Π) = ∫ ṗ  log(  /Π).The Kolmogorov-forward equation also reads and putting this last estimate in the previous time-derivative equation of , we have So (  | Π) converges to zero exponentially fast.This means that ‖  − Π‖ 2 TV ≤ 2(  | Π) → 0; that is,   converges in distribution toward a random variable  ∞ .The distribution law of  ∞ is Π and the speed of convergence is exponential.
Proof of Proposition 26.Let   := (1/) ∫  0   .We have to show that   converges almost surely to  ∞ = ∫ Π().First, we decompose   in the following way: We then have Let us introduce the positive recurrent Kolmogorov process (  ,  ≥ 0), solution to   =   − ∇(  ).The invariant Journal of Probability and Statistics probability measure associated with  is precisely Π.As  is pointwise ergodic, we have for all ℎ ∈  1 (Π) with an exponential speed of convergence (see for instance [16]).
Let us now prove the almost sure convergence of   .We have (66) We will now need the following technical result.
Assuming the validity of this statement, the process (   )  is an asymptotic pseudotrajectory for the flow (/)  () =  ∞ −  (),  0 () = .The flow induced by  admits a unique limit point { ∞ }, which is exponentially attracted.Thus,   converges almost surely to  ∞ (with an exponential speed of convergence).
Let us now estimate the distance between  +  and  +  , knowing that    =    .As  is strictly convex and ∇ is   -Lipschitz, we obtain the following inequality: To prove Proposition 26, we use the decomposition (64).It is obvious from that decomposition that if  ∞ = ∫ Π() ̸ = 0, then   does not converge and in that case   ∼  ∞ log .Suppose now that  ∞ = 0.As and by ( 65 Proof of Lemma 27.Let  ≥ 0. We have Let us prove that the drift term  +  /( +   ) is negligible for  large enough.Let  ≥ 0. For any 0 ≤  ≤ , we have where    =    .We emphasize that  and  are driven by the same Brownian motion.We have already proved in (70) that the first right-hand term of (73) converges (exponentially fast) to 0. Let us now study the most right-hand side of (73).An integration by parts leads to The ergodicity (65) of  implies directly that (1/ ))V| = 0 almost surely.Finally, putting all the pieces together we have shown that    is an asymptotic pseudotrajectory for the flow generated by .

If 𝑔(𝑡) Goes to Zero and 𝑡𝑔(𝑡) Goes to 𝑐 ∈ ]0,+∞].
The technique we adopt here is a change of scale added to a change of measure.This is useful as soon as we wish to study the asymptotic or ergodic behaviour of a nonhomogeneous process, as it usually permits to "reduce" to the homogeneous case.
We also remark that this result is coherent with the basic Ornstein-Uhlenbeck case.

If 𝑡𝑔(𝑡)
Goes to the Infinity.This study will be divided into two different cases.First, we suppose that there exists 0 <  < 1 such that   () goes to a positive constant.Whereas in the second case,   () goes to the infinity for any 0 <  < 1 (this is for instance satisfied by () = 1/ log ).The first part of the study is identical for the two cases and we will only divide the end of the study.
Let us sketch the proof of Theorem 29 that is postponed to the end of the paragraph.As in the preceding paragraph, we first suppose that () = || 2 .Define  as the positive increasing solution to  ∘ () = (  ()) − .Consider the time and scale-changed process Ỹ defined by Ỹ :=  () /√  ().Proof of Theorem 29.Lemma 30 proves that (1/) ∫  0 Ỹ  converges almost surely to 0 if and only ∫  −2()  = 0.Moreover, we see by a similar equation (88) that Ỹ is an asymptotic pseudotrajectory for Ŷ with the speed of convergence 1/√().As Ŷ converges to its invariant probability measure  −2()  with an exponential speed of convergence, we find that (1/) ∫  0 Ỹ  converges almost surely to 0 if and only if ∫  −2()  = 0 and in that case, we have the following result, depending on the function .(2) Suppose now that   () → ∞ for all 0 <  < 1.
The probability measure  satisfies the logarithmic Sobolev inequality, with the constant  LS denoted by LSI( LS ), if for all function ℎ ∈  2 (), we have ,  satisfying ∫ ℎ 2  Π ,  = 1 and we will show in Corollary 13 the existence of () > 0 such that  (  | Π ,  ) = ∫   log 2 , the two terms E( V  (  )) and ⟨  ⟩ Π ,  do not play any role in the upper bound.It now remains to find a upper bound for ⟨ V  ⟩ Π ,  .To this aim, we use Lemma 17.Indeed, there exist two positive constants  1 ,  2 such that =   converges in distribution.And by Corollary 20,   converges in distribution to  ∞ which concentrates on the global minima of .So,   =   −   converges in probability to 0.To conclude, if  satisfies ∑ 1≤≤     ̸ = 0, then   = ∫  0   (/) does not converge and so   diverges.