A Natural Diffusion Distance and Equivalence of Local Convergence and Local Equicontinuity for a General Symmetric Diffusion Semigroup

and Applied Analysis 3 X×X, with ρt(x, y) ≥ 0 for all x, y. We assume that ρ satisfies the semigroup property:


Introduction
Diffusion semigroups play an important role in analysis, both theoretical and applied.Diffusion semigroups include the heat semigroup and, more generally, as discussed in, e.g., [1], arise from considering large classes of elliptic secondorder (partial) differential operators on domains in Euclidean space or on manifolds.For examples of theoretical results involving diffusion semigroups, the interested reader may refer to Sturm [2] and Wu [3].Some recent applications of diffusion semigroups to dimensionality reduction, data representation, multiscale analysis of complex structures, and the definition and efficient computation of natural diffusion distances can be found in, e.g., [4][5][6][7][8][9][10][11].
A particular important issue in harmonic analysis is to connect the smoothness of a function with the speed of convergence of its diffused version to itself, in the limit as time goes to zero.For the Euclidean setting, see, for example, [12,13].In order to consider the smoothness of diffusing functions in more general settings, a distance defined in terms of the diffusion itself seems particularly appropriate.
Defining diffusion distances is of interest in applications as well.As discussed in [5], dimensionality reduction of data and the concomitant issue of finding structures in data are highly important objectives in the fields of information theory, statistics, machine learning, sampling theory, etc.It is often useful to organize the given data as nodes in a weighted graph, where the weights reflect local interaction between data points.Random walks, or diffusion, on graphs may then help understand the interactions among the data points at increasing distance scales.To even consider different distance scales, it is necessary to define an appropriate diffusion distance on the constructed data graph.

Abstract and Applied Analysis
In this paper, we consider a general symmetric diffusion semigroup {  } ≥0 on a topological space  with a positive -finite measure (i.e.,  is a countable union of measurable sets with finite measure), given, for  > 0, by an integral kernel operator:   () ≜ ∫    (, )().As part of their work in [7,11], Coifman and Leeb introduce a family of multiscale diffusion distances and establish quantitative results about the equivalence of a bounded function  being Lipschitz, and the rate of convergence of    to , as  → 0 + (we are discussing some of their results using a continuous time  for convenience; most of Coifman's and Leeb's derivations are done for dyadically discretized times.Moreover, most of the authors' results are in fact established without the assumption of symmetry and under the weaker condition than positivity of the kernel, namely, an appropriate  1 integrability statement (see [11])).To prove the implication that Lipschitz implies an appropriate estimate on the rate of convergence, Coifman and Leeb make a quantitative assumption about the decay of for their distances , namely, that for some  > 0. The authors show that their decay assumption holds for semigroups arising in many different settings (for which suitable decay and continuity assumptions are made on diffusion kernels relative to an intrinsic metric  of the underlying space), and even for some examples of nonsymmetric diffusion kernels.Coifman and Leeb also establish that (2) above, in the case of positive diffusion kernels, is in fact equivalent to their conclusion about the rate of convergence of    to , as  → 0 + , for a Lipschitz function .Additionally, Coifman and Leeb show that, in some of the settings they consider (with decay and continuity assumptions on the diffusion kernels relative to an intrinsic metric), their multiscale diffusion distance is equivalent to (localized) (, )  , where (, ) is the intrinsic metric of the underlying space and  is a positive number strictly less than 1.The authors emphasize that  cannot be taken to equal 1.
In the present paper, we introduce a new family of diffusion distances generated by the diffusion semigroup {  } ≥0 .We provide several reasons as to why we think our definition is natural; in particular, we show that, for a convolution diffusion kernel on R  , we achieve  = 1 in the discussion just above; i.e., we can recover (local) Euclidean distance to the "full" power 1.
The implication established in [7,11] that smoothness of  implies control of the speed of convergence of    to  seems to us to be a more notable result than the converse (which the authors establish without assuming the decay of (1)).However, if  is Lipschitz for the multiscale diffusion distance introduced in [7,11], as the authors themselves point out their assumed estimate (2) almost tautologically leads to the desired estimate for the speed of convergence of    to .
The main reason for our current work is that we wish to avoid making any assumptions about the decay of (1) and still establish a correspondence between some version of smoothness of a function  and convergence of    to , as  → 0 + .Our main contribution is to establish, under almost no assumptions, that local equicontinuity (in ) is equivalent to local convergence; i.e., local control of the differences   () −   () for all  small is equivalent to local control of the differences   () − () for all small .Here "local" is defined relative to a representative of our family of proposed diffusion distances.
Our paper is organized as follows.Following a notation and assumptions section (Section 2), we define our version of a natural diffusion distance   in Section 3: for  a bounded, nonnegative, increasing function on (0, 1], with lim →0 + () = 0. We are led to our definition by requiring that a diffusion distance has the property that, for all functions  bounded in magnitude by 1,   () be Lipschitz with respect to the distance, independent of the particular  (of course, we expect the Lipschitz constant to grow as  goes to 0).This requirement arises from the intuitively reasonable demand that diffusion be smoothing in some sense.We then discuss some other reasons why our resulting distance is natural.In particular, for diffusion semigroups with convolution kernels on R  (this class includes the Poisson and heat kernels), our distance is equivalent to (local) Euclidean or sub-Euclidean distances for certain choices of the function .
In Section 4, we make the assumption that balls of positive radius with respect to the distance   have positive measure.We show there is an equivalent topology, which does not depend on the function , for which a corresponding statement about positive measure is equivalent to our assumption.The latter requirement, in turn, seems to be a mild and reasonable one.
In the main section, Section 5, we define our version of local convergence of   () to , as well as local equicontinuity of the family {  } ≥0 .Both definitions use our distance   .We then establish that local convergence is equivalent to local equicontinuity.We next prove a corollary which extends an a.e.convergence result of Stein in [1]: for  0 > 0,  + 0  converges locally to   0 , as  converges to 0 + .
In the Appendix, we show that, for very general metrics D on , not necessarily arising from diffusion, This result is clearly a weaker statement than (2), but has the advantage of holding under virtually no assumptions.

Notation and Assumptions
Let  be a topological space equipped with a positive -finite measure.For  > 0,   (, ) will denote a symmetric kernel on Abstract and Applied Analysis 3  × , with   (, ) ≥ 0 for all , .We assume that  satisfies the semigroup property: for all ,  ∈ , and ,  > 0. In addition, we assume for all  ∈  and all  > 0. We will refer to a kernel   satisfying the conditions above as a symmetric diffusion kernel (at time ).A typical example for   is the heat kernel on a Riemannian manifold (see [14], for example).
To avoid degeneracy, e.g., each   being the averaging operator on a space of finite mass, we make an additional assumption:   () →  in  2 , as  → 0 + .
The symmetric diffusion operator   has the following properties of a symmetric diffusion semigroup: [1], in which the author derives various harmonic analysis results for symmetric diffusion semigroups without explicitly using kernels.

A Natural Diffusion Distance
We now define our diffusion distance.
Definition .For a bounded, nonnegative, increasing function  on (0, 1], with lim →0 + () = 0, and  strictly positive on the interval (0, 1], define the distance   by It is clear that the distance   satisfies the triangle inequality.Note that the restriction that  is bounded in the above supremum has the effect of making all "large" distances comparable to a constant, but this is not a drawback for smoothness considerations. We would now like to discuss why we are using this particular diffusion distance and why we think it is a natural choice.Our starting point is the desire that, for a reasonable diffusion distance (⋅, ⋅),   () should be "smooth" for  > 0, even for "rough" functions .This intuitive requirement is suggested by the idea that a diffusion semigroup be smoothing, in some sense.It would further be natural that the smoothness decays, for a general , as  → 0 + .We are thus led to impose a Lipschitz-like requirement, namely, that, for a diffusion distance (⋅, ⋅), and for  > 0, It is easy to see that Note that, for any  and , ‖  (, ⋅) −   (, ⋅) using ( 5) and (6).Letting () = 1/() we thus see that  is increasing, and from (10) we conclude that This last inequality motivates our Definition 1 of   .The restriction to  ≤ 1 is to ensure that   (, ) is finite for all  and  and is not stringent, due to the fact that ‖  (, ⋅) −   (, ⋅)‖  1 is decreasing in  and that for smoothness purposes we need to only concentrate on points  and  which are near each other.
A further indication of the naturality of our proposed diffusion distance   is that the  1 norm of the difference of two probability densities, ‖  (, ⋅) −   (, ⋅)‖  1 , occurring in the definition of   , is the (scaled) total variation distance between the probability distributions   (, ⋅) and   (, ⋅), i.e., Here,  , is the measure given by  , () = ∫    (, ), and  , is the measure given by  , () = ∫    (, ) for measurable  ⊆ ; the supremum is taken over all measurable  ⊆  (see Chapter 4 of [15]).As a final argument for the naturality of our proposed diffusion distance, we calculate   for a special case considered by the authors of [7] (for their own version of diffusion distances).We take X = R  , () =   , and assume that the diffusion kernel has the form   (, ) =  − ( − ( − )).
Here, ,  > 0 and  is a nonnegative radial  1 function whose gradient is also in  1 .The case  = 1/2 is for the heat kernel (with the appropriate ), and the case  = 1 is for the Poisson kernel (with the appropriate ).Now, where we made the change of variables  =  Proof.Using the notation for the special case above, we need to estimate sup 0<≤1   ℎ( − ( − )).
Combining the above discussions for the two ranges of values of , the result follows.
Thus, for this special case of  = R  , () =   , and   (, ) =  − ( − ( − )), which includes both the heat kernel and the Poisson kernel, our definition of diffusion distance gives (local) Euclidean or sub-Euclidean distance (depending on the relative sizes of  and ).This result seems appropriate.

A Geometric Assumption about the Measure on 𝑋
We make the following reasonable assumption about our distance   : for any  0 ∈  and any  > 0,  ( 0 , ) ≡ { :   ( 0 , ) < } , the ball of radius  and center  0 , has positive measure.
Returning to our assumption that, for any  0 ∈  and any  > 0, ( 0 , ) has positive measure, Proposition 3 shows that it is equivalent to require the following: for any  0 ∈ ,  > 0, and  > 0, the set ( 0 , , ) has positive measure.Note that the definition of the sets ( 0 , , ) is more "universal" than that of the balls ( 0 , ), since the former do not involve the function .
The assumption that, for any  0 ∈ ,  > 0, and  > 0, the set ( 0 , , ) has positive measure appears to us to be a very natural, and mild, one.In words, this requirement is saying that, for any time  > 0 and any  > 0, the set of points in our space  which have not diffused more than  away (in the  1 sense) from the diffused point  0 , at time , is not "thin" with respect to the underlying measure on .This assumption seems reasonable in both the discrete case (each point has positive mass, and  =  0 is "enough") and the continuous case (every point  0 has "many" arbitrarily close points in the sense of diffusion).

Local Convergence Is Equivalent to Local Equicontinuity
In this section, we define local convergence and local equicontinuity for our situation and show that the two concepts are equivalent under our assumptions.
In what follows,   is a symmetric diffusion operator as defined in Section 2.
Definition .Let  ∈   , 1 ≤  ≤ ∞.Note that  is actually an equivalence class of functions on the space .Suppose there exists a particular representative of this equivalence class, which we will also call , such that this representative  is defined at every point of , and for every  > 0, there exist  0 > 0 and  > 0 so that |  () − ()| < , for all  with 0 <  ≤  0 and all  ∈ ( 0 , ).We then say    converges to  locally at  0 .
We also make the following.
Definition .Let  ∈   , 1 ≤  ≤ ∞.Suppose there exists a particular representative of the equivalence class specified by  and which we will also call , such that this representative  is defined at every point of , and for every  > 0, there exist  0 > 0 and  > 0 with the property that, for all  ∈ ( 0 , ), we have |() − ( 0 )| <  and for all  with 0 <  ≤  0 , |  () −   ( 0 )| < .We then say the family {  } ≥0 is locally equicontinuous (in ) at  0 .
Our main result is the following.Proposition 6.For  ∈  2 ∩ ∞ and any  0 ∈ , the following are equivalent: (i)    converges to (the representative)  locally at  0 (ii) e family {  } ≥0 is locally equicontinuous at  0 Moreover, if a representative  satisfies one of these statements, the same representative satisfies the other statement.
Proof.We first show that local convergence at  0 implies local equicontinuity at  0 .We thus begin by assuming that    converges to a representative  locally at  0 .
In the proof above, we used Stein's Maximal Theorem (see Chapter III, §3 in [1]) to state that lim →0 +    =  a.e.Stein's convergence result, for  ∈  2 say, is the main place in our paper where the symmetry of the operators   is needed: Stein requires symmetry to prove his Maximal Theorem.
We immediately have the following.
Using our notation, Stein in [1] mentions that lim →0 +  + 0 () =   0 () for almost all , since he proves that    is a real-analytic function of  > 0 for almost all .Corollary 7 extends Stein's result (under our assumption discussed in Section 4) to show local convergence with respect to the distance   .

Conclusions and Future Work
In this paper, we have defined a diffusion distance which is natural if one imposes a reasonable Lipschitz condition on diffused versions of arbitrary bounded functions.We have next shown that the mild assumption that balls of positive radius have positive measure is equivalent to a similar, and an even milder looking, geometric demand.In the main part of the paper, we establish that local convergence of    to (a representative)  at a point is equivalent to local equicontinuity of the family {  } ≥0 at that point.
It may well be useful to have a quantitative estimate on the rate of convergence of    to  under the assumption that  is Lipschitz, say, with respect to some distance  (where  may be our   ).As essentially pointed out in the papers [7,11], a key issue is whether, and how rapidly, (32) In the Appendix below, we show that, for very general metrics D on , not necessarily arising from diffusion, ∫    (, ) D (, )  → 0 a.e., as  → 0 + . ( This result is certainly far from establishing the convergence in (32), much less a quantitative estimate.We plan to continue exploring for which (diffusion) distances the convergence in (32) holds and an estimate can be obtained.