Homogeneous Besov spaces on stratified Lie groups and their wavelet characterization

We establish wavelet characterizations of homogeneous Besov spaces on stratified Lie groups, both in terms of continuous and discrete wavelet systems. We first introduce a notion of homogeneous Besov space $\dot{B}_{p,q}^s$ in terms of a Littlewood-Paley-type decomposition, in analogy to the well-known characterization of the Euclidean case. Such decompositions can be defined via the spectral measure of a suitably chosen sub-Laplacian. We prove that the scale of Besov spaces is independent of the precise choice of Littlewood-Paley decomposition. In particular, different sub-Laplacians yield the same Besov spaces. We then turn to wavelet characterizations, first via continuous wavelet transforms (which can be viewed as continuous-scale Littlewood-Paley decompositions), then via discretely indexed systems. We prove the existence of wavelet frames and associated atomic decomposition formulas for all homogeneous Besov spaces ${\dot B}_{p,q}^{s}$, with $1 \le p,q<\infty$ and $s \in \mathbb{R}$.


Introduction
To a large extent, the success of wavelets in applications can be attributed to the realization that wavelet bases are universal unconditional bases for a large class of smoothness spaces, including all homogeneous Besov spaces. Given a wavelet orthonormal basis {ψ j,k } j,k ⊂ L 2 (R n ) (consisting of sufficiently regular wavelets with vanishing moments) and f ∈ L 2 (R n ), the expansion f = j,k f, ψ j,k ψ j,k converges not only in · L 2 , but also in any other Besov space norm · Ḃs p,q , as soon as f is contained in that space. Furthermore, the latter condition can be read off the decay behaviour of the wavelet coefficients { f, ψ j,k } j,k associated to f in a straightforward manner. This observation provided important background and heuristics for many wavelet-based methods in applications such as denoising and data compression, but it was also of considerable theoretical interest, e.g. for the study of operators. In this paper we provide similar results for simply connected stratified Lie groups. To our knowledge, studies of Besov spaces in this context have been restricted to the inhomogeneous cases: The definition of inhomogeneous Besov spaces on stratified Lie groups was introduced independently by Saka [27], and in a somewhat more general setting by Pesenson [24,25]. Since then, the study of Besov spaces on Lie groups remained restricted to the inhomogeneous cases [29,30,16,28,13], with the notable exception of [1] which studied homogeneous Besov spaces on the Heisenberg group. A further highly influential source for the study of function spaces associated to the sub-Laplacian is Folland's paper [9]. The first wavelet systems on stratified Lie groups (fulfilling certain technical assumptions) were constructed by Lemarié [20], by suitably adapting concepts from spline theory. Lemarié also indicated that the wavelet systems constructed by his approach were indeed unconditional bases of Saka's inhomogeneous Besov spaces. Note in particular that an adaptation, say, of the arguments in [10] for a proof of such a characterization requires a sampling theory for bandlimited functions on stratified groups, which was established only a few years ago by Pesenson [26]; see also [12]. More recent constructions of both continuous and discrete wavelet systems were based on the spectral theory of the sub-Laplacian [14]. Given the central role of the sub-Laplacian both in [13] and [14], and in view of Lemarié's remarks, it seemed quite natural to expect a wavelet characterization of homogeneous Besov spaces, and it is the aim of this paper to work out the necessary details. The paper is structured as follows: After reviewing the basic notions concerning stratified Lie groups and their associated sub-Laplacians in Section 2, in Section 3 we introduce a Littlewood-Paley-type decomposition of functions and tempered discributions on G. It is customary to employ the spectral calculus of a suitable sub-Laplacian for the definition of such decompositions, see e.g. [13], and this approach is also used here (Lemma 3.7). However, this raises the issue of consistency: The spaces should reflect properties of the group, not of the sub-Laplacian used for the construction of the decomposition. Using a somewhat more general notion than the φ-functions in [10] allows to establish that different choices of sub-Laplacian result in the same scale of Besov spaces (Theorem 3.11). In Section 4, we derive a characterization of Besov spaces in terms of continuous wavelet transform, with a wide variety of wavelets to choose from (Theorem 4.4). As a special case one obtains a characterization of homogeneous Besov spaces in terms of the heat semigroup. (See the remarks before Theorem 4.4.) In Section 5, we study discrete characterizations of Besov spaces obtained by sampling the Calderón decomposition. For this purpose, we introduce the coefficient spaceḃ s p,q . The chief result is Theorem 5.4, establishing that the wavelet coefficient sequence of f ∈Ḃ s p,q lies inḃ s p,q . Section 5 introduces our most important tool to bridge the gap between continuous and discrete decompositions, namely oscillation estimates. We then proceed to study wavelet synthesis and frame properties of the wavelet system. Our main result in this respect is that for all sufficiently dense regular sampling sets Γ, the discrete wavelet system (ψ j,γ ) j∈Z,γ∈Γ obtained by shifts from γ and dilations by powers of 2 is a universal Banach frame for all Besov spaces. In other words, the wavelet system allows the decomposition f = j,γ r j,γ ψ j,γ converging unconditionally inḂ s p,q whenever f ∈Ḃ s p,q , with coefficients {r j,γ } j,γ ∈ḃ s p,q depending linearly and boundedly on f , and satisfying the norm equivalence {r j,γ } j,γ ḃs p,q ≍ f Ḃs p,q .

Preliminaries and Notation
Following the terminology in [8], we call a Lie group G stratified if it is connected and simply connected, and its Lie algebra g decomposes as a direct sum Then g is nilpotent of step m, and generated as a Lie algebra by V 1 . Euclidean spaces R n and the Heisenberg group H n are examples of stratified Lie groups. If G is stratified, its Lie algebra admits a canonical (natural) family of dilations, namely δ r (X 1 + X 2 + · · · + X m ) = rX 1 + r 2 X 2 + · · · + r m X m (X j ∈ V j ) (r > 0) , which are Lie algebra automorphisms. We identify G with g through the exponential map. Hence G is a Lie group with underlying manifold R n , for some n, and the group product provided by the Campbell-Baker-Hausdorff formula. The dilations are then also group automorphisms of G. Instead of writing δ a (x) for x ∈ G and a > 0, we simply use ax, whenever a confusion with the Lie group product is excluded. After choosing a basis of g obtained as a union of bases of the V i , and a possible change of coordinates, one therefore has for x ∈ G and a > 0 that for integers d 1 ≤ · · · ≤ d n , according to Under our identification of G with g, polynomials on G are polynomials on g (with respect to any linear coordinate system on the latter). Polynomials on G are written as where c I ∈ C are the coefficients, and x I = x I 1 1 x I 2 2 · · · x In n the monomials associated to the multi-indices I ∈ N {1,...,n} . For a multi-index I, define A polynomial of the type (2) is called of homogeneous degree k if d(I) ≤ k holds, for all multiindices I with c I = 0. We write P k for the space of polynomials of homogeneous degree k. We let S(G) denote the space of Schwartz functions on G. By definition S(G) = S(g). Let S ′ (G) and S ′ (G)/P denote the space of distributions and distributions modulo polynomials on G, respectively. The duality between the spaces is denoted by the map (·, ·) : S ′ (G) × S(G) → C. Most of the time, however, we will work with the sesquilinear version f, g = (f, g), for f ∈ S ′ (G) and g ∈ S(G). Left Haar measure on G is induced by Lebesgue measure on its Lie algebra, and it is also right-invariant. The number Q = m 1 j(dim V j ) will be called the homogeneous dimension of G. (For instance, for G = R n and H n we have Q = n and Q = 2n + 2, respectively.) For any function φ on G and a > 0, the L 1 -normalized dilation of φ is defined by Observe that this action preserves the L 1 -norm, i.e., D a φ 1 = φ . We fix a homogeneous quasi-norm | · | on G which is smooth away from 0, |ax| = a|x| for all x ∈ G, a ≥ 0, |x −1 | = |x| for all x ∈ G, with |x| > 0 if x = 0, and fulfilling a triangle inequality |xy| ≤ C(|x| + |y|), with constant C > 0. Confer [8] for the construction of homogeneous norms, as well as further properties. Moreover, by Proposition 1.15 [8], for any r > 0, there is a finite C r > 0 such that |x|>R |x| −Q−r dx = C r R −r for all R > 0. Our conventions for left-invariant operators on G are as follows: We let Y 1 , . . . , Y n denote a basis of g, obtained as a union of bases of the V i . In particular, Y 1 , . . . , Y l , for l = dim(V 1 ), is a basis of V 1 . Elements of the Lie algebra are identified in the usual manner with leftinvariant differential operators on G. Given a multi-index I ∈ N n 0 , we write Y I for In addition, the norms | · | N induce the topology of S(G) (see [8]). The sub-Laplacian operator on G can be viewed as the analog of the Laplacian operator on R n defined by L = − n i=1 ∂ 2 ∂x 2 k . Using the above conventions for the choice of basis Y 1 , . . . , Y n and l = dim(V 1 ), the sub-Laplacian is defined as Note that a less restrictive notion of sub-Laplacians can also be found in the literature (e.g., any sum of squares of Lie algebra generators); we stress that the results in this paper crucially rely on the definition presented here. A linear differential operator T on G is called homogenous of degree l if T (f • δ a ) = a l (T f ) • δ a for any f on G. By choice of the Y i for i ≤ l, these operators are homogeneous of degree 1; it follows that L is homogenous of degree 2, and L k is homogenous of degree 2k. Furthermore, any operator of the form Y I is homogeneous of degree d(I).
When restricted to C ∞ c , L is formally self-adjoint: for any f, g ∈ C ∞ c (G), Lf, g = f, Lg . (For more see [14]). Its closure has domain D = {u ∈ L 2 (G) : Lu ∈ L 2 (G)}, where we take Lu in the sense of distributions. From this fact it quickly follows that this closure is self-adjoint and is in fact the unique self-adjoint extension of L| C ∞ c ; we denote this extension also by the symbol L. Suppose that L has spectral resolution where dP λ is the projection measure. For a bounded Borel functionf on [0, ∞), the operatorf is a bounded integral operator on L 2 (G) with a convolution distribution kernel in L 2 (G) denoted by f , andf An important fact to be used later on is that for rapidly decaying smooth functions, f ∈ S(R + ), the kernel associated tof (L) is a Schwartz function. For a function f on G we definef (x) = f (x −1 ) and f * =f . We will frequently use that for f ∈ L 2 (G) ∩ L 1 (G), the adjoint of the convolution operator g → g * f is provided by g → g * f * .

Homogeneous Besov Spaces on Stratified Lie Groups
In this section we define homogeneous Besov spaces on stratified Lie groups via Littlewood-Paley decompositions of distributions u as where ψ j is a dilated copy of a suitably chosen Schwartz function ψ. In the Euclidean setting, it is customary to construct ψ by picking a dyadic partition of unity on the Fourier transform side and applying Fourier inversion. The standard way of transferring this construction to stratified Lie groups consists in replacing the Fourier transform by the spectral decomposition of a sub-Laplacian L, see Lemma 3.7 below. However, this approach raises the question to which extent the construction depends on the choice of L. It turns out that the precise choice of sub-Laplacian obtained from a basis of V 1 is irrelevant. In order to prove this, we study Littlewood-Paley decompositions in somewhat different terms. The right setting for the study of such decompositions is the space of tempered distributions modulo polynomials, and the easiest approach to this convergence is via duality to a suitable space of Schwartz functions.
with absolute convergence of the integral.
Under our identification of G with g, the inversion map x → x −1 is identical to the additive inversion map. I.e., x −1 = −x, and it follows thatp ∈ P N for all p ∈ P N . Thus, if f has vanishing moments of order N, then for all p ∈ P N −1 i.e.,f has vanishing moments of order N as well.
Vanishing moments are central to most estimates in wavelet analysis, by the following principle: In a convolution product of the type g * D t f , vanishing moments of one factor together with smoothness of the other result in decay. Later on, we will apply the lemma to Schwartz functions f, g, where only the vanishing moment assumptions are nontrivial. The more general version given here is included for reference.
Lemma 3.2. Let N, k ∈ N be arbitrary.
(a) Let f ∈ C k , such that Y I (f ) is of decay order N, for all I with d(I) ≤ k. Let g have vanishing moments of order k and decay order N + k + Q + 1. Then there exists a constant, depending only on the decay of Y I (f ) and g, such that In particular, if p > Q/N, (b) Now suppose that g ∈ C k , with Y I (g) of decay order N for all I with d(I) ≤ k. Let f have vanishing moments of order k and decay order N + k + Q + 1. Then there exists a constant, depending only on the decay of f and Y I (g), such that In particular, if p > Q/N, Proof. First, let us prove (a): Let 0 < t < 1. For x ∈ G, let P k x,Dtf denote the left Taylor polynomial of D tf with homogeneous degree k − 1, see [8, 1.44]. By that result, with suitable positive constants C k and b. We next use the homogoneity properties of the partial derivatives [8, p.21], together with the decay condition on Y I f to estimate for I with d( where the penultimate inequality used [8, 1.10], and the final estimate used |ty| = t|y| ≤ |y|. Thus Next, using vanishing moments of g, and the integral is finite by [8, 1.15]. This proves (5), and (6) follows by For part (b), we first observe that Our assumptions on f, g allow to invoke part (a) withg,f replacing f, g, and (7) follows immediately. (8) is obtained from this by straightforward integration.
We let Z(G) denote the space of Schwartz functions with all moments vanishing. We next consider properties of Z(G) as a subspace of S(G) with the relative topology. Proof. By definition, Z(G) is the intersection of kernels of a family of tempered distributions, hence a closed subspace. For p ∈ P and f ∈ Z(G), one has by unimodularity of G that p,f = p, f = 0, sincep is a polynomial. But then, for any g ∈ S(G) and f ∈ Z(G), one has for all polynomials p on G that g * f, p = g, p * f = g, 0 = 0 , since f ∈ Z(G) implies p * f = 0 (translation on G is polynomial). Thus g * f ∈ Z(G). All further properties of Z(G) follow from the corresponding statements concerning Z(R n ). For identification of Z ′ (R n ) with the quotient space S(R n ) ′ /P, we first observe that a tempered distribution ϕ vanishes on Z(R n ) iff its (Euclidean) Fourier transform is supported in {0}, which is well-known to be the case iff ϕ is a polynomial. Using this observation, we map u ∈ Z ′ (R n ) toũ + P, whereũ is a continuous extension of u to all of S(R n ); such an extension exists by the Hahn-Banach theorem. The map is well-defined because the difference of two extensions of u annihilate Z(R n ), hence differ by a polynomial. Linearity follows from well-definedness. Furthermore, the inverse of the mapping is clearly obtained by assigning w + P to the restriction w| Z(G) .
In the following, we will usually not explicitly distinguish between u ∈ S ′ (G) and its equivalence class modulo polynomials, and we will occasionally write u ∈ S ′ (G)/P. The topology of S ′ (G)/P is just the topology of pointwise convergence on the elements of Z(G): For any net (u j + P) j∈I , u j + P → u + P holds if and only if u j , ϕ → u, ϕ , for all ϕ ∈ Z(G). We next study convolution on S ′ (G)/P. Lemma 3.4. For every ψ ∈ S(G), the map u → u * ψ is a well-defined and continuous operator S ′ (G)/P → S ′ (G)/P. If ψ ∈ Z(G), the associated convolution operator is a well-defined and continuous operator S ′ (G)/P → S ′ (G).
The definition of homogeneous Besov spaces requires taking L p -norms of elements of S ′ (G)/P. The following remark clarifies this.
Remark 3.5. Throughout this paper, we use the canonical embedding L p (G) ⊂ S ′ (G).
For p < ∞ this gives rise to an embedding L p (G) ⊂ S ′ (G)/P, using that P ∩L p (G) = {0}. Consequently, given u ∈ S ′ (G)/P, we let assigning the value ∞ otherwise. Here the fact that P ∩ L p (G) = {0} guarantees that the decomposition is unique, and thus (10) well-defined. By contrast, · ∞ can only be defined on S ′ (G), if we assign the value ∞ to u ∈ S ′ (G) \ L ∞ (G).
Note that with these definitions, the Hausdorff-Young inequality u * f p ≤ u p f 1 remains valid for all f ∈ S(G), and all u ∈ S ′ (G)/P (for p < ∞), resp. u ∈ S ′ (G) We now introduce a general Littlewood-Paley-type decomposition. For this purpose we define for ψ ∈ S(G): holds, with convergence in the Schwartz space topology. Duality entails the convergence for all u ∈ S ′ (G)/P.
The following lemma yields the chief construction of LP-admissible functions Lemma 3.7. Letφ be a function in C ∞ with support in [0, 4] such that 0 ≤φ ≤ 1 and Pick a sub-Laplacian L, and let ψ denote the convolution kernel associated to the bounded left-invariant operator ψ(L). Then ψ is LP-admissible, with ψ ∈ Z(G).
Proof. Let us first comment on the properties of ψ that are immediate from the construction via spectral calculus: ψ ∈ S(G) follows from [19], and vanishing moments by [14, Proposition 1]. Now let g ∈ Z(G). First note that 2-homogeneity of L implies that the convolution kernel associated to ψ(2 −2j ·)(L) coincides with ψ j . Then, by the spectral theorem and (13), where φ ∈ S(G) is the convolution kernel of φ(L). Since φ is a Schwartz function, it follows by [8, Proposition (1.49)] that g * D 2 N+1 φ → c φ g, for N → ∞, for all g ∈ S(G), with convergence in S(G) and a suitable constant c φ . We next show that g * D t f → 0 in S(G), as t → 0, for any f ∈ S(G). Fix a multi-index I and N, k ∈ N with k ≥ N. Then left-invariance and homogeneity of Y I yield Here the first inequality is an application of (5); the constant C f,g can be estimated in terms of |f | M , |g| M , for M sufficiently large. But this proves g * D t f → 0 in the Schwartz topology. Summarizing, |j|≤N g * ψ * j * ψ j → c φ g in S(G), and in addition by (14), |j|≤N g * ψ * j * ψ j → g in L 2 , whence c φ = 1 follows.
Note that an LP-admissible function ψ as constructed in 3.7 fulfills the convenient equality Remark 3.8. By spectral calculus, we find that ψ = L k g k , with g k ∈ Z(G). In particular, the decomposition We now associate a scale of homogeneous Besov spaces to the function ψ.
The homogeneous Besov space associated to ψ is defined aṡ with associated norm .
Remark 3.10. The definition relies on the conventions regarding L p -norms of distributions (modulo polynomials), as outlined in Remark 3.5. Definiteness of the Besov norm holds because of (12).
The combination of Lemma 3.7 with Definition 3.9 shows that we cover the homogeneous Besov spaces defined in the usual manner via the spectral calculus of sub-Laplacians. Hence the following theorem implies in particular that different sub-Laplacians yield the same homogeneous Besov spaces (at least within the range of sub-Laplacians that we consider).
Theorem 3.11. Let ψ 1 , ψ 2 ∈ Z(G) be LP-admissible. Let s ∈ R and 1 ≤ p, q ≤ ∞. ThenḂ s,ψ 1 p,q =Ḃ s,ψ 2 p,q , with equivalent norms. Proof. It is sufficient to prove the norm equivalence, and here symmetry with respect to ψ 1 and ψ 2 immediately reduces the proof to showing, for a suitable constant C > 0, in the extended sense that the left-hand side is finite whenever the right-hand side is. Hence assume that u ∈Ḃ s,ψ 2 p,q ; otherwise, there is nothing to show. In the following, let with convergence in S ′ (G)/P. Accordingly, where the convergence on the right-hand side holds in S ′ (G), by 3.4. We next show that the right-hand side also converges in L p . For this purpose, we observe that where k > s is a fixed integer. For ℓ − j ≥ 0, this follows directly from (8), using ψ 1 , ψ 2 ∈ S(G), and vanishing moments of ψ 1 , whereas for ℓ − j < 0, the vanishing moments of ψ 2 allow to apply (6). Using Young's inequality, we estimate with C from above that Next observe that By assumption, the sequence (2 js u * ψ * j,2 p ) j∈Z is in ℓ q , in particular bounded. Therefore, k − |s| > 0 yields that (20) converges. But then the right-hand side of (18) converges unconditionally with respect to · p . This limit coincides with the S ′ (G)/P-limit u * ψ * Now an application of Young's inequality for convolution over Z, again using k − |s| > 0, provides (17).
As a consequence, we writeḂ s p,q =Ḃ s,ψ p,q , for any LP-admissible ψ ∈ Z(G). These spaces coincide with the homogeneous Besov spaces for the Heisenberg group in [1], and with the usual definitions in the case G = R n . In the remainder of the section we note some functional-analytic properties of Besov spaces and Littlewood-Paley-decompositions for later use. Lemma 3.12. For all 1 ≤ p, q ≤ ∞ and all s ∈ R, one has continuous inclusion maps Z(G) ֒→Ḃ s p,q ֒→ S ′ (G)/P, as well as Z(G) ֒→Ḃ s * p,q , where the latter denotes the dual oḟ B s p,q . For p, q < ∞, Z(G) ⊂Ḃ s p,q is dense.
Proof. We pick ψ as in Lemma 3.7 and define ∆ j g = g * ψ * j for g ∈ S ′ (G). For the inclusion Z(G) ⊂Ḃ s p,q , note that (6) and (8) allow to estimate for all g ∈ Z(G) and k ∈ N that Here the constant C k is a suitable multiple of |g| M , for M = M(k) sufficiently large. But this implies that Z(G) ⊂Ḃ s p,q continuously. For the other embedding, repeated applications of Hölder's inequality yield the estimate valid for all f ∈ Z(G) ⊂Ḃ −s p ′ ,q ′ and g ∈Ḃ s p,q . Here p ′ , q ′ are the conjugate exponents of p, q, respectively. But this estimate implies continuity of the embeddingsḂ s p,q ⊂ S ′ (G)/P and Z(G) ⊂Ḃ s * p,q . For the density statement, let u ∈Ḃ s p,q , and ǫ > 0. For convenience, we pick ψ according to Lemma 3.7. Since q < ∞, there exists N ∈ N such that Let w = u * K N . By assumption on u and Young's inequality, w ∈ L p (G), and since p < ∞, there exists g ∈ S(G) with w − g p < ǫ 1/q . Let f = g * K N , then f ∈ Z(G), and for j ∈ Z, For |j| ≤ N − 1, the construction of ψ j and K N implies that K N * ψ * j = ψ * j , whereas for |j| > N + 1, one has K N * ψ * j = 0. As a consequence, one finds for |j| < N − 1 with some constant C > 0 depending only on ψ. For instance, for j = N, A straigthforward application of triangle and Young's inequality yields:

Similar considerations applied to
as desired. ✷ Remark 3.13. Let ψ as in Lemma 3.7. As a byproduct of the proof, we note that the space holds with finitely many nonzero terms.
We next extend the Littlewood-Paley decomposition to the elements of the Besov space.
For simplicity, we prove the result only for certain LP-admissible functions.
Proposition 3.14. Let 1 ≤ p, q < ∞, and ψ ∈ Z(G) an LP-admissible vector constructed via Lemma 3.7. Then the decomposition (11) converges for all g ∈Ḃ s p,q in the Besov space norm.
Proof. Consider the operators Σ N :Ḃ s p,q →Ḃ s p,q , By suitably adapting the arguments proving the density statement of Lemma 3.12, it is easy to see that the family of operators (Σ N ) N ∈N is bounded in the operator norm. As noted in 3.13, the Σ N strongly converges to the identity operator on a dense subspace. But then boundedness of the family implies strong convergence everywhere.
A further class of spaces for which the decomposition converges is L p : Proposition 3.15. Let 1 < p < ∞, and ψ ∈ Z(G) an LP-admissible vector constructed via Lemma 3.7. Then the decomposition (11) converges with respect to · p , for all g ∈ L p (G).
Proof. Let the operator family (Σ N ) N ∈N be defined as in the previous proof. Then and Young's inequality implies that the sequence of operators is norm-bounded. It therefore suffices to prove the desired convergence on the dense subspace S(G). By [8, 1.20] and thus Again by [8, 1.20], (D 2 N g * φ) → c g φ, in particular, Hence Σ N g → c φ g, and the case p = 2 yields c φ = 1.
Proof. Completeness is the only issue here. Again, we pick ψ ∈ Z(G) an LP-admissible vector via Lemma 3.7. Suppose that {u n } n∈N ⊂Ḃ s p,q is a Cauchy sequence. As a consequence, one has in particular, for all j ∈ Z, that {u n * ψ * j } n∈N ⊂ L p (G) is a Cauchy sequence, hence u n * ψ * j → v j , for a suitable v j ∈ L p (G). Furthermore, the Cauchy property of {u n } n∈N ⊂Ḃ s p,q implies that is a Cauchy sequence. On the other hand, the sequence converges pointwise to {2 js v j p } j , whence We define Now, using (22) and Z(G) ⊂Ḃ −s p ′ ,q ′ , where p ′ , q ′ are the conjugate exponents of p, q, respectively, a straightforward calculation as in the proof of Lemma 3.12 shows that the sum defining u converges in S ′ (G)/P. Furthermore, (22) and (15) easily imply that u ∈Ḃ s p,q . Finally, for the proof of u n → u, we employ (15) together with the equality Summarizing, the sequence 2 js (u n − u) * ψ * j p j∈Z n∈N ∈ ℓ q (N) is a Cauchy sequence, converging pointwise to 0. But then u n − u Ḃs p,q → 0 follows.

Characterization via Continuous Wavelet Transform
The following definition can be viewed as a continuous-scale analog of LP-admissibility. The next theorem reveals a large class of Z-admissible wavelets. In fact, all the wavelets studied in [14] are also Z-admissible in the sense considered here. Its proof is an adaptation of the argument showing [14, Theorem 1].
Theorem 4.2. Let h ∈ S(R + ), and let ψ be the distribution kernel associated to the operator L h(L). Then ψ is Z-admissible up to normalization.
Proof. The main idea of the proof is to write, for f ∈ Z(G), with suitable g ∈ S(G). Once this is established, f * D A g → c g f for A → ∞ follows by [8, Proposition (1.49)], with convergence in the Schwartz topology. Moreover, f ∈ Z(G) entails that f * D ǫ g → 0 in the Schwartz topology: Given any N > 0 and I ∈ N n with associated left-invariant differential operator Y I , we can employ (5) to estimate which converges to zero for ǫ → 0, as soon as M ≥ N and k > M − Q − d(I). But this implies f * D ǫ g → 0 in S(G), by [8]. Thus it remains to construct g. To this end, define which is clearly in S(R + ), and let g denote the associated convolution kernel of g(L). By the definition, g ∈ S(G). Let ϕ 1 , ϕ 2 be in S(G), and let dλ ϕ 1 ,ϕ 2 denote the scalar-valued Borel measure associated to ϕ 1 , ϕ 2 by the spectral measure. Then, by spectral calculus and the invariance properties of da/a, as desired.
(b) There exist Z-admissible ψ ∈ C ∞ c (G) with vanishing moments of arbitrary finite order.
Given a tempered distribution u ∈ S ′ (G)/P and a Z(G)-admissible function ψ, the continuous wavelet transform of u is the family (u * D a ψ * ) a>0 of convolution products. We will now prove a characterization of Besov spaces in terms of the continuous wavelet transform. Another popular candidate for defining scales of Besov spaces is the heat semigroup; see e.g. [27] for the inhomogeneous case on stratified groups, or rather [2] for the general treatment. In our setting, the heat semigroup associated to the sub-Laplacian is given by right convolution with h t (x) = D t h(x), where h is the kernel of h(L) with h(ξ) = e −ξ . Theorem 4.2 implies that ψ = L k h is Z-admissible; it can be viewed as an analog of the well-known Mexican Hat wavelet. (This wavelet on the general groups was studied for the first time in [14].) The wavelet transform of f ∈ S ′ (G) associated to ψ is then very closely related to the k-fold time derivative of the solution to the heat equation with initial condition f : By choice of h, denotes the solution of the heat equation associated to L, with initial condition f . A formal calculation using left invariance of L then yields Thus the following theorem also implies a characterization of Besov spaces in terms of the heat semigroup.
Theorem 4.4. Let ψ ∈ S(G) be Z-admissible, with vanishing moments of order k. Then, for all s ∈ R with |s| < k, and all 1 ≤ p < ∞, 1 ≤ q ≤ ∞, the following norm equivalence holds: ∀u ∈ S ′ (G)/P : u Ḃs p,q ≍ a → a s u * D a ψ * Here the norm equivalence is understood in the extended sense that one side is finite iff the other side is. If ψ ∈ Z(G), the equivalence is also valid for the case p = ∞.
Proof. The strategy consists in adapting the proof of Theorem 3.11 to the setting where one summation over scales is replaced by integration. This time however, we have to deal with both directions of the norm equivalence. In the following estimates, the symbol C denotes a constant that may change from line to line, but in a way that is independent of u ∈ S ′ (G). Let us first assume that R a sq u * D a ψ * q p da a < ∞ , for u ∈ S ′ (G)/P, 1 ≤ p, q ≤ ∞, for a Z-admissible function ψ ∈ S(G) with k ψ > |s| vanishing moments (ψ ∈ Z(G), if p = ∞). Let ϕ ∈ Z(G) be LP-admissible. Then, for all j ∈ Z, holds in S ′ (G), by 3.4. We next prove that the right-hand side of (25) converges in L p . For this purpose, introduce We estimate where we used that da/a is scaling-invariant. Note that the last inequality is Hölder's inequality for q < ∞. In this case, taking qth powers and summing over j yields Using vanishing moments and Schwartz properties of ψ and ϕ, we can now employ (6) and (8) with a constant independent of a ∈ [1, 2]. But then, since k > |s|, we may proceed just as in the proof of 3.11 to estimate the integrand in (28) via Summarizing, we obtain In particular, c j < ∞. But then the right-hand side of (25) converges to u * ϕ * j in L p . The Minkowski-inequality for integrals yields u * ϕ * j p ≤ c j , and thus u qḂ as desired. In the case q = ∞, (29) yields that Thus, by (26) sup The remainder of the argument is the same as for the case q < ∞. Next assume u ∈Ḃ s p,q . Then, for all a ∈ [1, 2] and ℓ ∈ Z, with convergence in S ′ (G)/P; for ψ ∈ Z(G) convergence holds even in S ′ (G). As before, Again, we have ϕ j * D a2 ℓ ψ * 1 2 −|j−ℓ|k with a constant independent of a. Hence one concludes in the same fashion as in the proof of Theorem 3.11 that, for all a ∈ [1, 2], again with a constant independent of a. In the case q = ∞, this finishes the proof immediately, and for q < ∞, we integrate the qth power over a ∈ [1,2] and sum over ℓ to obtain the desired inequality. As an application of the characterization via continuous wavelet transforms, we exhibit certain of the homogeneous Besov spaces as homogeneous Sobolev spaces, and we investigate the mapping properties of sub-Laplacians between Besov spaces of different smoothness exponents: Lemma 4.6.Ḃ 0 2,2 = L 2 (G), with equivalent norms.
Proof. Pick ψ by Lemma 3.7. Then spectral calculus implies that for all f ∈ Z(G) Since Z(G) is dense in both spaces, and both spaces are complete, it follows thatḂ 0 2,2 = L 2 (G).
The next lemma investigates the mapping properties of sub-Laplacians between Besov spaces of different smoothness exponents. Its proof is greatly facilitated by the characterization via continuous wavelet transforms.
Lemma 4.7. Let L denote a sub-Laplacian. For all u ∈ S ′ (G)/P, 1 ≤ p, q < ∞, s ∈ R and k ≥ 0: in the extended sense that one side is infinite iff the other side is. In particular, L k : is a bijection, and it makes sense to extend the definition to negative k. Thus, for all k ∈ Z, L k :Ḃ s p,q →Ḃ s−2k p,q is a topological isomorphism of Banach spaces.
Proof. Pick a nonzero real-valued h ∈ S(R + ), an integer m > |s| and let ψ denote the distribution kernel of L m h(L). Hence ψ is admissible by Theorem 4.2, with vanishing moments of order 2m and ψ * = ψ. On L 2 (G), the convolution operator u → u * D a ψ * can be written as Ψ a (L) with a suitable function Ψ a . For u ∈ Z(G) ⊂ L 2 (G), spectral calculus implies where we employed left invariance to pull L k past u in the convolution. Note that up to normalization, L k ψ is admissible with vanishing moments of order 2m + 2k > |s − 2k|. Thus, applying Theorem 4.4, we obtain Now assume that L k u ∈Ḃ s−2k p,q . Then, combining the density statements from Lemma 3.12 and Remark 3.8, we obtain a sequence (u n ) n∈N ⊂ Z(G) with L k u n → L k u inḂ s−2k p,q ; thus also with convergence in S ′ (G)/P. The norm equivalence on Z(G) and completeness ofḂ s p,q yield that u n → v ∈Ḃ s p,q , for suitable v ∈Ḃ s p,q . Again, this implies convergence in S ′ (G)/P. Since L k is continuous on that space, it follows that L k u n → L k v, establishing that L k v = L k u. Since any distribution annihilated by L k is a polynomial, this finally yields u = v ∈Ḃ s p,q , and u Ḃs p,q ≍ L k u Ḃ s−2k p,q follows by taking limits. A similar but simpler argument establishes the norm equivalence under the assumption that u ∈ B s p,q . This observation shows that we can regard certain Besov spaces as homogeneous Sobolev spaces, or, more generally, as generalizations of Riesz potential spaces.
As a further corollary, we obtain the following interesting result relating two sub-Laplacians L 1 and L 2 : For all k ∈ Z, the operator is densely defined and has a bounded extension with bounded inverse. More general analogues involving more than two sub-Laplacians is also easily formulated. For the Euclidean case, this is easily derived using the Fourier transform, which can be viewed as a joint spectral decomposition of commuting operators. In the general, nonabelian case however, this tool is not readily available, and we are not aware of a direct proof of this observation, nor of a previous source containing it.

Characterization of Besov Spaces by Discrete Wavelet Systems
We next show that the Littlewood-Paley characterization ofḂ s p,q can be discretized by sampling the convolution products f * ψ * j over a given discrete set Γ ⊂ G. This is equivalent to the study of the analysis operator associated to a discrete wavelet system {ψ j,γ } j∈Z,γ∈Γ , defined by Throughout the rest of the paper, we assume that the wavelet ψ ∈ Z(G) has been chosen according to Lemma 3.7. We first define the discrete coefficient spaces which will be instrumental in the characterization of the Besov spaces: The coefficient spaceḃ s p,q (Γ) associated toḂ s p,q and Γ is then defined aṡ b s p,q (Γ) := {c j,γ } j∈Z,γ∈Γ : {c j,γ } j∈Z,γ∈Γ ḃs p,q < ∞ .
We simply writeḃ s p,q if Γ is understood from the context.
We define the analysis operator A ψ associated to the function ψ and Γ, assigning each u ∈ S ′ (G)/P the family of coefficients A ψ (u) = { u, ψ j,γ } j,γ . Note that the analysis operator is implicitly assumed to refer to the same set Γ that is used in the definition oḟ b s p,q . We next formulate properties of the sampling sets we intend to use in the following. We shall focus on regular sampling, as specified in the next definition. Most of the results are obtainable for less regular sampling sets, at the cost of more intricate notation.
Definition 5.2. A subset Γ ⊂ G is called regular sampling set, if there exists a relatively compact Borel neighborhood W ⊂ G of the identity element of G satisfying γ∈Γ γW = G (up to a set of measure zero) as well as |γW ∩ αW | = 0, for all distinct γ, α ∈ Γ. Such a set W is called a Γ-tile. A regular sampling set Γ is called U-dense, Note that the definition of U-dense used here is somewhat more restrictive than, e.g., in [12]. A particular class of regular sampling sets is provided by lattices, i.e., cocompact discrete subgroups Γ ⊂ G. Here, Γ-tiles are systems of representatives mod Γ. However, not every stratified Lie group admits a lattice. By contrast, there always exist sufficiently dense regular sampling sets, as the following result shows. Proof. By [12, 5.10] there exists Γ ⊂ G and a relatively compact W with nonempty open interior, such that γW tiles G (up to sets of measure zero). Then V = W x −1 0 is a Γ-tile, for some point x 0 in the interior of W . Finally, choosing b > 0 sufficiently small ensures that bV ⊂ U, and bV is a bΓ-tile.
The chief result of this section is the following theorem which shows that the Besov norms can be expressed in terms of discrete coefficients. Note that the constants arising in the following norm equivalences may depend on the space, but the same sampling set is used simultaneously for all spaces.
Theorem 5.4. There exists a neighborhood U of the identity, such that for all U-dense regular sampling set Γ, and for all u ∈ S ′ (G)/P and all 1 ≤ p, q ≤ ∞, the following implication holds: u ∈Ḃ s p,q ⇒ { u, ψ j,γ } j∈Z,γ∈Γ ∈ḃ s p,q (Γ) .
Furthermore, the induced coefficient operator A ψ :Ḃ s p,q →ḃ s p,q is a topological embedding. In other words, onḂ s p,q one has the norm equivalence with constants depending on p, q, s and Γ.
Remark 5.5. As a byproduct of the discussion in this section, we will obtain that the tightness of the frame estimates approaches 1, as the density of the sampling set increases. I.e., the wavelet frames are asymptotically tight.
For the proof of Theorem 5.4, we need to introduce some notations. In the following, we write X j = {u * ψ * j : u ∈ S ′ (G)} , which is a space of smooth functions, as well as X p j = X j ∩ L p (G). Furthermore, let Γ j = 2 j Γ, and denote by R Γ j : X j ∋ g → g| Γ j the restriction operator mapping. In order to prove Theorem 5.4, it is enough to prove the following sampling result for the spaces X j ; the rest of the argument consists in summing over j. In particular, note that the sampling set Γ is independent of p and j, and the associated constants are independent of j.
Lemma 5.6. There exists a neighborhood U of the identity, such that for all U-dense regular sampling sets Γ, the implication holds. Furthermore, with suitable constants 0 < c(p) ≤ C(p) < ∞ (for 1 ≤ p ≤ ∞), the inequalities hold for all j ∈ Z and all u ∈ X j .
Proof. Here we only show that the case j = 0 implies the other cases; the rest will be established below. Hence assume (37) is known for j = 0. Let g = u * ψ * j ∈ X j . For arbitrary j we have that ψ * j = 2 jQ ψ * 0 • δ 2 j , and thus Here v j = u•δ 2 −j , where the dilation action on distributions is defined in the usual manner by duality. The last equality follows from the fact that δ 2 j is a group homomorphism.
Recall that for any j and γ, ψ j,γ (x) = a jQ ψ(γ −1 · 2 j x), applying the case j = 0, we obtain for p < ∞ that γ∈Γ | u, ψ j,γ | p which is the upper estimate for arbitrary j. The lower estimate and the case p = ∞ follow by similar calculations.
For the remainder of this section, we will therefore be concerned with the case j = 0, which will be treated using ideas similar to the ones in [12], relying mainly on oscillation estimates. Given any function f on G and a set U ⊂ G, we define the oscillation We can then formulate the following result.
Proposition 5.7. Let X 0 ⊂ S ′ (G) be a space of continuous functions. Suppose that there exists K ∈ S(G) such that, for all f ∈ X 0 , f = f * K holds pointwise. Define X p 0 = X 0 ∩ L p (G), for 1 ≤ p ≤ ∞. Let ǫ < 1, and U be a neighborhood of the unit element fulfilling osc U (K) 1 ≤ ǫ. Then, for all U-dense regular sampling sets Γ the following implication holds: The restriction map R Γ : f → f | Γ induces a topological embedding (X p 0 , · p ) → l p (Γ). More precisely, for p < ∞, where W denotes a Γ-tile, and Proof. We introduce the auxiliary operator T : ℓ p (Γ) → L p (G) defined by with c = (c γ ) γ∈Γ . Since the sets γW are pairwise disjoint, T is a multiple of an isometry, T c p = |W | c p . In particular, T has a bounded inverse on its range, and T c ∈ L p (G) implies c ∈ ℓ p (Γ) for any sequence c ∈ C Γ .
The equation f = f * K implies the pointwise inequality osc U (f ) ≤ |f | * osc U (K) .
(see [12, p. 185]). Now Young's inequality provides for f ∈ X p : Since the γW 's are disjoint, we may then estimate, for all f ∈ X p , In particular, T R Γ f ∈ L p (G), whence R Γ f ∈ ℓ p (Γ). In addition, we obtain the upper bound of the sampling inequality for f ∈ X p The lower bound follows similarly by Thus (39) and (38) are shown, for 1 ≤ p < ∞. For p = ∞ we note that T ∞ = T −1 ∞ = 1. Furthermore, Now the remainder of the proof is easily adapted from the case p < ∞.
It remains to check the conditions of the proposition for Lemma 5.8. There exists a Schwartz function K acting as a reproducing kernel for X 0 , i.e., f = f * K holds for all f ∈ X p 0 .
Proof. We pick a real-valued C ∞ c -function k on R + that is identically 1 on the support of ψ 0 , and let K be the associated distribution kernel to k(L). Then ψ * 0 = ψ * 0 * K, whence f = f * K follows, for all f ∈ X 0 . Lemma 5.9. Let K be a Schwartz function. For every ǫ > 0 there exists a compact neighborhood U of the unit element such that osc U (K) 1 < ǫ.
Proof. First observe that by continuity, osc U (K) → 0 pointwise, as U runs through a neighborhood base at the identity element. Thus by dominated convergence it suffices to prove osc V (K) 1 < ∞, for some neighborhood V . Let V = {x ∈ G : |x| < 1}. A straightforward application of the mean value theorem [8, Here C and β are constants depending on G. The Sobolev estimate [21, (5.13)] for p = 1 yields that for all z with |z| < β where Y runs through all possible Y I with d(I) ≤ Q + 1, including the identity operator corresponding to I = (0, . . . , 0). Furthermore, W = {x ∈ G : |x| < β}, and C ′ > 0 is a constant. Now integrating against Haar-measure (which is two-sided invariant) yields and the last integral is finite because K is a Schwartz function. Now Lemma 5.6 is a direct consequence of 5.7, 5.8 and 5.9. Note that the tightness in Proposition 5.7 converges to 1, as U runs through a neighborhood of the identity. This property is then inherited by the norm estimates in Theorem 5.4.

Banach Wavelet Frames for Besov Spaces
In Hilbert spaces a norm equivalence such as (35) would suffice to imply that the wavelet system is a frame, thus entailing a bounded reconstruction from the discrete coefficients. For Banach spaces one needs to use the extended definition of frames [17], i.e., to show the invertibility of associated frame operator. In this section will establish these statements for wavelet systems in Besov space. We retain the assumption that the wavelet ψ was chosen according to Lemma 3.7.
For this purpose, we first prove that any linear combination of wavelet systems with coefficients inḃ s p,q converges unconditionally inḂ s p,q , compare [10, Theorem 3.1]. We then show that for all sufficiently dense choices of the sampling set Γ, the wavelet system {2 −jQ ψ j,γ } constitutes a Banach frame forḂ s p,q . Recall that the sampled convolution products studied in the previous sections can be read as scalar products f * ψ * j (2 j γ) = f, ψ j,γ , where ψ j,γ (x) = 2 jQ ψ(γ −1 · 2 j x) denotes the wavelet of scale 2 −j at position 2 −j γ. In the following, the wavelet system is used for synthesis purposes, i.e., we consider linear combinations of discrete wavelets. The next result can be viewed in parallel to synthesis results e.g. in [28]. It establishes synthesis for a large class of systems. Note in particular that the functions g j,γ need not be obtained by dilation and shifts from a single function g.
(a) Suppose that we are given tempered distributions (g j,γ ) j∈Z,γ∈Γ satisfying the following decay conditions: For all N, θ ∈ N there exist constants c 1 , c 2 such that for all j, l ∈ Z, γ ∈ Γ, x ∈ G: Then for all {c j,γ } j∈Z,γ∈Γ ∈ḃ s p,q (Γ), the sum converges unconditionally in the Besov norm, with for some constant c independent of {c j,γ } j∈Z,γ∈Γ . In other words, the synthesis operatorḃ s p,q (Γ) →Ḃ s p,q associated to the system (g j,γ ) j,γ is bounded.
In order to motivate the following somewhat technical lemmas, let us give a short sketch of the proof strategy for the theorem. It suffices to show (43) for all finitely supported sequences; the rest follows by density arguments, using thatḂ s p,q is a Banach space. Hence, given a finitely supported coefficient sequence {c j,γ } and f = j,γ c j,γ g j,γ we need estimates for the L p -norms of These estimates are obtained by first looking at the summation over γ, with j fixed, and then summing over j. In both steps, we use the decay condition (41).
The following lemma shows that (41) is fulfilled for g j,γ = ψ j,γ , and thus allows to conclude part (b) of Theorem 6.1.
Then the operator is bounded on ℓ p (Γ), with operator norm ≤ M.
Lemma 6.4. Let η, j ∈ Z, with η ≤ j and N ≥ Q + 1. Let Γ ⊂ G be separated. Then for any x ∈ G one has where the constant C depends only on N and Γ.
Proof. By assumption there exists an open set W such that γW ∩ γ ′ W = ∅, for γ, γ ′ ∈ Γ with γ ∩ γ ′ . In addition, we may assume W relatively compact. Then For y ∈ 2 −j (γW ), the triangle inequality of the quasi-norm yields with the last inequality due to η ≤ j. Accordingly, where the inequality used disjointness of the γW . For N ≥ Q+1, the integral is finite.
The next lemma is an analog of Lemma 3.4 of [10], which we will need for the proof of Theorem 6.1.
Lemma 6.5. Let 1 ≤ p ≤ ∞ and j, η ∈ Z be fixed with η ≤ j. Suppose that Γ ⊂ G is a regular sampling set. For any γ ∈ Γ, let f j,γ be a function on G. Assume that the f j,γ fulfill the decay estimate with a constant C 1 > 0. Define F = γ∈Γ c j,γ f j,γ , where {c j,γ } γ ∈ l p (Γ). Then the series converges unconditionally in L p , with with a constant C 2 independent of j, γ, η, and of the coefficient sequence.
Thus, by the case j = 0, 2 −jQ/p v j * ψ * 0 * ψ 0 − g • δ 2 −j p = ǫ2 −jQ/p v j * ψ * 0 * ψ 0 p = ǫ u * ψ * j p , as desired. Now, invertibility of the frame operator is easily established. In fact, we can even show the existence of a dual frame and an atomic decomposition for our homogeneous Besov spaces. Note however that the notation of the following theorem is somewhat deceptive: The dual wavelet frame might depend on the spaceḂ s p,q , whereas the well-known result for wavelet bases in the Euclidean setting allows to takeψ j,k = ψ j,k , regardless of the Besov space under consideration. Theorem 6.7 (Atomic decomposition). Let 1 ≤ p, q < ∞. There exists a neighborhood U of the identity such that for all U-dense regular sampling sets Γ ⊂ G the frame operator S ψ,Γ is an automorphism ofḂ s p,q (G). In this case, there exists a dual wavelet family {ψ j,γ } j,γ ⊂Ḃ s * p,q , such that for all f ∈ Proof. Fix 0 < ǫ < 1, and choose the neighborhood U according to the previous lemma, with ǫ replaced by ǫ 0 = ǫ (2 −sq + 1 + 2 sq ) 1/q 3 (q−1)/q ψ 0 1 . Let Γ be a U-dense regular sampling set, and let W denote a Γ-tile. Let f ∈ D, where D ⊂Ḃ s p,q (G) is the dense subspace of functions for which f = j∈Z f * ψ * j * ψ j Remark 6.8. We wish to stress that an appropriate choice of Γ provides a wavelet frame inḂ s p,q , simultaneously valid for all 1 ≤ p, q < ∞ and all s ∈ R. As the discussion in Section 5 shows, the tightness of the oscillation estimates converges to one with increasing density of the quasi-lattices. As a consequence, the tightness of the wavelet frame inḂ s p,q converges to one also, at least when measured with respect to the Besov norm from 3.9, applied to the same window ψ. However, the tightness will depend on p, q and s. Remark 6.9. We expect to remove the restriction on p and q in our future work and prove the existence of (quasi-) Banach frame for all homogeneous Besov spacesḂ s p,q with 0 < p, q ≤ ∞ and s ∈ R.
Remark 6.10. Our treatment of discretization problems via oscillation estimates is heavily influenced by the work of Feichtinger and Gröchenig on atomic decomposition, in particular the papers [17,7] on coorbit spaces. A direct application of these results to our problem is difficult, since the representations underlying our wavelet transforms are not irreducible if the group G is noncommutative, whereas irreducibility is an underlying assumption in [17,7]. However, the recent extensions of coorbit theory, most notably [5], provides a unified approach to our results (see [4]).