Detailed Large Deviation Analysis of a Droplet Model Having a Poisson Equilibrium Distribution

One of the main contributions of this paper is to illustrate how large deviation theory can be used to determine the equilibrium distribution of a basic droplet model that underlies a number of important models in material science and statistical mechanics. The model is simply defined. $K$ distinguishable particles are placed at random onto the $N$ sites of a lattice, where the ratio $K/N$, the average number of particles per site, equals a constant $c \in (1,\infty)$. We focus on configurations for which each site is occupied by at least one particle. The main result is the large deviation principle (LDP), in the limit where $K \rightarrow \infty$ and $N \rightarrow \infty$ with $K/N = c$, for a sequence of random, number-density measures, which are the empirical measures of dependent random variables that count the droplet sizes. The rate function in the LDP is the relative entropy $R(\theta | \rho^*)$, where $\theta$ is a possible asymptotic configuration of the number-density measures and $\rho^*$ is a Poisson distribution restricted to the set of positive integers. This LDP reveals that $\rho^*$ is the equilibrium distribution of the number-density measures, which in turn implies that $\rho^*$ is the equilibrium distribution of the random variables that count the droplet sizes. We derive the LDP via a local large deviation estimate of the probability that the number-density measures equal $\theta$ for any probability measure $\theta$ in the range of these random measures.


Introduction
This paper is an expanded version of the companion paper [12] containing the following additional material: full details of routine proofs omitted from [12], four appendices, and extra background information.
These two papers are motivated by a natural and simply stated question. Given b ∈ N and c > b, K distinguishable particles are placed, each with equal probability 1/N, onto the N sites of a lattice. Under the assumption that K/N = c and that each site is occupied by a minimum of b particles, what is the equilibrium distribution, as N → ∞, of the number of particles per site? We prove in Corollary 2.3 that this equilibrium distribution is a Poisson distribution ρ b,α b (c) restricted to the set of positive integers n satisfying n ≥ b; the parameter α b (c) is chosen so that the mean of ρ b,α b (c) equals c. As we explain at the end of the introduction, this equilibrium distribution has important applications to technologies using sprays and powders.
We answer this question about the equilibrium distribution by first proving a large deviation principle (LDP) for a sequence of random, number-density measures, which are the empirical measures of a sequence of dependent random variables that count the droplet sizes. This LDP is stated in Theorem 2.1. The space for which we prove the LDP is a natural choice, being the smallest convex subset of probability measures containing the range of the number-density measures. Our proof of the LDP avoids general results in the theory of large deviations, many of which do not apply because the space for which we prove the LDP is not a complete, separable metric space. Our proof is completely self-contained and starts from first principles, using techniques that are familiar in statistical mechanics. For example, the proof of the local large deviation estimate in Theorem 3.1, a key step in the proof of the LDP for the number-density measures, is based on combinatorics, Stirlings formula, and Laplace asymptotics. Our self-contained proof of the LDP perfectly matches the simplicity and elegance of our main result on the equilibrium distribution stated in the preceding paragraph.
In order to define the droplet model and to formulate the LDP for the number-density measures, a standard probabilistic model is introduced. We begin as in the first paragraph. Given b ∈ N and c > b, K distinguishable particles are placed, each with equal probability 1/N, onto the N sites of the lattice Λ N = {1, 2, . . . , N}. In section 2 we also consider the case b = 0. The large deviation limit -or in statistical mechanical terminology, the thermodynamic limit -is defined by taking K → ∞ and N → ∞ with K/N equal to c. The ratio K/N equals the average number of particles per site or the average size of a droplet. The configuration space for the droplet model is the set Ω N = Λ K N consisting of all ω = (ω 1 , ω 2 , . . . , ω K ), where ω i denotes the site in Λ N occupied by the i'th particle. The cardinality of Ω N equals N K . Denote by P N the uniform probability measure that assigns equal probability 1/N K to each of the N K configurations ω ∈ Ω N . For subsets A of Ω N , P N (A) = card(A)/N K , where card denotes cardinality.
The asymptotic analysis of the droplet model involves the following two random variables, which are functions of the configuration ω ∈ Ω N : for ℓ ∈ Λ N , K ℓ (ω) denotes the number of particles occupying the site ℓ in the configuration ω; for j ∈ N ∪{0}, N j (ω) denotes the number of sites ℓ ∈ Λ N for which K ℓ (ω) = j.
We focus on the subset of Ω N consisting of all configurations ω for which every site of Λ N is occupied by at least b particles. Because of this restriction N j (ω) is indexed by j ∈ N b = {n ∈ Z : n ≥ b}. It is useful to think of each particle as having one unit of mass and of the set of particles at each site ℓ as defining a droplet. With this interpretation, for each configuration ω, K ℓ (ω) denotes the mass or size of the droplet at site ℓ. The j'th droplet class has N j (ω) droplets and mass jN j (ω). Because the number of sites in Λ N equals N and the sum of the masses of all the droplet classes equals K, the following conservation laws hold for such configurations: (1.1) In addition, since the total number of particles is K, it follows that ℓ∈Λ N K ℓ = K. These equality constraints show that the random variables N j and the random variables K ℓ are not independent.
In order to carry out the asymptotic analysis of the droplet model, we introduce a quantity m = m(N) that converges to ∞ sufficiently slowly with respect to N; specifically, we require that m(N) 2 /N → 0 as N → ∞. In terms of b and m we define the subset Ω N,b,m of Ω N consisting of all configurations ω for which every site of Λ N is occupied by at least b particles and at most m of the quantities N j (ω) are positive. This second condition is a key technical device that allows us to control the errors in several estimates.
The random quantities in the droplet model for which we formulate an LDP are the numberdensity measures Θ N,b . For ω ∈ Ω N,b,m these random probability measures assign to j ∈ N b the probability N j (ω)/N, which is the number density of the j'th droplet class. Thus for any Because of the two conservation laws in (1.1) and because K/N = c, for ω ∈ Ω N,b,m , Θ N,b (ω) is a probability measure on N b = {n ∈ Z : n ≥ b} having mean · card(A).
In the language of statistical mechanics P N,b,m defines a microcanonical ensemble that incorporates the conservation laws for number and mass expressed in (1.1). A natural question is to determine two equilibrium distributions: the equilibrium distribution ρ ⋆ of the number-density measures and the equilibrium distribution ρ * * = j∈N b ρ * * j δ j of the droplet-size random variables K ℓ . These distributions are defined by the following two limits: for any ε > 0, any ℓ ∈ Λ N , and all j ∈ N b lim N →∞ P N,b,m (Θ N,b ∈ B(ρ * , ε)) → 1 and lim N →∞ where B(ρ * , ε) denotes the open ball with center ρ * and radius ε defined with respect to an appropriate metric on P N b ,c . We make the following observations concerning these equilibrium distributions.
2. We first determine the equilibrium distribution ρ * of Θ N,b and then prove that ρ * is also the equilibrium distribution of K ℓ .
3. As in many models in statistical mechanics, an efficient way to determine the equilibrium distribution ρ * of Θ N,b is to prove an LDP for Θ N,b , which we carry out in Theorem 2.1.
The content of Theorem 2.1 is the following: as N → ∞ the sequence of number-density measures Θ N,b satisfies the LDP on P N b ,c with respect to the measures P N,b,m . The rate function is the relative entropy R(θ|ρ b,α ) of θ ∈ P N b ,c with respect to the Poisson distribution ρ b,α on N b having components In this formula Z b (α) is the normalization that makes ρ b,α a probability measure, and α equals the unique value α b (c) for which ρ b,α b (c) has mean c [Thm. C. 1(a)]. Using the fact that R(θ|ρ b,α b (c) ) equals 0 at the unique measure θ = ρ b,α b (c) , we apply the LDP for Θ N,b to conclude in Theorem 2.2 that ρ b,α b (c) is the equilibrium distribution of Θ N,b . Corollary 2.3 then implies that ρ b,α b (c) is also the equilibrium distribution of K ℓ . The space P N b ,c is the most natural space on which to formulate the LDP for Θ N,b in Theorem 2.1. Not only is P N b ,c the smallest convex set of probability measures containing the range of Θ N,b for all N ∈ N, but also the union over N ∈ N of the range of Θ N,b is dense in P N b ,c . As we explain in part (a) of Theorem 2.4, P N b ,c is not a complete, separable metric space, a situation that prevents us from applying any of the general results in the theory of large deviations that require the setting of a complete, separable metric space. In our opinion the fact that we avoid using such general results makes our self-contained proof of the LDP even more attractive.
The droplet model is defined in section 2. Our proof of the LDP for Θ N,b consists of the following three steps, the first of which is the topic of section 3 and the second and third of which are the topics of section 4.

1.
Step 1 is to derive the local large deviation estimate in part (b) of Theorem 3.1. This local estimate, one of the centerpieces of the paper, gives information not available in the LDP for Θ N,b , which involves global estimates. It states that as N → ∞, for any probability measure θ in the range of the number-density measure Θ N,b where o(1) is an error term converging to 0 uniformly for all measures θ in the range of Θ N,b . Showing that the parameter of the Poisson distribution ρ b,α b (c) in the local large deviation estimate equals α b (c) is one of the crucial elements of the proof. The proof of the local large deviation estimate involves combinatorics, Stirling's formula, and Laplace asymptotics.

2.
Step 2 is to lift this local large deviation estimate to the large deviation limit for Θ N,b lying in open balls and certain other subsets of P N b ,c . This is done in Theorem 4.1 as a consequence of the general formulation given in Theorem 4.2 and the approximation procedure proved in appendix B.

3.
Step 3 is to lift the large deviation limit for open balls and certain other subsets to the LDP for Θ N,b stated in Theorem 2.1, thus proving this LDP. This is done by applying the general formulation given in Theorem 4.3.
The paper has four appendices. In appendix A we derive properties of the relative entropy needed in a number of our results. Appendix B is devoted to the proof of the approximation procedure to which we just referred in item 2 above. In appendix C we prove the existence of the quantity α b (c) that defines the Poisson distribution ρ b,α b (c) and derive a number of properties of this quantity. Our proof of the existence of α b (c) for general b is subtle. This proof should be contrasted with the straightforward proof of the existence of α b (c) for b = 1, which is given in Theorem C.2. We now explain the contents of appendix D. In order to control several errors in our self-contained proof of the LDP, we must introduce the restriction involving the quantity m = m(N) that, as mentioned earlier, requires no more than m of the quantities N j to be positive. This restriction is explained in detail in section 2; it is incorporated in the definition (2.1) of the set of configurations Ω N,b,m and the definition (2.3) of the microcanonical ensemble P N,b,m . In appendix D we present evidence supporting the conjecture that this restriction can be eliminated. Eliminating this restriction would enable us to present our results in a more natural form.
The paper [13] explores how our work on the droplet model was inspired by the work of Ludwig Boltzmann on a simple model of a discrete ideal gas, for which the Maxwell-Boltzmann is the equilibrium distribution. The main connection is via the local large deviation estimate in part (b) of Theorem 3.1. The form of the Maxwell-Boltzmann distribution can be proved using Sanov's theorem, which proves the LDP for the empirical measures of i.i.d. random variables [13, §4]. As we show just before Corollary 2.3, Θ N,b is the empirical measure of the random variables K ℓ . However, Sanov's theorem for empirical measures of i.i.d. random variables cannot be applied because the K ℓ are dependent and, since their distributions depend on N, they form a triangular array. In section 7 of [13] we explore how Sanov's theorem, although not applicable as stated, can be used to give a heuristic motivation of the LDP for Θ N,b .
The main application of the results in this paper is to technologies using sprays and powders, which are ubiquitous in many fields, including agriculture, the chemical and pharmaceutical industries, consumer products, electronics, manufacturing, material science, medicine, mining, paper making, the steel industry, and waste treatment. In this paper we focus on sprays; our theory also applies to powders with only changes in terminology. The behavior of sprays might be complex depending on various parameters including evaporation, temperature, and viscosity. Our goal here is to consider the simplest model where the only assumption is made on the average size of droplets in the spray. In many situations it is important to have good control over the sizes of the droplets, which can be translated into properties of probability distributions. The size distributions are important because they determine reliability and safety in each particular application.
Interestingly, there does not seem to be a rigorous theory that predicts the equilibrium distribution of droplet sizes, analogous to the Maxwell-Boltzmann distribution of energy levels in a discrete ideal gas [17,20]. Our goal in the present paper is to provide such a theory. We do so by focusing on one aspect of the problem related to the relative entropy, an approach that characterizes the equilibrium distribution of droplet sizes as being a Poisson distribution restricted to N b . We expect that this distribution will dominate experimental observations. A full understanding of droplet behavior under dynamic conditions requires treating many other aspects and is beyond the scope of this paper. A comparison of our results with experimental data will appear elsewhere. In addition we plan to apply the ideas in this paper to understand the entropy of dislocation networks.
Because of the length of this paper and its many technicalities, we would like to help the reader by summarizing the main results and explaining how one proceeds from the local large deviation estimate stated in (1.2) and proved in part (b) of Theorem 3.1 to the LDP for the number-density measures Θ N,b stated in Theorem 2.1. We also summarize the theorems proved in appendices A, B, C, and D.
• Theorem 2.1. This theorem states that the sequence of P N,K,m -distributions of the numberdensity measures Θ N,b on P N,c satisfies the LDP on P N,c with rate function R(θ|ρ b,α b (c) ).
• Theorem 2.2. In this theorem we identify the Poisson distribution ρ b,α b (c) as the equilibrium distribution of Θ N,b with respect to P N,b,m . It is a consequence of Theorem 2.1.
• Corollary 2.3. The Poisson distribution ρ b,α b (c) is shown in this corollary to be also the equilibrium distribution of the droplet-size random variables K ℓ with respect to P N,b,m . It is a consequence of Theorem 2.2.
• Theorem 2.4. This theorem proves a number of properties of two spaces of probability measures that arise in the large deviation analysis of Θ N,b .
• Theorem 3.1. In part (a) of this theorem we show that there exists a unique value α = α b (c) ∈ (0, ∞) for which the measure ρ b,α b (c) has mean c; the components of ρ b,α b (c) are defined in (2.7). In part (b) we prove the local large deviation estimate (1.2).
• • Theorem A.1. In this theorem we collect a number of properties of the relative entropy used throughout the paper.
• Theorem B.1. This result is an approximation theorem that allows us to approximate an arbitrary probability measure θ ∈ P N b ,c by a sequence of probability measures θ (N ) in the range of Θ N,b having the following property: the sequence of relative entropies This approximation theorem is applied in two key places. First, it allows us to prove the asymptotic estimate in Lemma 3.3, which is a basic ingredient in the proof of the local large deviation estimate in part (b) of Theorem 3.1. Second, it allows us to lift this local large deviation estimate to the large deviation limit for open balls and certain other subsets as formulated in Theorem 4.1.
• Theorem C.1. This theorem studies a number of properties of the quantity α b (c) that defines the Poisson-type equilibrium distribution ρ α(c) .
• Theorem C.2. This theorem studies a number of properties of the quantity α b (c) for b = 1. Luc Rey-Bellet for valuable conversations concerning large deviation theory, and Michael Sullivan for his generous help with a number of topological issues arising in this paper. We are also grateful to Jonathan Machta for suggesting the generalization, explained in section 2, from a minimum of 1 particle at each site to a minimum of b particles at each site, where b is any positive integer, and for helping us with the proof of part (a) of Theorem C.1.

Definition of Droplet Model and Main Theorem
After defining the droplet model, we state the main theorem in the paper, Theorem 2.1. The content of this theorem is the LDP for the sequence of random, number-density measures, which are the empirical measures of a sequence of dependent random variables that count the droplet sizes in the model. As we show in Theorem 2.2 and in Corollary 2.3, the LDP enables us to identify a Poisson distribution as the equilibrium distribution both of the number-density measures and of the droplet-size random variables. Finally, in Theorem 2.4 we prove a number of properties of two spaces of probability measures in terms of which the LDP for the numberdensity measures is formulated. We start by fixing parameters b ∈ N ∪ {0} and c ∈ (b, ∞). The droplet model is defined by a probability measure P N,b parametrized by N ∈ N and the nonnegative integer b. The measure depends on two other positive integers, K and m, where 2 ≤ m ≤ N < K. Both K and m are functions of N in the large deviation limit N → ∞. In this limit -which is the same as the thermodynamic limit in statistical mechanics -we take K → ∞ and N → ∞, where K/N, the average number of particles per site, stays equal to c. Thus K = Nc. In addition, we take m → ∞ sufficiently slowly by choosing m to be a function m(N) satisfying m(N) → ∞ and m(N) 2 /N → 0 as N → ∞; e.g., m(N) = N δ for some δ ∈ (0, 1/2). Throughout this paper we fix such a function m(N). The parameter b and the function m = m(N) first appear in the definition of the set of configurations Ω N,b,m in (2.1), where these quantities will be explained.
Because K and N are integers, c must be a rational number. This in turn imposes a restriction on the values of N and K. If c is a positive integer, then N → ∞ along the positive integers and K → ∞ along the subsequence K = cN. If c = x/y, where x and y are positive integers with y ≥ 2 and x and y relatively prime, then N → ∞ along the subsequence N = yn for n ∈ N and K → ∞ along the subsequence K = cN = xn. Throughout this paper, when we write N ∈ N or N → ∞, it is understood that N and K satisfy the restrictions discussed here. In the droplet model K distinguishable particles are placed, each with equal probability 1/N, onto the sites of the lattice Λ N = {1, 2, . . . , N}. This simple description corresponds to a simple probabilistic model. The configuration space is the set Ω N = Λ K N consisting of all sequences ω = (ω 1 , ω 2 , . . . , ω K ), where ω i ∈ Λ N denotes the site in Λ N occupied by the i'th particle. Let ρ (N ) be the measure on Λ N that assigns equal probability 1/N to each site in Λ N , and let P N = (ρ (N ) ) K be the product measure on Ω N with equal one-dimensional marginals ρ (N ) . Thus P N is the uniform probability measure that assigns equal probability 1/N K to each of the N K configurations ω ∈ Ω N ; for subsets A of Ω N we have P N (A) = card(A)/N K , where card denotes cardinality.
The asymptotic analysis of the droplet model involves two random variables that we now introduce. Our goal is to prove a large deviation principle (LDP) for a sequence of random probability measures defined in terms of these random variables. The LDP is stated in Theorem 2.1.
The dependence of K ℓ (ω) and N j (ω) on N is not indicated in the notation. Because the distributions of both random variables depend on N, both K ℓ and N j form triangular arrays. We now specify the role played by the nonnegative integer b, first focusing on the case where b is a positive integer. The case where b = 0 is discussed later. For ω ∈ Ω N , in general there exist sites ℓ ∈ Λ N for which K ℓ (ω) = 0; i.e., sites that are occupied by 0 particles. For this reason the quantity N j (ω) just defined is indexed by j ∈ N ∪{0}. The next step in the definition of the droplet model is to specify a subset Ω N,b,m of configurations ω ∈ Ω N for which every site is occupied by at least b particles and another constraint holds. In the following definition of Ω N,b,m , N b denotes the set {n ∈ Z : n ≥ b}. Thus N 0 is the set of nonnegative integers.
1. Given b ∈ N, for any configuration ω ∈ Ω N,b,m every site of Λ N is occupied by at least b particles. In other words, for each ℓ ∈ Λ N there exists at least b values of i ∈ {1, 2, . . . , K} such that ω i = ℓ. Equivalently, in the configuration ω and for each ℓ ∈ Λ N we have K ℓ (ω) ≥ b. It follows that for ω ∈ Ω N,b,m , N j (ω) is indexed by j ∈ N b .
2. For any configuration ω ∈ Ω N,b,m at most m of the components N j (ω) for j ∈ N b are positive. As specified at the start of this section, m = m(N) → ∞ and m(N) 2 /N → 0 as N → ∞.
We denote by N(ω) the sequence {N j (ω), j ∈ N b } and define In terms of this notation (2.1) Constraint 2, which restricts the number of positive components of N(ω), is a useful technical device that allows us to control the errors in several estimates. In appendix D we explain why we impose this constraint and give evidence supporting the conjecture that this restriction can be eliminated. Because of the two constraints, the maximum number of particles that can occupy any site is When b is a positive integer, for each ω ∈ Ω N,b,m each site in Λ N is occupied by at least b particles. In this case it is useful to think of each particle as having one unit of mass and of the set of particles at each site ℓ as defining a droplet. With this interpretation, for each configuration ω, K ℓ (ω) denotes the mass or the size of the droplet at site ℓ. The j'th droplet class has N j (ω) droplets and mass jN j (ω). Because the number of sites in Λ N equals N and the sum of the masses of all the droplet classes equals K, it follows that the quantities N j (ω) satisfy the following conservation laws for all ω ∈ Ω N,b,m : We now consider the modifications that must be made in these definitions when b = 0. In this case constraint 1 in the definition of Ω N,b,m disappears because we allow sites to be occupied by 0 particles, and therefore N j (ω) is indexed by j ∈ N 0 = N ∪ {0}. On the other hand, we retain constraint 2 in the definition of Ω N,0,m , which requires that for any configuration ω ∈ Ω N,0,m at most m of the components N j (ω) for j ∈ N 0 are positive. In terms of |N(ω)| + the definition of Ω N,0,m becomes Because the choice b = 0 allows sites to be empty, we lose the interpretation of the set of particles at each site as being a droplet. However, for ω ∈ Ω N,0,m the two conservation laws (2.2) continue to hold.
For the remainder of this paper we work with any fixed nonnegative integer b. The probability measure P N,b,m defining the droplet model is obtained by restricting the uniform measure P N to the set Ω N,b,m . Thus P N,b,m equals the conditional probability P N (·|Ω N,b,m ). For subsets A of Ω N,b,m , P N,b,m (A) takes the form The second line of this formula follows from the fact that P N assigns equal probability 1/N K to every ω ∈ Ω N,b,m . In the language of statistical mechanics P N,b,m defines a microcanonical ensemble that incorporates the conservation laws for number and mass expressed in (2.2).
Having defined the droplet model, we introduce the random probability measures whose large deviations we will study. For ω ∈ Ω N,b,m these measures are the number-density measures Θ N,b that assign to j ∈ N b the probability N j (ω)/N. This ratio represents the number density of droplet class j. Thus for any subset A of N b By the two formulas in (2.2) is a probability measure on N b having mean c. We next introduce several spaces of probability measures that arise in the large deviation analysis of the droplet model. P N b denotes the set of probability measures on N b = {n ∈ Z : n ≥ b}. Thus θ ∈ P N b has the form j∈N b θ j δ j , where the components θ j satisfy θ j ≥ 0 and θ(N b ) = j∈N b θ j = 1. We say that a sequence of measures {θ (n) , n ∈ N} in P N b converges weakly to θ ∈ P N b , and write θ (n) ⇒ θ, if for any bounded function f mapping P N b is topologized by the topology of weak convergence. There is a standard technique for introducing a metric structure on P N b for which we quote the main facts. Because N is a complete, separable metric space with metric d(x, y) = |x − y|, there exists a metric π on P N b called the Prohorov metric with the following properties: • Convergence with respect to the Prohorov metric is equivalent to weak convergence [14, Thm. 3.3.1]; i.e., θ (n) ⇒ θ if and only if π(θ (n) , θ) → 0 as N → ∞.
• With respect to the Prohorov metric, P N b is a complete, separable metric space [14, Thm.

3.1.7].
We denote by P N b ,c the set of measures in P N b having mean c. Thus θ ∈ P N b ,c has the form j∈N b θ j δ j , where the components θ j satisfy θ j ≥ 0, j∈N b θ j = 1, and N b xθ(dx) = j∈N b jθ j = c. By (2.5) the number-density measures Θ N,b defined in (2.4) take values in In part (a) of Theorem 2.4 we prove two properties of P N b ,c : with respect to the Prohorov metric, P N b ,c is a relatively compact, separable subset of P N b ; however, P N b ,c is not a closed subset of P N b and thus is not a compact subset or a complete metric space. The fact that P N b ,c is not a closed subset of P N b is easily motivated. If θ (n) is a sequence in P N b ,c such that θ (n) ⇒ θ for some θ ∈ P N b , then some of the mass of θ (n) could escape to ∞, causing θ to have a mean strictly less than c; an example is given in (2.12). Although P N b ,c is the natural space in which to formulate the LDP for Θ N,b in Theorem 2.1, the fact that P N b ,c is not a closed subset of P N b gives rise to a number of unique features in the LDP.
Because P N b ,c is not a closed subset of P N b , it is natural to introduce the closure of P N b ,c in P N b . As we prove in part (b) of Theorem 2.4, the closure of P N b ,c in P N b equals P N b , [b,c] , which is the set of measures in P N b having mean lying in the closed interval [b, c]. For any θ ∈ P N b the minimum value of the mean of θ is b, which occurs if and only if θ = δ b . Being the closure of the relatively compact, separable metric space P N b ,c , P N b ,[b,c] is a compact, separable metric space with respect to the Prohorov metric. This space appears in the formulation of the large deviation upper bound in part (c) of Theorem 2.1.
We next state Theorem 2.1, which is the LDP for the sequence of distributions P N,b,m (Θ N,b ∈ dθ) on P N b ,c as N → ∞. The rate function in the LDP is the relative entropy of θ with respect to a certain measure ρ b,α b (c) = j∈N b ρ b,α b (c);j δ j defined in (2.7), where each ρ b,α b (c);j > 0. Thus any θ ∈ P N b ,c is absolutely continuous with respect to ρ b,α b (c) . For θ ∈ P N b ,c the relative entropy of θ with respect to ρ b,α b (c) is defined by For j ∈ N b the components of the measure ρ b,α b (c) appearing in the LDP have the form where α b (c) ∈ (0, ∞) is chosen so that ρ b,α b (c) has mean c and Z b (α b (c)) is the normalization making ρ b,α b (c) a probability measure; thus Z 0 (α 0 (c)) = e α 0 (c) , and for b ∈ N, As we show in part (a) of Theorem C.1, there exists a unique value of α b (c). For b ∈ N the Poisson-type distribution ρ b,α b (c) differs from a standard Poisson distribution because the former has 0 mass at 0, 1, . . . , b − 1 while the latter has positive mass at these points. In fact, ρ b,α b (c) can be identified as the distribution of a Poisson random variable . Despite this difference we shall also refer to ρ b,α b (c) as a Poisson distribution.
According to part (a) of Theorem 2.1 R(·|ρ b,α b (c) ) has compact level sets in P N b ,c . It is well known that the relative entropy has compact level sets in the complete space P N b . The level sets are also compact in P N b , [b,c] because the latter is a compact subset of P N b . However, because P N b ,c is not closed in P N b , the compactness of the level sets in P N b ,c is not obvious.
As a consequence of the fact that P N b ,c is not closed in P N b , the large deviation upper bound takes two forms depending on whether the subset F of P N b ,c is compact or whether F is closed. When F is compact, in part (b) we obtain the standard large deviation upper bound for F with −R(F |ρ b,α b (c) ) on the right hand side. When F is closed, in part (c) we obtain a variation of the standard large deviation upper bound; −R(F |ρ b,α b (c) ) on the right hand side is replaced by −R(F |ρ b,α b (c) ), where F is the closure of F in the compact space P N b ,[b,c] and is therefore compact. When F is compact, its closure in P N b ,[b,c] is F itself. In this case the large deviation upper bounds in parts (b) and (c) coincide.
The refinement in part (c) is important. It is applied in the proof of Theorem 2.2 to show that ρ b,α b (c) is the equilibrium distribution of the number-density measures Θ N,b . In turn, Theorem 2.2 is applied in the proof of Corollary 2.3 to show that ρ b,α b (c) is the equilibrium distribution of the droplet-size random variables K ℓ .
In the next theorem we assume that m is the function m(N) appearing in the definition of (c) For any closed subset F of P N b ,c , let F denote the closure of F in P N b , [b,c] . We have the large deviation upper bound For any open subset G of P N b ,c we have the large deviation lower bound As noted in the comments after the statement of Theorem 4.3, Theorem 2.1 is a consequence of that theorem and several other results proved in the paper. Part (b) of Theorem 3.1 proves a local large deviation estimate for probabilities of the form P N,b,m (Θ N,b = θ), where θ is a probability measure in the range of Θ N,b . This local estimate is one of the centerpieces of this paper, giving information not available in the LDP for Θ N,b , which involves global estimates. In  c] of the open ball defined by There is a subtlety in the proof in the next theorem that ρ b,α b (c) is the equilibrium distribution of Θ N,b . To prove this, we need an exponentially decaying estimate on the probability that c but is not compact, we obtain this estimate by applying the large deviation upper bound in part (c) of Theorem 2.1 to [B π (ρ b,α b (c) , ε)] c and using the fact that the closure of this set in P

Theorem 2.2.
We assume the hypotheses of Theorem 2.1. The following results hold for any ε > 0.
(a) The quantity For any number y in the interval (0, x ⋆ ) and all sufficiently large N This upper bound implies that as N → ∞, , ε)) → 1 and for any bounded, continuous function g mapping We now prove part (a) of Theorem 2.2. Since R(θ|ρ b,α b (c) ) has compact level sets in This completes the proof of part (a). The inequality in part (b) is an immediate consequence of part (a) and the large deviation upper bound (2.9). This inequality yields the limit P N,b,m (Θ N,b ∈ B π (ρ b,α b (c) , ε)) → 1, which in turn implies (2.8). The proof of Theorem 2.2 is complete.
We now apply Theorem 2.2 to prove that ρ b,α b (c) is also the equilibrium distribution of the random variables K ℓ , which count the droplet sizes at the sites of Λ N . Although these random variables are identically distributed, they are dependent because for each ω ∈ Ω N,b,m they satisfy the equality constraint ℓ∈Λ N K ℓ (ω) = K. Except for one step the proof that ρ b,α b (c) is also the equilibrium distribution of K ℓ is completely algebraic and requires only the condition that the K ℓ are identically distributed. Their dependence does not affect the proof. A key observation needed in the proof is that Θ N,b is the empirical measure of these random variables; i.e., for ω ∈ Ω N,b,m , Θ N,b (ω) assigns to subsets A of N b the probability This characterization of Θ N,b follows from the fact that the empirical measure of K ℓ assigns to j ∈ N b the probability (2.10)

Corollary 2.3.
We assume the hyotheses of Theorem 2.1. Then for any site ℓ ∈ Λ N and any Proof. Since the random variables K ℓ are identically distributed, it suffices to prove the corollary for ℓ = 1. For fixed j ∈ N b , the limit (2.8) with g(θ) = θ j yields This completes the proof.
The last theorem in this section proves several properties of P N b ,c and P N b ,[b,c] with respect to the Prohorov metric that are needed in the paper. (a) P N b ,c , the set of probability measures on N b having mean c, is a relatively compact, separable subset of P N b . However, P N b ,c is not a closed subset of P N b and thus is not a compact subset or a complete metric space.
In the present setting the relative compactness of P N b ,c is easy to prove from the tightness of P N b ,c without appealing to the general formulation of Prohorov's Theorem. Given any sequence θ (n) ∈ P N b ,c , a diagonal argument yields a subsequence θ (n ′ ) such that θ j = lim n→∞ θ To see this let f be any nonzero bounded function mapping The latter bound is possible since by Fatou's Lemma c = lim inf n ′ →∞ j∈N b jθ . . , ξ} and ε > 0 is arbitrary, the weak convergence of θ (n ′ ) to θ is proved. Taking f to be identically 1 verifies that θ ∈ P N b , which must be the case since P N b is complete.
We now prove that P N b ,c is not a closed subset of P N b by exhibiting a sequence θ (n) ∈ P N b ,c having a weak limit that does not lie in P N b ,c . To simplify the notation, we denote the mean of σ ∈ P N b by σ . Let θ be any measure in has the property that θ (n) ∈ P N b ,c and that θ (n) ⇒ θ ∈ P N b ,c . We conclude that P N b ,c is not a closed subset of P N b . This completes the proof of part (a).
, then we use the sequence θ (n) in (2.12), which converges weakly to θ. We conclude that θ lies in the closure of P N b ,c and thus that P N b ,[b,c] is a subset of the closure of P N b ,c in P N b . This completes the proof of part (b). The proof of Theorem 2.4 is done.
We end this section by giving examples of closed, noncompact subsets of P N b ,c and compact subsets of P N b ,c . We do this to emphasize the care that must be taken in dealing with the nonclosed metric space P N b ,c and the necessity of having separate large deviation upper bounds for compact sets in part (b) of Theorem 2.1 and for closed sets in part (c) of Theorem 2.1. We construct these examples as level sets of lower semicontinuous functions I mapping P N b ,c into [0, ∞] and having the form For the next set of examples, we assume that g is a nondecreasing function mapping N b into [0, ∞) and satisfying g(j) → ∞ and g(j)/j → 0 as j → ∞. In this case, as in the proof of part (a) of Theorem 2.4 that P N b ,c is relatively compact, Prohorov's Theorem implies that the level set U M is relatively compact. However, in general U M is not compact because it is not closed For the final set of examples, we assume that g is a nondecreasing function mapping N b into [0, ∞) and satisfying g(j)/j → ∞ as j → ∞. Again Prohorov's Theorem implies that U M is relatively compact. In addition, because of the assumption on g, U M is uniformly integrable; i.e., This implies that if θ (n) ∈ U M converges weakly to θ ∈ P N b , then c = θ (n) → θ . This standard consequence of uniform integrability, proved in Proposition 2.3 in the appendix of [14], can be proved in the present setting as in (2.11) if θ (n ′ ) is replaced by θ (n) and f (j) is replaced by j for j ∈ N b . It follows that θ has mean c and so lies in P N b ,c and therefore in U M because U M is closed in P N b ,c . We conclude that U M is both relatively compact and closed in The rate function in Theorem 2.1 is the relative entropy R(θ|ρ α(c) ), a lower semicontinuous function mapping P N b ,c into [0, ∞] that does not have the simple form of I. The proof that R(·|ρ α(c) ) has compact level sets in P N b ,c relies on Lemma 5.1 in [7] and the fact that ρ α(c) has a finite moment generating function In the next section we present the local large deviation estimate that will be used in section 4 to prove the LDP for Θ N,b in Theorem 2.1.

Local Large Deviation Estimate Yielding Theorem 2.1
The main result needed to prove the LDP in Theorem 2.1 is the local large deviation estimate stated in part (b) of Theorem 3.1. The first step is to introduce a set A N,b,m that plays a central role in this paper. Fix a nonnegative integer b and a rational number c ∈ (b, ∞). Given N ∈ N define K = Nc and let m be the function appearing in the definition of Ω N,b,m in (2.1) and satisfying m(N) → ∞ and m(N) 2 where |ν| + = card{j ∈ N b : ν j ≥ 1}. Because ν j ∈ N 0 , the two sums involve only finitely many terms.
the set of probability measures on N b having mean c. It follows that the set In part (b) of the next theorem we state the local large deviation estimate for the event In Theorem C.2 we give the straightforward proof of the existence of α b (c) for b = 1. The proof of the existence of α b (c) for general b ∈ N is much more subtle than the proof for b = 1. The proof for general b ∈ N is given in appendix C in the present paper, where it is the content of part (a) of Theorem C.1.
We comment on the proof of part (a) of the next theorem for b ∈ N because the existence of The existence of such a solution is a consequence of the following three steps, which are carried out in appendix C: To carry out step 3, we note that because first in terms of an incomplete gamma function and then in terms of a moment generating function. The log-convexity of the moment generating function and a short calculation involving power series completes the proof.
) is finite, and we have the local large deviation estimate The quantity ε N (ν) → 0 uniformly for ν ∈ A N,b,m as N → ∞.
We now prove the local large deviation estimate in part (b) of Theorem 3.1. This proof is based on a combinatorial argument that is reminiscent of, and as natural as, the combinatorial argument used to prove Sanov's theorem for empirical measures defined in terms of i.i.d. random variables having a finite state space [13, §3]. Part (b) of Theorem 3.1 is proved by analyzing the asymptotic behavior of the product of two multinomial coefficients that we now introduce.
Given ν ∈ A N,b,m , our goal is to estimate the probability It follows that Our first task is to determine the asymptotic behavior of card(∆ N,b,m;ν ). In determining the asymptotic behavior of card(Ω N,b,m ), we will use the fact that Ω N,b,m can be written as the disjoint union Let ν ∈ A N,b,m be given. We start by expressing the cardinality of card(∆ N,b,m;ν ) as a product of two multinomial coefficients. For each configuration ω ∈ ∆ N,b,m;ν , K particles are distributed onto the N sites of the lattice Λ N with j particles going onto ν j sites for j ∈ N b . We carry this out in two stages. In stage one K particles are placed into N bins, ν j of which have j particles for j ∈ N b . The number of ways of making this placement equals the multinomial coefficient This multinomial coefficient is well-defined since j∈N b jν j = K. Given this placement of K particles into N bins, the number of ways of moving the particles from the bins onto the sites 1, 2, . . . , N of the lattice Λ N equals the multinomial coefficient This second multinomial coefficient is well-defined since j∈N b ν j = N. We conclude that the cardinality of ∆ N,b,m;ν is given by the product of these two multinomial coefficients: Since |ν| + ≤ m, at most m of the components ν j are positive. A related version of this formula, well known in combinatorial analysis, is derived in Example III.23 of [16]. The next two steps in the proof of the local estimate given in part (b) of Theorem 3.1 are to prove the asymptotic formula for card(∆ N,b,m;ν ) in Lemma 3.2 and the asymptotic formula for card(Ω N,b,m ) in part (b) of Lemma 3.3. The proof of Lemma 3.2 is greatly simplified by a substitution in line 3 of (3.16). This substitution involves a parameter α ∈ (0, ∞), which, we emphasize, is arbitrary in this lemma. The substitution in line 3 of (3.16) allows us to express the asymptotic behavior of both card(∆ N,b,m;ν ) in Lemma 3.2 and card(Ω N,b,m ) in Lemma 3.3 directly in terms of the relative entropy R(θ N,b,ν |ρ b,α ), where ρ b,α is the probability measure on N b having the components defined in part (a) of Theorem 3.1. One of the major issues in the proof of part (b) of Theorem 3.1 is to show that the arbitrary parameter α appearing in Lemmas 3.2 and 3.3 must take the value α b (c), which is the unique value of α guaranteeing that . We show that α must equal α b (c) after the statement of Lemma 3.3.
Proof. The proof is based on a weak form of Stirling's approximation, which states that for all N ∈ N satisfying N ≥ 2 and for all n ∈ N satisfying 1 ≤ n ≤ N 1 ≤ log(n!) − (n log n − n) ≤ 2 log N. By (3.10) the term denoted by O(log N) satisfies 1 ≤ O(log N) ≤ 2 log N. We will also use (3.10) with N replaced by K and by other quantities in the model. To simplify the notation, we rewrite (3.9) in the form where M 1 (N, ν) denotes the first multinomial coefficient on the right side of (3.9), and M 2 (K, ν) denotes the second multinomial coefficient on the right side of (3.9). We have The asymptotic behavior of the first term on the right side of the last display is easily calculated. Since ν ∈ A N,b,m , there are |ν| + ∈ {1, 2, . . . , m} positive components ν j . Because of this restriction on the number |ν| + of positive components of ν, we are able to control the error in line 3 of (3.13). We define Using the fact that j∈Ψ N (ν) ν j = N, we obtain By the inequality noted after (3.11) and the fact that |ν| + ≤ m We now study the asymptotic behavior of the second term on the right side of (3.12). Since K = Nc, we obtain for all K ≥ 2 The weak form of Stirling's formula is used to rewrite the term log(K!) in the last display, but not to rewrite the terms log(j!), which we leave untouched. Substituting (3.13) and (3.14) into (3.12), we obtain We conclude that ζ N (ν) → 0 uniformly for ν ∈ A N,b,m as N → ∞. Now comes the key step, the purpose of which is to express the sum in the last line of (3.15) as the relative entropy R(θ N,b,ν;j |ρ b,α ), where α ∈ (0, ∞) is arbitrary. To express the sum in the last line of (3.15) as R(θ N,b,ν |ρ b,α ), we rewrite the sum as shown in line 3 of the next display: We obtain the next-to-last equality by using the fact that since The proof of Lemma 3.2 is complete.
The next step in the proof of the local large deviation estimate in part (b) of Theorem 3.1 is to prove the asymptotic formula for card(Ω N,b,m ) stated in part (b) of the next lemma. The proof of this lemma uses Lemma 3.2 in a fundamental way. After the statement of this lemma we show how to apply it and Lemma 3.2 to prove part (b) of Theorem 3.1. An important component of this proof is to calculate the quantity min θ∈P N b ,c R(θ|ρ b,α ), which appears in part (b) of the next lemma. The proof of part (b) of the lemma depends on part (a), which is also used to verify hypothesis (i) of Theorem 4.2 in the setting of Theorem 4.1.
Before proving Lemma 3.3, we derive the local large deviation estimate in part (b) of Theorem 3.1 by applying Lemmas 3.2 and 3.3. An integral part of the proof is to show how the arbitrary value of α ∈ (0, ∞) appearing in these lemmas is replaced by the specific value α b (c) appearing in Theorem 3.1. As in the statement of part (b) of Theorem 3.1, let ν be any vector in Substituting the asymptotic formula for log card(∆ N,b,m;ν ) derived in Lemma 3.2 and the asymptotic formula for log card(Ω N,b,m ) given in part (b) of Lemma 3.3 yields We now consider the first two terms on the right side of the last line of (3.19). By assertion With this step we have succeeded in replacing the relative entropy R(θ N,b,ν |ρ b,α ) with respect to ρ b,α , which appears in Lemma 3.2, by the relative entropy R(θ N,b,ν |ρ b,α b (c) ) with respect to ρ b,α b (c) , which appears in Theorem 3.1. Substituting the last equation into (3.19) gives This is the conclusion of part (b) of Theorem 3.1.
We now complete the proof of part (b) of Theorem 3.1 by proving Lemma 3.3.

Proof of Lemma 3.3. (a) To estimate the cardinality of
Thus we can bound the cardinality of A N,b,m by bounding separately the cardinality of each of the disjoint sets in the union. By [2, Cor. 2.5] the number of elements in the set indexed by k equals the binomial coefficient C(N − 1, k − 1). Since by assumption m/N → 0 as N → ∞, for all sufficiently large N the quantities C(N − 1, k − 1) are increasing and are maximal when An application of the weak form of Stirling's formula yields for all m ≥ 2 and all N ≥ m + 2 This completes the proof of part (a).
(b) The starting point is (3.6), which states that It follows from part (a) that δ N → 0 as N → ∞.
We continue with the estimation of card(Ω N,b,m ). By Lemma 3.2 and the fact that logarithm is an increasing function − min Under the assumption that R(·|ρ b,α ) attains its infimum over P N b ,c , we define In the last two paragraphs of this proof, we show that η N → 0 as N → ∞. Given this fact, the last equation yields the asymptotic formula (3.17) in part (b). We now prove that η N → 0 as N → ∞. To do this, we use (3.21) to write Like the second and third terms on the right side, the first term on the right side is nonnnegative because Given the existence of min θ∈P N b ,c R(θ|ρ b,α ), this assertion is certainly plausible since as shown in Corollary B.2, the measures θ N,b,ν are dense in . However, this detail is not needed in the present proof, which we would like to keep as self-contained as possible. We prove (3.22) by applying Theorem B.1 to θ = θ ⋆ , obtaining a sequence θ (N ) with the following properties: The limit in (3.22) follows from the inequalities This completes the proof of Lemma 3.3 and thus the proof of the local estimate in part (b) of Theorem 3.1.
We end this section by explaining the insight behind the key step in the proof of Lemma 3.2. This key step is to rewrite the sum in line 2 of (3.16) as shown in line 3. This allows us to express the sum in line 3 as the relative entropy R(θ N,b,ν |ρ b,α b (c) ) plus terms that are independent of θ N,b,ν . We now motivate this step. In order to streamline this motivation, we drop all error terms and avoid rigor.
Our starting point is line 2 of (3.16). If we do not rewrite the sum as shown in line 3 of that display, then we have the following modification of the conclusion of Lemma 3.2: This in turn leads to the following modification of Lemma 3.3: Hence it is plausible that as N → ∞ the minimum in the last display can be replaced by To determine this minimum, we introduce two Lagrange multipliers corresponding to the two equality constraints j∈N b θ j = 1 and j∈N b jθ j = c satisfied by θ ∈ P N b ,c . A formal calculation, which we omit, suggests that the minimum is attained at the unique θ ∈ P N b ,c having components The last line of this display can be rewritten as Except for the error terms, this coincides with the conclusion of part (b) of Theorem 3.1. The calculation just presented was our first attempt to prove Lemmas 3.2 and 3.3. It also guided us to the much more efficient current proofs both of Lemma 3.2 -where the sum in line 2 of (3.16) is written directly in terms of the relative entropy -and of Lemma 3.3. An analogous but much simpler calculation motivates the solution of a finite dimensional problem involving the minimum of a relative entropy over a set of probability measures having fixed mean. This simpler calculation is directly related to the present paper because it gives the form of the Maxwell-Boltzmann distribution for a discrete ideal gas. For details see section 6.4 of [10], sections 4-5 of [11], and section 4 of [13], each of which emphasizes different aspects of the calculation. This completes the motivation of the proof of Lemma 3.2.
In the next section we show how the local large deviation estimate in part (b) of Theorem 3.1 yields the LDP in Theorem 2.1. This limit will be used to prove the large deviation upper bound for closed subsets in part (c) of Theorem 2.1. Since P N   (a) Let θ be a measure in P N b ,c and take r > 0. Then for any open ball B π (θ, r) in P N b ,c , R(B π (θ, r)|ρ b,α b (c) ) is finite, and we have the large deviation limit and take r > 0. Then the set B π (θ, r) ∩ P N b ,c is nonempty, is finite, and we have the large deviation limit We prove Theorem 4.1 by applying the local large deviation estimate in part (b) of Theorem 3.1. A key step is to approximate probability measures in B π (θ, ε) and in B π (θ, r) ∩ P N b ,c by appropriate sequences of probability measures in the range of Θ N,b . This procedure allows one to show in part (a) that the infimum R(B π (θ, ε)|ρ b,α b (c) ) can be approximated by the infimum of R(θ|ρ b,α b (c) ) over θ lying in the intersection of B π (θ, ε) and the range of Θ N,b ; a similar statement holds for the infimum in part (b). A set of hypotheses that allow one to carry out this approximation procedure is given in Theorem 4.2, a general formulation that yields Theorem 4.1 as a special case.
Theorem 4.2 is formulated for a complete, separable metric space X containing a relatively compact subset W that is not closed. We define Z to be the closure of W in X . In the application to Theorem 4.1 X equals P N b , the set of probability measures on N; W equals P N b ,c , the subset of P N b containing probability measures with mean c; and Z equals P    N ∈ N let (Ω N , F N , Q N ) be a sequence of probability spaces. Let X be a complete, separable metric space, W a relatively compact subset of X that is not closed and thus not compact, and Z the closure of W in X ; thus Z is compact. Also let Y N be a sequence of random vectors mapping Ω N into W, and let I be a function mapping X into [0, ∞]. For A a subset of X we denote the infimum of I over A by I(A). We assume the following four hypotheses.
(i) For ω ∈ Ω the range of Y N (ω) is a finite subset W N of W, and the cardinality of W N satisfies (ii) For each y ∈ W N we have I(y) < ∞ and the local large deviation estimate where ε N (y) → 0 as N → ∞ uniformly for y ∈ W N . (iii) There exists a dense subset D of W such that I(y) < ∞ for all y ∈ D.
(iv) For any y ∈ W satisfying I(y) < ∞, there exists a sequence y N ∈ W N for which y N → y and I(y N ) → I(y) as N → ∞.
Under these hypotheses the following conclusions hold.
For the last sum in this equation we have the bounds In addition, for the term max y∈B∩W N exp[−N(I(y) − ε N (y))] we have the bounds It follows that Since ε N (y) → 0 uniformly for y ∈ W N , by hypothesis (i) the proof is done once we show that lim  Taking δ → 0 gives (4.2) and thus proves the limit (4.1). This completes the proof of part (a).
To prove the limit in part (b), we proceed as in the proof of the limit in part (a), replacing the set B in part (a) by the set B ∩ W. Since W N ⊂ W, we have B ∩ W ∩ W n = B ∩ W N . By the local large deviation estimate in hypothesis (ii) Exactly as in the proof of part (a), it follows that Since ε N (y) → 0 uniformly for y ∈ W N , by hypothesis (i) the proof is done once we show that lim   For any δ > 0 there exists y ⋆ ∈ B ∩ W such that I(y ⋆ ) ≤ I( B ∩ W) + δ < ∞. Hypothesis (iv) guarantees the existence of a sequence y N ∈ W N such that y N → y ⋆ and I(y N ) → I(y ⋆ ).
Since for all sufficiently large N we have y N ∈ B ∩ W N , it follows that I( B ∩ W N ) ≤ I(y N ). Hence lim sup Taking δ → 0 gives (4.4) and thus proves the limit (4.3). This completes the proof of part (b) and thus the proof of the theorem.
We now prove Theorem 4.1 as an application of Theorem 4.2. In Theorem 4.2 we make the following identifications for N ∈ N. • The function I is the relative entropy The LDP in the next theorem has a number of unique features because W is not a closed subset of X . The large deviation upper bound takes two forms depending on whether the subset F of W is compact or whether F is closed. When F is compact, in part (b) we obtain the standard large deviation bound for F with −I(F ) on the right hand side. When F is closed, in part (c) we obtain a different form of the standard large deviation upper bound; −I(F ) on the right hand side is replaced by −I(F ), where F is the closure of F in the compact space Y. When F is compact, its closure in the compact space P N b ,[b,c] is F itself. In this case the large deviation upper bounds in parts (c) and (d) coincide.
Then, as N → ∞, with respect to the measures Q N , the sequence Y N satisfies the LDP on W with rate function I in the following sense.
(a) For any compact subset F of W we have the large deviation upper bound

(b) For any closed subset F of W we have the large deviation upper bound
where F denotes the closure of F in Z. . This corresponds to the limit (4.6) in Theorem 4.3. In the application to Theorem 2.1 W is the relatively compact, nonclosed subset P N b ,c of X = P N b and Z is the compact subset P N b ,[b,c] of P N b . According to parts (a) and (b) of Theorem A.1, R(·|ρ b,α b (c) ) maps P N b ,c into [0, ∞] and is lower semicontinuous on P N b , while part (d) of that theorem proves that R(·|ρ b,α b (c) ) has compact level sets in P N b ,c . This last property of the relative entropy is needed for part (a) of Theorem 2.1.

Proof of Theorem 4.3. We prove the three large deviation bounds in the order (c), (a), and (b).
(c) Let G be any open subset of W. We denote by τ the metric on X . For any point x ∈ G there exists ε > 0 such that the open ball B τ (x, ε) = {y ∈ W : τ (x, y) < ε} is a subset of G. The limit (4.5) implies that Since x is an arbitrary point in G, it follows that This completes the proof of the large deviation lower bound for any open set G in W.
(a) Let F be any compact subset of W. We first prove the large deviation upper bound for F under the assumption that I(F ) < ∞. The proof when I(F ) = ∞ is given afterward. We start by showing that for each x ∈ F lim inf ε→0 + I(B τ (x, ε)) ≥ I(F ). (4.7) Let ε n be any positive sequence converging to 0, and take any δ > 0. For any n ∈ N there exists x n ∈ B τ (x, ε n ) such that I(B τ (x, ε n )) + δ ≥ I(x n ). Since x n → x, the lower semicontinuity of I on W and the fact that x ∈ F imply that Sending δ → 0 yields (4.7) because ε n is an arbitrary positive sequence converging to 0. We now prove the large deviation upper bound in part (a). Take any η > 0. By (4.7) for each x ∈ F there exists ε x > 0 such that The open balls {B τ (x, ε x ), x ∈ F } cover F . Since F is compact, there exist T < ∞ and finitely many points x i ∈ F, i = 1, 2, . . . , T , such that By Lemma 1.2.15 in [6] and by the limit (4.5) applied to Sending η → 0, we obtain This completes the proof of the large deviation upper bound for any compact subset F of W under the assumption that I(F ) < ∞. We now assume that I(F ) = ∞, which implies that I(x) = ∞ for each x ∈ F . The proof of the large deviation upper bound when I(F ) = ∞ rests on the assertion that for each x ∈ F there exists ε x > 0 such that I(B τ (x, ε x )) = ∞. Indeed, if this assertion were false, then there would exist a sequence x n ∈ W satisfying I(x n ) < ∞ and x n → x. Since I is lower semicontinuous on W, it would follow that lim inf n→∞ I(x n ) ≥ I(x) = ∞, which in turn would imply that I(x n ) = ∞. This contradiction completes the proof that for each x ∈ F there exists ε x > 0 such that I(B τ (x, ε x )) = ∞. As in the case when I(F ) < ∞, the open balls {B τ (x, ε x ), x ∈ F } cover F . Since F is compact, there exist T < ∞ and finitely many points By the same steps as in (4.8) This completes the proof of the large deviation upper bound for any compact subset F of W when I(F ) = ∞. The proof of part (a) is complete.
(b) Let F be any closed subset of W. We claim that F equals F ∩ W, where F is the closure of F in Z. Since Z is compact, the closed subset F is also compact. Clearly F ⊂ F ∩ W. On the other hand, any x ∈ F ∩ W is a limit point lying in W of a sequence x n in F . Since F is closed in W, any x ∈ F ∩ W lies in F . This completes the proof that F = F ∩ W. This is a special case of a general result in topology stated in Theorem 17.2 of [18].
We first prove the large deviation upper bound for F under the assumption that I(F ) < ∞. The proof when I(F ) = ∞ is given afterward. The proof proceeds as in part (a), essentially by replacing the balls B τ (x, ε) for x ∈ W by B τ (x, ε) ∩ W for x ∈ Z, where B τ (x, ε) = {y ∈ Z : τ (x, y) < ε}. As in the proof of part (a), we start by showing that for each x ∈ F lim inf ε→0 + I( B τ (x, ε) ∩ W) ≥ I(F ). (4.9) Let ε n be any positive sequence converging to 0, and take any δ > 0. For any n ∈ N there exists x n ∈ B τ (x, ε n ) ∩ W such that I( B τ (x, ε n ) ∩ W) + δ ≥ I(x n ). Since x n → x, the lower semicontinuity of I and the fact that x ∈ F imply that Sending δ → 0 yields (4.9) because ε n is an arbitrary positive sequence converging to 0. We now prove the large deviation upper bound in part (b). Take any η > 0. By (4.9) for each x ∈ F there exists ε x > 0 such that The open balls { B τ (x, ε x ), x ∈ F } cover F . Since F is compact, there exist T < ∞ and finitely many points Since F = F ∩ W, we have again by Lemma 1.2.15 in [6] lim sup We now apply the limit (4.6) to Sending η → 0, we obtain This completes the proof of the large deviation upper bound for any closed subset F of W under the assumption that I(F ) < ∞.
We now assume that I(F ) = ∞, which implies that I(x) = ∞ for each x ∈ F . The proof of the large deviation upper bound when I(F ) = ∞ rests on the assertion that for each x ∈ F there exists ε x > 0 such that I( B τ (x, ε x ) ∩ W) = ∞. As in the proof of part (b), this assertion is a consequence of the lower semicontinuity of I. As in the proof of the large deviation upper bound when I(F ) < ∞, the open balls { B τ (x, ε x ), x ∈ F } cover F . Since F is compact, there exist T < ∞ and finitely many points x i ∈ F , i = 1, 2, . . . , T , such that By the same steps as in (4.10) and (4.11) This completes the proof of the large deviation upper bound for any closed subset F of W when I(F ) = ∞. The proof of part (b) as well as the proof of the theorem are done.
This paper contains four appendices. In appendix A we prove properties of the relative entropy needed in the paper. Theorem B.1 in appendix B states a basic approximation result that is applied in two crucial places in the paper. In appendix C we study a number of properties of the quantity α b (c) appearing in part (a) of Theorem 3.1. In appendix D we discuss why we impose the constraint involving m = m(N) in the definitions of Ω N,b,m in (2.1) and P N,b,m in (2.3) and how, if this constraint could be eliminated, then our results could be formulated in a more natural way.

Appendices A Properties of Relative Entropy
We fix a nonnegative integer b and a real number c ∈ (b, ∞). Given θ a probability measure on N b = {n ∈ Z : n ≥ b}, the mean N b xθ(dx) of θ is denoted by θ . In Theorem A.1 we study properties of the relative entropy R(θ|ρ b,α ) and R(θ|ρ b,α b (c) ) for θ in each of the following three spaces: P N b , the set of probability measures on N; P N b ,c , the set of θ ∈ P N b satisfying θ = c; and P N b , [b,c] , the set of θ ∈ P N b satisfying θ ∈ [b, c]. The Prohorov metric introduces a topology on P N b that is equivalent to the topology of weak convergence. These three spaces have the following properties: P N b is a complete, separable metric space; P N b ,c is relatively compact, separable subset of P N b that is not closed in P N b and therefore is not complete; P N b , [b,c] is the closure of P N b ,c in P N b and is a compact, separable metric space. The properties of P N b are proved in Theorems 3.3.1 and Theorem 3.1.7 of [14], and the properties of P N b ,c and P N b , [b,c] are proved in Theorem 2.4.
We recall that for α ∈ (0, ∞), ρ b,α denotes the Poisson distribution on N b having components where Z 0 (α) = e α , and for b ∈ N, Z b (α) = e α − b−1 j=0 α j /j!. According to part (a) of Theorem 3.1 there exists a unique value Assertion (ii) in part (f) of the next theorem plays an important role in the main part of the paper. After the statement of Lemma 3.3 we use this assertion to show that the arbitrary parameter α in Lemmas 3.2 and 3.3 must have the value α b (c) in Theorem 3.1. Fix a nonnegative integer b and a real number c ∈ (b, ∞). For any α ∈ (0, ∞) the relative entropy R(θ|ρ b,α ) = j∈N b θ j log(θ j /ρ b,α;j ) has the following properties. (b) R(·|ρ b,α ) is a convex, lower semicontinuous function on P N b . In other words, for θ and σ in P N b , λ ∈ (0, 1), and θ (N ) a sequence in P N b converging weakly to θ In other words, if θ = σ are two measures in A, then for λ ∈ (0, 1) [b,c] and in P N b ,c . In other words, for Y equal to any of these three spaces and any M < ∞, the set {θ ∈ Y : where Z 0 (α) = e α , and for b ∈ N, Z b (α) = e α − b−1 j=0 α j /j!. Then for any θ ∈ P N b ,c R(θ|ρ b,α ) = R(θ|ρ b,α b (c) ) + g(α, b, c).
(f) The following two assertions hold.
(i) R(θ|ρ b,α ) attains its infimum over θ ∈ P N b ,c at the unique measure θ = ρ b,α b (c) , and is a compact subset of P N b , [b,c] . This completes the proof that R(·|ρ b,α ) has compact level sets 4(a)], the proof that R(·|ρ b,α ) has compact level sets in P N b ,c is more subtle. If θ (n) is any sequence in P N b ,c satisfying R(θ (n) |ρ b,α ) ≤ M, then since θ (n) ∈ P N b and R(·|ρ b,α ) has compact level sets in P N b , there exists θ ∈ P N b and a subsequence θ (n ′ ) such that θ (n ′ ) ⇒ θ and R(θ|ρ b,α ) ≤ M. To complete the proof that R(·|ρ b,α ) has compact level sets in P N b ,c , we must show that θ ∈ P N b ,c ; i.e., that θ = c. By Fatou's Lemma In addition, for any w ∈ (0, ∞) Lemma 5.1 in [7] implies that the sequence θ (n ′ ) is uniformly integrable; i.e., (e) For any θ ∈ P N b ,c we have j∈N b θ j = 1 and j∈N b jθ j = c. Hence This completes the proof of part (e).
(f) (i) As stated in part (b), R(·|ρ b,α ) is lower semicontinuous on P N b and thus on P N b ,c . Since R(·|ρ b,α ) has compact level sets in P N b ,c , it attains its infimum over P N b ,c . By part (a) R(·|ρ b,α b (c) ) attains its minimum value of 0 over P N b ,c at the unique measure ρ b,α b (c) . Hence part (e) implies that the minimum value of R(·|ρ b,α ) over P N b ,c equals The last equality follows by applying part (e) with θ = ρ b,α b (c) . This display shows that R(·|ρ b,α ) attains its infimum over P N b ,c at ρ b,α b (c) . Let us assume that R(·|ρ b,α ) attains its infimum over P N b ,c at another measure θ ⋆ = ρ b,α b (c) . Then for any λ ∈ (0, 1), The equality of the extreme terms contradicts the strict inequality, proving that R(·|ρ b,α ) attains its infimum over P N b ,c at the unique measure ρ b,α b (c) . This completes the proof of assertion (i) in part (f).
(ii) By assertion (i) min θ∈P N b ,c R(θ|ρ b,α ) = g(α, b, c). Substituting this into part (e) yields assertion (ii). This completes the proof of part (f). The proof of Theorem A.1 is done.
This completes our discussion of properties of the relative entropy. The main theorem in appendix B is a basic approximation result that is applied in two crucial places in the paper.
Fix a nonnegative integer b and a rational number c ∈ (b, ∞). P N b ,c is the set of probability measures on N b = {n ∈ Z : n ≥ b} having mean c. We recall the definitions of the sets A N,b,m and B N,b,m , which are introduced at the beginning of section 3: In the formula defining A N (a) θ (N ) ⇒ θ as N → ∞.
We also need the following corollary, which is applied to verify hypothesis (iii) in Theorem 4.2 in the setting of Theorem 4.1. It also shows that P N b ,c is separable, a fact needed in parts (a) and (b) of Theorem 2.4. Proof. Given any θ ∈ P N b ,c and any ε > 0, let B π (θ, ε) denote the open ball with center θ and radius ε defined in terms of the Prohorov metric π. We apply part (a) of Theorem B.1 with α = α b (c). Since the measures θ (N ) constructed in part (a) of that theorem converge weakly to θ, for all sufficiently large N we have θ (N ) ∈ B π (θ, ε). The fact that only finitely many of the components θ (N ) j are nonzero implies that R(θ (N ) |ρ b,α b (c) ) < ∞ for all N. Since ∪ N ∈N B N,b,m is a countable set, the proof is complete. Given θ ∈ P N b ,c , we determine a sequence ν (N ) ∈ A N,b,m such that the probability measures θ (N ) with components θ of ν (N ) for all j ∈ N b except for the two values j = j ⋆ and j = j ⋆ + 1. The two components corresponding to these two values of j will then be defined so that ν (N ) satisfies the two summation constraints in the definition of A N,b,m . In order to simplify the notation, the components ν (N ) j are written as ν j . For x ∈ R we denote by ⌊x⌋ the largest integer less than or equal to x. The definition of the components is the following: We make a few simple observations. If j ⋆ = b, then the first line of this definition is vacuous.
In addition, for b ≤ j ≤ j ⋆ − 1, we have ν j /N = 0 = θ j . If for some j satisfying j ⋆ + 2 ≤ j ≤ j ⋆ + m − 1 we have θ j = 0, then ν j = 0. We now define ν j for j = j ⋆ and j = j ⋆ + 1 so that ν j /N → θ j for these two values and so that the following two summation constraints in the definition of A N,b,m are valid: With these definitions of ν j ⋆ and ν j ⋆ +1 , we have |ν| + ≤ m. According to part (d) of Lemma B.3, the resulting vector ν lies in A N,b,m for all sufficiently large N.
In order to keep the notation manageable, we introduce the set of m − 2 indices Since ν j = 0 for b ≤ j ≤ j ⋆ − 1 and for j ≥ j ⋆ + m, the two equalities in (B.3) can be rewritten in the form and These are two linear equations for the two unknowns ν j ⋆ and ν j ⋆ +1 . Solving them for the two unknowns and inserting ν j = ⌊Nθ j ⌋ for j ∈ Φ(j ⋆ , m), we obtain the following definitions of ν j ⋆ and ν j ⋆ +1 : The next lemma states a number of facts about ν j for j ∈ N b that are needed to prove Theorem B.1. Parts (a) and (b) give upper and lower bounds on ν j ⋆ and ν j ⋆ +1 that follow from (B.6) and (B.7). The reason for imposing the condition that m 2 /N → 0 as N → ∞ in Theorem B.1 is the appearance of this quantity as an error term in parts (a) and (b). Part (c) focuses on the convergence of ν j /N to θ j for j ⋆ ≤ j ≤ j ⋆ + m − 1. Part (d) shows that for all sufficiently large N the vector ν (N ) with components ν j is an element of A N,b,m and the measure θ (N ) with components θ In order to prove part (b) of Theorem B.1 concerning the convergence R(θ (N ) |ρ b,α ) → R(θ|ρ b,α ), we will use the fact, stated in part (e), that for all j ∈ N b satisfying j = j ⋆ + 1 we have θ The conclusion of part (f) is that such a bound does not exist for j = j ⋆ + 1 and that in general there does not exist M < ∞ such that for any N ∈ N, ν j ⋆ +1 /N ≤ Mθ j ⋆ +1 . Fix a nonnegative integer b and a rational number c ∈ (b, ∞), and let θ be any probability measure in P N b ,c . Let m be the function m(N) appearing in the definition of Ω N,b,m in (2.1) and satisfying m(N) → ∞ and m(n) 2 /N → 0 as N → ∞. We define β m = j≥j ⋆ +m θ j and γ m = j≥j ⋆ +m jθ j ; since θ ∈ P N b ,c , β m → 0 and γ m → 0 as N → ∞. The following conclusions hold.
Proof. (a) We first prove the lower bound. According to (B.2), ν j ≥ N(θ j − 1/N) for all j ∈ Φ(j ⋆ , m). Since for all j ∈ Φ(j ⋆ , m) we have j > j ⋆ + 1, the first line of (B.6) implies that We now use the facts that In addition Substituting (B.9), (B.10), and (B.11) into the last expression in (B.8), we conclude that This is the lower bound in part (a). We now prove the upper bound in part (a). According to (B.2), ν j ≤ Nθ j for all j ∈ Φ(j ⋆ , m). Since for all j ∈ Φ(j ⋆ , m) we have j > j ⋆ + 1, the first line of (B.6) implies that Except for the absence of the term containing 1/N, this is the same expression that appears in the second line of (B.8). Hence by a calculation similar to that yielding the lower bound in part (a) We now use the fact that Substituting this inequality into the preceding display shows that ν j ⋆ ≤ Nθ j ⋆ . This is the upper bound in part (a). The proof of part (a) is complete.
(b) We first prove the upper bound. According to (B.2), ν j ≥ N(θ j − 1/N) for all j ∈ Φ(j ⋆ , m). Since for all j ∈ Φ(j ⋆ , m) we have j > j ⋆ , the first line of (B.7) implies that As in the proof of (B.11), Substituting this inequality as well as the equalities in (B.9) and (B.10) into the last expression in (B.12), we conclude that This is the upper bound in part (b).
We now prove the lower bound in part (b). According to (B.2), ν j ≤ Nθ j for all j ∈ Φ(j ⋆ , m). Since for all j ∈ Φ(j ⋆ , m) we have j > j ⋆ , the first line of (B.12) implies that Except for the absence of the term containing 1/N, this is the same expression that appears in the second line of (B.12). Hence by a calculation similar to that yielding the upper bound in This is the second inequality in part (b). We now use the fact that This is the third inequality in part (b). The proof of part (b) is complete.
(c) For j = j ⋆ and j = j ⋆ + 1 the limits lim N →∞ ν j /N = θ j are immediate consequences of parts (a) and (b) since each of the quantities β m , γ m , and m 2 /N converge to 0 as N → ∞. For j ∈ N satisfying j ≥ j ⋆ + 2 the limit lim N →∞ ν j /N = θ j follows from (B.2) and the fact that m → ∞ as N → ∞. Finally, for j ∈ N b satisfying b ≤ j ≤ j ⋆ − 1, ν j /N = 0 = θ j . The proof of part (c) is complete.
(d) According to (B.1), for all j ∈ N b satisfying j = j ⋆ , j ⋆ + 1 we have ν j ∈ N 0 for all N. We now consider ν j ⋆ . As N → ∞, each of the quantities β m , γ m , and m 2 /N converge to 0. Since θ j ⋆ > 0, it follows from the lower bound in part (a) of this lemma that ν j ⋆ > 0 for all sufficiently large N. The definition of ν j ⋆ in (B.6) shows that ν j is an integer for all N. It follows that ν j ⋆ ∈ N for all sufficiently large N. Finally we consider ν j ⋆ +1 . The lower bound in part (b) of this lemma shows that ν j ⋆ +1 ≥ 0. The definition of ν j ⋆ +1 in (B.7) shows that ν j ⋆ +1 is an integer for all N. It follows that ν j ⋆ +1 ∈ N 0 for all N. We conclude that for all sufficiently large N the vector ν (N ) is an element of N N 0 . In addition, since ν j = 0 for all j ∈ N b satisfying b ≤ j ≤ j ⋆ − 1 and j ≥ j ⋆ + m, we have |ν (N ) | + ≤ m; i.e., at most of the components ν j are positive. These correspond to the indices j ∈ N b satisfying j ⋆ ≤ j ≤ j ⋆ + m − 1. (f) Assume that θ j ⋆ +1 > 0. By the upper bound in part (b) of this lemma, γ m − j ⋆ β m + m 2 /N → 0 as N → ∞. Hence for all sufficiently large N, ν j ⋆ +1 /N ≤ 2θ j ⋆ +1 . However, even if θ j ⋆ +1 > 0. the upper bound ν j ⋆ +1 /N ≤ θ j ⋆ +1 cannot hold for any N because of the three additional terms in the upper bound in part (b); while γ m and β m can be 0 for sufficiently large N, the term m 2 /N > 0 for all N. This proves the first two assertions in part (f). Concerning the third assertion, let us see how the bound ν j ⋆ +1 /N ≤ Mθ j ⋆ +1 can fail. We assume that θ j ⋆ +1 = 0 and that there exists a subsequence j ′ → ∞ such that θ j ′ > 0 along this subsequence. By the lower bound in part (a) of this lemma Since θ j ′ > 0 along the subsequence j ′ → ∞, it follows that for all N ∈ N and all j ′ Since θ j ⋆ +1 = 0 and ν j ⋆ +1 /N > 0 for all N ∈ N, the bound ν j ⋆ +1 /N ≤ Mθ j ⋆ +1 cannot hold for any M < ∞. This completes the proof of part (f). The proof of Lemma B.3 is done.
We are now ready to prove Theorem B.1. Given θ ∈ P N b ,c , θ (N ) in this theorem is the sequence with components θ 1), (B.6), and (B.7). In the proof of the theorem we work with sufficiently large N ∈ N guaranteeing, according to part (d) of Lemma B.3, that θ (N ) is a probability measure lying in Proof of part (a) of Theorem B.1. We prove that θ (N ) ⇒ θ by showing that for any bounded We use the facts that ν j = 0 = θ j for b ≤ j ≤ j ⋆ − 1, ν j = 0 for j ≥ j ⋆ + m, and These facts, which follow from (B.1) and (B.2), give the upper bound By part (c) of Lemma B.3 ν j ⋆ /N → θ j ⋆ and ν j ⋆ +1 /N → θ j ⋆ +1 as N → ∞. Since m/N → 0 and j≥j ⋆ +m θ j → 0 as N → ∞, it follows that This completes the proof of part (a) of Theorem B.1.

Proof of part (b) of Theorem B.1. Let θ be a probability measure in
We use the following facts.

For all
2. For all j ∈ N b satisfying j = j ⋆ + 1, we have θ j ⋆ +1 ≤ θ j ⋆ +1 is not valid for any N, and in general there does not exist M < ∞ such that for any N ∈ N, θ (N ) j ⋆ +1 ≤ Mθ j ⋆ +1 . Because of this anomaly the term in R(θ (N ) |ρ b,α ) corresponding to j = j ⋆ + 1 must be handled separately. Define To prove part (b) of Theorem B.1 we must justify the following interchange of the limit N → ∞ and the sum over j ∈ N b \ {j ⋆ + 1}: We justify the interchange of the limit and the sum over j ∈ N b \ {j ⋆ + 1} by applying the Dominated Convergence Theorem. This procedure requires finding constants a j for j ∈ N b \ {j ⋆ + 1} such that for all sufficiently large N ∈ N ρ b,α;j |ϕ(θ The key to applying the Dominated Convergence Theorem is to use two properties of ϕ(x) = x log x: its boundedness on the interval [0, 1) and its monotonicity on the interval [1, ∞).
In terms of these sets we write For j ∈ C N the boundedness of ϕ on [0, 1) implies that For j ∈ D N the monotonicity of ϕ on [1, ∞) and the bound θ Using the fact that R(θ|ρ b,α ) < ∞, we prove that j∈N b \Ψ a j < ∞. We have In terms of these sets we write It follows that Substituting the last display into (B.13), we conclude that This completes the proof of part (b). The proof of Theorem B.1 is done.
In appendix C we study prove part (a) of Theorem 3.1 as well as a number of other properties of the parameter α b (c) that defines the Poisson equilibrium distribution ρ b,α b (c) .

C Proof of Part (a) of Theorem 3.1 re α b (c)
The goal of this appendix is to prove Theorem C.
Let ρ b,α be the probability measure on N b whose components are defined by The following conclusions hold.
Either of these bounds imply that α b (c) is asymptotic to c as c → ∞; i.e., lim c→∞ α b (c)/c = 1.
Before we prove Theorem C.1, we state a second theorem that focuses on the case b = 1. In this case the equilibrium distribution ρ 1,α 1 (c) is a probability measure on N 1 = N. In part (a) we give the proof of the existence of α b (c) for b = 1, which is much more straightforward than the proof for general b. In parts (b) and (c) we give two iterative procedures for calculating α 1 (c) while in part (d) we derive two sets of inequalities that are tighter than the inequalities for α b (c) for general b given in part (b) of Theorem C.1. Like the inequalities in part (b) of Theorem C.1, the inequalities in part (d) of the next theorem imply that α 1 (c) is asymptotic to c as c → ∞. (b) Let α 1 = c and consider the following iterative procedure defined for n ∈ N: Then the sequence {α n , n ∈ N} is monotonically decreasing and lim n→∞ α n = α 1 (c).
(c) Let β 1 = log c and consider the following iterative procedure defined for n ∈ N: .
Proof. (a) The measure ρ 1,α is a probability measure on N having mean Thus ρ 1,α has mean c if and only if α satisfies αe α = c(e α − 1). We prove part (a) by showing that this equation has a unique solution α 1 (c) ∈ (0, ∞) for any c > 1.
Since γ 1 (α) → ∞ as α → ∞, we conclude that there exists a unique value α = α 1 (c) ∈ (0, ∞) solving γ 1 (α 1 (c)) = c and thus solving α 1 (c)e α 1 (c) = c(e α 1 (c) − 1). This completes the proof of part (a). (b) Since e −c < 1, we have the inequality We use induction to prove that the sequence α n is monotonically decreasing. For n ∈ N, n ≥ 2, under the assumption that α n < α n−1 , this property of the sequence is a consequence of the following calculation: We now use induction to prove that the sequence α n is bounded below by log c. For n = 1, a 1 = c > log c. Assuming that α n > log c, we have The last inequality follows from the facts that when c = 1, c − 1 = 0 = log c and that for c ∈ (1, ∞), (c − 1) ′ = 1 > 1/c = (log c) ′ . This completes the proof that α n > log c for all n ∈ N. Since α n is a monotonically decreasing sequence bounded above by c and below by log c, we conclude α ⋆ = lim n→∞ α n exists and satisfies both α ⋆ ∈ (log c, c) and α ⋆ = c(1 − e −α ⋆ ). Because α 1 (c) is the unique positive solution of this equation, it follows that lim n→∞ α n = α 1 (c). This completes the proof of part (b). (c) Since β 1 = log c, we have the inequality We use induction to prove that the sequence β n is monotonically increasing. For n ∈ N, n ≥ 2, under the assumption that β n−1 < β n , this is a consequence of the following calculation: We now use induction to prove that the sequence β n is bounded above by c. For n = 1, β 1 = log c < c. Assuming that β n < c, we have This completes the proof that β n is bounded above by c. Since β n is a monotonically increasing sequence bounded above by c and below by log c, we conclude β ⋆ = lim n→∞ β n exists and satisfies both β ⋆ ∈ (log c, c) and β ⋆ = c(1 − e −β ⋆ ). Because α 1 (c) is the unique positive solution of this equation, it follows that lim n→∞ β n = α(c). This completes the proof of part (c).
(d) We first prove that c(1 − e −c ) > α 1 (c). This follows immediately from the iterative procedure discussed in part (a), which implies that c = α 1 > α 2 = c(1 − e −c ) > α 1 (c). One can obtain the weaker upper bound c > α 1 (c) directly if one writes the equation solved by α 1 (c) in the form and uses the fact that e −α 1 (c) ∈ (0, 1). We now prove a series of three lower bounds, the last two of which, in combination with the upper bound c(1 − e −c ) > α 1 (c), imply that α 1 (c) ∼ c as c → ∞. The first lower bound is α 1 (c) > log c. To prove this, we use the fact that α 1 (c) > 0 to write e α 1 (c) − 1 ≥ α 1 (c). It follows that or equivalently that e α 1 (c) > c. This implies that α 1 (c) > log c, as claimed.
We now bootstrap this lower bound into a tighter lower bound by substituting α 1 (c) > log c into the right hand side of (C.1), obtaining the second lower bound This implies that lim c→∞ α 1 (c)/c = 1 or that α 1 (c) is asymptotic to c as c → ∞. By bootstrapping the lower bound in (C.2), we obtain yet a tighter lower bound on α 1 (c) which gives a second proof that α 1 (c) ∼ c. To do this, we substitute α 1 (c) > c − 1 into the right hand side of (C.1), obtaining the third lower bound α 1 (c) > c(1 − e −c+1 ). It follows that This implies lim c→∞ α 1 (c)/c = 1 at a rate that is at least exponentially fast. By contrast, (C.2) shows a much slower rate of convergence to 1 that is only of the order 1/c. Interestingly, iterating this procedure again does not give a tighter lower bound than that in (C.3). This completes the proof of Theorem C.2.
We now turn to the proof of Theorem C.1. According to part (a) of this theorem, for b ∈ N, α b (c) is the unique solution of αZ b−1 (α)/Z b (α) = c. The heart of the proof of Theorem C.1, and its most subtle step, is to prove that the function γ b (α) = αZ b−1 (α)/Z b (α) satisfies γ ′ b (α) > 0 for α ∈ (0, ∞) and thus is monotonically increasing on this interval. This fact is proved in the next lemma. Fix a positive integer b and a real number c ∈ (b, ∞). For α ∈ (0, ∞) the function The key to proving that γ ′ b (α) > 0 is to represent log Z b (α) in terms of the moment generating function of a probability measure. We do this by first expressing Z b (α) in terms of the upper incomplete gamma function via the formula This formula is easily proved by induction. For b = 1 the right side equals e α − 1 = Z 1 (α).
Assuming that it is true for b = n, we prove that it is true for b = n + 1 by integrating by parts, which gives This completes the proof of (C.4) for all b ∈ N. As suggested in [19], we now make the change of variables x = yα, obtaining the representation The function g b is the moment generating function of the probability measure on R having the For α ∈ (0, ∞) let σ b,α be the probability measure on R having the density e αy h b (y)/g b (α) on [−1, 0]. A straightforward calculation shows that It follows that (log g b ) ′′ (α) > 0 for all α ∈ (0, ∞). The last display shows that (log g b ) ′ (α) is the mean of σ b,α , and (log g b ) ′′ (α) is the variance of σ b,α .
Using (C.5) and the power series representations This completes the proof of the lemma.
We are now ready to prove Theorem C.1.
Proof of Theorem C.1. (a) We first consider b = 0. In this case ρ 0,α is a standard Poisson distribution on N 0 having mean α. It follows that α 0 (c) = c is the unique value for which ρ 0,α 0 (c) has mean c and thus lies in P N 0 ,c . This completes the proof of part (a) for b = 0. We now consider b ∈ N. In this case ρ b,α is a probability measure on N b having mean Thus ρ b,α has mean c if and only if α satisfies γ b (α) = c, where γ b (α) = αZ b−1 (α)/Z b (α). We prove part (a) by showing that γ b (α) = c has a unique solution α b (c) ∈ (0, ∞) for all b ∈ N and any c > b.
Having completed steps 1, 2, and 3, we have proved part (a) for all b ∈ N. Since we also validated part (a) for b = 0, the proof of part (a) for all nonnegative integers b is done.
(b) We first prove that α b (c) < c for b ∈ N by observing that for any α ∈ (0, ∞) we We now bootstrap this lower bound into the tighter lower bound indicated in part (b). To do this we note that for any α ∈ (0, ∞) The first lower bound α b (c) > c − b now yields the tighter lower bound This completes the proof of the bounds in part (b). Either of these bounds imply that lim c→∞ α b (c)/c = 1. This proves that α b (c) is asymptotic to c as c → ∞, completing the proof of part (b).
By substituting the power series representation for Z b (α), we find that .
This completes the proof of part (d). The proof of Theorem C.1 is done as is the proof of part (a) of Theorem 3.1.
In the next and final appendix we explore how the restriction involving m = m(N) could be avoided in the definition of the set of configurations Ω N,b,m in (2.1) and in the definition of the microcanonical ensemble P N,b,m in (2.3). Avoiding this restriction would enable us to present our results in a more natural form.

D Avoiding Restriction Involving m = m(N )
In this appendix we explore a more natural formulation of our results, and we explain the issues that make such a formulation so challenging. Among these issues there is a limitation that seems to be inherent in the approximation procedure we use to prove our results. This discussion makes contact with several interesting ideas including Stirling numbers of the second kind and associated Stirling numbers of the second kind. Let us review the notation. We start with the configuration space Ω N = Λ K N . For ω ∈ Ω N , K ℓ (ω) is the droplet-size random variable denoting the number of particles occupying the site ℓ ∈ Λ N , and N j (ω) is the number of sites for which K ℓ (ω) = j. We also introduce |N(ω)| + , which is the number of indices j for which N j (ω) ≥ 1. Given b a nonnegative integer, we focus on the configuration space Ω N,b,m consisting of all ω ∈ Ω N for which every site of Λ N is occupied by at least b particles and for which |N(ω)| + ≤ m. The quantity m is a function m(N) satisfying m(N) → ∞ and m(N) 2 /N → 0 as N → ∞. In symbols (D.1) The first constraint involving K ℓ is intrinsic to the definition of the model. By contrast, the second constraint involving m is not intrinsic to the definition of the model, but rather is a useful technical device that enables us to control the errors that arise at various stages of the analysis. A more natural configuration space would be the set Ω N,b consisting of all ω ∈ Ω N for which every site of Λ N is occupied by at least b particles but for which there is no restriction on the number of positive quantities N j (ω). In symbols We now come to the main point. Let P N be the uniform probability measure on Ω N that assigns equal probability 1/N K to each of the N K configurations in Ω N . All of the results in the paper are formulated for the probability measure P N,b,m , defined as the restriction of P N to Ω N,b,m . However, because the second constraint in the definition of Ω N,b,m involving m is not intrinsic to the definition of the model, it would be more natural to formulate our results for the probability measure P N,b , defined as the restriction of P N to the larger and more natural configuration space Ω N,b .
In order to understand why our results are formulated for P N,b,m and not for P N,b , we explain how the constraint involving m arises in the paper. There are three sources. First, in Lemma 3.2 we require that m log N/N → 0 as N → ∞ to prove that the error ζ The stronger condition that m 2 /N → 0 as N → ∞ means that m → ∞ at a slower rate than √ N. What we find fascinating is the fact that the relationship between m and √ N is also central to another component of our analysis. As we show in the next theorem, if m → ∞ at a faster rate than Proof. (a) For ω ∈ Ω N,b , N(ω) denotes the sequence {N j (ω), j ∈ N b }. Let κ(ω) = |N(ω)| + , and let 1 ≤ j 1 < j 2 < . . . < j κ(ω) denote the indices for which N j (ω) ≥ 1. We have strict inequality since the |N(ω)| + droplet classes have different sizes. Since for each of these indices we have j k ≥ k, the second conservation law in (2.2) implies that It follows that 2cN ≥ κ(ω)(κ(ω) + 1) = (κ(ω) + 1/2) 2 − 1/4, which in turn implies that Now let ω be any configuration in Ω N,b for which N k (ω) = 1 for k = 1, 2, . . . , |N(ω)| + . In this case an example of such a function would be m = N δ for some δ ∈ (1/2, 1). If we could find such an approximation procedure, then all our results formulated for P N,b,m would automatically hold for the more natural measure P N,b . Unfortunately, despite great effort, we were unsuccessful.
Because of this situation it is worthwhile to look more closely at the two components of the approximation procedure presented in appendix B. Given any measure θ ∈ P N b ,c , this procedure constructs a sequence θ (N ) lying in the range B N,b,m of Θ N,b and having the following two properties: We are able to construct a number of sequences θ (N ) ∈ B N,b,m that satisfy property (a) under the hypothesis that m/N → 0. However, none of these satisfy property (b) with a function m satisfying m/ √ N → ∞. On the basis of this experience, we conjecture that there exists no sequence θ (N ) ∈ B N,b,m satisfying both properties (a) and (b) under a hypothesis that is weaker than the current condition that m 2 /N → 0.
This setback motivated us to seek an alternate approach that would allow us to replace the probability measure P N,b,m , which is the restriction of the uniform measure P N to Ω N,b,m , with the probability measure P N,b , which is the restriction of P N to Ω This completes the proof of part (a).
In order to prove part (b) of Theorem D.2, we now show that condition (D.5) holds if b = 0 or b = 1. To prove this we compare the asymptotic behavior of card(Ω N,b,m ) with that of card(Ω N,b ) for these values of b. A formula for the asymptotic behavior of card(Ω N,b,m ) for any nonnegative integer b is derived in part (b) of Lemma 3.3. In the next proposition we express this formula in a different and more useful form for b = 0 and b = 1. Although we do not apply it here, in part (c) we give the analogous formula for b ∈ N satisfying b ≥ 2. where η N → 0 as N → ∞. We now appeal to item (i) in part (f) of Theorem A.1, which shows that Substituting this formula into the preceding display, we obtain where η N → 0 as N → ∞. We next use (D.7) to prove part (a) for b = 0 and part (b) for b = 1. Part (c) for b ∈ N satisfying b ≥ 2 is obtained by specializing (D.7) to these values. It follows that log Z 1 (α 1 (c)) = α 1 (c) + log α 1 (c) − log c. We conclude that 1 N log card(Ω N,0,m ) card(Ω N,0 ) = η N → 0 as N → ∞.
We conclude that condition (D.5) holds for b = 0 and thus that the large deviation lower bound (D.6) is valid for b = 0. This completes the proof.
The verification of condition (D.5) for b = 1 is much deeper than that for b = 0.
Proof of part (b) of Theorem D.2 for b = 1. This proof depends on the relationship between card(Ω N,1 ) and Stirling numbers of the second kind. Given c a rational number in (1, ∞), let K and N be positive integers satisfying K/N = c. We denote by S(K, N) the Stirling number of the second kind, which is the number of ways to partition a set of K elements into N nonempty subsets [2, pp. 96-97]. The N! permutations of the class of all such partitions correspond to all the ways of placing the K particles in the droplet model onto the N sites of Λ N and therefore are in one-to-one correspondence with the elements of Ω N,1 . It follows that card(Ω N,1 ) = N! · S(K, N). The computation of N −1 log card(Ω N,1 ) is given in part (b) of the next theorem. This computation is based on a deep, classical result on the asymptotic behavior of S(K, N) that is derived in Example 5.4 in [1] and is stated in part (a) of the next theorem in our notation. The quantities in [1] denoted by n, k, and r correspond respectively to our K, N, and α 1 (c).
We now apply part (b) of Proposition D.3 and the conclusion of the next theorem; the former involves the error term η N → 0 as N → ∞, and the latter involves the error term ε N → 0 as N → ∞. Except for the error terms the asymptotic formulas are identical. Hence we obtain where ε N → 0 as N → ∞.
Proof. We start with the asymptotic formula for S(K, N) derived in Example 5.4 in [1] and stated here in our notation. For any δ ∈ (0, 1) and any M < ∞, uniformly for c ∈ (1 + δ, M) the asymptotic behavior of S(K, N) is given by The quantities in [1] denoted by n, k, and r correspond respectively to our K, N, and α 1 (c). It follows that  [1]. However, we are unable to locate such a formula. Nevertheless, based on our calculation for b = 0 and b = 1 it is reasonable to conjecture that condition (D.5) holds for any b ∈ N satisfying b ≥ 2, which would imply the large deviation lower bound (D.8) for these values.
We now explore whether we can extend to P N,b the large deviation upper bound proved in parts (c) and (d) of Theorem 2.1 for P N,b,m . If we could do this, then we could transfer to P N,b the fact, proved in Theorem 2.2 and Corollary 2.3, that with respect to P N,b,m , ρ b,α b (c) is the equilibrium distribution of Θ N,b and of K ℓ . Unfortunately, we are unable to prove the large deviation upper bound for P N,b using either of two possible approaches explained briefly below.
Concerning the statement about the equilibrium distribution, the best that we can do is to use the large deviation lower bound for b = 0 and b = 1 to prove that with respect to P N,b for these values of b, ρ b,α b (c) is the equilibrium distribution of Θ N,b in the following weak form: for any ε > 0 lim N →∞ If we could prove that −R(F |ρ b,α b (c) ) is greater than or equal to the second expression on the right side of the last line, then we would be able to transfer the large deviation upper bound to P N,b . Unfortunately, however, we are unable prove that −R(F |ρ b,α b (c) ) is greater than or equal to the second expression on the right side of the last line. The second approach to transferring to P N,b the large deviation upper bound in parts (c) and (d) of Theorem 2.1 rests on a careful analysis of how these upper bounds follow from the local estimate in part (b) of Theorem 3.1 and from Theorem 4.2 as applied to Theorem 4.1, for which we need only the large deviation upper bound for the sets appearing in Theorem 4.1. Omitting the details, we claim that the crucial step is to show that At the end of the proof of part (b) of Lemma 3.3 we prove this limit by applying the approximation procedure in appendix B, which requires the condition that m 2 /N → 0 as N → ∞. If we could prove this limit without invoking the approximation procedure and under a condition that is compatible with m/ √ N → ∞ as n → ∞, then the large deviation upper bound in parts (c) and (d) of Theorem 2.1 would hold with P N,b replacing P N,b,m . Unfortunately, we have not been able to carry this out.
We end this section by proposing an interesting test case for gaining insight into whether the conditioned measure P N,b,m could be replaced by P N,b in the LDP for Θ N,b in Theorem 2.1. This test case would be to use the methods of this paper to prove Sanov's Theorem for the empirical measures of i.i.d. random variables taking values in N b . This theorem, of course, can be proved directly without the methods of this chapter [6, Thm. 6.2.10], [7,Thm. 4.5]. If one uses the methods of this paper, then one would first have to prove it for the analogue of the measure P N,b,m restricted to the analogue of the restricted configuration space Ω N,b,m , where the number of positive components of N j is restricted by m = m(N). The quantity m(N) → ∞ at an appropriate rate. It would be instructive to see if this restriction can be eliminated using one of the approaches proposed in this appendix.