Polya Tree Distributions for Statistical Modeling of Censored Data

Polya tree distributions extend the idea of the Dirichlet process as a prior for Bayesian nonparametric problems. Finite dimensional distributions are defined through conditional probabilities in P . This allows for a specification of prior information which carries greater weight where it is deemed appropriate according to the choice of a partition of the sample space. Muliere and Walker[7] construct a partition so that the posterior from right censored data is also a Polya tree. A point of contention is that the specification of the prior is partially dependent on the data. In general, the posterior from censored data will be a mixture of Polya trees. This paper will present a straightforward method for determining the mixing distribution.


Introduction
In a nonparametric statistical model, the unknown of interest is the probability measure responsible for generating the data at hand.The Bayesian approach to nonparametric problems is to place a prior directly on the class of probability measures.A traditional choice is to use the Dirichlet process.
Ferguson has extended the idea of Bayesian estimation for a probability vector, where the conjugate prior would be the Dirchlet distribution, to the case where one wishes to estimate an entire probability measure.By defining the finite dimensional distributions of (P (B 1 ), . . ., P (B k )) as Dirichlet with parameters determined through a measure α, a distribution on the A. NEATH probability P is implied.The weight of the prior information is the constant α( ).
Polya tree distributions ( [5], [6]) extend the idea of the Dirichlet process by defining the finite dimensional distributions as conditional probabilities in P .This allows for a specification of prior information which carries greater weight in subsets of where it is deemed appropriate according to the choice of a partition of .
In this paper, we present an analysis based on nonparametric Bayesian methods when the prior is a Polya tree and the available data is subject to censoring.Section 2 contains the needed mathematical background on Polya tree distributions.Muliere and Walker [7] (subsequently referred to as MW) examine Polya tree priors in survival analysis, constructing a partition so the posterior from right censored data will also be from the family of Polya tree distributions.A point of contention is that the specification of the prior is partially dependent on the data.Without this construction, the posterior from censored data will be a mixture of Polya trees.We will present in Section 3 a straightforward method for determining this mixing distribution.Our concluding section contains an example as well as a discussion on how the choice of a partition influences the posterior estimate.

Polya Tree Distributions
The necessary partitioning of the sample space for defining a Polya tree is done is stages.Actually, we refer to a sequence of ever finer partitions.
The particular Polya tree distribution is determined by the partitions in Π and the Beta parameters in A.
The weight of the information on P (B 0 B ) is given by α 0 + α 1 .Polya trees are generalizations of Dirchlet processes.A Polya tree is a Dirichlet process if for every ∈ E , α = α 0 + α 1 ([5]).Without this restriction, one has the option of choosing α 0 + α 1 large to reflect greater weight or small to reflect greater uncertainty as to the nature of the true probability P on the set B .
The choice of a partition plays no role in defining a Dirichlet process, so long as the above restriction is met.We shall see that the choice of the sets B , ∈ E , in the partition can play an important role in specifying a Polya tree prior.
Random variables W 1 , . .., W n are said to be a sample of size n from P if W 1 , . . ., W n P ∼ iid P for P ∼ P T (Π, A) .
The unconditional distribution of an observation W i is given for ∈ E as We can think of this expression as a prior estimate P0 (B ) of P (B ).

A. NEATH
The posterior distribution is obtained ( [5]) as where A complete observation W increases by 1 all parameters corresponding to sets for which W ⊆ B .
A posterior estimate of the probability measure evaluated at B , ∈ E m , is Our interest is in using Polya tree distributions for modeling problems with censored data.We look to how the posterior is computed from an observation W known only to be contained in some set A. If A ∈ Π, then the posterior will still be a Polya tree distribution.The updated sequence of parameters is defined through Result (3) will be demonstrated by a small example.Let m = 2 and consider sets B 0 , B 1 and B 00 , B 01 , B 10 , B 11 .Suppose the Polya tree is parametrized by α 0 = 1, α 1 = 3 and α 00 = 1, α 01 = 1, α 10 = 3, α 11 = 1.The prior estimate of P (B 0 ) is 1  4 and the prior estimates of P (B 00 |B 0 ) and P (B 10 |B 1 ) are 1  2 and 3 4 , respectively.Thus, the probability vector [P (B 00 ), P (B 01 ), P (B 10 ), P (B 11 )] is estimated a priori as ), ( )] = [.125,.125,.5625,.1875].
Note the estimate of P (B 00 ) has increased even though it is known W 1 / ∈ B 00 .This is a consequence of the partitioning, which reflects a belief that an observation in B 01 signals a greater likelihood for event B 00 than for events B 10 and B 11 .For the second observation, A 2 ⊆ B 1 .The only updated parameter is α 1 = 4. Since no further information is available from this observation, the parameters within the second partition are unchanged.We now have [ P (B 00 ), P (B 01 ), P (B 10 ), P ( ), ( ), ( )] = [.111,.222,.5, .167].
MW [7] show how Polya tree priors update from right censored data provided the partitions are constructed with respect to the censoring times.We take up the problem of determining the posterior in a setting which is slightly broader.

Mixtures Of Polya Tree Distributions
We begin with the following result for a single censored observation.
Theorem 1 Suppose P ∼ P T (Π, A).Let W be a sample from P and suppose that our information consists of W ∈ A. Then, P ξ, W ∈ A ∼ P T (Π, A(ξ)) where ξ = d W W ∈ A.
We discuss Theorem 1 before presenting a formal proof.Bayesian updating is accomplished after a censored observation by introducing the variable ξ to represent the unobserved realization W from P .To better understand a mixture of Polya trees, we examine our result written as Consider the finite-dimensional distributions defined through the random variables in where β • a, b denotes the cumulative distribution function for Beta(a, b).So, where for i = 0, 1 The induced posterior on Y M is a weighted average of product-Beta distributions, with the weights determined by probabilities on the index variable ξ.

Proof:
We look to determine the induced posterior on Y M , with M arbitrary: Summing over the (M + 1) th partition, we obtain The proof is done since this is the same form as (7) with ξ = d W W ∈ A. Now consider a sample of censored observations.An updating scheme will be developed through induction.Let ξ n denote the random vector representing the unobserved W 1 , . . ., W n .Then the distribution on P prior to the (n + 1) th censored observation can be described by the mixture of Polya trees in (4).A direct application of Theorem 1 yields the posterior given [W n+1 ∈ A n+1 ] as Note the distribution of ξ n is changed through the updating procedure.This prevents a stochastic process depiction of the mixing distribution, a technique which has been utilized in similar problems ([1], [2], [8]).We continue with our example to show the use of result (9).After the first two observations, the Polya tree has parameters α 0 = 2, α 1 = 4, α 00 = 1, α 01 = 2, α 10 = 3, α 11 = 1, and the current estimate of the probability vector is [ P (B 00 ), P (B 01 ), P (B 10 ), P ( ].
Recall that this also represents the unconditional distribution for the next observation.Suppose the next two censored observations are given as the posterior will become a mixture of Polya trees.

Now after observing [W
we must adjust the probabilities on the variable ξ 3 according to (9).So, since The denominator may also be computed from the current estimate of the probability vector.We see the probability on event [W 3 ∈ B 01 ] has increased after observing [W 4 ∈ B 00 ∪ B 01 ].The mathematics follows the logic which says an observation at B 01 increases the likelihood of that event.
Conditional on ξ 3 , one can calculate the probabilities on ξ 4 to be The mixing distribution on the variables (ξ 3 , ξ 4 ) has probabilities In this manner, we can determine the mixing distribution for any number of censored observations.For our example, the observed sets could be written by finite unions of sets in Π.This condition does not impose a significant restriction.As the sample space is partitioned into smaller sets, the observed sets will become closely approximated by sets in Π at some level M .This is similar motivation to that of sampling an approximate probability measure from a Polya tree distribution by terminating the procedure at a finite level.

An Application
We use the data given in Kaplan and Meier [4] and analyzed by MW [7].deaths occurred at 0.8 3.1 5.4 9.2 censoring times at 1.0 2.7 7.0 12.1 The partition defined by MW so that the posterior on the survival distribution will also be a Polya tree is given in Table 1.We consider a partition based upon breaking the interval [0, ∞) into subintervals of equal length as is seen in Table 2. Polya tree modeling allows for the prior to be centered on a particular distribution P 0 according to the choice of the parameters in A. For ∈ E m , let α = γ m P 0 (B ) where γ m > 0. We determine the parameters for our Polya tree prior by letting P 0 correspond to the exponential distribution with failure rate λ = 0.12.The prior estimate of the survival function becomes Ŝ0 (t) = exp {−0.12 t} .Also, we take γ m = m 2 .This choice is sufficient for the Polya tree prior to place probability 1 on distributions which are absolutely continuous.
Aside from differences in the partition Π, our prior is equivalent to the one considered by MW.The posterior estimates of survival can be compared using Tables 3 and 4. Here, we can see the effect of Π as our estimates of survival are larger than those from MW.
We will take the set (0, 1.0] as a basis of comparison.In the MW partition, this set is B 0 .For the alternate partition, the set (0, 1.0] is an element of Π for m = 4, rather than m = 1. An observed death represents a complete realization and updates the distribution on P according to (1).There are four complete observations in our data set.For the MW partition, this results in The alternate partition after the four deaths are incorporated gives At this point, the estimates of S(1.0), which were equal a priori, are MW: S(1.0) = 1 − P (B MW 0 ) = .6855alt: S(1.0) = 1 − P (B 0000 ) = 1 − P (B 0 )P (B 00 |B 0 )P (B 000 |B 00 )P (B 0000 |B 000 ) = .867.
An observation known to be in a set B for ∈ E m will have a greater impact on the probability estimate P (B ) when m is small.Lavine [5] states "choosing α for small m typically involves judgments about sets which have large initial probability.For large m, the consideration is that α can express a belief about the smoothness of P ."An indication of this effect is seen here with the observed death at 0.8 having large influence under the MW partition.
The four censored observations are considered next.The MW partition, by design, keeps the posterior within the Polya tree family.The alternate partition gives a mixture of Polya trees for its posterior and thus is not represented as nicely mathematically.It was chosen, however, to reflect true prior uncertainty, not for mathematical tractability.

Table 3 .
Survival estimates based upon the MW partition

Table 4 .
Survival estimates based upon an equal length interval partition