Maximum Likelihood Estimates with Order Restrictions on Probabilities and Odds Ratios A Geometric Programming Approach

The problem of assigning cell probabilities to maximize a multinomial likelihood with order restrictions on the probabilies and or restrictions on the local odds ratios is modeled as a posynomial geometric program GP a class of nonlinear optimization problems with a well developed duality theory and collection of algorithms Local odds ratios provide a measure of association between categorical random variables A constrained multinomial MLE example from the literature is solved and the quality of the solution is compared with that obtained by the iterative method of El Barmi and Dykstra which is based upon Fenchel duality Exploiting the proximity of the GP model of MLE problems to linear programming LP problems we also describe as an alternative in the absence of special purpose GP software an easily implemented successive LP approximation method for solving this class of MLE problems using one of the readily available LP solvers


Introduction
Suppose that we h a ve observed frequencies (f 1 f 2 : : : f r ) of a discrete random variable Y which w e assume to have m ultinomial distribution with parameters n and p = ( p 1 p 2 ::: p r ).In estimating p, i t w ould be appropriate to use any prior knowledge of the relationships among the probabilities.For example, one might require an estimating distribution for which p i+1 p i , i.e., the probabilities are nonincreasing (or more generally, p i+1 i p i ) or else p i+1 p i , i.e., the probabilities are nondecreasing(or more generally, p i+1 i p i for positive i ).Hence, nding the maximum likelihood estimator will require the solution of max p r Y i=1 p fi i (1.1) subject to either the restrictions p i+1 i p i 1 i = 1 2 : : : r ; 1 (1.2) in the case of nonincreasing probabilities, or, in the case of nondecreasing probabilities, the restriction i p i p i+1 1 i = 1 2 : : : r ; 1 (1.3) and, of course, r X i=1 p i = 1 (1.4) p i 0 i = 1 2 ::: r (1.5)A special case of the constraints arise when all the 0 i s are one.This important case is solved by Robertson, Wright, and Dykstra 14].Among early appearances of the more general constraints with non{unitary 0 i s was the thesis of Willy Gochet,11], where the purpose was to identify these problems as geometric programs 10].
In this paper we review a simple modi cation for expressing the above and other more complex MLE problems with constraints on odds ratios as posynomial geometric programs.Section 2 of this paper summarizes some of the properties of geometric programming (GP), and Section 3 explains how these MLE problems can beformulated as a GP problem, for which special-purpose optimization software is available.Section 4 describes the proximity of the GP model to a log-linear system, and explains how, by a p p r o ximating the sum of the probabilities by a monomial, complete log-linearity i s achieved, so that, in the absence of specialized GP software, the problem might b e s o l v ed using commonly-available linear programming (LP) software.Section 5 presents a small example to illustrate the LP approach described in Section 4. In Section 6, we consider maximum likelihood estimation of joint probability distributions of two or three discrete ordinal random variables, when it is possible to specify positive or negative association between two random variables, or local 3-factor interaction in the case of three random variables.Section 7 solves a GP model of a MLE problem from the literature, using a state-of-the-art GP solver.The nal section presents a summary of the paper.

Geometric Programming Model
The general primal problem of geometric programming (GP) (Du n et al. 10]) i s to Minimize g 0 (x) subject to g i (x) 1 i = 1 2 : : : m (2.1) x > 0 MAXIMUM LIKELIHOOD ESTIMATES WITH ORDER RESTRICTIONS 55 where x R m and each function g i (x) i s a posynomial with T i terms, i.e., The exponents a ijk are arbitrary real numbers, but the coe cients c ij are assumed to be positive constants and the decision variables x n are required to be strictly positive.
The corresponding posynomial GP dual problem is to ij 0 j = 1 2 : : : T i i = 0 1 2 : : : m (2.7)Note that there are two sets of dual variables: dual variable i corresponds to posynomial i (i.e., the primal objective if index i = 0, or constraint i f i 1) dual variable ij is associated with term j of posynomial i.This dual problem o ers several computational advantages: after using (2.5) to eliminate , the logarithm of the objective (2.3) is a concave function to be maximized over a linear system.
This linear system has T variables, where T = T 0 + T 1 + ::: + T m , and (N+1) equations, and hence the di erence T-(N+1) is referred to as its degree of di culty.
In case the degree of di culty is negative, i.e., T < N + 1, the dual feasible region is overdetermined and may be empty this "pathological" case rarely occurs for well-formulated problems.(Cf.Bricker et al.,7].)The GP dual problem does possess some undesirable properties which make it not amenable to solution by a general purpose nonlinear optimization algorithm, most notably the fact that the objective function is nondi erentiable when one or more variables are zero.) If an optimal dual solution ( ) i s k n o wn, then the following relationships may be used to compute a primal solution x under suitable conditions, 10], T h e o r e m 1, p80: x aijn n j = 1 2 : : : where g 0 (x ) = v( ) a n d , f o r i > 0, g i (x ) = 1 i f i 6 = 0 .Note that, from these relationships, one may obtain a system of equations linear in the logarithms of the D. L. BRICKER, K. O. KORTANEK, AND L. XU optimal values of the primal variables: N X k=1 a ijk ln x k = l n ( ij g i (x ) i c ij ) j = 1 2 : : : T j (2.9) for each i = 0 1 : : : m such t h a t i 6 = 0 Typically, but not always, the system of linear equations (2.9) uniquely determines the optimal x .See Dembo 8] for a discussion of the recovery of primal solutions from the dual solution in general.).Wong 15] has used the GP model to establish the relationship between the maximum likelihood estimation problem and the entropy maximization formulation of the trip distribution problem in tra c studies.
To the best of our knowledge, however, no discussion of the use of GP models for MLE with restrictions on odds ratios (cf.Section 6).

Approximation by a Log-Linear System
Of course, if it were not for constraint (3.2), the above MLE problem would be log-linear, i.e., which, after a logarithmic transformation z i = l n p i is a linear programming problem (in which the variables are unrestricted in sign.)Unfortunately, because (3.2) contains multiple terms, the logarithmic transformation of (3.2) would yield a nonseparable and nonlinear constraint.It is possible, however, to approximate (3.2) with a monomial constraint b y means of the arithmetic-geometric mean inequality 10], p110.This inequality might be stated in the form: Given i > 0,i=1,2,...,r, such that P r i=1 i = 1, then for all p i > 0, i.e., the arithmetic mean of two positive n umbers is at least as great (5.3) and p 1 + p 2 + p 3 + p 4 1 (5.4) as well as nonnegativity restrictions on the probabilities.
The solution p obtained above is the same as the maximum likelihood estimator restricted to antitonic functions on Y (i.e.satisfying the constraints p 1 p 2 p 3 p 4 ) as described in Section 2 of Chapter 7 in Robertson, Wright, and Dykstra 14], 1988.Robertson et al. demonstrate that, geometrically, p i is the slope of the linear segment to the left of y i in the least concave majorant o f F 4 (where F 4 (y) i s the Empirical Distribution Function (EDF) of p).Thus, the speci c MLE example problem (with = 1 :0) considered in this section might b e s o l v ed more easily without the use of a mathematical programming model.Our introduction of this GP model here was to provide a simple illustration of the successive LP approach, which is applicable to MLE with more general constraints, e.g.order restrictions with < 1, or the MLE problems in the following section requiring estimates of joint distributions with restrictions on local odds ratios.The above successive L P approach (which m a y also be referred to as a cutting-plane approach) has the advantage that it utilizes widely-available LP software as one might expect, however, it su ers from the fact that the added linear constraints become nearly colinear as the algorithm progresses, limiting the level of precision which is attainable.The example MLE problem in Section 7, with restrictions on odds ratios, which will beintroduced in the next section, will be solved by a m uch superior interior-point algorithm which does not exhibit this undesirable behavior.

Cross Classi cations
Consider two ordinal discrete random variables Y 1 and Y 2 , and an r c rectangular array i n which the (i j) th cell contains the relative frequency f ij with which the event \ Y 1 = i and Y 2 = j " w as observed.Let p ij be the true probability of this event.The local odds ratios ij = p ij p i+1 j+1 p i+1 j p i j+1 (6.1) can be used to specify properties of association between Y 1 and Y 2 .For example, ij =1 for all i and j implies independence, while ij ( )1 implies a positive (negative) association between the two random variables (cf.Agresti 1]).
Just as the order restriction (1.2) discussed earlier may be expressed as a log-linear constraint, so also the constraints p ij p i+1 j+1 p i+1 j p i j+1 ij (6.2) and p ij p i+1 j+1 p i+1 j p i j+1 ij or equivalently p i+1 j p i j+1 ij p ij p i+1 j+1 1 (6.3) where ij > 0, may both be expressed as log-linear constraints.Thus, the maximum So in this case also, the restricted maximum likelihood estimator is the optimum of a posynomial geometric programming problem.
Although an order restriction on marginal probabilities, e.g., P j p i;1 j P j p ij , cannot be expressed as a posynomial GP constraint, by applying condensation to the denominator of the equivalent constraint Consider further the case of a three-dimensional r c l array where the third dimension (level) corresponds to a third ordinal factor, and let p ijk denote the corresponding joint probability.Within each l e v el k (1 k l) w e de ne the odds ratios ij(k) = p ijk p i+1 j+1 k p i+1 j k p i j+1 k ( The ratio of odds ratios ijk = ij( may be used to specify local three-factor interaction: a value of 1.0 for all i, j, and k speci es no such i n teraction, while ijk 1, i.e., ij(k+1) ij(k) , for all i, j, and k, speci es nondecreasing local conditional association between Y 1 and Y 2 as Y 3 increases.(In a similar fashion, nonincreasing local conditional association may be speci ed.)In the next section, we present an example estimation problem illustrating this type of order restriction on the odds ratios.

Example II
We illustrate the geometric programming approach using data found in J. R. Ashford and R. Snowden 3].Medical examinations were performed on a sample of coalminers in the United Kingdom.Each of 18,000 subjects was classi ed according to age (the third factor), as well as to whether he exhibits each o f t wo s y m ptoms, namely "breathlessness" and "wheeze" (factors 1 and 2, respectively).The frequencies f ijk of these classi cations are presented in Table 7.1.
It is hypothesized that the two factors "breathlessness" and "wheeze" have a positive association within each age level, i.e., ij(k) 1 but, however, that this positive association become weaker as the third factor, age, increases, i.e., 11(k) 11(k+1) for k=1,...8 (7.1) and p ijk 0 for all i, j, and k.This geometric programming model has N = 3 6 primal variables and 10 constraints (7.3), (7.5) and (7.6).With a total of T = 4 6 terms in the objective and constraints, the dual geometric program has 46 dual variables ij with N + 1 = 3 7 constraints, so that the degree of di culty is T ; (N + 1 ) = 9 .
The solution was easily found using a code written by Xiaojie Xu, an implimentation of the algorithm described in Kortanek, Xu, and Ye 1 2 ] .The optimal joint probabilities found by this algorithm are shown in Table 7.2.For example, the probability that a coalminer between 20-24 years of age exhibits both wheeze and breathlessness is 0.51811 percent, while for a coalminer in the next age category (25-29 years), this probability has increased to 1.22203 percent.The logarithm of the objective function (7.2) at this solution is 1:28427 10 4 .On the other hand, the solution provided by El Barmi and Dykstra 6], shown in Table 7.3, evaluated by the same objective function (7.2), has a somewhat larger value, 1:31267 10 4 .(Recall that the objective function is the reciprocal of the likelihood, a n d i s t o b e minimized.)Note that, for both estimates, the odds ratios all exceed 1.0 as required, but that in the case of the solution given by El Barmi and Dykstra, the odds ratio for the age category 25-29 exceeds slightly that for age category 20-24, violating the restriction that the association be nonincreasing with respect to age.The geometric programming solution, on the other hand, satis es the restrictions that the positive association be weaker as the age increases, and has an objective v alue which is approximately 2.2 percent better than the solution of El Barmi and Dykstra.Aside from the (perhaps insigni cant) improvement in the quality of the solution, an advantage to using GP instead of the El Barmi and Dykstra approach i s the availability of software for solving posynomial GP problems.

Summary
It has long been known that the likelihood function ts well into the posynomial geometric programming framework.The purpose of this paper has been to review the applicability of GP to MLE problems in which restrictions have been placed on the relative magnitudes of the probability estimates and also to demonstrate that restrictions on the odds ratios t well into the GP model.By doing so, one is able to utilize GP software for the computation.
The fact that only one constraint o f these GP models has multiple terms (viz., the restriction that the sum of the probabilities cannot exceed 1) makes them especially amenable to solution by a successive LP method, since after a monomial approximation to that constraint, the problem becomes log-linear.While inferior to state-of-the-art geometric programming algorithms, this approach m a y g i v e satisfactory estimates if high precision is not required.

3 .(
Formulation of the Constrained MLE Problem as a GPReplace the maximization objective (1.1) with the equivalent objective to minimize its reciprocal, the GP primal constraints must be inequalities, replace the equation (Because the objective (3.1) is strictly nonincreasing, the inequality constraint (3.2) will certainly be "tight" at an optimal p.)And so(3.1), (3.2),(1.5),and either ordering constraint (1.2) or (1.3) constitute an instance of a posynomial geometric programming problem with N = r variables and T = 2r terms, having T ; (N + 1 ) = 2 r ; (r + 1 ) = r ; 1 degrees of di culty.The utility of geometric programming (GP) for solving another particular MLE problem (an overlapping sample frame problem) was long ago noted by Alldredge and Armstrong 2] (cf.also Mazumdar and Je erson 13] s o l v ed by the successive linear programming approach which w as applied above.El Barmi and Dykstra 6] (cf.also 5]) have a l s o d e v eloped a convergent iterative algorithm for this problem, based upon Fenchel duality t h e o r y , for the case = 1 .In order to formulate this problem as a geometric program, the equality (6.5) must be relaxed to the inequality l l surely remain "tight" at the optimum, and the objective restated as a minimization (of the reciprocal of the likelihood): a posynomial GP constraint.Hence, a MLE with order restrictions on the marginal distributions might be computed by solving a succession of approximating GP problems.(Cf.Avriel et al. 4]).
One hundred observations of the ordinal random variable Y were measured, with Y =1, 2, 3, and 4 with frequencies 32, 35, 15, and 18, respectively.We w i s h t o n d the MLE of the distribution of Y , subject to the restriction that PfY = i + 1 g PfY = ig, i=1,2,3, i.e., probabilities are nonincreasing.This problem may be formulated as the GP problem in which w e wish to minimize the reciprocal of the likelihood function, namely is a relaxation of constraint (3.2), i.e., p i satisfying (4.6) have a sum exceeding 1, unless, of course, equality is attained in (4.3).The conditions for this equality occurring for our choice of is, according to(4.4), that beidentical to the distribution p.The monomial constraint (4.6) is equivalent to the log-linear constraint

Table 7 .
Note that, since the rst and second factors have only a single pair of levels each, only a single odds ratio 11(k) is de ned for level k of the third factor (age).)The maximum likelihood estimator is to be computed so as to satisfy this hypothesis, that is, we wish to

Table 7 .
3. Joint Probabilities (percent) of Breathlessness and Wheeze by Age from El Barmi and Dykstra

Table 7 .
4shows the local odds ratios evaluated for the two solutions.