Bayesian Hierarchical Modeling for Categorical Longitudinal Data from Sedation Measurements

We investigate a Bayesian hierarchical model for the analysis of categorical longitudinal data from sedation measurement for Magnetic Resonance Imaging (MRI) and Computerized Tomography (CT). Data for each patient is observed at different time points within the time up to 60 min. A model for the sedation level of patients is developed by introducing, at the first stage of a hierarchical model, a multinomial model for the response, and then subsequent terms are introduced. To estimate the model, we use the Gibbs sampling given some appropriate prior distributions.


Introduction
Magnetic Resonance Imaging (MRI) and Computerized Tomography (CT) require the patient to lie still for periods of up to 60 min. These two diagnostic procedures also require strict immobility and sedation for a successful result. If a child cannot remain adequately still for examination, sedation may be necessary. Optimal sedation management of children before MRI and CT has received attention in the last decade [1,2]. The sedation medications must be chosen carefully for children's safety and effectiveness. Many researches related to the comparison of different sedation medications have been performed successfully [3,4]. In these studies, for each medication group sedation levels were obtained at different time points within the time up to 60 min. In addition to sedation level measurements, the other multiple assessments of the same patient were recorded, and the within subject ones, such as sedation levels at different time points for a given patient, were correlated. This case is an example when a longitudinal study is made with responses being measured repeatedly on the same patient across time. In medical studies, statistical analysis of the data set described earlier has been performed by many researchers, who use the known methods such as ANOVA, MANOVA, and Linear Models, assuming that the repeated observations from each patient are uncorrelated. Since repeated observations are made on the same patient, observed responses are generally correlated. For robust analysis, this association must be accounted for. Weighted least squares model is used for repeated categorical data. This model works well for large sample size, no missing data, a small number of response variables, and discrete independent variables. Recent years have witnessed new statistical methods of analysing for data that do not meet these conditions. Mathematical models for multiple regression, linear models, and time series are generally useful where random variables are approximately normal and can be explained by some linear structure. However, data can be clearly nonnormal when they represent categorical or frequency observations. Generalized Linear Models (GLMs) offer convenient and highly applicable tools for these kinds of data. They allow for more general structures and more general distributions than linear regression and ANOVA. Nelder and Wedderburn developed the concept of GLMs [5], and an extensive treatment was given by [6]. With the introduction of GLM, a much more flexible instrument for statistical modeling was created. As special cases, they include multiple linear regression, logit and probit models for quanta responses, and log linear response models for counts. Introduced Generalized 2 Computational and Mathematical Methods in Medicine Estimating Equations (GEEs) [7] were developed to extend the GLM introduced by [5].
Longitudinal researches are defined as studies in which the response of each patient is observed on two or more occasions. They are often used in medical and health research. The methods used for the analysis of longitudinal data differ from the traditional regression analysis such as multiple regressions. Longitudinal data sets consist of repeated observations of an patient and a set of covariates for each of many patients which may be fixed or which may be changed with time. Longitudinal data sets are defined by the fact that repeated observations for a patient are correlated [8]. Therefore the modeling of the correlation structure is required. When the response variable is normal, a large class of linear models is available for analysis. However, when the response variable is categorical, other methods must be considered. In recent years, considerable effort has gone into the development of statistical methods for the analysis of longitudinal categorical response data. While much of this effort has focused on methods for binary or Poisson data, relatively little attention has been given to nominal categorical data.
More generally, hierarchical models describe efficiently complex datasets incorporating correlation or including other properties in our model. Hence, when multivariate or repeated responses are observed, correlation can be incorporated in the model via a common "random" effect for all measurements referring to the same individual. This introduces a marginal correlation between repeated data, while interpretation is based on the conditional means. Therefore, given the random effects, the structure and the interpretation are similar to common generalized linear models. Accordingly, hierarchical models naturally appear, for example, when modeling spatiotemporal data in which correlation between time and space can be added by using common random effects on adjacent (in time or space) responses. Hierarchical models can also be used to imply a complicated marginal distribution but (at the same time) keep the conditional structure as simple as possible [9].
Bayesian analyses of hierarchical linear models have been considered for at least forty years [10] and have remained a topic of theoretical and applied interest [11][12][13][14]. Reference [15] reviews much of the extensive literature in the course of comparing Bayesian and non-Bayesian inferences for hierarchical models. As part of their article, Browne and Draper consider some different prior distributions for variance parameters; here, we explore the principles of hierarchical prior distributions in the context of a specific class of models. Hierarchical (multilevel) models are central to modern Bayesian statistics for both conceptual and practical reasons. At a practical level, hierarchical models are flexible tools for combining information and partial pooling of inferences [16][17][18].
In this study, we use a Bayesian approach to fit several hierarchical models of increasing complexity to assess the significance of both fixed and random effects on sedation levels and investigate a Bayesian hierarchical model for the analysis of categorical longitudinal data from sedation measurement for MRI and CT. A model for the sedation level of patients is developed by introducing, at the first stage of a hierarchical model, a multinomial model for the response and then subsequent terms are introduced.

Material and Method
There are several methods that may be used to estimate the determinants of sedation levels with categories (1, . . . , 6).
First method we considered is the multinomial logit approach. The model where Pr( = / ) is the probability that patient has outcome at time given covariates of the patient at that time.
In our analysis the response has six levels ( = 1, . . . , 6). For identifiability, = 1 is set as the reference category so that the parameters estimated from the multinomial logistic model are interpreted as the logarithm of the change in the odds of being outcome relative to that of being outcome 2 for a one unit change in the corresponding explanatory variable at time .
We investigate the relationship between sedation levels and both categorical and continuous explanatory variables by specifying a Bayesian hierarchical model for the multinomial response. We also include the lagged response variable in the model to assess the probability of transition between the times. Reference [19] considers a dynamic multinomial logit panel model with random effects to explain the labour market level of individuals in urban Mexico. The individual effects are assumed to be independent of the observed characteristics and to follow a multivariate normal distribution. We use a similar model for explaining the sedation levels of patients in which the selected covariates and sedation levels at the time of the previous time may influence the sedation levels of a patient.
Assume that patient (= 1, . . . , ) can be in any of possible levels at time (= 1, . . . , ). In the first level of the Bayesian hierarchical model, the . = ( 1 , . . . , ) are assumed to be distributed as multinomial random variables. The response takes the value ∈ {1, 2, 3, 4, 5, 6}. The model may be written as where is the probability of patient being in level at time .
The second level of the model relates the probabilities to the regression effects, lagged effects, and random effects such that = ∑ , where is a matrix of explanatory variables, is a matrix of lagged level variables, and are vectors of parameters Computational and Mathematical Methods in Medicine 3 To identify the model, we choose first level of sedation to be reference level ( = 1) with 1 , 1 , and 1 set to 0. It follows that log( 1 ) = 0, and, hence, So log( ) can be interpreted as the log of the probability of being in level relative to the probability of being in level 1.
The posterior distributions for all parameters are as follows where we have assumed that unknown parameters , , and Σ are a priori independent and depend only on Σ. MCMC methods are used to sample from the posterior distributions of the unknown parameters. We have used the WinBug software which uses the Gibbs sampling to form the posterior distribution for each unknown parameter by drawing samples from their full conditional distributions. Our primary interest in modeling sedation levels is to investigate the effects of the some covariates as well as the transition from one sedation level to another as time since arrival progresses. To do this we consider five variations of the model in Table 1. The models contain combinations of terms to capture the covariate effects, the transition effects, and a random effects term to capture over dispersion in the form of between-subject variability.
In the model 1, given sedation level , the regression effects remain constant but each individual is considered as a cluster of responses over time ( = 1, 2, 3). A random intercept term which is allowed to vary between individuals, given level , is included in the model to account for time constant unobserved variability. In the model 2, given sedation level , this model includes constant regression effects for the covariates as well as constant regression effects for the lagged response variable . The term representing the lagged response may be useful in explaining the transition between sedation levels and absorb some of the unobserved variability between individuals. The model 3 is similar to model 2 with a random effect term included to capture any additional between-subject variation. In the model 4, given sedation level , this model includes constant regression effects for the covariates but differs to model 2  Anxious or restless or both Sed2 Cooperative, orientated, and tranquil Sed3 Responding to commands Sed4 Brisk response to stimulus Sed5 Sluggish response to stimulus Sed6 No response to stimulus as it also includes time-varying effects for the lagged response variable, . These effects are included to capture any change in the transition in sedation levels between the different times. The model 5 is similar to model 4 with a random effect term included to capture any additional between-subject variation.
The ability to fit complex hierarchical models using MCMC techniques presents a need for methods to compare alternative models. Standard model comparison techniques such as the Akaike Information Criterion (AIC) [20] and the Bayesian Information Criterion (BIC) [21] require the specification of the number of parameters in each model. For hierarchical models which contain random effects, the number of parameters is not generally obvious and so an alternative method of comparison is required. The Deviance Information Criterion (DIC) is a hierarchical modeling generalization of the AIC and BIC. It is particularly useful in Bayesian model selection problems where the posterior distributions of the models have been obtained by MCMC simulation. Like AIC and BIC it is an asymptotic approximation as the sample size becomes large.
DIC was developed by [22]. The DIC statistic is a measure of model complexity and fit and is defined as where ( ) is the deviance given the model parameters , ( ) is the posterior mean of the deviance, ( ) is the deviance evaluated at the posterior mean , and = ( ) − ( ) is the effective number of parameters in the model.    The quantities ( ) and ( ) are easily computed from an MCMC simulation chain.

Application to Sedation Data
A part of the data was used by [23]. They compared the effects of Midazolam, Diazepam, Luminal, and Cardiac Cocktail in terms of sedation level. Also 127 children who received MRI and CT were included in this study. Group M ( = 30) received Midazolam, Group D ( = 31) received Diazepam, Group L ( = 32) received Luminal, and Group C ( = 34) received Cardiac Cocktail. Systolic Blood pressures, Pulse rates, the number of breathe, and oxygen saturation were monitored. The other measurements, which may affect the sedation level, such as weight, disease status, test status, complication status, age, and adaptation status, were also recorded. Descriptions of predictor values used in the analysis are given in Table 2.
Models in Table 1 were constructed according to the assumption that sedation levels are distributed as a multinomial random variables with the six possible categories as in Table 3. Sedation levels were maintained in the range of Ramsey Scale from 1 to 6 for the 15th minute, 30th minute, and 60th minute. The Ramsay Sedation Scale was given in Table 3.
The models were constructed according to the assumption that sedation levels are distributed as multinomial random variables with the six possible categories. Since there is little information available about the parameters, we choose noninformative prior distributions for the parameters. For regression parameters ∼ Normal(0, 10 3 ) and ∼ Normal(0, 10 3 ), we assume that the random effects are drawn from a multivariate normal distribution with zero mean and a variance-covariance matrix Σ. Noninformative uniform priors were determinated for the individual elements of Σ. Σ 11 and Σ 22 were given uniform (0, 100) priors and Σ 12 = Σ 21 was assigned uniform (−√Σ 11 Σ 22 , √Σ 11 Σ 22 ) prior.
Gibbs sampler was run for 10.000 iterations with the first 1000 as burn-in. Convergence for the posterior distributions of all models was achieved. We set up five multinomial models with six possible sedation levels for each model in Table 1. Therefore 30 models were constructed. Posterior calculations were calculated for all models. As an example posterior summaries for the effect on log[ (Sed6)/ (Sed1)] using model 1 are represented in Table 4.
It is easy to say from Table 4 that there are associations between the response and some explanatory variables. The explanatory variable group D, weight, comp, SPS, and PUL have significant effect on log[ (Sed6)/ (Sed1)]. We have the similar posterior results for all thirty models. Estimated posterior means and %95 intervals for the effects of all explanatory variables in Table 2 on the log of the probability of a patient being in Sed6 relative to the probability of being in Sed1 from models 1, 2, 3, 4, and 5 were obtained. They are given in Table 5. The variable Sed-lev.( − 1) refers to the sedation levels of the patient in the previous time. The corresponding effect in the model is averaged over the 2 steps between times. The variable Sed-lev.( , − 1) refers to the previous sedation levels at time for = 2, 3.
From Table 5, we can say that the explanatory variable group D, weight, comp, SPS, and PUL have significant effect on log[ (Sed6)/ (Sed1)] for models 1, 2, 3, 4, and 5. We also certainly state that there is relationship between the current sedation level and the sedation level at the previous time of measurements for all models.
For model comparisons, DIC values for all effect for each model were calculated. The DIC values were given in Table 6.
Firstly, Deviance Information Criteria (DIC) value was obtained at three times for all models with different effect. Table 6 compares the models when the deviance is obtained at three times. Model 2 and Model 4 are log models and essentially condition time 1 and model 1 explains time 1, with model 1 which explain time 1. We also calculate the DIC * at times 2 and 3. Therefore we focus on prediction of these times only.

Conclusions
Results in Table 6 show that the DIC for model is smaller than the DIC for the other models. Model 1 which contains a random effects term for each patient and sedation level over time shows better performance than the other models.
Model 2, which includes a transition variable, shows the similar performance with models 3, 4, and 5. If we are concerned with the prediction of times 2 and 3 only, model comparisons results in Table 6 show that the DIC * for models 2, 4, and 5 is smaller than model 1. Models 2, 4, and 5 provide better understanding of the effect of the changes over the three waves than Model 1. For this aim, we prefer to consider models 2, 3, 4, and 5.
For models 4 and 5, Table 6 shows that there is a significant difference between the transitions in sedation levels for times 1 to 2, and from times 2 to 3. Therefore we may prefer models 4 and 5 to the other models for the transitions.
We say that an important characteristic of hierarchical models is that each parameter referring to a specific group from the corresponding parameters of the other group.
Using Bayesian approach makes hierarchical model more flexible than classic hierarchical models. That is why they describe the data better. Bayesian hierarchical approach simplifies the interpretation and computation of the model.