Bayesian Correction for Misclassification in Multilevel Count Data Models

Covariate misclassification is well known to yield biased estimates in single level regression models. The impact on hierarchical count models has been less studied. A fully Bayesian approach to modeling both the misclassified covariate and the hierarchical response is proposed. Models with a single diagnostic test and with multiple diagnostic tests are considered. Simulation studies show the ability of the proposed model to appropriately account for the misclassification by reducing bias and improving performance of interval estimators. A real data example further demonstrated the consequences of ignoring the misclassification. Ignoring misclassification yielded a model that indicated there was a significant, positive impact on the number of children of females who observed spousal abuse between their parents. When the misclassification was accounted for, the relationship switched to negative, but not significant. Ignoring misclassification in standard linear and generalized linear models is well known to lead to biased results. We provide an approach to extend misclassification modeling to the important area of hierarchical generalized linear models.


Introduction
Misclassification and measurement error is well known to cause bias in estimation. The exposure variable in epidemiologic studies is often subject to misclassification [1][2][3]. While the impact of misclassification is well documented for generalized linear models with a single level, considerable less work has been done on multilevel models. Covariate misclassification has been considered for continuous outcome models [4,5]. Here, our focus is on multilevel models with a count response where the primary exposure of interest is potentially misclassified.
Partner violence can have impacts on a wide range of outcomes on the abused partner, for instance, depression and even suicide attempts [6,7]. The impact on children in the household of domestic violence has also been studied [8][9][10]. Domestic violence is well known to be misclassified, especially in survey type data [11].
Our analysis uses India National Health Survey (NFHS-3) 2005-2006. It is a nationally representative household survey that provides a set of key variables for the study, including the number of surviving children a woman has and the occurrence of partner violence in the previous generation which is subject to misclassification. Thus our interest is in determining if previous generation spousal abuse impacts number of children correcting for potential misclassification in reported spousal abuse.
Our paper is organized as follows. We first describe the National Family and Health Survey. Next, we discuss the proposed Bayesian hierarchical model that accounts for covariate misclassification. We discuss the results of a simulation experiment in which we compare the proposed model to the naïve model that ignores misclassification. We apply the proposed model to the data from National Family and Health Survey and conclude with a discussion.

Data Description
According to the National Family and Health Survey in 2005, total lifetime prevalence of domestic violence was 33.5% and 8.5% for sexual violence among women aged 15-49. More notably, lifetime prevalence of domestic abuse ranged from 2 Computational and Mathematical Methods in Medicine 18% to 45% in states across India. In [12], it is explored whether female employment enhances women's bargaining power within the home and reduce the risk of intimate partner violence. Further, [13] examined whether female employment challenges the culture of male dominance and provokes more violence. Lastly, [14] used many of the factors from the previous studies but also addressed misclassification bias in violence reporting due to the significant underreporting of intimate partner violence in surveys [15]. We explore whether being raised in a household with spousal abuse significantly impacts the number of children the child exposed has in their lifetime. Data used in the analysis came from India National Health Survey (NFHS-3) 2005-2006. The sample is restricted to urban (mega city, large city, small city, and large town) women from 27 different states and key variables are physical spousal violence experience of their parents and the number of children they have. Additional covariates include a wide range of demographic and socioeconomic variables. The sample includes 19,026 urban women that have no missing variables. Lastly, we assume the spousal abuse rate varies across the 27 different states as well as the number of children.

Model
We follow the general approach of [16] for measurement error models. The approach is to specify three models, the outcome model, exposure model, and measurement model. The outcome model we consider has two levels with = 1, 2, . . . , denoting subjects and = 1, 2, . . . , denoting clusters. The count data at level 1 are assumed to be Poisson: where is an offset, often time or space, and is the rate of occurrence which is related to covariates through a log-linear model: ( The exposure, , is assumed to be potentially misclassified. Other covariates ( ) are assumed to be measured without error. 0 represents a random intercept for each cluster: with ∼ (0, 2 ). 1 is the exposure effect of primary interest while , > 1, represents the impact of other covariates. The exposure model for binary covariate is assumed to be a logistic model: with and random intercept with V ∼ (0, ] 2 ) and 0 . Finally, the measurement model depends on several assumptions and the amount of data available. Here, we assume nondifferential misclassification and either one or two tests available used as surrogates for the true exposure, . In the one test case, we assume the fallible method has sensitivity 1 and specificity 1 and conditional on the true value we have fallible assessment 1 which takes values of 0 and 1 with probability If there are two conditionally independent assessments, then the measurement model is in terms of two fallible outcomes Figure 1 provides a graph showing the relationship between all the variables in the model. Because we are using the Bayesian framework we require prior distributions for the model parameters. In the absence of prior information or expert opinion, diffuse normal prior distributions are often used for logistic and Poisson regression coefficients; see, for example, [17,18]. Thus, we assume that 0 , 1 , . . . , have independent normal prior distributions with mean 0 and variance 10. Similarly, the Poisson regression coefficients, 0 , 1 , . . . , , are also assumed to be independent and normally distributed but we assume a variance of 100 for these. We assume uniform priors for the random effects standard deviations and ] [19]. Finally, beta priors are used for the sensitivities and specificities. For the two assessment case, noninformative beta (1, 1) priors can be used. If only one assessment is used, the model is not identifiable and thus requires informative priors for both 1 and 1 .
The joint posterior is proportional to the product of the likelihoods from the outcome, exposure, and measurement models along with the prior distributions. The resulting marginal posteriors for all parameters of interest are not available in closed form. Packages such as OpenBUGS and JAGS can be used to fit the model. Our OpenBUGS code is available from the first author upon request.

A Simulation Study
To determine the size of the impact of the covariate misclassification on the hierarchical count model we performed a simulation experiment. We generate the simulated data in order to mimic the real data. Specifically, we assume a single exposure variable that is subject to misclassification along with four other covariates that are assumed to be measured correctly. The true exposure model is a binomial model with a logit link: while the Poisson log-linear model for the outcome is where ] ∼ (0, 0.1 2 ) and ∼ (0, 0.1 2 ). These random effect standard deviations of 0.1 are relatively small. We also considered the cases of 0.2 and 0.3 and found the results to be similar. For the measurement model we consider two cases. The first is with a single diagnostic test with sensitivity 1 and specificity 1 . As a second case, we assume two conditionally independent tests, the first with sensitivity 1 and specificity 1 and the second with sensitivity 2 and specificity 2 . For the first test we fix 1 = 0.55 and 1 = 0.95. For the second test we have 2 = 0.7 and 2 = 0.8.
For the simulation we set N, the number of clusters, to be 25 and = 200 for all clusters. We simulated 50 data sets and fit three models. The first is the naïve model where the misclassified covariate is assumed to be measured without error. The second model is the case of a single fallible diagnostic test. Finally, we consider the case of two conditionally independent diagnostic tests. For the case of a single diagnostic test we require informative priors for the sensitivity and specificity. We use a beta (10, 8) for 1 and a beta (165.7, 9.7) for 1 . Note the prior mean for 1 is 0.55 and the prior mean for 1 is 0.94. So the priors are centered near the truth, but the 95% prior interval for 1 is (0.33, 0.77), which allows for a wide range of possible values. These priors are consistent with [15] which accounts for errors in the domestic abuse question in a survey. Note there is considerably more certainty about the specificity than the sensitivity. This makes intuitive sense in that people who are not victims are unlikely to say they are, but victims are much The naïve model performs particularly poorly illustrating the problem of ignoring the misclassification. The average posterior mean across the 50 data sets is approximately 0 and none of the 95% intervals contained the true value. The two corrected models performed significantly better with both having empirical coverage over 90% and significantly less bias. The main cost of the misclassification model over the naïve is larger variability. For instance, the posterior standard deviation for 1 averages only 0.025 for the naïve model but is about 0.11 for both misclassification models. Accounting for misclassification and measurement error commonly inflates variability in estimates, but this is preferred to the large bias and low coverage of the naïve model. Though 1 has the largest degree of bias for the naïve model, most of the other parameters are also somewhat biased and coverages of the intervals below nominal. The performance for all the parameters are provided in Appendix.

Application
We apply the proposed Poisson regression model with a misclassified covariate to the India National Health Survey (NFHS-3) 2005-2006 data accounting for the 27 states as a cluster random effect. We consider five covariates: a binary indicator for whether a parent reported being abused, the age of the female, the education level of the female, a binary indicator as to whether the family considers themselves religious, and a continuous variable indicating their wealth. The number of children each woman surveyed has is the outcome variable and we assume the counts are distributed Poisson. All of the variables are considered to be reported correctly other than the binary variable indicating whether a female was abused or not. Because the one test Poisson model is nonidentifiable we must supply the prior on the sensitivity and specificity relatively informative beta distributions. Information from [14,15,19] on reporting of partner violence in developing countries indicates the distribution of the sensitivity could easily vary between 20 and 60%. Using the same sources, the specificity is found most likely to be above 90%. Using this information, we specify the priors as S ∼ Beta (5, 7.6) and C ∼ Beta (165. 7, 9.7). The priors indicate a substantial degree  of underreporting with a prior mean of only 40% for the sensitivity. Conversely, the specificity is expected to be quite high, with a prior mean over 94%. We also consider the naïve model and report the posterior estimates of the response model; however, it is unnecessary to consider the exposure model. We fit the data with both the proposed model and a naïve model that does not account for the misclassification. The models were fit in OpenBUGS and inferences were based on 20,000 iterations after 10,000 burn-ins. The results for both models are provided in Table 2. The main parameter of interest is the spousal abuse parameter, 1 . For the naïve model, the posterior of 1 is positive and the 95% interval is completely above 0 indicating statistical significance. Thus the inference would be that parental partner violence would lead to an increase in number of children. For the misclassification model, the coefficient is not significant as the interval does contain 0. This is not unusual because misclassification is well known to bias estimates towards the null. What is particularly interesting here is that though the interval contains 0, the posterior mean is negative, thus by accounting for the misclassification the majority of the posterior shifted from positive to negative. This further illustrates the importance of accounting for covariate misclassification in hierarchical count models. The other covariates in the response model are all significant and are very similar in both models. Age has a positive relationship with number of children while education, religiosity, and wealth all have negative relationships. The naïve model has no estimates for the parameters or the sensitivity and specificity since these are all concerned with the misclassification model. For the misclassification model, we see that age, education, and wealth all reduce the probability of abuse among the parents and are statistically significant. Religiosity appears to have a positive relationship with abuse, but the 95% interval contains 0, so the relationship is not strong. The standard deviation of the random slopes for the outcome model is relatively modest with a posterior mean of 0.13. This indicates only a small degree of heterogeneity of the baseline average number of children across the 27 states. For the misclassification model, the random effect standard deviation has a posterior mean of 1.12 indicating considerably more heterogeneity of the baseline probability of abuse across the 27 states. Finally, the sensitivity and specificity both have posteriors that are centered around larger values than the priors. This indicates there is less misclassification in the data than was expected, but still considerable underreporting since the sensitivity is still centered around a value less than 0.5.

Conclusion
In this paper, we have addressed the problem of covariate misclassification in a hierarchical count model. Through simulation, we illustrated that ignoring the misclassification can lead to biased estimates and undercoverage of interval estimators. Our real data example demonstrated an extreme possibility in that the naïve model yielded results that were statistically significant (95% interval completely above 0) while the bias corrected model had a point estimate that was negative, though not statistically significant. There are several extensions to the model we have proposed. In some cases, the count response may also be subject to misclassification. Accounting for under-or overreporting of the response in a hierarchical model such as this would be an interesting follow-up work. An important limitation to note is that while the information for our priors for the example is from developing countries including studies from India specifically, we have not matched subjects with our current data; thus there is no guarantee that the populations match perfectly. This is one reason why the two diagnostic case approaches are preferred because the information on the sensitivity and specificity come from the current data. However, the one diagnostic test case essentially works as a Monte Carlo sensitivity analysis where the priors dictate a range of likely values for the sensitivity and specificity. In that sense, our results are robust with respect to a large number of possible true values of the sensitivity and specificity.