^{1}

^{1}

^{2}

^{1}

^{1}

^{2}

Covariate misclassification is well known to yield biased estimates in single level regression models. The impact on hierarchical count models has been less studied. A fully Bayesian approach to modeling both the misclassified covariate and the hierarchical response is proposed. Models with a single diagnostic test and with multiple diagnostic tests are considered. Simulation studies show the ability of the proposed model to appropriately account for the misclassification by reducing bias and improving performance of interval estimators. A real data example further demonstrated the consequences of ignoring the misclassification. Ignoring misclassification yielded a model that indicated there was a significant, positive impact on the number of children of females who observed spousal abuse between their parents. When the misclassification was accounted for, the relationship switched to negative, but not significant. Ignoring misclassification in standard linear and generalized linear models is well known to lead to biased results. We provide an approach to extend misclassification modeling to the important area of hierarchical generalized linear models.

Misclassification and measurement error is well known to cause bias in estimation. The exposure variable in epidemiologic studies is often subject to misclassification [

Partner violence can have impacts on a wide range of outcomes on the abused partner, for instance, depression and even suicide attempts [

Our analysis uses India National Health Survey (NFHS-3) 2005-2006. It is a nationally representative household survey that provides a set of key variables for the study, including the number of surviving children a woman has and the occurrence of partner violence in the previous generation which is subject to misclassification. Thus our interest is in determining if previous generation spousal abuse impacts number of children correcting for potential misclassification in reported spousal abuse.

Our paper is organized as follows. We first describe the National Family and Health Survey. Next, we discuss the proposed Bayesian hierarchical model that accounts for covariate misclassification. We discuss the results of a simulation experiment in which we compare the proposed model to the naïve model that ignores misclassification. We apply the proposed model to the data from National Family and Health Survey and conclude with a discussion.

According to the National Family and Health Survey in 2005, total lifetime prevalence of domestic violence was 33.5% and 8.5% for sexual violence among women aged 15–49. More notably, lifetime prevalence of domestic abuse ranged from 18% to 45% in states across India. In [

We follow the general approach of [

Directed acyclic graph illustrating how the outcome model, exposure model, and measurement model are connected.

Because we are using the Bayesian framework we require prior distributions for the model parameters. In the absence of prior information or expert opinion, diffuse normal prior distributions are often used for logistic and Poisson regression coefficients; see, for example, [

The joint posterior is proportional to the product of the likelihoods from the outcome, exposure, and measurement models along with the prior distributions. The resulting marginal posteriors for all parameters of interest are not available in closed form. Packages such as OpenBUGS and JAGS can be used to fit the model. Our OpenBUGS code is available from the first author upon request.

To determine the size of the impact of the covariate misclassification on the hierarchical count model we performed a simulation experiment. We generate the simulated data in order to mimic the real data. Specifically, we assume a single exposure variable that is subject to misclassification along with four other covariates that are assumed to be measured correctly. The true exposure model is a binomial model with a logit link:

For the simulation we set

The results across the 50 simulations for the main parameter of interest,

Averages across 50 simulated data sets for

Mean | SD | Coverage | |
---|---|---|---|

Naïve | 0.001 | 0.03 | 0 |

One test | 0.38 | 0.11 | 1 |

Two tests | 0.34 | 0.11 | 0.91 |

We apply the proposed Poisson regression model with a misclassified covariate to the India National Health Survey (NFHS-3) 2005-2006 data accounting for the 27 states as a cluster random effect. We consider five covariates: a binary indicator for whether a parent reported being abused, the age of the female, the education level of the female, a binary indicator as to whether the family considers themselves religious, and a continuous variable indicating their wealth. The number of children each woman surveyed has is the outcome variable and we assume the counts are distributed Poisson. All of the variables are considered to be reported correctly other than the binary variable indicating whether a female was abused or not. Because the one test Poisson model is nonidentifiable we must supply the prior on the sensitivity and specificity relatively informative beta distributions. Information from [

We fit the data with both the proposed model and a naïve model that does not account for the misclassification. The models were fit in OpenBUGS and inferences were based on 20,000 iterations after 10,000 burn-ins. The results for both models are provided in Table

Posterior summaries for naïve and misclassification models.

Mean | SD | 95% interval | ||
---|---|---|---|---|

Naïve results | | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| ||||

Assuming misclassification | | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | |

In this paper, we have addressed the problem of covariate misclassification in a hierarchical count model. Through simulation, we illustrated that ignoring the misclassification can lead to biased estimates and undercoverage of interval estimators. Our real data example demonstrated an extreme possibility in that the naïve model yielded results that were statistically significant (95% interval completely above 0) while the bias corrected model had a point estimate that was negative, though not statistically significant. There are several extensions to the model we have proposed. In some cases, the count response may also be subject to misclassification. Accounting for under- or overreporting of the response in a hierarchical model such as this would be an interesting follow-up work. An important limitation to note is that while the information for our priors for the example is from developing countries including studies from India specifically, we have not matched subjects with our current data; thus there is no guarantee that the populations match perfectly. This is one reason why the two diagnostic case approaches are preferred because the information on the sensitivity and specificity come from the current data. However, the one diagnostic test case essentially works as a Monte Carlo sensitivity analysis where the priors dictate a range of likely values for the sensitivity and specificity. In that sense, our results are robust with respect to a large number of possible true values of the sensitivity and specificity.

In Tables

Simulation results for naïve model.

Truth | Mean | SD | Coverage | |
---|---|---|---|---|

| | | 0.02 | 0.04 |

| | | 0.03 | 0 |

| | | 0.01 | 0.66 |

| | | 0.01 | 0.78 |

| | | 0.01 | 0.73 |

| | | 0.01 | 0.39 |

| | | 0.02 | 0.96 |

Simulation results for one diagnostic test case.

Truth | Mean | SD | Coverage | |
---|---|---|---|---|

| 0.85 | 0.86 | 0.04 | 0.87 |

| 0.4 | 0.37 | 0.11 | 1 |

| | | 0.01 | 0.98 |

| | | 0.01 | 0.93 |

| | | 0.01 | 0.96 |

| | | 0.01 | 0.97 |

| | | 0.40 | 0.94 |

| | | 0.11 | 1 |

| | | 0.1 | 0.96 |

| | | 0.11 | 0.9 |

| | | 0.12 | 0.96 |

| | | 0.09 | 0.94 |

| | | 0.02 | 0.88 |

| | | 0.13 | 0.98 |

| | | 0.02 | 0.94 |

Simulation results for two diagnostic test cases.

Truth | Mean | SD | Coverage | |
---|---|---|---|---|

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

| | | | |

The authors declare that there are no conflicts of interest regarding the publication of this paper.