Modeling Chickenpox Dynamics with a Discrete Time Bayesian Stochastic Compartmental Model

We present a Bayesian stochastic susceptible-exposed-infectious-recovered model in discrete time to understand chickenpox transmission in the Valencian Community, Spain. During the last decades, different strategies have been introduced in the routine immunization program in order to reduce the impact of this disease, which remains a public health’s great concern. Under this scenario, a model capable of explaining closely the dynamics of chickenpox under the different vaccination strategies is of utter importance to assess their effectiveness. The proposed model takes into account both heterogeneous mixing of individuals in the population and the inherent stochasticity in the transmission of the disease. As shown in a comparative study, these assumptions are fundamental to describe properly the evolution of the disease. The Bayesian analysis of the model allows us to calculate the posterior distribution of the model parameters and the posterior predictive distribution of chickenpox incidence, which facilitates the computation of point forecasts and prediction intervals.


Introduction
Chickenpox is a highly contagious disease caused by the varicella-zoster virus. Its main symptoms are a blister-like rash, itching, tiredness, and fever. It is a rarely fatal disease that mainly affects children younger than ten years of age, although older children and adults can also get it. Most of the people suffer the disease during childhood and reinfection is very strange [1]. Because of the contagious behavior of the disease, which spreads easily through the coughs and sneezes of an infected person or by touching the virus particles that come from chickenpox blisters, the best modeling approach would be obtained by taking into account individual movement and contact behavior. However, information at individual level is scarcely ever available and models based on aggregated data are usually considered.
Compartmental models are commonly used in epidemiology to understand the underlying mechanisms that influence disease transmission when aggregated data are available. These models divide the population being studied into different compartments according to the disease status and describe the evolution of the disease through changes in the number of individuals in each compartment. There may be susceptible, exposed, infectious, recovered, and immune individuals depending on the nature of the disease. Compartmental models are usually written using ordinary differential equations or difference equations. In addition, they are normally defined assuming that all the individuals in the population are equally likely to contact any other individual in that population [2,3]. Models based on difference equations have been considered to describe the progression of chickenpox disease (see, e.g, [4,5]). However, contact patterns in real populations are indeed heterogeneous. Therefore, models assuming homogeneous mixing should be replaced by models incorporating stochastic effects [2]. Stochastic models are able to accommodate the stochasticity inherent in the transmission of infectious diseases and provide an improved and more natural description of disease dynamics [6][7][8]. In addition, stochastic models can be analysed from a Bayesian viewpoint (see, e.g., [9]).
In the case of chickenpox, symptoms appear about two weeks (from ten to twenty-one days) after exposure to a 2 Complexity contagious person. A person with chickenpox can spread the disease from one to two days before they get the rash until all the blisters have fully crusted over. The condition usually resolves by itself within a week and the infected individual usually acquires permanent immunity [1]. Therefore, a susceptible-exposed-infectious-recovered (SEIR) model seems an appropriate model to describe chickenpox dynamics.
In this paper, we present a Bayesian stochastic SEIR model in discrete time to understand chickenpox dynamics in the Valencian Community, Spain. The proposed compartmental model takes into account both heterogeneous mixing of individuals in the population and the inherent stochasticity in the transmission of the disease. In addition, the discrete formulation of the model better adjusts to the available data (weekly counts) and simplifies its Bayesian analysis. The analysis of the model from a Bayesian viewpoint allows us to consider a probability distribution for the observed counts of disease as well as for the parameters of the model. The posterior distribution for these parameters and the predictive distribution for future observations can then be straightforwardly obtained. Note that point forecasts and prediction intervals can be of utter importance to implement effective public health measures to reduce the burden of the disease.
The remainder of this paper is organized as follows. In Section 2 we present the available chickenpox data. Section 3 describes the SEIR model and shows its Bayesian analysis. In Section 4 we show the results obtained in the analysis of the data. Finally, we conclude with a general discussion of the proposed model and provide directions for further research.

Data
In this paper we analyse weekly cases of chickenpox reported by the health sentinel network in the Valencian Community, Spain, for the period 2007-2012 (http://www.sp.san.gva.es). It is important to point out that there were no changes in the vaccination strategy during the period analysed. Starting in 2006, the chickenpox vaccine was introduced in the routine immunization schedule of the Valencian Community and, during the period 2007-2012, [10][11][12][13][14] year-old susceptible children were vaccinated. Because varicella vaccine was available in pharmacies, younger children could also be vaccinated against varicella beyond official guidelines.
For 2007-2012, an average of 19,000 cases of chickenpox have been reported per year in the region of Valencia, representing an average incidence rate of around 375 cases per 100,000 inhabitants. Figure 3 shows the time plot of the series, which presents a clear seasonal pattern. As can be seen, chickenpox incidence reaches the highest rates during spring. From June the trend is descending, with very few cases during the summer and part of autumn. In the end of autumn, we can observe an upward trend that continues during the winter to complete the seasonal cycle.
Chickenpox is a mandatory notifiable disease in the Valencian Community. On the other hand, population coverage by the Spanish Public Health System is almost universal (99.5%) and even persons not covered by it can also be attended to in public centers. Therefore, underreported chickenpox cases are assumed to be insignificant.

Model
Although it can vary from person to person, an individual takes around two weeks to show symptoms after being infected and, as we mentioned before, a person with chickenpox can spread the disease from one to two days before the rash appears until all the blisters have crusted over (usually five to six days after the start of the rash). Hence, in order to describe as accurately as possible the different stages of chickenpox while keeping a relatively simple model, we make the following assumptions: once an individual comes into contact with a person with chickenpox, there is an incubation period of two weeks and then the individual becomes infectious for a week. Even though there is a proportion of infected individuals that, after being recovered, become susceptible again, that proportion is so small that it will not be considered here. Moreover, reinfected individuals do not transmit the disease. Consequently, chickenpox dynamics can be explained via a compartmental model that divides the population into susceptible, exposed, infectious, and recovered individuals. In addition, this model should focus on the first movement, that is, the transition from the susceptible compartment to the exposed one. Note that the following transitions (to the infectious compartment and finally to the recovered one) come naturally due to the nature of the disease.
Let 1 be the number of people at the first week of their incubation period. Because the data reported by the health sentinel network refer to the number of infectious people at each week, represented by , it is necessary to transform the available data into the time-series { 1 }. This can be easily done taking into account the fact that +2 was 1 . At the first level of the model hierarchy, we can assume that 1 follows a Binomial distribution: where −1 is the number of susceptible individuals at week − 1 and is the probability of getting the virus at week . In order to take into account the transmissible nature of the disease, we assume that the probability of infection depends on the number of infectious individuals at the previous week. Besides, this probability must lie in the intervals 0-1. In order to avoid the problem of estimates of falling outside the 0-1 limits, we model logit( ) instead, which lies in the interval (−∞, ∞). In particular, we assume that which is equivalent to where exp{ } is the transmission rate and the mixing parameter in (2) allows for heterogeneous mixing of individuals Figure 1: Flowchart of the proposed chickenpox model. Boxes represent compartments, horizontal arrows represent transitions between compartments, and downward arrows represent people leaving the system. in the population [10]. Setting = 1 would correspond to the assumption of mass-action law. We assume here that follows the uniform distribution in the interval (0, 1). A key feature of our model formulation is that the transmission rate is allowed to vary stochastically over time.
In particular, we model as follows: where represents the number of sine-cosine waves needed to capture the seasonal variation in the disease transmission. Its value depends on the data under study, and so has to be estimated together with the other parameters of the model. Because the seasonal pattern varies slightly from year to year, the random effect ∼ (0, 2 ) allows for unspecified features of week . The parameters { } are assumed to have zero mean Gaussian distributions with standard deviations . The uniform distribution in the interval (0, 5) is assigned to all the standard deviations [11].
The number of individuals in each compartment is examined at discrete time steps through the following recursive equations: where is the number of susceptible individuals, 1 is the number of individuals at the first week of their incubation period (which represents the data at hand, and so it is just set equal to its value), 2 is the number of individuals at the second week of their incubation period, is the number of infectious individuals, and is the number of recovered (immune) individuals at week . Although the common formulation of the SEIR model includes only one compartment of exposed individuals, we divide here the compartment into two classes, 1 and 2 , in order to simplify the update of the recursive equations. By using this formulation, all the individuals in compartment 1 at week −1 move to compartment 2 at week and so on. So the only compartments where individuals can stay longer than one unit time are the susceptible and recovered compartments. The average proportion of individuals vaccinated each week is represented by V. The parameter represents the number of births at week and the population size, which is assumed to be constant over time ( = + 1 + 2 + + , ∀ ). To keep the population size constant, individuals must leave the system each week, proportionally to the sizes of the compartments; that is, . The flowchart diagram for the model is described in Figure 1.
Using demographic data from the Spanish National Institute of Statistics (http://www.ine.es), we can estimate the average weekly number of births in the Valencian Community for years 2007-2012 as 1, 002, and so we set = 1, 002 ∀ and = 5,061,245. The annual vaccination coverage among children younger than 14 years of age in the Valencian Community is approximately 45% [12]. Thus, parameter V can be estimated as 0.45 times the percentage of population aged less than 14 years divided by 52. Because the vaccination program started in 2006, we assume that children older than 14 years old are already protected.
Taking into account the following relationship between exposed and infectious individuals, the initial conditions for the recurrence equations can be computed as follows: 4 Complexity

Results
In  (6)). A total of = 260 data were used to estimate the model. Data for the first 3 weeks of 2007 ( 1 , 2 , and 3 ) were used to compute the initial conditions for the recursive equations with ini = 1 (see (7)). Finally, data for the last 49 weeks (from week 4 to week 52, 2012) were left to evaluate the ability of our model to predict the dynamics of chickenpox infection.
As it happens with many statistical models, the posterior distribution of the model parameters is not analytically tractable and this makes necessary the use of efficient simulation algorithms to obtain a sample from that posterior distribution. All the analysis was implemented using the free statistical software with the package R OpenBUGS that allows us to call OpenBUGS within to carry out posterior sampling using Markov chain Monte Carlo simulation [13]. The code for the proposed model can be seen in Pseudocode 1.
To assess the convergence of MCMC chains, we fixed a burnin period of 50,000 iterations. We then kept one posterior sample in ten iterations until a set of 5,000 iterations was obtained. Figure 4 shows the estimated transmission rate together with its seasonal component and the estimated probability of getting the virus at each time point. In this case study, the seasonal component is given by the sum of three harmonic waves ( = 3). Higher values of were not significant. As can be seen, seasonality has a strong influence on disease transmission. Nevertheless, it is important to incorporate random effects into the model to accommodate variability in the transmission rate from year to year. The posterior mean and 95% credible intervals for parameters and { } are shown in Table 1. Note that the posterior mean of parameter is 0.59. This value of smaller than 1 corroborates that the mixing of individuals in the population is heterogeneous. Figure 5 displays the fit of the model to the data originally reported by the health sentinel network. The estimated data for week + 2, +2 , has been calculated as −1 ⋅̂⋅ (1 − / ) 2 (see (1) and (6)). The fitting root mean squared error (RMSE) is 4.41.
second formulation is the deterministic counterpart, that is, a deterministic model in discrete time with a seasonal transmission rate. The last two models correspond to deterministic continuous-time formulations. Alternative (stochastic model in discrete time with a deterministic seasonal transmission rate (SMDT-DTR)). This alternative formulation is similar to the one proposed in this paper, but the transmission rate is modeled in a deterministic way. The transmission rate is allowed to vary seasonally, but we assume that the seasonal pattern repeats over time; that is, there are no weekly random effects that account for overdispersion in the transmission rate. In particular, the SMDT-DTR assumes that We have implemented this model in a Bayesian framework using the package R OpenBUGS in a similar way to our model. Alternative (deterministic model in discrete time with a seasonal transmission rate (DMDT)). This model is the deterministic counterpart of the stochastic discrete time model with a deterministic seasonal transmission rate. Within this model formulation, the number of people at the first week of their incubation period is estimated as where is modeled as in (10). So, instead of analysing directly 1 and considering that 1 is a random variable with its associated probability distribution, the model is defined in terms of an estimate, 1 , which is given by the mean of the Binomial distribution in (1). In this case, the recursive equations are given by In order to fit this deterministic model, we minimized the sum of squared errors ( 1 − 1 ) using the optim function in .
Alternative (deterministic continuous-time model with a seasonal transmission rate (DMCT)). As mention in the Introduction, deterministic compartmental models in continuous time are widely used to understand the dynamics of infectious diseases. A deterministic continuous-time SEIR model accounting for heterogeneous mixing patterns in the population can be formulated as follows: where ( ) is defined as 0 + 1 cos(2 + ) to account for seasonality, is the proportion of births, and V is the proportion of individuals vaccinated. The values of and are set taking into account the fact that the average time to show symptoms after being infected is 14 days and the average time to recover from chickenpox disease is 7 days. Therefore, in the continuous scenario, it is not necessary to distinguish between 1 and 2 . Finally, we assume that + + + = 1, ∀ . The flowchart diagram for this model is described in Figure 2.

Alternative
(deterministic continuous-time model with a seasonal transmission rate and assuming homogeneous mixing (DMCT-Homo)). This model is similar to the previous one, but it assumes that all the individuals in the population are equally likely to contact any other individual (homogeneous mixing), and so the model is formulated as in (13) setting = 1.
In order to estimate the parameters of these two previous continuous-time deterministic models, we have used the package deSolve together with the optim function. Table 2 shows the fitting and forecast RMSEs for the Bayesian stochastic SEIR model in discrete time proposed in this paper and the four alternative formulations previously described. As can be seen, the proposed model leads to an improved goodness of fit as judged by a lower RMSE. The comparison between the fitting error of the proposed model and that of the SMDT-DTR corroborates the idea that the incorporation of random effects when modeling the transmission rate is fundamental to explaining particular features of annual epidemics, which are different from year to year. The difference between the corresponding forecast errors is smaller due to the fact that the random effects in the transmission rate are set equal to zero when forecasting. Comparison between the fitting errors of the stochastic models and those associated with the deterministic approaches highlights the importance of taking into account the stochasticity inherent in the transmission dynamics. We can also emphasize the importance of using models that do not imply mass-action mixing of individuals in the population.
Finally, Figures 7 and 8 compare, respectively, the estimates and point forecasts of chickenpox infections obtained with the proposed model, the SMDT-DTR model and the DMCT model. As expected, both the SMDT-DTR and the DMCT models, which consider a seasonal pattern that is constant over time, are not able to properly describe annual epidemics. Besides, in this case study, the commonly used deterministic continuous-time approach cannot explain properly epidemic peaks.

Conclusions
We have proposed a flexible stochastic compartmental model in discrete time to understand the dynamics of chickenpox, which is an infectious disease with a latency period. Unlike standard formulations, the proposed model does not imply homogeneous mixing of individuals in the population and  Although the results obtained are encouraging, we must note that the proposed model has some limitations. We have  assumed that the number of births equals the number of deaths so that the total population size is constant over time. In addition, we do not consider an age structure into the formulation of the compartmental model. It would be valuable to extend the proposed model to allow for different age groups. An age-structured model would also provide an important tool to study the effects of alternative vaccination strategies.
Another very fruitful area for further research is the extension of the model into the spatial domain. Space can play a significant role in disease dynamics. In addition, a spatiotemporal model would help to detect high-risk areas in need of more strong intervention strategies to reduce the burden of disease. The incorporation of covariates affecting disease transmission, such as temperature or wind speed, may also improve description and prediction of the pattern of disease.

Conflicts of Interest
The authors declare that they have no conflicts of interest.