A Bayesian Inferential Approach to Quantify the Transmission Intensity of Disease Outbreak

Background. Emergence of infectious diseases like influenza pandemic (H1N1) 2009 has become great concern, which posed new challenges to the health authorities worldwide. To control these diseases various studies have been developed in the field of mathematical modelling, which is useful tool for understanding the epidemiological dynamics and their dependence on social mixing patterns. Method. We have used Bayesian approach to quantify the disease outbreak through key epidemiological parameter basic reproduction number (R 0), using effective contacts, defined as sum of the product of incidence cases and probability of generation time distribution. We have estimated R 0 from daily case incidence data for pandemic influenza A/H1N1 2009 in India, for the initial phase. Result. The estimated R 0 with 95% credible interval is consistent with several other studies on the same strain. Through sensitivity analysis our study indicates that infectiousness affects the estimate of R 0. Conclusion. Basic reproduction number R 0 provides the useful information to the public health system to do some effort in controlling the disease by using mitigation strategies like vaccination, quarantine, and so forth.


Introduction
Influenza is an emerging infectious disease and influenza-like illness (ILI) is a clinical illness caused by the influenza virus, which gave rise to human pandemics such as 1918 Spanish flu (H1N1), 1957 Asian flu (H3N2), 1968 Hong Kong flu (H2N2), and most recently H1N1 pandemic 2009. Influenza A (H1N1) was originally referred to as "swine flu" because laboratory testing showed that many of the genes in this new virus were very similar to those found in pigs in North America. Further on, it has been found that this new virus has gene segments from the swine, avian, and human flu virus genes. The scientists call this a "quadruple reassortant" virus and hence this new (novel) virus is christened "influenza-A (H1N1) virus" [1].
A reassorted influenza was first detected in Mexico on March 18, 2009, and rapidly spread to the United States, Canada, and subsequently all regions worldwide including India. The first case of H1N1 was reported in India on May 17, 2009, at Hyderabad Airport in a young boy who travelled from the USA and later it spread throughout the country [2].
According to World Health Organization (WHO) update of November 20, 2009, virus spread across more than 206 countries resulting in 6770 deaths [3]. In June 2009 World Health Organization (WHO) raised the level of pandemic alert phase 5 to phase 6 [4]. According to the Directorate General of Health Services, Government of India, New Delhi, update on November 19, 2009, there have been 16044 laboratories confirming cases recorded in India resulting in 537 deaths [5].
To formulate the valid and reliable estimate of transmissibility and spread of an outbreak we have utilised statistical modelling, which facilitates our understanding of mechanism of disease spread. To access the intensity of an outbreak, transmission potential can be quantified by reproduction number 0 , that is, average number of secondary cases generated by a single primary case in a completely susceptible population [6,7]. The importance of basic reproduction number 0 becomes more apparent when an emerging infectious disease strikes a population which is a key concept in the epidemic theory. If 0 is less than or equal to one, then transmission in the population goes stochastically extinct with probability of one after a small number of infections. If 0 is greater than 1, then there is a positive probability of a large epidemic. Statistical estimation of 0 is used to understand the transmission dynamics and evolution of the infectious disease which facilitate designing the effective public health intervention strategies and mitigation policies [8].
The aim of present study is to quantify the intensity of pandemic influenza A/H1N1 2009 in India. To achieve this goal we have calculated basic reproduction number from time series data set of H1N1 through Bayesian approach to contact patterns. We have estimated basic reproduction number 0 with 95% credible interval.

Data Sources.
Time series data for the 2009 influenza A/H1N1 pandemic in India was obtained from the Ministry of Health and Family Welfare (MoHFW), Government of India [9]. 31,924 infected cases with 1525 deaths were reported during May 17, 2009, to May 17, 2010, with two complete waves of epidemic. We have represented the data of the first wave from May to October 2009 (see Figure 1). For the analysis we have used daily reported cases (incidence) for initial phase of an epidemic from June 11 to August 13, 2009 [10].

Statistical Method.
Statistical inference of 0 is still in progress, and it is recognized that the estimate is very sensitive to dispersal of the disease progression [11]. Estimation of 0 can be illustrated by employing time-since-infection model and suggests origin of transmission of infectious diseases which is the counterpart of compartmental models like SI, SIS, and so forth. Both models are originated in the basic paper of Kermack and McKendrick [12], and both the SIR model and the simplest time-since-infection model are known as "the Kermack-McKendrick age structure model. " It is used to identify key epidemiological parameter by using a simple renewal process which adheres to the basic reproduction number 0 .
Let ( ) represent the number of new infections or incidence at calendar time ; that is, each infected individual on an average generates secondary cases at a rate ( ) at timesince-infection which is a specific case of renewal equation of birth process [13][14][15]. Consider (1) Since 0 represents average number of secondary cases that a primary case generates during entire his/her infectious period, the estimate is given by [16,17] Under Kermack and McKendrick assumption "single infection causes an independent process of infection with host" which allows an age representation for the state of infection, that is, infectivity of an individual. The time elapsed since infection is called the infection age or time since infection, whereas ( ) is expected infectivity of an individual with time-since-infection whereas ( ) becomes the rate of secondary transmission per single primary case at time-sinceinfection [18]: where ( ) is the transmission rate which depends on frequency of contact and infectiousness at infection age and Γ( ) is the probability of being infectious at infection age .
Further, we consider a probability density of the generation time where generation time is defined as time from infection of an individual to the infection of a secondary case by that individual, denoted by ( ) through normalized density of secondary transmission [16,17,19]. We have Using (5) in (4) we get Then, the basic reproduction number 0 is as in In reality, the case incidences are rather in discrete form as daily/weekly reports. The discretized analogy of (7) can be derived as Computational  By taking the inverse of both sides of (8) we get where is the discretized form of time since infection (usually in days), is discretized calendar time (usually in days/weeks), and is the generation time distribution for an infectious disease which is the probability distribution function for the time from infection of an individual to the infection of a secondary case by that individual [19,20].
From Figure 2 the concept of 0 through contact patterns has been illustrated here before taking up the Bayesian method of estimation. From Figure 2(a), the transmission tree with 0 = 2 represents who infected whom, where each primary case on an average generates "2" secondary cases. Secondary transmissions from primary to secondary cases are given by the basic reproduction number 0 = 2. From Figure 2(b) restructuring the transmission tree, given that all the potential contacts made by primary cases with the probability of each possible contact resulted in a secondary transmission, is 1/ 0 . This type of transmission tree represents who infected whom which is unobservable in nature unless rigorous contact tracing is implemented [21,22].
The numerator of the right hand side of (9) represents the total number of effective contacts made by a possible primary case in day/weeks which have an equal probability of resulting in the secondary transmission; that is, the probability that a secondary case is linked to an effective contact made by a single primary case at time is given by 1/ 0 . Effective contacts lead to potential secondary cases with equal chance of getting the infection from the primary cases at time . It has been seen that every contact does not lead to successful transmission of infection; that is, the effective contacts are uncertain which is defined as the contact that is sufficient to lead to the transmission of infection between infectious and susceptible population. The total number of effective contacts made by a potential primary case at time is the sum of the product of incidence cases and generation time distribution during the generation interval of length . This indicates a simple binomial law of uncertainty in effective contacts [22].
Total number of effective contacts become = ∑ =0 − , during the th day, that is, ∼ Bin( , 1/ 0 ), ∀ = 1, 2, . . . , , where 1/ 0 is the probability of effective contacts. Let denote incidence or new cases at calendar time and denotes generation time distribution at time-sinceinfection . Then the probability mass function of effective contacts becomes The likelihood function of 1/ 0 is as follows: Here, Bayesian inferential approach is used to estimate parameter 0 which provides us with different but related estimate by combining prior belief and the evidence observed. As more evidence is gathered the prior distribution is modified into the posterior distribution that represents the uncertainty over the parameter values. Posterior distribution is derived from the Bayes formula [23]  where ∫ ⋅ Θ ( / ) ( ) is a normalization constant, indicated data, is the unknown quantity, ( ) is the prior distribution, ( / ) is the likelihood function, and posterior distribution ( / ) completely describes the uncertainty. There are two key advantages of Bayesian theory: (i) once the uncertainty in the posterior distribution is expressed via probability distribution then the statistical inference can be automated and (ii) available prior information is reasonably incorporated into the statistical model. Now, the posterior estimate of parameter 0 is derived using its prior information, where likelihood function follows binomial distribution with conjugate prior as beta distribution of first kind [24]. That is, 1/ 0 ∼ Beta( , ), with realistic choice of parameters ( , ): Now posterior distribution of 1/ 0 is proportional to the likelihood times prior. Therefore, We are interested in estimating 0 with its 95% credible interval (CrI) which has been derived by considering the sampling transformation of 1/ 0 through simulation from the above posterior distribution with different choices of priors for beta distribution. Theoretically, it is hard to find the posterior distribution of 0 where 1/ 0 is a beta variable. Through simulation we have generated 10000 samples from beta posterior distribution and also estimated 95% CrI. We have considered beta distribution with several combinations of mean (ranges from 0.4 to 0.8) and accordingly we chose different values of ( , ). The second and fourth column of Table 1 represent our estimates of 0 which are posterior means along with 95% CrI. Figure 3 displays the posterior distribution of 0 with different prior choices.
Generation time is another most important characteristic in infectious disease epidemiology, since 0 indicates only the average number of secondary infections one primary infection produces in one disease generation. When we consider disease transmission in real time scale such as days or weeks, it matters a lot how long one disease generation lasts. Generation time is the average time taken for secondary infections produced by a primary infection [25]. Generation interval or generation time distribution is assumed to be known as Weibull distribution which is a biologically plausible choice [20,[26][27][28] with a mean of 1.78 and 2.48 days and a standard deviation (SD) of 0.66 and 1.06 days for = 7 and 10 days [29,30] (Tables S1 and S2       Computational and Mathematical Methods in Medicine All the sensitivity analysis was done by using MATLAB (Supplementary Material, Algorithm).

Sensitivity Analysis.
Sensitivity analysis is a statistical technique which provides insight into how uncertainty in input variables affects the model outputs and which input variable tends to derive variation in the outputs [31]. We performed sensitivity analysis to quantify the effect of changes on 0 . It has been used to determine how sensitive an estimate of the parameter is. It is usually performed as series of tests in which one can use different set of hyperparameter values to see the change in the estimate. Our analysis is based on the pandemic influenza A/H1N1 in India 2009 through the Bayesian estimates of basic reproduction number; we used the daily reported cases to calculate effective contacts. We have calculated posterior distribution of 0 using prior as beta distribution with different values of parameter choices. From Figure 1 we have seen that as prior choice changes the shape of the posterior distribution also changes.

Results and Discussion
The estimates of 0 for the 2009 H1N1 influenza pandemic were mainly reported based on the data obtained in the first few months of pandemic or based on whole first wave data. Most of these 0 estimates ranges from 1.1 to 2 [32][33][34][35][36][37]. Our estimated value of the basic reproduction number indicates the milder intensity of disease transmission in India. Interestingly, this estimated 0 with 95% credible interval is consistent with several other studies on the same strain [35], along with many European countries [28]. Notably, it has a smaller credible length which is more reliable estimate; see Table 1. Statistical inference of 0 is based on incidence (reported cases) and known generation time distribution. Some differences among these estimates are due to the choice of generation time distribution because 0 estimation relies much on the assumptions of the generation time distribution [38]. In general, shorter mean generation time may lead to smaller 0 estimates. Since, the estimate of 0 crucially depends on generation time distribution. From Table 1, we conclude that generation time or infectiousness of an individual affects the basic reproduction number. This method does not require exponential growth assumption. Still our estimate is greater than one so one has to make effort in controlling the disease through control strategies, which are typically targeted to bring this number below one and maintain it, as this will lead to eventual extinction of the epidemic.

3.1.
Limitations. This method is applied only for initial stage of the epidemic (exponential phase) when there is no intervention like quarantine, isolation vaccination, and so forth. If basic reproduction number is 0 < 1, then the probability 1/ 0 terminates because it exceeds the law of probability.