A Heterogeneous Branching Process with Immigration Modeling for COVID-19 Spreading in Local Communities in China

,


Introduction
e COVID-19 spread alarmingly fast in Wuhan in late January 2020 before the city's lockdown starting Jan. 23.Based on the public reports on the number of confirmed cases, the prevalence of COVID-19 outside Hubei Province came to a controllable size in late February.
e Wuhan lockdown eventually ended on Apr. 8, 76 days since its commencement, after the confirmation of COVID-19 under control in China.
An extensive amount of research has been conducted to understand the spreading features of COVID-19.ere are two primary directions.One is the clinic feature, focusing on the virus itself [1,2], such as the estimation of the basic reproduction number R 0 [3][4][5], the effective reproduction number R t [6], and the basic statistical results obtained from the confirmed cases, such as the incubation period, the serial time, and the secondary attack rate [7][8][9].e other direction is the spreading dynamics, which is studied mainly through mathematical models, such as SEIR [10][11][12][13][14] and branching model [15,16].e effect of lockdown of Wuhan [17,18] and different levels of isolations are also considered based on generalized or specialized SEIR models [15].In this paper, public reports of line-list confirmed cases in Anhui, Henan, Jiangsu, and Zhejiang provinces from Jan. 21 to Feb. 19, 2020, covering 30 days, were collected and analyzed.Due to effective isolation policies, such as suggesting people to stay at home, wearing masks, washing hands, and tracing close contacts, the epidemic got under control within about one month in the above four provinces, which is approximately two or three generations according to the serial time.Short transmission chains are not appropriate to be modeled by the SEIR model, which usually simulates transmissions using multiple iterations.erefore, instead of SEIR, we propose a branching process to model the spreading of COVID-19 in well-prevented regions in China.Based on the statistical results extracted from our data, two influential factors for the propagation of COVID-19 are considered: the migration from outside a particular community and the efficacy of containment within the local communities.
In fact, SEIR models and branching models are both superior candidates for modeling classical epidemic spreading.Under certain conditions, such as when the total population is large enough, the two models are equivalent in modeling general epidemic dynamics; see [19] for theoretical support.In SEIR models, transmissions may happen in multiple iterations and result in longer transmission chains than the actual situation.In contrast, the branching process is more flexible in modeling the efficacy by the cutoff of the transmission chain in model assumption.On the contrary, the branching model is also flexible in modeling the confirmed cases of COVID-19 with different sources of contact, that is, imported or local, which are modeled by immigration and branching parts, respectively.In SEIR models, it is difficult to distinguish the sources of contact.
e above comparison is listed in Table 1.
To sum up, instead of the well-known SEIR model, a heterogeneous branching process with immigration is established to explore the diffusion of COVID-19 in wellprevented local communities in China.In our branching model, heterogeneity is caused by the distribution of serial time, immigration is the confirmed cases coming to a certain local community from outside, and the secondary cases infected by the imported infectors are modeled as their offsprings with the specific branching mechanism.Further transmissions are modeled as further offsprings with similar rules.All parameters in the model are extracted and approximated from real data.e feasibility of this approach is verified by back analysis of choosing proper parameters which represent isolation strength and social distance.It turns out that our model matches the real data very well.e efficacy of the containment measures is also simulated with our branching model.Our findings reveal the spreading mechanism of the COVID-19 from an individual to the population level in well-prevented local communities.e effectiveness of isolation measures in local communities obtained in our work can shed light on preventing the global pandemic spreading of COVID-19.
An outline of this paper is as follows.e data description is given in Section 2. e branching model is built in Section 3, with parameters obtained by statistical analysis from the real data.e validation of our branching model and the impact analysis of the isolation parameters in our model are explored in Section 4. Conclusions and discussions are given in Section 5.

Data Description
e data in this paper are extracted from the reports of confirmed cases collected in Anhui (totally 887 cases), Henan (totally 1279 cases), Jiangsu (totally 577 cases), and Zhejiang (totally 1137 cases) from Jan. 21 to Feb. 19, 2020.e locations of these four provinces, as well as Hubei, are illustrated in Figure 1. e color refers to the number of confirmed cases we collected in each region till Feb. 19, 2020.A typically reported item is as follows.
"Patient ID: Huainan-25.e patient Huainan-25 is a 59 year-old woman who is the wife of the Huainan-26 patient.On Feb. 12, she developed fever, muscle soreness, and other symptoms.On Feb. 14, she went to the hospital for treatment and stayed at the hospital for observation.On Feb. 15, her nucleic acid test was deemed positive, and doctors diagnosed her as a suspected patient.Two days later, she was confirmed.Doctors have traced back three close contacts, all of whom have been quarantined for medical observation.During the Chinese New Year's holiday, she had close contact with her daughter, son-in-law, and granddaughter.Her son-in-law, an asymptomatic patient with a history of suspicious exposure in Hefei, stayed at a designated hospital for observation.Doctors have traced back his 46 close contacts, all of whom have been quarantined for medical observation." For the patient with ID "Huainan-25," Huainan refers to the city where the patient lives, and 25 means that she is the 25th confirmed case in Huainan.e cases we selected are the ones with partial or full of the following information: (1) date of confirmation, (2) whether or not be an imported case (that is, infected outside the local community or not), (3) date of his/her infector's confirmation, and (4) relationship between a primary case (infector) and a secondary case (infectee).After extracting the necessary information we need, sample sizes for (1) and ( 2) are 831 for Anhui, 967 for Henan, 299 for Jiangsu, and 1,051 for Zhejiang, respectively.For (3) and (4), 411 cases (with Anhui 234 and Henan 177) are obtained.
Based on the actual data, statistical results concerning the key features during the spreading are illustrated in Section 3, including (1) the imported and local new cases evolving with time, (2) the main relationships between infector and infectee, and (3) the serial interval, that is, the time interval of confirmation times between each pair of infector and infectee.

Model Description: Heterogeneous Branching Process with Immigration
Naturally, a strict isolation policy is urgently needed for highly infectious diseases without pharmaceutical measures to prevent its epidemic effectively.e detailed reports of confirmed cases provide necessary information to understand the mechanism of COVID-19 transmission.e tracing back and isolation of close contacts efficiently cut off the transmission chain such that the imported cases to a specific region could only transmit the virus for few more generations.e serial interval and the incubation period are two of the key factors for prevention policymaking, from which the suggested length of isolation is commonly set as at least 14 days.
A heterogeneous branching process with immigration is established considering three ingredients for modeling, which are (1) the temporal pattern of serial time, ( 2

e Framework of the Branching Model.
Heterogeneous branching processes with immigration are well suited to describe the temporal evolution of populations in which individuals appear randomly over time in accordance with two distinct mechanisms.One mechanism, called immigration, is the influx of new individuals in the population of which they are not natives.e other mechanism, referred to as branching, is how individuals of the population generate new offspring.In this paper, we consider a heterogeneous branching process with immigration, in which (i) the branching mechanism is used to model the spreading of the virus in local communities with heterogeneity caused by serial time (ii) the immigration is a time-dependent Poissionian process, modeling the imported cases coming from the outside of a certain region In the following, immigration, offspring distribution, and serial time are discussed in detail with values or distributions obtained from statistical analysis of the real data described in Section 2.

Immigration.
e imported cases with contact history from outside of a local region are described as immigration.In our branching model, the immigration process is modeled by a time-dependent Poissionian process with a varying rate r(t).at is, the number of immigration arrived on the t th day, denoted as I(t) for t � 1, 2, . .., possesses the following distribution law: From the line-list reports, the number of imported cases and the number of local cases changing with time are

Model SEIR Branching
Merit Simple and easy for modeling the epidemic spreading with mean field approach under well mixture assumption (i) Flexible for modeling different sources of contact (ii) Flexible for specific transmission rules during the spreading Limitation (i) Iteration procedure results in longer transmission chains e simulating procedure is more complex for tracing every transmission tree (ii) It is difficult for modeling the agents with different sources of contact Complexity obtained.Figure 2 illustrates the data results of the four provinces we considered.e red and black curves refer to the imported and local cases, respectively.e immigration process established in our model is extracted from the imported sequences of the four provinces, referring to the red curves in Figure 2.
It is apparent that the first spreading in local communities is due to the import of confirmed cases from outside of the considered local community.In the beginning, the imported cases are more than the local ones.en, several days later, local cases began to increase.Whether or not an outbreak will happen depends on the prevention policy of local communities as long as the import path is completely cut off at an early stage.

Offspring.
e number of potential secondary cases produced by each infective individual is called the offspring in our branching model, which comprises two parts considering the place where infection of COVID-19 happens.
One is within a family, drawn from a binomial distribution Bin (N − 1, p), where N is a random variable representing the number of family members and p is the probability of getting infected within a family by the first infected member.It is notable that, for N � 1, there is no other family member to be infected.Moreover, high transmission of COVID-19 results in a large value of p.In our model, the transmission within a family happens with probability p � 0.9 according to the statistical result that about 90% infection happened between family members.e relationship between each pair of infector and infectee is counted.Due to home quarantine and high transmissibility of the COVID-19, the family members of the imported infectious ones are at super high risk of being infected.e top three relationships between an infector and an infectee are between couples, from parent to child, and from child to parent.e number and ratio of cases for the three relationships are shown in Table 2.
For the number of family members N, the reference distribution comes from the Chinese statistical yearbook of 2020.e distribution is illustrated in Table 3.
e other part of the offspring happens out of their homes.Assume the probability of leaving home is α ∈ (0, 1).Moreover, the number of potential infectees outside homes follows Poissionian distribution Poi (λ), λ > 0. In other words, α represents the strength of home quarantine, and λ measures the effect of social distance.Smaller α and/or λ means more strict containment of COVID-19 in certain regions.
e final assumption comes from the isolation and tracing back policy.
e secondary infectors' behavior is slightly adjusted.Firstly, the family infectees would not transmit the virus to family members since they are all treated as the offspring of the first infected family member who imported the virus.Secondly, the family infectees may have secondary out-of-home infectees, but the probability changes from α to α 2 .In fact, the decaying pattern of the probability of going out of home is set as exponential due to the cumulative awareness of isolation.
erefore, the probability of leaving home for the second generation is set as α 2 instead of linear relation or others between generations.irdly, for the social infectees of the imported cases initially, they can transmit the virus to their family members, and they may also have their secondary out-of-home infectees.e probability also changes from α to α 2 .Finally, no more transmission would happen after two generations due to the strict contact tracing measures taken at local communities.

Heterogeneity.
e heterogeneity in our branching model comes from the serial time, denoted by T, which is the time interval between the onset times of a newly infectee and its infector.e serial interval distribution extracted from the data is illustrated in Figure 3.
e empirical distribution of T in Figure 3 is obtained with the 80.05% positive serial intervals.It is notable that there are also negative and zero serial intervals.e ratios of negative and zero serial intervals are 5.35% and 14.60% in our sample, shown in the bar plot in Figure 3. Together with the empirical distribution of the positive serial interval, two known distributions, which are the Gamma distribution with mean 4.43 and variance 10.23 and the Weibull distribution with mean 4.45 and variance 10.31, are utilized to fit the empirical distribution.e fitting distributions are drawn in Figure 3 as references.Notably, a translational Weibull distribution is utilized in Ref. [6] for the serial time with different datasets, which is consistent with our result.
e numerical result of our empirical distribution is listed in Table 4 for reference, which is used in our simulation.
To sum up, the parameters or variables of our branching model, together with their descriptions and values or distributions for further investigation and simulation, are listed in Table 5. Notably, the tested parameters are α and λ, representing the isolation level and social distance.Other parameters or variables involved in our model are kept fixed during the simulation.

Simulation Results
Firstly, we show the good match of our model to the real data in Section 4.1.en, the efficiency of staying at home with parameter α and keeping social distance with parameter λ is provided in Section 4.2.In the following simulation, the distributions of family members N and the serial interval T are listed in Table 5, and the transmission probability between family members p is fixed as 0.9.

Fitting Real Spreading Processes.
To match the real data, the imported data series is borrowed as the immigration of the branching process in each province, which are the red curves in Figure 2. In order to test the validation of our branching structure, the simulation results should match the black curves in Figure 2. erefore, fine values for the tested parameters, i.e., the probability of going out of home α and the mean of secondary cases due to social activities λ should be set carefully.e best fit values for α and λ are obtained by minimizing the mean absolute error (MAE) between the real 4 Complexity local series and the simulated ones.e simulation series is obtained by averaging the 50 experiment trials.Figure 4 shows the simulation results.As shown in Figure 4, the simulated local confirmed case series and the real ones match well for all the four provinces, which gives good validation of our model, proving that the branching structure built in this paper is adequate for modeling the spreading of COVID-19 in well-prevented local communities.

Simulation Results.
In this section, experiments are conducted to investigate the combined effect of staying at home and keeping social distance.For this purpose, the  Complexity immigration rate r(t) is set as the average of the four series from the four provinces with a moving average of order 5, which is illustrated as the red curve with circles in Figure 5(b).First of all, the spatial stratified heterogeneity (SSH) among provinces is measured.e program calculates a socalled q-statistic to test the significance of differences among provinces.e value of q is a ratio ranging from 0 to 1, where 0 means no association between the number of cases and province, while 1 means that they are perfectly associated.
e q-statistic can be calculated with the following equation [20]: where SSW �  L h�1 N h σ 2 h , SST � Nσ 2 , N and σ 2 are the numbers of units and the variance in the study area which is composed of L strata, respectively, N h is the number of units, and σ h is the variance in stratum h.Large value of q means larger spatial heterogeneity in the study area.e significance value p can be transformed so that it can satisfy the noncenteral F-distribution: with where λ and Y h are the noncentral parameter and mean value in stratum h, respectively.en, the q-statistic and the corresponding p value in Table 6 can be applied to testify that whether the concerned cases have significant differences of variances in different strata.As the p values in Table 6 and curves in Figure 5, the SSH for all cases are significant at the level slightly above 0.05.e only nonsignificant one is the new cases of immigrant.As can be seen from Figure 5, the extreme fluctuation of Zhejiang Province is the main reason that leads to this nonsignificance.To sum up, the SSH for the time series we considered is significant.Despite the SSH, our branching model can fit different cases quite well with different parameters.In the following simulation for the isolation parameter α and social distance parameter λ, the immigrant rate r(t) is fixed as the reference, which is chosen as the moving average of order 5 of the mean immigrant series of the four provinces, just the red curve in Figure 5(b).
In the following, we conduct the simulation for our parameters.e values for isolation parameter α and social distance parameter λ for simulation are selected as α � 0.1, 0.4, and 0.7 and λ � 0.4, 1.4, and 4.8.e simulation results will be illustrated in Figures 6 and 7, with 9 subfigures for the nine combination of parameters α and λ. Figure 6 shows the evolution of local cases (the black curves)  changing with time, with the same immigration curve (the red ones) as the reference.Figure 7 gives the detailed components of local infectees by Home, Social, and Secondary.Home and Social refer to the infected cases of the imported cases taking place at home and out of home, respectively.Secondary refers to the infectees caused by Home and Social.e red, blue, and black curves in Figure 7 are the local infectees of Home, Social, and Secondary, respectively.Based on our assumption, the branching model only evolves two generations due to the contact tracing policy.In the following, detailed results with isolation parameter α and social distance parameter λ are given.Firstly, either strict isolation or keeping a strict social distance is effective for preventing the spreading.e effect of strict isolation is obtained from in Figures 6(a)-6(c), in which α � 0.1.Obviously, the confirmed local cases (black curves) grow as λ increases but still within a controllable size.In Figures 7(a)-7(c), the numbers of Home (red curves) keep stable, while the Social (black curves) and Secondary (blue curves) infectees increase slightly as the gathering together As in Figure 3 and Table 4 Serial interval between confirmations  Complexity 7 parameter λ increases.erefore, the most effective measure for preventing the spreading is staying at home for about two weeks.
Secondly, when it is necessary to leave home, keeping a social distance is the second line of defense.Since it is difficult to stay at home for a couple of weeks without going out, well prevention is crucial to avoid being infected.e effectiveness of social distance can be obtained from Figures 6(a), 6(d), and 6(g), in which λ � 0.4.e local cases increase as α increases but still within a controllable size.In Figures 7(a), 7(d), and 7(g), the numbers of Social (black curve) and Secondary (red curve) increase slightly as the leaving home probability α increases.However, as long as the social distance is far enough, isolation can be mitigated.
Finally, if isolation fails, for illustration, people have high demands of going out of their home, it is crucial to keep social distance, or the disaster result of gathering together will merge.As shown in Figures 6(g)-6(i) with α � 0.7, the    Complexity probability of going out is 70%; then, the local infectees grow very fast as λ increases from 0.4 and 1.4 to 4.8.In Figures 7(g)-7(i), the Home infectees keep stable, while the Social and Secondary increase obviously as λ increases.Notably, our assumption on contact tracing leads to a complete cutoff of the third and further generations.However, with α � 0.7 and λ � 4.8, the number of infectious individuals is so large that it is quite difficult to isolate the infected individuals, let alone the trace back and isolation of the close contact individuals, due to the lack of medical resources.erefore, an outbreak would take place in local communities with high possibility.
To sum up, when faced with the pandemic of COVID-19, the most costless and effective measure is staying at home.It is not the effort of someone but the effort of everyone.More importantly, it should be carried out simultaneously.However, considering the trade-off between the prevention of COVID-19 and economic affairs, keeping a proper social distance is more important.

Conclusion
Based on the confirmed cases reported outside the epic center in China, temporal and structural patterns are extracted from the actual data.Moreover, an age-dependent branching process with immigration is built to mimic the mechanism of the transmission of COVID-19 in particular local communities.Our model matches the actual data quite well, showing the validation of our branching model.e efficiencies of isolation and social distance are also tested by the branching model.We reveal that the spreading chain can be cut efficiently under strict isolation, which might be the main reason for the success of COVID-19 prevention in China.However, due to the trade-off between economic consideration and prevention of the pandemic, keeping a proper social distance is more important when leaving home for social activities.Our findings reveal the effectiveness of isolation in China outside Hubei Province and may shed light on preventing the global pandemic spreading of COVID-19.
e branching structure is proper for modeling the spreading of COVID-19, as shown in Figure 4.Although the situations we considered are the well-prevented local communities, the basic features, such as the serial interval, the composition of infectees, and the immigration structures, can be applied to more general situations for investigating other containment measures.
) the structural pattern of transmission considering containment measures, and (3) the import of confirmed cases which 2 Complexity begins the prevalence of COVID-19 in local communities.e framework of our branching model is given in Section 3.1.e values of parameters and distributions of random variables in our branching model are extracted from the real data in Sections 3.2, 3.3, and 3.4.e validation of our model and simulation results for different isolation levels and social distances are given in Section 4.

Figure 1 :
Figure 1: e location of Hubei, Anhui, Henan, Jiangsu, and Zhejiang.e color refers to the number of confirmed cases we collected in each region till Feb. 19, 2020.at is, Anhui, 887 cases, Henan 1279 cases, Jiangsu 577 cases, and Zhejiang 1137 cases.

Figure 2 :
Figure 2: e number of imported and local confirmed cases changing with time for (a) Anhui Province (831 cases), (b) Henan Province (967 cases), (c) Jiangsu Province (299 cases), and (d) Zhejiang Province (1051 cases).e red and black curves are the imported and the local cases, respectively.

Figure 3 :
Figure 3: e empirical distribution of the positive serial time with 329 confirmed cases.e reference fitted distributions are Gamma distribution with mean 4.43 and variance 10.23 (the blue line) and the Weibull distribution with mean 4.45 and variance 10.31 (the red dashdotted line).e inserted is the histogram of all serial times for 411 confirmed cases, containing nonpositive ones.

Figure 4 :
Figure 4: e simulating results and the real time series.e parameters for the best fit of the local series in each province are provided on top of each figure.e blue curves are obtained from our model with the average of 50 trials.e black curves are the original time series.(a) Anhui (α � 0.8 and λ � 0.4).(b) Henan (α � 0.1 and λ � 1.0).(c) Jiangsu (α � 0.7 and λ � 0.2).(d) Zhejiang (α � 0.1 and λ � 0.1).

Figure 5 :
Figure 5: e time series for new cases and cumulative cases for different provinces.Legends for provinces are shown in subfigure (b).e total, immigrant, and local cases are considered, respectively.e rate r(t) of immigrant I(t) for simulations, which is the moving average of order 5 of the mean immigrant series of the four provinces, is as the red curve in (b).(a) Total.(b) Immigrant.(c) Local.(d) Total.(e) Immigrant.(f) Local.

Table 1 :
e comparison of SEIR and branching in modeling COVID-19 in well-prevented regions.

Table 2 :
e top three infector-infectee relationships within family members based on 411 confirmed cases.

Table 3 :
e reference distribution of the number of family members N from Chinese statistical yearbook of 2020.

Table 4 :
Empirical distribution law of the positive serial interval T.

Table 5 :
Parameters or variables' descriptions with values or distributions for our model.

Table 6 :
e q-statistic and corresponding p value for the SSH test.