^{1}

^{2}

^{3}

^{4}

^{5}

^{5}

^{1}

^{2}

^{3}

^{4}

^{5}

The influenza pandemic is a wide-ranging threat to people’s health and property all over the world. Developing effective strategies for predicting the influenza outbreak which may prevent or at least get ready for a new influenza pandemic is now a top global public health priority. Owing to the complexity of influenza outbreaks that are usually involved with spatial and temporal characteristics of both biological and social systems, however, it is a challenging task to achieve the real-time monitoring of influenza outbreaks. In this study, by exploring the rich dynamical information of the city network during influenza outbreaks, we developed a computational method, the minimum-spanning-tree-based dynamical network marker (MST-DNM), to identify the tipping point or critical stage prior to the influenza outbreak. With historical records of influenza outpatients between 2009 and 2018, the MST-DNM strategy has been validated by accurate predictions of the influenza outbreaks in three Japanese cities/regions, respectively, i.e., Tokyo, Osaka, and Hokkaido. These successful applications show that the early-warning signal was detected 4 weeks on average ahead of each influenza outbreak. The results show that our method is of considerable potential in the practice of public health surveillance.

Influenza, a seasonal, contagious, and widespread respiratory illness, has always been a huge threat to people’s health. According to the World Health Organization, up to 650,000 deaths annually are associated with respiratory diseases caused by seasonal influenza. In the United States, the influenza pandemic leads to an average of 610,660 deaths per year and 3.1 million hospitalized days [

In this study, by exploring the rich dynamical information provided by high-dimensional records of clinic hospitalization data, we developed a practical computational method, i.e., the minimum-spanning-tree-based dynamical network marker (MST-DNM), to quantitatively measure the dynamical change of a city network and thus detect the early-warning signal of an influenza outbreak. The theoretical basis of MST-DNM is our recently proposed concept, the so-called dynamical network marker (DNM) [

The MST-DNM is a novel network-based computational method combined with minimum spanning tree for accurate detection of early-warning signal to the influenza outbreak. The spread of infectious diseases in a region is described as the dynamical evolution of a nonlinear system, while the influenza outbreak is regarded as a qualitative state transition of the dynamical system. Without loss of generality, there are three states for the influenza outbreak (Figure

Schematic illustration of detecting the early-warning signal of influenza outbreak based on MST-DNM. (a) The historical records of clinic visits caused by influenza between 1 January 2009 and 1 May 2019 were collected from three regions of Japan, including Tokyo, Osaka, and Hokkaido. (b) Through building a city network, weighting, and the changes of the minimum spanning tree of this network, the MST-DNM method can monitor in real time the progress of the influenza and issue early-warning signals in a timely manner. (c) Based on the MST-DNM method, the outbreak process of influenza could be divided into three states, i.e., the normal state, the preoutbreak state, and the flu outbreak state. The abrupt increase of MST-DNM score means the arrival of the preoutbreak state.

The influenza spread and outbreak is a complex dynamic process of a nonlinear system. According to the DNM theory, when a complex system approaches to a tipping point or critical transition point, there is a dominant group, i.e., the DNM, which satisfies the following three essential properties [

The correlation (

The correlation (

The standard deviation (

In general, the above properties can be roughly understood as that the emergence of the DNM group with violent fluctuation and high correlation signifies the upcoming critical transition. Thus, these properties can be utilized as three criteria to identify the critical state of a complex biological system.

Based on the DNM theory, we developed the MST-DNM method in order to accurately predict the early-warning signal to the influenza outbreak, by combining with the minimum spanning tree in a city network. According to our method, the evolution process of flu outbreak could be modeled as three diverse stages or states (Figure

The sketch of the MST-DNM method is presented in Figure

The overall algorithm structure of MST-DNM method. First, model a city network based on its administrative divisions and the geographical relationship and map the corresponding clinic-visiting record matrix into the city network. Then, regard a week

1: Model a city network

2: Map the hospitalization data into the corresponding nodes in the network

3:

4:

5: Weight the edge

6:

7:

8: Calculate the minimum spanning tree’s weight sum

9:

10: the week

11: Break

12:

13:

A city network is modeled based on its administrative divisions’ geographic location and their adjacent information. As demonstrated in Figure

For each district of a city, it is necessary that the raw data which is weekly based should be averaged in terms of the total number of clinics within the district, owing to the enormous discrepancy of the number of visits between different clinics. Afterwards, the processed data is mapped to the city network.

The city network can be represented as a graph

First, we consider the number of clinic visits per week of a district as a sample

Second, for each edge

Third, when the city network is at week

1:

2:

3: MAKE-SET

4:

5: sort the edges of

6:

7:

8:

9: UNION

10:

11:

12:

In the ideal case, when the network system approaches a tipping point, there are the following two properties for the relationship between nodes in the network:

The nodes in the city network are all DNM members. The standard deviation of these members and the Pearson’s correlation coefficient between these members both dramatically increase

There are DNM and non-DNM members in the city network. The standard deviation of the DNM members dramatically increases, but the Pearson’s correlation coefficient between DNM members and non-DNM members decreases significantly, i.e., its absolute value increases significantly

Meanwhile, the proposed city network’s MST-DNM score

After the above procedure, it is possible to quantitatively analyze and monitor the dynamical process of influenza spreading based on the indicator

Logistic regression, which essentially is a linear regression model based on the sigmoid function, is used to analyze the dataset with duality to explore relationship between its internal independent variables, i.e., solving two-class (0 or 1) problems. Assume a dataset with

According to the above form, the key to the logistic regression model is to train a suitable parameter

In order to prevent our model from overfitting, the

In this study, we used the MST score of each week as

It is usually too complicated to mathematically express the influenza transmission kinetics before a sudden outbreak, because the influenza spread involves massive parameters from both biological and social systems. Based on the dynamical systems theory, there exists a so-called bifurcation point when there are dramatic fluctuations or a qualitative transformation in a network from its normal status [

As shown in Figure

As presented in Figure

The predictions of annual influenza outbreak in Tokyo city between 2009 and 2019. For each year, our MST-DNM method timely issues the early-warning signal of influenza outbreak only based on the clinic-visiting information. For each figure, the

In order to better demonstrate the dynamical process of the influenza spread in the network level, the evolutions of minimum spanning tree of the city network can also be presented. As shown in Figure

The dynamic evolution of the minimum spanning tree of the city network in Tokyo during years 2013-2014. The nodes are colored by the average number of clinic visits of the corresponding district, and the thickness of the edges represents the correlations between corresponding nodes (the detailed calculation is in Materials and Methods). It is clear that the edges become thicker before the nodes turn red in week 54, which indicates that the early-warning signals from our method appear before the flu outbreak.

In order to illustrate the universality of our MST-DNM method, we also applied it to detect the early-warning signals of flu outbreak in Hokkaido and Osaka. Similar to the processing flow in Tokyo city, a 30-node city network was modeled for Hokkaido region and an 11-node city network for Osaka city. Then, we mapped the clinic-visiting data to the corresponding network and calculate the minimum spanning tree. Finally, a logistic regression model trained by data consisting of MST-DNM scores was applied to detect the tipping point of influenza for each year.

As shown in Figures

In order to demonstrate the key role of the minimum spanning tree in our approach, we compared the effect of the MST-DNM method on the presence or absence of the minimum spanning tree in 2010, which is presented in Figure

The comparison result of the MST-DNM method on the presence or absence of the minimum spanning tree in 2010. (a) The early-warning signal of a DNM method without the minimum spanning tree is far away from the real influenza outbreak point; however, the MST-method’s is measurable. (b) The minimum spanning tree avoids abnormal correlations around node 7 in week 45, through which the MST-DNM method is more accurate.

An undirected and edge-weighted minimum spanning tree is the smallest tree model that minimizes the sum of the weights of all connected edges in the original network. It is able to reflect the overall changes of the network structure and could avoid the impact caused by local abnormal correlations around node 7 in week 45, which indicates that the minimum spanning tree plays a key role in the prediction process of outbreak points.

In the previous work, we developed a groundbreaking network-based approach for predicting influenza outbreaks, the so-called landscape dynamic network marker, which used empirical fold-change threshold to recognize the significant changes in DNM score to get the early-warning signal. We compared the performance of the proposed method MST-DNM with different tipping point determination strategies, that is, threshold determined from logistic regression and empirical threshold, which is presented in Figure

The performance of the MST-DNM method in different critical status determination strategies, that is, logistic regression and 2-fold change threshold. It can be seen that the MST-DNM method based on logistic regression is better than that based on 2-fold change threshold. The AUC of MST-DNM with logistic regression is 0.8986 while that of MST-DNM with 2-fold change threshold is 0.7391.

Japan suffered a serious influenza outbreak at the beginning of year 2019. According to the reports of about 5000 designated medical institutions across Japan, there was an average of 57.09 influenza patients per institution in the week from January 21st to 27th, which hit a new historical high since the first statistics in 1999. The influenza epidemic causes school suspension and the absence of a large number of workers, which would further result in a decline in social productivity and affect the economic development. It is estimated that the direct economic losses caused by the 2009 influenza pandemic to countries are about 0.5% to 1.5% of gross domestic product (GDP) [

Based on the DNM theory, which was applied to detect the tipping point or analysis critical transition of complex diseases on related genomic data in our previous works, combined with minimum spanning tree and logistic regression, a novel computable method called MST-DNM was developed to identify the early-warning signal of influenza outbreak in Tokyo, Osaka, and Hokkaido of Japan. In our MST-DNM method, we first extract the crucial characteristics of the preoutbreak state of influenza using DNM and minimum spanning tree from high-dimensional and longitudinal clinic-visiting counts. Then, the logistic regression trained by leave-one-out cross-validation is applied to identify the preoutbreak state and issue an early-warning signal based on these crucial characteristics. As shown in Figures

The historical raw data is available from Tokyo Metropolitan Infectious Disease Surveillance Center (link:

The authors declare that there is no conflict of interest regarding the publication of this paper.

The authors are grateful to Professor Yongjun Li for the valuable discussion. The work was supported by the National Natural Science Foundation of China (Nos. 11771152, 11901203, and 11971176), the Guangdong Basic and Applied Basic Research Foundation (2019B151502062), the China Postdoctoral Science Foundation funded project (No. 2019M662895, 2020T130212), and the Fundamental Research Funds for the Central Universities (2019MS111).

Figure S1: the predictions of annual influenza outbreak in Hokkaido city between 2011 and 2015. Figure S2: the predictions of annual influenza outbreak in Osaka city between 2012 and 2017. Figure S3: the dynamic evolution of the minimum spanning tree of the city network in Hokkaido during years 2014-2015. Figure S4: the dynamic evolution of the minimum spanning tree of the city network in Osaka during years 2017-2018.