The science of networks has revolutionised research into the dynamics of interacting elements. It could be argued that epidemiology in particular has embraced the potential of network theory more than any other discipline. Here we review the growing body of research concerning the spread of infectious diseases on networks, focusing on the interplay between network theory and epidemiology. The review is split into four main sections, which examine: the types of network relevant to epidemiology; the multitude of ways these networks can be characterised; the statistical methods that can be applied to infer the epidemiological parameters on a realised network; and finally simulation and analytical methods to determine epidemic dynamics on a given network. Given the breadth of areas covered and the ever-expanding number of publications, a comprehensive review of all work is impossible. Instead, we provide a personalised overview into the areas of network epidemiology that have seen the greatest progress in recent years or have the greatest potential to provide novel insights. As such, considerable importance is placed on analytical approaches and statistical methods which are both rapidly expanding fields. Throughout this review we restrict our attention to epidemiological issues.
The science of networks has revolutionised research into the dynamics of interacting elements. The associated techniques have had a huge impact in a range of fields, from computer science to neurology, from social science to statistical physics. However, it could be argued that epidemiology has embraced the potential of network theory more than any other discipline. There is an extremely close relationship between epidemiology and network theory that dates back to the mid-1980s [
Here, we review the growing body of research concerning the spread of infectious diseases on networks, focusing on the interplay between network theory and epidemiology. The paper is split into four main sections which examine the types of network relevant to epidemiology, the multitude of ways these networks can be characterised, the statistical methods that can be applied to either infer the likely network structure or the epidemiological parameters on a realised network, and finally simulation and analytical methods to determine epidemic dynamics on a given network. Given the breadth of areas covered and the ever-expanding number of publications (over seven thousand papers have been published concerning infectious diseases and networks) a comprehensive review of all work is impossible. Instead, we provide a personalised overview into the areas of network epidemiology that have seen the greatest progress in recent years or have the greatest potential to provide novel insights. As such considerable importance is placed on analytical approaches and statistical methods which are both rapidly expanding fields. We note that a range of other network-based processes (such as the spread of ideas or panic) can be modelled in a similar manner to the spread of infection; however, in these contexts, the transmission process is far less clear; therefore, throughout this paper, we restrict our attention to epidemiological issues.
There are a wide number of network structures and types that have been utilised when considering the spread of infectious diseases. Here, we consider the most common forms and explain their uses and limitations. Later, we review the implications of these structures for the spread and control of infectious diseases.
We start our examination of network forms by considering the ideal network that would allow us to completely describe the spread of any infectious pathogen. Such a network would be derived from an omniscient knowledge of individual behaviour. We define
Obviously, the reality of transmission networks is far from this ideal. Information on the potential transmission routes within a population tends to be limited in a number of aspects. Firstly, it is rare to have information on the entire population; most networks rely on obtaining personal information on participants, and therefore participation is often limited. Secondly, information is generally only recorded on a single transmission route (e.g., face-to-face conversation or sexual partnership) and often this is merely recorded as the presence or absence of a contact rather than attempting to quantify the strength or frequency of the interaction. Finally, data on contact networks are rarely dynamic; what is generally recorded is whether a contact was present during a particular period with little consideration given to how this pattern may change over time. In the light of these departures from the ideal, it is important to consider the specifics of different networks that have been recorded or generated and understand their structure, uses, and limitations.
One of the few examples of where many of the potential transmission routes within a population have been documented comes from the spread of sexually transmitted infections (STIs). In contrast with airborne infections, STIs have very obvious transmission routes—sex acts (or sharing needles during intravenous drug use)—and as such these potential transmission routes should be easily remembered (Figure
Examples of networks used in epidemiology. (a) Contacts between 22 intravenous drug users, as recorded in [
Despite these problems, and motivated by the desire to better understand the spread of HIV and other STIs, several pioneering studies were performed. Probably the earliest is discussed by Klovdahl [
One of the few early examples of the simulation of disease transmission on an observed network comes from a study of a small network of 22 injection drug users and their sexual partners [
A different approach to gathering social network and behavioural data was initiated by the Human Dynamics group at MIT and illustrates how modern technology can assist in the process of determining transmission networks. One of the first approaches was to take advantage of the fact that most people carry mobile phones [
A more recent study into the encounters between wild Tasmanian devils in the Narawntapu National Park in northern Tasmania utilised a similar technological approach [
Given the huge logistical difficulties of capturing the full network of interactions between individuals within a population, a variety of methods have been developed to generate synthetic networks from known attributes. Generally, such methods fall into two classes: those that utilise egocentric information and those that attempt to simulate the behaviour of individuals.
Egocentric data generally consists of information on a number of individuals (the egos) and their contacts (the alters). As such the information gathered is very similar to that collected in the sexual contact network studies in Manitoba and Colorado Springs, but with only the initial step of the snowball sampling was performed; the difference is that for the majority of egocentric data the identity of partners (alters) is unknown and therefore connections between egos cannot be inferred (Figure
The general configuration model approach of randomly linking together “half-links” from each ego [
A slightly more general approach to the generation of model sexual networks was employed by Ghani et al. [
The alternative approach of simulating the behaviour of individuals is obviously highly complex and fraught with a great deal of uncertainty. Despite these problems, three groups have attempted just such an approach: Longini’s group at Emory [
Whilst smallpox has been eradicated, concern remains about the possibility of a deliberate release of the disease. The stochastic simulation models of the Longini group have predominantly focused on methods of controlling this infection [
The Ferguson models have primarily been used to consider the spread and control of pandemic influenza, examining its potential spread from an initial source in South-East Asia [
An alternative source of network information comes from the recorded movements of individuals. Such data frequently describe a relatively large network as information on movements is often collected by national or international bodies. The network of movements, therefore, has nodes representing locations (rather than individuals) and edges weighted to capture the number of movements from one location to another—as such the network is rarely symmetric. Four main forms of movement network have played important roles in understanding the spread of infectious diseases: the airline transportation network [
Probably the earliest work using detailed movement data to drive simulations comes from the spread of 1918 pandemic influenza in the Canadian Subarctic, based on records kept by the Hudson’s Bay Company [
The movement of passenger aircraft as collated by the International Air Transport Association (IATA) provides very useful information about the long-distance movement of individuals and hence how rapidly infection is likely to travel around the globe [
A wide variety (and in practice the vast majority) of movements are not made by aircraft but are regular commuter movements to and from work. The network of such movements has also been studied in some detail for both the UK and USA [
Undoubtedly, one of the largest and most comprehensive data sets of movements between locations comes from the livestock tracing schemes run in Great Britain and being adopted in other European countries. The Cattle Tracing Scheme in particular is spectacularly detailed, containing information of the movements of all cattle between farms in Great Britain; as such, this scheme generates daily networks of contacts between over 30,000 working farms in Great Britain [
The early spread of foot and mouth disease (FMD) in 2001 was primarily due to livestock movements, particularly of sheep [
The use of static networks to model the very dynamic movement of livestock is questionable. Expanding on earlier work, Green et al. [
Contact tracing and hence the networks generated by this method can take two distinct forms. The first is when contact-tracing is used to initiate proactive control. This is often the case for STIs, where identified cases are asked about their recent sexual partners, and these individuals are traced and tested; if found to be infected, then contact tracing is repeated for these secondary cases. Such a process is related to the snowball sampling that was discussed earlier, with the notable exception that tracing is only performed from known cases. Similar contact-tracing may operate for the early stages of an airborne epidemic (as was seen for the 2009 H1N1 pandemic), but here, the tracing is not generally iterative as contacts are generally traced and treated so rapidly that they are unlikely to have generated secondary cases. An alternative form of contact-tracing is when a transmission pathway is sought between all identified cases [
These forms of network have two main advantages but one major disadvantage. The network is often accompanied by test results for the individuals within the network, as such we not only have information on the contact process but also on the resultant transmission of infection. In addition, when contact tracing is only performed to define an infection tree, there is the added advantage that the infection process itself defines the network of contacts, and hence there is no need for human interpretation of which forms of contact may be relevant. Unfortunately, the reliance on the infection process to drive the tracing means that the network only reflects one realisation of the epidemic process and, therefore, may ignore contacts that are of potential importance and would be needed if the epidemic was to be simulated; therefore, while they can inform about past outbreaks, they have little predictive power.
Obtaining large-scale and reliable information on who contacts whom is obviously very difficult; therefore, there is a temptation to rely on alternative data sets, where network information can be extracted far more easily, and where the data is already collected. As such the movement networks and contact tracing networks discussed above are examples of such surrogate networks although their connection to the physical processes of infection transmission are far more clear. Other examples of networks abound [
Given the huge complexity involved in obtaining large-scale and reliable data on real-transmission networks many researchers have instead relied on theoretically constructed networks. These networks are usually highly simplified but aim to capture some of the known (or postulated) features of real-transmission networks—often the simplifications are so extreme that some analytical traction can be gained. Here, we briefly outline some of the commonly used theoretical networks and identify which features they capture; some of the results of how infection spreads on such networks are discussed more fully in Section
One of the simplest forms of network is to allow each individual to have a set of contacts that it wishes to make (in more formal language each node has a set of half-links), these contacts are then made at random with other individuals based on the number of contacts that they wish to make (half-links are randomly connected) [
An alternative formulation that offers a compromise between tractability and realism occurs when individuals that exist in fully interconnected cliques have randomly assigned links within the entire population [
Both lattice networks and small world networks begin with the same formulation: individuals are regularly spaced on a grid (usually in just one or two dimensions), and each individual is connected to their
Small world networks improve upon the rigid structure of the lattice by allowing a low number of random contacts across the entire space (Figure
Spatial networks, as the name suggests, are generated using the spatial location of all individuals in the population, as such lattices and small worlds are a particular form of spatial network. The general methodology initially positions each individual
In recent years, there has been growing interest in exponential random graph models (ERGMs) for social networks, also called the
Here, we have shown that a wide variety of network structures have been measured or synthesised to understand the spread of infectious diseases. Clearly, with such a range of networks, no clear consensus can be drawn on the types of underlying network structures that are generally present; in part, this is because different studies have focused on different infectious diseases and different diseases require different transmission routes. However, three factors emerge that are key components of epidemiological networks: heterogeneity in the number of contacts such that some individuals are at a higher risk of both catching and transmitting infection, clustering of contacts such that groups of individuals are often highly interconnected, and some reflection of spatial separation such that contacts usually form locally, but occasional long-range connections do occur.
Three fundamental problems still exist in the study of networks. Firstly, are there relatively low-dimensional ways of capturing key aspects of a network’s structure? What constitutes a key aspect will vary with the problem being studied, but for epidemiological applications, it should be hoped that a universal set of network characteristics may emerge. There is then the task of assessing reasonable and realistic ranges for these key variables based on values computed for known transmission networks—unfortunately very few transmission networks have been recorded in any degree of detail although modern electronic devices may simplify the process in the future. Secondly, there is the related statistical problem of inferring plausible complete networks from the partial information collected by methods such as contact tracing. This is equivalent to seeking an underlying model for the network connections that is consistent with the known partial information, and hence, has strong resonance with the more mechanistically motivated models in Section
Real networks can exhibit staggering levels of complexity. The challenge faced by researchers is to try and make sense of these structures and reduce the complexity in a meaningful way. In order to make any sense of the complexities present, researchers over several decades have defined a large variety of measurable properties that can be used to characterise certain key aspects [
In general, networks are not necessarily connected; in other words, all parts of the network are not reachable from all others. The component to which a node belongs is that set of nodes that can be reached from it by paths running along edges of the network. A network is said to have a
The concept of a giant component is central when considering disease propagation in networks. The extent of the epidemic is necessarily limited to the number of nodes in the component that it begins in, since there are no paths to nodes in other components. In directed networks, in the case of a single initial infected individual, only the out-component of that node is at risk from infection. More generally, the strongly connected component contains those nodes that can be reached from each other. Members of the strongly connected component are most at risk from infection imported at a random node, since a single introduction of infection will be able to reach all nodes in the component.
The
Comparison of random and scale-free networks. (a) Degree distributions for two classes of networks: scale free and random networks. (b) Example random network with 100 nodes and 300 links. All nodes have similar numbers of links. (c) Example scale-free network with 100 nodes and 300 links. Most nodes have few links, with a few nodes having many links.
The degree distribution is one of the most important ways of characterising a network as it naturally captures the heterogeneity in individuals’ potential to become infected as well as cause further infection. Intuitively, the higher the number of edges a node has, the more likely it is to be a neighbour of an already infected node. Also, the more neighbours a node has, the more likely it is to cause a large number of onward cases. Thus, knowing the form of
For the extreme case of
The degree distribution provides very useful information on uncorrelated networks such as those produced by configuration models. However, real networks are in general correlated with respect to degree; that is, the probability of finding a node with given degree,
Characterising degree correlations is important for understanding disease spread. The classic example is the existence of strong correlations in sexual networks which were shown to be a key factor in understanding HIV spread [
In a network, the
Characterising networks in terms of the number of steps needed to reach any node from any other is also important. Real networks frequently display the small-world property; that is, the vast majority of nodes are reachable in a small number of steps. This has clear implications for disease spread and its control. Percolation approaches have shown that the effects of the small-world phenomenon can be profound [
The notion of shortest distance through a network can be used to quantify how central a given node is in the network. Many measures have been used [
An important example of an observable property of any network is the
Locally, the clustering coefficient for each node,
Degree and clustering characterise some aspects of network structure at an individual level. Considering distances between nodes provides information about the global organisation of the network. Intermediate scales are also present, and characterising these can help in our understanding of network structure and therefore the dynamics of spread.
At the simplest level, networks can be thought of being comprised of a collection of subgraphs. The simplest subgraph, the
Considering higher order structures can be very informative but is more involved. Milo and coworkers began by looking for specific patterns of connections between nodes in small subgraphs, dubbed
In the above definitions, a subgraph has been defined only in reference to itself. A different approach is to compare the number of internal edges to the number of external edges, arising from the intuitive notion that a
Although the impact of communities in transmission processes has not been fully explored, a few studies have shown it can have a profound impact on disease dynamics [
All of the above definitions have concentrated on networks where the edges remain unchanged over time and all edges have equal weight. Both of these constraints can naturally be relaxed, but generally, this calls for a higher-dimensional characterisation of the edges within the network. It is a matter of common experience that social interactions which can lead to infection do change, with some contacts being repeated regularly, while others are more sporadic. The frequency, intensity, and duration of contacts are all time-varying. How these inherently dynamic networks are represented for the purposes of modelling can have a significant impact on the model outcomes [
In the sections that follow, we discuss how these network properties can be inferred statistically and the improvements in our understanding of the transmission of infection in networks that have come as a result.
One of the key advantages of the simulation of disease processes on networks is that it enables the study of systems that are too complex for analytical approaches to be tractable. With that in mind, it is worth briefly considering efficient approaches to disease simulation on networks.
There are two main types of simulation model for infectious diseases on networks: discrete-time and continuous-time models; of these, discrete-time simulations are more common, so we discuss them first. In a discrete-time simulation, at every time step, disease may be transmitted along every edge from an infectious node to a susceptible node with a particular probability (which may be the same for all extant edges or may vary according to properties of the two nodes or the edge). Also, nodes may recover (becoming immune, or reverting to being susceptible) during each time-step. Within a time-step, every infection and recovery event is assumed to occur simultaneously. In a dynamic network simulation, the network is typically updated every time step—for example, in a livestock movement network, during time-step
Whilst algorithms for discrete-time simulations are not complex, some simple implementation techniques (arising from the observation that most networks of epidemiological interest are sparse) can significantly enhance software performance. In a directed network with
The alternative approach to simulating disease processes on networks is to simulate a series of stochastic Markovian events—the continuous-time approach. Essentially, given the state of the system, it is possible to calculate the probability distributions of when possible subsequent events (i.e., recovery of an infectious node or infection of a susceptible node) will occur. Random draws from these distributions are then made to determine which event occurs next, the state of the system updated, and the process repeated. This approach was pioneered by Gillespie to study the dynamics of chemical reactions [
The continuous-time approach is clearly in closer agreement with the ideal of standard disease models; however, utilising this method may be computationally prohibitive especially when large networks are involved. Discrete-time models may provide a viable alternative for three main reasons. Firstly, as the time steps involved in the discrete-time model become sufficiently small, we would expect the two models to converge. Secondly, inaccuracies due to the discrete-time formulation are likely to be less substantial in network models compared to random-mixing models, providing two events do not occur in the same neighbourhood during the same time step. Finally, the daily cycle of contacts that regulate most of our lives means that using time steps of less than 24 hours may falsely represent the temporal accuracy that can be attributed to any simulation of the real world.
In this section, we use the word “analytic” broadly, to imply models that are directly numerically integrable, without the use of Monte Carlo simulation methods, rather than systems for which all results can be written in terms of fundamental functions, of which there are very few in epidemiology. Analytic approaches to transmission of infection on networks fall into three broad categories. Firstly, there are approaches that calculate exact invasion thresholds and final sizes for special networks. Secondly, there are approaches for calculating exact transient dynamics, including epidemic peak heights and times, but again, these only hold in special networks. Finally, there are approaches based on moment closure that are give approximately correct dynamics for a wide class of networks.
Before considering these approaches on networks, it is worth considering what is meant by nonnetwork mixing and showing explicitly how this can derive the standard transmission terms from familiar differential equation models. Nonnetwork mixing can be taken to have one of two meanings: either that every individual in the population is weakly connected to every other (the mean-field assumption), or that an Erdös-Rényi random graph defines the transmission network, depending on context. To see how this determines the epidemic dynamics, we consider a population of
In either case, having defined an appropriate population-level transmission rate, a stochastic susceptible-infectious model of transmission is defined through a Markov chain, in which a population with
In the case of exponentially distributed infectious periods and recovery from infection offering long-lasting immunity, the standard
For nonnetwork mixing, the threshold for invasion is given by the basic reproductive ratio
Provided there are no short closed loops in the network,
Once an appreciable number of short closed loops are present in the network, exact threshold parameters can still sometimes be defined, but these typically depart from the standard verbal definition of
The most sophisticated and general way to obtain exact results for the expected final size of a major outbreak on a network is called the
The main way to calculate approximate final sizes is given by percolation-based methods. These were reviewed by Bansal et al. [
Some of the earliest work on infectious diseases involved the exact solution of master equations (where the probability of the population being in each possible configuration is calculated) on small, fully connected graphs as summarised in Bailey [
Nevertheless, other specific routes do exist that allow exact systems of equations of lower dimensionality to be derived for special networks. For static networks constructed using the configuration model (where individuals have heterogeneous degree but connections are made at random such that the presence of short loops can be ignored in a large network, see Figure
While
For more structured networks with a local-global distinction, there are two limits in which exact dynamics can also be derived. If the network is composed of
In the limit where
Both of these two local-global models, the metapopulation model (
While all the exact results above are an important guide to intuition, they only hold for very specialised networks. A large class of models exists that form a bridge between “mean-field” models and simulation by using spatial or network moment closure equations. These are highly versatile models. In general, invasion thresholds and final sizes can be calculated rigorously, but exact calculation of transient dynamics is only possible for very special networks. If one wants to calculate transient effects in general network models—most importantly, peak heights and times—then moment closure is really the only versatile way of calculating desired quantities without relying on full numerical simulation.
It is also worth noting that there are many results derived through these “approximate” approaches that are the same as exact results or are numerically indistinguishable from exact results and simulation. We give some examples below and also note that the dynamical PGF approach [
The idea of moment closure is to start with an exact but unclosed set of equations for the time evolution of different units of structure. Here, we show how these can be derived by considering the rates of change of both types of individual and types of connected pair. Such pairwise moment closure model are a natural extension to the standard (random-mixing) models, given that infection is passed between pairs of infected individuals
Common notation.
Concept/Measure | Other common names | Our notation | Other common notation |
---|---|---|---|
Network | Graph | ||
Node | Vertex, point, site, actor | ||
Edge | Link, tie, bond | ||
Adjacency matrix | Connectivity matrix | ||
Number of nodes | Size of network | ||
Number of edges | Graph size | ||
Centrality | |||
Degree | Connectivity | ||
Betweenness | |||
Degree distribution | Connectivity distribution | ||
Shortest path distance | Geodesic distance | ||
Clustering | transitivity | ||
Number of nodes of type A | |||
Number of | |||
Diameter | Maximal shortest path |
Clearly, the system (
Equation (
The work on dynamical PGF models [
There are many other extensions of this general methodology that are possible. Writing ODEs for the time evolution of triples and closing at a higher order allows the consideration of the epidemiological consequences of varying motif structure [
Another approach is to approximate the transmission dynamics in the standard (mean-field) differential equations models. Essentially, this is a form of moment closure at the level of pairs rather than triples. For example, in Roy and Pascual [
Finally, a very recent work [
This diversity of approaches leads to some important points about methods based on moment closure. These methods are extremely general and can be applied to consider almost any aspect of network structure or disease natural history; they can be applied to populations not currently amenable to direct simulation due to their size, and they do not require a complete description of the network to run—only certain statistical properties. However, there are currently no general methods for the proposal of appropriate closure regimes nor any derivation of the limits on dynamical biases introduced by closure. Therefore, closure methods sit somewhere in between exact results for highly specialised kinds of network and stochastic simulation, where intuitive understanding and general analysis are more difficult.
In the papers that introduced them, the differential-equation-based approximate dynamical systems above were compared to stochastic simulations on appropriate networks. Two recent papers making a comparison of different dynamical systems with simulation are Bansal et al. [
Firstly, it is necessary to define what is meant by agreement between a smooth, deterministic epidemic curve and the rough trajectories produced by simulation. Limiting results about the exactness of different ODE models assume that both the number of individuals infectious and the network size are large, and so the early behaviour of simulations, when there are few infectious individuals, is often dominated by stochastic effects. There are different ways to address this issue, but even after this has been done, there are two sources of deviation of simulations from their deterministic limit. The first of these is the number of simulations realised. If there is a summary statistic such as the mean number of infectious individuals over time, then the confidence interval in such a statistic can be made arbitrarily small by running additional simulations, but agreement between the deterministic limit and a given realisation may still be poor. The second source of deviation is the network size. By increasing the number of nodes, the prediction interval within which the infection curve will fall can be made arbitrarily small; however, the computational resources needed to simulate extremely large networks can quickly become overwhelming.
More generally, each approximate model is designed with a different application in mind. Models that perform well in one context will often perform poorly in another, and this means that “performance” of a given model in terms of agreement with simulation will primarily be determined by the discrete network system on which simulations are performed.
The above considerations motivate the example comparisons with simulation that we show in Figure
Comparison of simulation and deterministic models for six networks. (a) Two-group configuration model network. (b) Two-group assortative network. (c) Static regular network. (d) Dynamical regular network. (e) Regular clustered network. (f) One-dimensional lattice.
In Figures
Figure
In order to be predictive, epidemic models rely on valid values for parameters governing outbreak dynamics, conditional on the population structure. However, obtaining these parameters is complicated by the fact that even when knowing the underlying contact network structure, infection events are censored—it is only when disease is detected either from symptoms or laboratory tests that a case becomes apparent. In attempting to surmount this difficulty, parameter estimates are often obtained by making strong assumptions as to the infectious period or through ad hoc methods with unknown certainty. Measuring the uncertainty in such estimates is as important as obtaining the estimates themselves in providing an honest risk prediction. Given these difficulties, inference for epidemic processes has perhaps received little attention in comparison to its simulation counterpart.
The presence of contact network data for populations provides a unique opportunity to estimate the importance of various modes of disease transmission from disease incidence or contact tracing data. For example, given knowledge of the rate of contact between two individuals, it is possible to infer the probability that a contact results in an infection. If data on mere connectivity (i.e., a 1 if the individuals are connected and 0 otherwise) is available, then it is still possible to infer a rate of infection between connected individuals. Thus, the detail of the inference is determined to a large extent by the available detail in the network data [
Epidemic models are defined in terms of times of transitions between infection states, for example a progression from susceptible, to infected, to removed (i.e., recovered with lifelong immunity or dead) in the so-called “SIR” model. Statistical inference requires firstly that observations of the disease process are made: at the very least, this comprises the times of case detections, remembering that infection times are always censored (you only ever know you have a cold a few days after you caught it). In addition, covariate data on the individuals provides structure to the population and begins to enable the statistician to make statements about the importance of individuals’ relationships to one another in terms of disease transmission. Therefore, any covariate data, however slight, effectively implies a network structure upon which disease transmission can be superimposed.
As long as populations are relatively small (e.g., populations of farms in livestock disease analysis), it is common for models to operate at the individual level, providing detailed information on case detection times and perhaps even information on epidemiologically significant historical contact events [
In a heterogeneous population, the behaviour of an epidemic within any particular locality is governed by the relationship between infected and susceptible individuals. For inference in the early stages of an epidemic, it is important to quantify the amount of uncertainty in the underlying contact networks as the early growth of the epidemic is known to be subexponential due to the depletion of the local susceptible population. This contrasts markedly to the exponential growth observed in a large homogeneously mixing population [
For homogeneous models the basic reproduction number, or
Although inference for
Obviously, complete or near-complete epidemic data is rare and hence it is desirable to perform inference based upon partial observation. This is particularly relevant for real time estimation of
The tradeoff in the simplicity of estimating
Inference for households models is well developed in comparison to inference for other “network” models. In essence, this is for three main reasons: firstly, it is a reasonable initial approximation to assume that infection either occurs within the home or from a random source in the population. Secondly, entire households can be serologically sampled following an epidemic, such that the distribution of cases in households of given sizes can be ascertained. Finally, it is often a reasonable approximation that following introduction of infection into the household, the within-household epidemic will go extinct before any further introductions—which dramatically simplifies the mathematics.
The first methods proposed for such inference are maximum likelihood procedures based upon chain-binomial models, such as the Reed-Frost model, or the stochastic formulation of the Kermack-McKendrick model considered by Bartlett [
Additionally, fixed probabilities
The ability to relax assumptions further has been predominately due to use of Markov chain Monte Carlo (MCMC) methods as first considered by O’Neill et al. [
Whilst new developments are appearing at an increasing rate, the significant majority of methods are based upon final size data and are developed for SIR disease models, perhaps due in part to the simplification of arguments for deriving final size distributions. One key, but still unanswered question from these analyses of household epidemics is how the transmission rate between any two individuals in the household scales with the total number of individuals in the household (compare Longini and Koopman [
Perhaps the holy grail of statistical inference on epidemics is to make use of an individual-level model to describe heterogeneous populations at the limit of granularity. In this respect, Bayesian inference on stochastic mechanistic models using MCMC have perhaps shown the most promise, allowing inference to be made on both transmission parameters and using data augmentation to estimate the infectious period.
An analysis of the 1861 outbreak of measles in Hagelloch by Neal and Roberts [
Whilst representing the cutting edge of inference on infectious disease processes, these approaches are currently limited by computing power, with their algorithms scaling by the number of infectives multiplied by the number of susceptibles. However, with advances in computer technology expected at an increasing rate, and small approximations made in the calculation of the statistical likelihoods needed in the MCMC algorithms, these techniques may well form the mainstay of epidemic inference in the future.
In livestock diseases, part of the standard response to a case detection is to gather contact tracing information from the farmer. The resulting data are a list of contacts that have been made in and out of the infected farm during a stipulated period prior to the notification of disease [
Much has been written on how contact tracing may be used to decrease the time between infection and detection (notification) during epidemics. However, this focuses on the theoretical aspects of how contact tracing efficiency is related to both epidemic dynamics and population structure (see, e.g., Eames and Keeling [
Qualitative results from simulations indicate that epidemics on networks, for some parameter values, show features that distinguish them from homogeneous models. The principal features are a very variable length slow-growth phase, followed by a rapid increase in the infection rate and a slower decline after the peak [
However, with appropriate assumptions some results can be obtained, which provide some insight into what more could be achieved. When the network is taken to be an Erdös-Rényi graph with unknown parameter
The use of networks is clearly a rapidly growing field in epidemiology. By assessing (and quantifying) the potential transmission routes between individuals in a population, researchers are able to both better understand the observed distribution of infection as well as create better predictive models of future prevalence. We have shown how many of the structural features in commonly used contact networks can be quantified and how there is an increasing understanding of how such features influence the propagation of infection. However, a variety of challenges remain.
Several open problems remain if networks are to continue to influence predictive epidemiology. The majority of these stem from the difficulty in obtaining realistic transmission networks for a range of pathogens. Although some work has been done to elucidate the interconnected structure of sexual encounters (and hence the sexual transmission network), these are still relatively small-scale compared to the population size and suffer from a range of potential biases. Determining comparable networks for airborne infections is a far greater challenge due to the less precise definition of a potential contact.
One practical issue is therefore whether new techniques can be developed that allow contact networks to be assessed remotely. Proximity loggers, such as those used by Hamede and colleagues [
There is now a growing set of diary-based studies that have attempted to record the personal contacts of a large number of individuals; of these, POLYMOD is currently the most comprehensive [
Associated with the desire to have realistic contact networks for entire populations, comes the need to characterise such networks in a relatively parsimonious manner that provides important insights into the types of epidemiological dynamics that could be realised. Such a characterisation would allow for different networks (from different times or different locations) to be compared in a manner that is epidemiologically significant and would allow artificial networks to be created that matched particular known network features. This clearly relies on both existing measures of network structure (as outlined in Section
To date the vast majority of the work into disease transmission on networks has focused on static networks where all links are of equal strength and, therefore, associated with the same basic rate of transmission. However, it is clear that contact networks change over time (both on the short-time scale of who we meet each day, and on the longer time-scale of who our main work and social contacts are), and that links have different weights (such that some contacts are much more likely to lead to the transmission of infection than others). While the simulation of infection on such weighted time-varying networks is feasible, it is unclear how the existing sets of network properties or the existing literature of analytical approaches can be extended to such higher-dimensional networks.
For any methodology to have any substantive use in the field, it is important both to have effective data gathering protocols in place and to have the statistical techniques in place to analyse it. Here, three issues are perhaps most critical. Firstly, data gathering resources are almost always limited. Therefore, carefully designed randomised sampling schemata should be employed to maximise the power of the statistical techniques used to analyse data, rather than having to reply on data augmentation techniques to work around the problems present in ad hoc datasets. This aspect is particularly important when working on network data derived from population samples. Secondly, any inference on both network and infectious disease models should be backed up by a careful analysis of model fit. Although recent advances in statistical epidemiology have given us an unprecedented ability to measure population/disease dynamics based on readily available field data, epidemic model diagnostics are currently in their infancy in comparison to techniques in other areas of statistics. Therefore, it is expected that with the growth in popularity of network models for analysing disease spread, much research effort will be required in designing such methodology.
We have highlighted that the study of contact networks is fundamentally important to epidemiology and provides a wealth of tools for understanding and predicting the spread of a range of pathogens. As we have outlined above, many challenges still exist, but with growing interest in this highly interdisciplinary field and ever increasing sophistication in the mathematical, statistical and remote-sensing tools being used, these problems may soon be overcome. We conclude, therefore, that now is an exciting time for research into network epidemiology as many of the practical difficulties are surmounted and theoretical concepts are translated into results of applied importance in infection control and public health.
This work is funded by the Medical Research Council (L. Danon, M. J. Keeling, M. C. Venon), the Biotechnology and Biological Sciences Research Council (C. P. Jewell, M. J. Keeling, G. O. Roserts), the Engineering and Physical Sciences Research Council (T. House, A. Ford, M. J. Keeling), the Centre for Research in Statistical Methodology funded by the Engineering and Physical Sciences Research Council (GOR), the Research and Policy for Infectious Disease Dynamics (RAPIDD) program of the Science and Technology Directorate (M. J. Keeling), and the Australian Research Council's Discovery Projects funding scheme, Project no. DP110102893 (J. V. Ross). We would like to thank Kieran Sharkey for use of pre-publication MATLAB code and two anonymous reviewers for their comments. All authors contributed equally to this manuscript.