Network Models: An Underutilized Tool in Wildlife Epidemiology?

Although the approach of contact network epidemiology has been increasing in popularity for studying transmission of infectious diseases in human populations, it has generally been an underutilized approach for investigating disease outbreaks in wildlife populations. In this paper we explore the differences between the type of data that can be collected on human and wildlife populations, provide an update on recent advances that have been made in wildlife epidemiology by using a network approach, and discuss why networks might have been underutilized and why networks could and should be used more in the future. We conclude with ideas for future directions and a call for field biologists and network modelers to engage in more cross-disciplinary collaboration.


Introduction
Conventional methods for studying infectious disease dynamics include a repertoire of modeling techniques: traditional mean field (or Susceptible-Infected-Recovered (SIR) compartmental), metapopulation, lattice-based, reactiondiffusion, and network models [1]. Many of these modeling approaches have been around for decades. The contact network approach, originally developed for applications in the field of statistical physics, has only recently gained in popularity. In network terminology, individuals, or groups of individuals, are defined as nodes, connections between those nodes are edges, and the number of edges from one node to another is the degree (Figure 1). In network epidemiology, diseases spread from node to node following the edges. If the transmission probability along edges is high enough, an epidemic can occur. A very appealing property of networks is their ability to easily depict the complexity of the real world. In particular, the degree distribution captures heterogeneity in transmission among hosts, allowing the disproportionate role of highly connected individuals-superspreaders-to be easily investigated [2,3]. Networks also often include lists of attributes to nodes or edges that describe between-edge variation in disease transmission or between-host variation in infectiveness or pathogen excretion patterns.
The network approach is not inherently different from the other modeling tools. It is simply a more general way of representing epidemiological systems. In fact, most alternative models can be considered as particular cases of network model. For example, modeling an epidemic using an SIR compartmental model is equivalent to using a complete network model in which all the nodes are connected to each other (Figure 2(a)). A lattice-based model can also be replaced by a network model in which nodes that are neighbors on the lattice are connected to each other and all nodes have same degree.
Because it offers more flexibility, the network approach can be used to answer new questions and to improve disease control. Since all individuals and all potential transmission paths are represented in the network, it becomes possible to identify individuals or edges that play a key role for disease transmission. Epidemiologists can then propose measures to alter the network in order to prevent or stop disease percolation. For example, vaccinating super-spreading individuals changes the degree distribution, which may be a more efficient way to achieve herd immunity than   [4]. In the within-pride network, the nodes (circles) are individuals and edges (lines between circles) are contacts observed on a short time scale (this is a cartoon, not based on data). The between-pride network is derived from behavioral observations of individually known lions as in [4] where nodes represent prides, and edges represent contacts between prides. The histogram represents the degree distribution of the between-pride network.
random vaccination. For these reasons, network techniques have been increasingly used for the study of human diseases [2,7,8]. To obtain parameters for these contact network models, populations of humans can rather easily self-report contact data quickly and efficiently through contact tracing or contact diaries [9]; there are also clever ways of using proxies for human disease incidence, such as mining the number of flu-related internet queries [10] or using mobile phone locations as a proxy for human movement or contacttracing studies [11]. Despite the advantages over traditional disease models, networks are still underused for the study of diseases in wild animal populations. In this paper we describe the state of the art for wildlife network modeling, discuss reasons why network models are an underused tool in wildlife epidemiology, and suggest how contact network epidemiology could become more widespread for biologists. For clarity, we restrict our discussion to microparasites with simple life cycles and focus on between-host disease dynamics.

Current Use of Network Models in Wildlife Epidemiology
This section provides an overview of the current state of wildlife network epidemiology. We first present the main reasons why network models are particularly suited to wildlife epidemiology. We then expose a critical particularity of wildlife epidemiology: the type of data that are collected. Finally, we present recent advances made in the field.

Why Might Network Models Be Preferred to Traditional Modeling Techniques?
Traditional compartmental or metapopulation epidemiological models assume that individuals constituting an epidemiological system can be pooled in a small number of functional groups within which the disease incidence rate is simply proportional to the number of susceptible and infectious individuals. These models are often qualified as "mass-action models". Within these functional groups, all the individuals are therefore assumed to be epidemiologically identical. The originality of network models resides in their ability to take into account interindividual or intergroup (i.e., internode) variations in epidemiological properties (e.g., degree, infectiveness, recovery rate). With this high resolution, the role played by each individual in the network can be assessed. Since network models capture more heterogeneity among nodes than traditional models, fitted network models can be used to predict the impact of interventions targeting individuals that are critical for disease percolation. Network models therefore constitute powerful tools to analyze highly heterogeneous epidemiological systems. For Random network with Poisson degree distribution and mean degree = 4, generated following the Erdos and Renyi model [5]. (c) Scale-free network generated using Barbasi-Albert's preferential attachment algorithm [6], with mean degree = 4 and a power law degree distribution. The network is created by starting with one node and no edges. At each time step, a node is added and connected to two other vertices chosen in proportion to their current degree. This network is characterized by a few highly connected nodes, which may act as superspreaders during epidemics. (d) Stochastic SIR simulations of disease dynamics through the three networks (120 runs per network type). Squares, circles and triangles correspond to networks (a), (b), and (c), respectively. The final epidemic size (attack rate) is represented in relation to the intergroup transmission β. The recovery rate is fixed at 0.1. Note that even when the mean connectivity is kept constant, disease impacts vary with network structure.
example, when the degree distribution is strongly rightskewed, the small number of individuals with the highest degree values tend to be infected very early during the epidemic, and subsequently redistribute the disease to a large number of individuals. These "hub" individuals are then responsible for very high incidence rates at the beginning of the epidemic, which traditional models are unable to predict (see Figure 2). If a mass-action model is used to fit prevalence data collected during this epidemic, the associated goodness-of-fit will be poor, and the estimates of epidemiological parameters will be biased. As explained below, wild animal populations are typical examples of heterogeneous systems and therefore greatly benefit from the network approach.

How Is Wildlife Data Different from Human Data?
From the epidemiological point of view, wildlife systems differ in four important respects from human systems: (i) the underlying structure of the population, (ii) the tools available to collect data on the network structure, (iii) available epidemiological data, and (iv) potential control options. First, wild animal populations are often highly structured. Numerous species live in groups, which generally interact nonrandomly. And within a given area, several species susceptible to the same disease can also interact. In such a complex system, the global contact network is modular. It can be decomposed into elements corresponding to the different observation scales: within-group networks (level n = 1), between-group networks (n = 2), and sometimes higher order level networks like between-species network (n = 3). Level-n networks (with n ≥ 2) can then be considered as metanetworks, that is, networks of networks, with the networks of the level n-1 constituting the nodes of the level n ( Figure 1). Wildlife epidemiologists need to estimate basic structural parameters of their study population in order to know how these different networks are combined together. Basically they need to answer the following questions: do the animals live in groups? If so, what is the group size distribution? How do individuals interact within a group? How do groups interact? Does the disease of interest likely involve several species in the study area? Are there other potentially relevant hierarchical levels, such as subgroups (groups inside groups) or subpopulations (groups of groups)?
Second, wildlife biologists face multiple challenges when collecting contact data [12][13][14]. Behavioral observations of animals rarely allow inferring exact, full contact networks, as it is basically impossible to watch all individuals of a population at the same time. The use of indirect measures (through technology) can help this problem, although the number of individuals that can be simultaneously monitored is often limited due to logistical difficulties or the high costs of technology (but see [15] which might have recorded a full network of a study lizard population). More commonly, a representative subset of the study population is generally chosen and then either directly observed using standard behavioral sampling methods or indirectly monitored using biologgers, radio telemetry, mark-recapture, or other methods (for a discussion of methods see Table 1). When choosing a technique to inform a contact model, it is important to take into account whether the species is habituated or not, captive or wild, the local environment of the population (e.g., heavily forested or underwater), the size of the animal, the resolution of the data needed to create a contact network specific to that animal's behavior, the budget, and the sample size needed. It is important not to change the animal's behavior using these techniques, for example through observer presence of timid animals, or heavy tags limiting movement.
There are a few specific challenges in constructing contact networks from empirical data. (i) Contact networks are normally derived from healthy individuals, and an animal's behavior, and hence the topography of the contact network, might change upon infection. Often it is unknown whether infection would alter the network structure by causing more contacts (e.g., "furious" rabies) or fewer contacts ("dumb" rabies). In this case, a sensitivity analysis could be used to hedge against any changes in contact rates due to infection. (ii) It is quite difficult to define a "contact"; clearly the definition of a contact will depend on the transmission of the pathogen of interest. Is the pathogen sexually transmitted? Aerosol borne? Does it persist in the environment? What is an effective contact? (of course, a contact does not necessarily mean a transmission event.) The best way to get answers to these questions is to do controlled transmission experiments, but this can be ethically challenging, especially for wild animals of conservation concern. (iii) Once a definition of "contact" is created, and a technique chosen to capture these contacts, it is then difficult to measure other types of social interactions for which you are not monitoring. (iv) Despite the recent technological advances allowing the collection of biologically relevant contact data for the majority of a population, how to sample a network is still a problem. For example, technological failures can lead to incomplete networks even if the whole population was successfully tagged, and there are often edge effects with other populations [24]. (v) The type of method used to collect the contact data can influence the properties of the network, hence the infectious period of the disease must be taken into account when choosing a method [27]. Because another behavioral variable is normally being used as a proxy for contact (i.e., proximity data), the raw data collected from these indirect measures does not immediately yield a contact network. But, after adequate processing, it becomes possible to reconstruct contact networks that will not exactly match the actual full network, but will rather have the same statistical, and hence epidemiological, properties.
Wild animal contact networks also often, if not always, exhibit temporal variation, creating a dynamic network. For example, individuals or groups can migrate to a different area (e.g., reindeer [40], wildebeest [41], birds [42], monarch butterflies [43]), individuals can transfer to a different group (e.g., [44]), and animal societies can fission-fusion (e.g., hyenas [45], chimpanzees [46], bottlenose dolphins [47], elephants [48], lions [49], and guppies [50]). In addition, contact networks can change over long time scales due to demographic processes such as births and deaths. Theoretical studies have shown that the spread of infectious diseases in dynamic networks differs from static networks [51]. Significant changes in contact patterns during the course of an epidemic need to be accounted for, and this data describing contact network dynamics can be obtained using direct observation or technology as listed above and in Table 1.
Third, epidemiological parameters can also be challenging to collect [52]. Incidence can be recorded through passive surveillance operations or direct observation for only the few diseases where wild animals exhibit overt clinical signs (e.g., rabies [53]). However, the majority of wild animals do not show visual signs of disease and most wild animals simply disappear when they die. In the field it is often  Good indicator of (i) scale of interaction but gives coarse resolution of a "contact", (ii) mixing between groups of animals, but not within groups and (iii) den-sharing contacts. Presence of fieldworkers may alter behavior. [27,[34][35][36] Trapping and bait marking Easily trapped and handled individuals who use latrines to mark territories Good data on home range overlap and intergroup movement rates [37] Video tracking from animal's perspective Animal must be able to be caught and wear something like a video backpack Great contact data from individual perspective [38] Video trapping from fixed perspective (automated)

Social insects that can be individually tagged and the group monitored
Great resolution of contact data; software records duration and frequency of contacts [39] difficult to detect carcasses, and more worrying, even detect any sort of die-off (e.g., [54]). As another example, out of over 1000 lions suspected to die in a fatal canine distemper virus outbreak in the Serengeti in 1994, only 11 carcasses were recovered from a highly-monitored population [55]. Prevalence data can be collected through active surveillance methods such as serological surveys. Blood can be screened directly for pathogens or indirectly for antibodies to pathogens to provide insight on disease dynamics [56]. Longitudinal surveys are generally the preferred type of serological survey; cross-sectional serological surveys can be misleading because antibodies persist long after the end of the infection [57]. Collecting blood samples is only possible if the study animals can be trapped or darted. It is generally expensive, time-consuming, and potentially risky to the animal. However, in recent years, noninvasive disease screening methods have been developed, such as immunoglobulin dosage in urine and feces (e.g., SIV in chimps [58]) or parasite genotyping in feces (e.g., malaria in great apes [59]). In contrast to human diseases, multiple hosts are often involved in wildlife diseases. Human outbreaks often involve animals, but generally only at the very beginning; whereas in wildlife, multiple species are often involved during the entire course of the outbreak. This increases the complexity of building a multihost network, and often it is challenging to have accurate assessments of contact networks from multiple hosts, forcing a fall-back strategy on mean field models [60].
Finally, despite constraints to inferring the structure of the contact network and collecting disease data, network models allow us to easily evaluate a wide range of disease control interventions in wildlife populations. In humans, because there are numerous ways to modify human networks, such as school closure, travel warnings, and airport closure in certain cases, public health actions often focus on improving epidemiological surveillance and implementing subsequent vaccination campaigns. In wildlife epidemiology, altering networks is also possible, but in very different ways [61]. For example, oral vaccination baits can occasionally be used with success [62]. Parenteral vaccination can be used for small wild animal populations [63], but is logistically challenging and sometimes considered too invasive. Wildlife contact networks can also be modified by reducing contacts between domestic animals, humans, and wildlife to avoid the spillover to wildlife in the first place-this is a type of quarantine [64][65][66]. Population density can also be reduced through culling or decreasing birth rates [34,67]. An important benefit of the network approach is the ability to identify central individuals likely responsible for most transmission events. When those individuals are targeted for intervention purposes, this reduces the number of wildlife needing to be culled or vaccinated, for example.

Recent Progress in Wildlife Network Epidemiology.
In the past 5-10 years, wildlife biologists have made solid progress in characterizing contact networks in wildlife populations. Through the use of these contact networks we have been able to address novel questions relating to wildlife and their diseases. For example, superspreading animals have been found in some populations (e.g., deer mice and possums [23,28]), while they are not obvious in other systems (e.g., Tasmanian devil and African lion populations [4,24]). The Tasmanian devil network was found to be one giant component, meaning that the whole endangered population is threatened by a novel infectious cancer [24]. Well-connected individuals were more likely to be infected in some wildlife populations (brushtail possums, sleepy lizards, skinks, and bumble-bees [15,29,32,39]) but not in others (meerkats [16]). In a study of possums, density was found to be uncorrelated with contact rates [25]. In contrast, abundance thresholds above which disease can spread (percolation thresholds) have been identified in gerbils with plague [68] and in multihost plague systems of mice and prairie dogs [69]. With networks, researchers were able to distinguish spatial patterns of disease spillover from epidemic waves [17]. Temporal changes in contact patterns were also identified as critical for the spread of respiratory diseases in wild chimpanzees [70]. Issues of different spatial scales have been tackled with networks, specifically the relative importance of local versus long range transmission events in driving disease spread [68]. Finally, multilevel network models have been developed and successfully applied to tuberculosis transmission between badgers groups and cows [26]. For a more extensive list of insights we refer to Table 1 of a recent review [71].

Despite These Advances, Why Are Network Models Underused in Wildlife Epidemiology?
The network approach might need a public relations campaign in the literature. Networks have been used in other fields like statistical physics for decades, yet have only in the past 10 years really taken off with the human epidemiology literature, and are now at the cutting edge of wildlife epidemiology. Contact networks models are likely not wellknown in the wildlife community. The number of studies using this approach is still relatively limited, but of those studies combining networks and wildlife, they often get published in high profile journals-potentially indicating that wildlife network epidemiology is still in its infancy. Second, the network literature, especially in the physics literature, is quite hard to grasp and at first sight, may seem complicated for the field biologist. The analytical treatment of network epidemiological models is however only slightly more difficult than solving systems of differential equations of mass action models. A few articles are notable for presenting the mathematics of network models in an accessible way for biologists [72][73][74]. However, finding analytical solutions is not always necessary. Agent-based stochastic simulations can be good alternatives and are relatively easy to implement. They are particularly interesting to model complex systems that have a lot of parameters and hence cannot be described by closed form equations. Unfortunately there is limited network software with easy-to-use graphical user interfaces implementing these methods, and often people program their own network simulation models (Table 2). Coding using basic programming languages (e.g., R, Python or NetLogo) is often a disparate skill set from a field biologist who successfully collects contact and epidemiological data on wild populations.
Third, networks are indeed data intensive, and wildlife systems are unfortunately often data limited. Individuallevel data can be expensive and time-consuming to collect [71]. For example, in constructing a contact network for a population of Serengeti lions, only 36 pride-to-pride contacts were observed per 1294 hours of daylight observation over a 3 year time period [4]. In addition, wild animals cannot be continuously observed, and dealing with gaps and missing data is often challenging (see Section 4).
Finally, contact networks are inferred from contact data collected for a specific species, for a specific ecosystem, and for a specific period of time. Therefore it can be difficult to generalize epidemiological results obtained with a network model to other circumstances (e.g., [4]).

What Can Be Done to Dispel
Doubts about Networks?
Network approaches need to become better known and more accessible. Wildlife epidemiologists should be encouraged to promote their network approach at meetings and in journals that have not normally embraced a network approach.
More training sessions such as SISMID (University of Washington) or INSNA's workshops at Sunbelt would be useful. A formal comparison between network and meanfield models would also help spread the word. Currently, there are few papers comparing the performance of meanfield and network models. Although scientists might have tried multiple modeling approaches during the course of the study, normally only one approach is published. Mass action models normally work "well enough," but we are unaware of any formal quantitative comparison of the pros and cons of using mean-field versus network models for a range of empirical and theoretical systems. It is likely that network models are only going to get more complex: they will include more parameters and variables. "Complexity" is an intrinsic, objective property of a model. It is not necessarily synonymous with "complicated"-a subjective judgment of the difficulty of the modeling task. For example, it is important to note that stochastic agent-based modeling handles very complex models but is generally not complicated. In the last few years, biologists have increasingly used these types of models to investigate respective effects of different variables on biological phenomenon. Several software and user-friendly computer languages are particularly suited to develop network epidemiological agentbased models ( Table 2). It has even become possible to fit such models to field data. The recent developments in approximate Bayesian computation (ABC), a set of methods initially developed by population geneticists, greatly facilitate agent-based model fitting [84][85][86]. We would like to attract the attention of epidemiologist to these methods, which we believe will be used extensively in the future.
Technology is helping to bridge the gap between dataintensive network models, and the challenges inherent in collecting contact data. There has been a burst of new technology such as satellite GPS radio telemetry, proximity data loggers, camera and video traps, tracking, proximity data radio collars, powder marking, PIT tags, and antennae, and capture-mark-recapture (Table 1). These methods almost always collect data at discrete time intervals. Ignoring the gaps inherent to these datasets can lead to biased estimates of contact network or epidemiological parameters. For example, if an animal is observed susceptible at time t and reobserved infected 10 days later, should we assume that it became infected on the first day, on the last day, or maybe after five days? The answer is that none of these assumptions is necessary. At least three statistical methods can be used to 8 Interdisciplinary Perspectives on Infectious Diseases deal with this uncertainty. First, survival analyses, that were initially developed to estimate survival rates using date-oflast-observation data can also be applied, for example, to estimate rates of seroconversion. One simply has to assume that seroconversion is equivalent to death [87]. Second, multistate capture-mark-recapture (CMR) models have proven very useful to estimate animal migration rates, survival rates, and rates of change of individual state (for example, states S, I or R). Although user-friendly software exists to fit CMR models (Table 2), these models could be more broadly used in wildlife epidemiology, both to estimate network parameters and epidemiological parameters. Third, agent-based models, coupled with ABC fitting procedure, can easily circumvent the problem of missing data [84][85][86]. Our purpose here is to attract the attention of epidemiologists to these methods rather than to describe them in detail, so we encourage interested readers to consult the references cited. An exciting and useful push for future directions would be to develop theoretical advances for network models that allow us to develop "universal principles". As stated above, current network models are generally inferred from contact data. But one generally does not know what rules govern the establishment of contact patterns. Understanding these rules, in particular how ecological variables such as food resource distribution, distribution of conspecifics, and climate influence contact patterns, would allow identifying universal principles governing networks' structure. It would then become possible to extrapolate these mechanistic models to other populations, areas, or time periods.
Network epidemiological modeling is by essence interdisciplinary. This is even more pertinent to wildlife network epidemiology, because new fields such as behavioral ecology, capture-mark-recapture, and advanced statistics are combined. Collaborative work is an efficient way to do network modeling. Field biologists know their system, and know how to collect data, while theoreticians can work on the hardcore modeling aspects. We would like to promote better collaboration between modelers and field biologists. Modelers may need to be seen as more approachable by field biologists. Importantly, we believe collaborations should take place at all stages of epidemiological studies, from the design of the data collection protocol to the end of the modeling stage.

Conclusion
Network models have promising applications in the field of wildlife epidemiology. Although using this approach requires some substantial training, the learning curve is not as steep as it seems, and several software and interpreted computer languages have been developed that will make this step easier. In any case, we strongly believe that the benefits far outweigh these costs. The number of applications of network models in wildlife epidemiology is already broad, and will keep increasing. New application domains beg to be explored. For example, network models are well-suited to combine network and genetic data, potentially for viral diseases such as feline immunodeficiency virus and simian immunodeficiency virus. Contact network epidemiology using directed networks (where there is stronger transmis-sion in one direction) has been applied to animals using the same resting spots for indirectly transmitted pathogens [15,29], and could be expanded to fresh water organisms because river networks are easy to map, have a good spatial component, and pathogens might travel downstream.
We feel that developing collaboration between field biologists and network modelers will be a key factor bringing advances to wildlife epidemiology. We need to become more multidisciplinary and cross disciplinary [14,88].