Comparison of Mycobacterium Tuberculosis Genomes Reveals Frequent Deletions in a 20 kb Variable Region in Clinical Isolates

The Mycobacterium tuberculosis complex is associated with a remarkably low level of structural gene polymorphism. As part of a search for alternative forms of genetic variation that may act as a source of biological diversity in M. tuberculosis, we have identified a region of the genome that is highly variable amongst a panel of unrelated clinical isolates. Fifteen of 24 isolates examined contained one or more copies of the M. tuberculosis-specific IS6110 insertion element within this 20 kb variable region. In nine of the isolates, including the laboratory-passaged strain H37Rv, genomic deletions were identified, resulting in loss of between two and 13 genes. In each case, deletions were associated with the presence of a copy of the IS6110 element. Absence of flanking tri- or tetra-nucleotide repeats identified homologous recombination between adjacent IS6110 elements as the most likely mechanism of the deletion events. IS6110 insertion into hot-spots within the genome of M. tuberculosis provides a mechanism for generation of genetic diversity involving a high frequency of insertions and deletions.


Introduction
Mycobacterium tuberculosis is a highly successful human pathogen, infecting up to one-third of the global population and causing around two million deaths annually (Dye et al., 1999). Successful pathogenesis is dependent on a combination of attributes, including the ability to resist killing by phagocytes, to establish a prolonged latent infection in healthy individuals and to exploit opportunities for active growth and aerosol transmission. De®nition of the cellular and molecular mechanisms underlying these different stages of infection presents an important challenge in tuberculosis research. It has long been recognized that isolates of M. tuberculosis cultured from individual patients vary in their ability to survive under different conditions in the laboratory and to cause progressive infection in animals (Mitchison et al., 1963;Ordway et al., 1995). Similarly, epidemiological studies demonstrate heterogeneity in transmissibility and virulence amongst different isolates, as judged by rates of skin test conversion and onset of clinical disease amongst exposed contacts (Valway et al., 1998;Rhee et al., 1999). Investigation of genetic variation amongst M. tuberculosis isolates may provide insights into mycobacterial pathogenesis. By matching genetic differences between isolates with particular clinical or epidemiological patterns, it may be possible to identify genes that in¯uence the biology of tuberculosis infection, and to understand the forces of microbial evolution that contribute to the success of this pathogen in a wide range of historical and geographical settings.
Comparative analysis of nucleotide sequences of a panel of structural genes has been used as an approach to investigate the extent of strain diversity amongst members of the M. tuberculosis complex (which includes the predominant veterinary pathogens M. bovis and M. microti as well as the major human pathogens M. tuberculosis and M. africanum) (Sreevatsan et al., 1997). This approach uncovered a striking absence of allelic variation, generating ®ndings consistent with a model in which M. tuberculosis evolved to its present form as a result of an evolutionary`bottleneck' around 15 000±20 000 years ago. In contrast, a more extensive diversity has been revealed by whole genome comparisons within the M. tuberculosis complex (Behr et al., 1999;Gordon et al., 1999). A series of chromosomal deletions were identi®ed, with around 100 of the 4000 genes present in M. tuberculosis being absent from the genome of M. bovis isolates. A further 30 genes were found to have been selectively lost during the years of in vitro culture of M. bovis that led to generation of the current set of attenuated isolates used for preparation of the BCG (bacille Calmette Gue Ârin) vaccine. These genetic differences may underlie variations in host speci®city of the pathogenic isolates, and are most probably responsible for attenuation of the vaccine strains (Behr et al., 1999).
Amongst isolates of M. tuberculosis itself, genome variation is evident from patterns revealed by pulsed-®eld gel electrophoresis (Zhang et al., 1992). Factors that contribute to this genetic heterogeneity are known to include differences in the copy number and location of an M. tuberculosis-speci®c insertion sequence (IS6110), variations amongst a set a polymorphic GC-rich sequences (PGRS) present in genes encoding a family of proteins with unknown function, and short variable sequences interspersed within an apparently noncoding region of the genome characterized by the presence of a series of direct repeat elements (thè DR region') (van Embden et al., 1993;Chaves et al., 1996;Kamerbeek et al., 1997). These markers have been extensively employed for strain-typing ± proving particularly useful in identi®cation of clusters associated with transmission chains (Small et al., 1994) ± but their potential contribution to variation in phenotypic properties of the different isolates has received less attention.
The present study was initiated with the aim of identifying sites of genetic variation amongst M. tuberculosis isolates that may contribute to differences in the biological properties of the organisms. To identify candidate loci, the genome of M.
tuberculosis H37Rv ± a well-characterized isolate for which the complete sequence has been determined (Cole et al., 1998) ± was analysed for evidence of IS6110-mediated gene disruption. There are 16 copies of the IS6110 insertion element in the H37Rv; ®ve of these are inserted within predicted open reading frames (ORFs) (Sampson et al., 1999). One of the IS6110 insertions, encoded by Rv1756c and Rv1757c, is associated with truncation of the N-terminal region of two ORFs arranged in opposite orientations and predicted to encode phospholipase C (plcD) and cutinase enzymes (Rv1755c and Rv1758, respectively). The N-terminal portions that are missing from the two H37Rv ORFs are present in the genome of a second well-characterized clinical isolate, CDC1551 (Valway et al., 1998), together with three additional intervening ORFs which are not found in H37Rv ( Figure 1). The missing genes are also present in the laboratory-attenuated strain H37Ra  and correspond to the RvD2 deletion initially described as being present in M. bovis but absent from H37Rv . In the present study, we have analysed this region in a panel of clinical isolates of M. tuberculosis, and have identi®ed it as an area of extensive genetic diversity resulting from frequent insertion and deletion events.

Sequence comparison
The nucleotide sequence of M. tuberculosis H37Rv from the Sanger Centre (http://www.sanger.ac.uk/ Projects/M_tuberculosis/) was compared to the sequence of cosmid Y28 (Accession No. Z95890) and the incomplete and unannotated nucleotide sequence of M. tuberculosis CDC1551 (The Institute for Genome Research, http://www.tigr.org, October 1998 version) using the BLAST 2.0 programme at the National Center for Biotechnology Information (NCBI, http://www4.ncbi.nlm.nih.gov/BLAST). The Genemark programme (European Bioinformatics Institute, http://www2.ebi.ac.uk/genemark) was then used to predict open reading frames (ORFs) of sequences present in the CDC1551 strain. These ORFs were compared against the EMBL databases, using BLAST to look for similar sequences. Subsequently, partial annotations for ORFs of the CDC1551 strain became available (http://www. tigr.org/tdb/CMR/gmt/htmls/SplashPage.html, July 1999 version). These are pre®xed in the text with MT' followed by the ORF number. In contrast, ORFs referring to the H37Rv strain are pre®xed with`Rv'.

Clinical isolates and DNA extraction
Clinical isolates of M. tuberculosis were taken from cultures derived from clinical specimens isolated in the Mycobacteria Laboratory at St Mary's Hospital, Paddington, London, UK. All of the patients were infected with drug-sensitive strains, and responded to conventional therapy. A culture of the clinical isolate CDC1551 was kindly provided by Dr Jack Crawford at the Centers for Disease Control and Prevention, Atlanta, GA, USA. Cultures were grown in Middlebrook 7H9 medium, supplemented with albumin, dextrose and catalase (Difco). Genomic DNA was extracted from cultures in the exponential phase of growth by standard phenol±chloroform extraction, as described previously (Goyal et al., 1997), and made up in HPLC grade water at a concentration of approximately 50 mg/ml.

Ampli®cation by polymerase chain reaction (PCR)
All reactions were carried out using a 48-well Hybaid Touchdown PCR thermocycler (Hybaid). A standard 25 ml reaction mixture was used containing 25 pmol of each primer in 1 ml, 2.5 ml of a 2 mM deoxyribonucleotide mix, 2.5 ml 10r reaction buffer (Promega), 2 ml 5 mM MgCl 2 and 1.5 units Taq polymerase (Promega). To this was added 3 ml DNA solution and the volume made up to 25 ml with HPLC grade water (Sigma-Aldrich). Thermocycling parameters were as follows: 1 cycle at 94uC for 30 s, held at 85uC while the Taq enzyme was added, followed by 35 cycles of 94uC for 30 s, 30 s at the speci®ed annealing temperature, 72uC for the speci®c extension time required, followed by a ®nal cycle at 72uC for 3 min. For predicted products greater than 2 kb, PCR was performed using the eLONGase enzyme mix (Gibco/BRL). Each 50 ml PCR reaction mixture contained 25 pmol of each primer, 5 ml 2 mM deoxyribonucleotide mix, 5 ml 5r eLONGase Buffer A, 5 ml 5r eLONGase Buffer B (unless otherwise stated) and 2 ml eLONGase enzyme mix. To this was added 3 ml DNA solution and the volume made up to 50 ml with HPLC grade water (Sigma-Aldrich). Thermocycling parameters were as follows: 1 cycle at 94uC for 30 s, held at 85uC while the enzyme mix was added, followed by 35 cycles of 94uC for 30 s, 30 s at the speci®ed annealing temperature, 68uC for the speci®c extension time required, followed by a ®nal cycle at 68uC for 10 min. Details of primer pairs, speci®c annealing temperatures and extension times are listed in Table 1.
PCR products were examined by routine gel electrophoresis performed on 1% (w/v) agarose gels in TAE buffer. Products for sequencing were excised using a sterile scalpel blade and puri®ed using the GENECLEAN II kit (Bio 101), according to the manufacturer's instructions.

Dot-blot hybridization
A 2 ml sample of PCR product ampli®ed using the M1/C3 primer pair was denatured by heating at 94uC for 2 min, snap-cooled on ice, and applied to a Hybond N + membrane (Amersham) and air-dried. The DNA was cross-linked to the membrane using a UV cross-linker (Stratagene). As a probe, 1 mg INS1/INS2 amplicon from IS6110 was labelled for 16 h using the DIG High Prime DNA labelling kit (Boehringer-Mannheim), in accordance with the manufacturer's instructions. The membrane was prehybridized using DIG Easy Hyb (Boehringer-Mannheim) for 1 h at 37uC. The probe was denatured by boiling for 5 min and snap-cooled in ice-water. The probe was then added to the hybridization bottle and incubated overnight at 37uC. Post-hybridization washes and chemiluminescence detection steps were carried out according to the manufacturer's protocol, using the Boehringer-Mannheim Detection kit. Automated DNA sequencing Cycle sequencing of PCR products was performed using a Hybaid Touchdown PCR machine and dichlororhodamine dye terminator ready reaction mixture (PE Biosystems) in accordance with the manufacturer's protocol. Subsequent analysis was performed on an ABI 310 Genetic analyser (PE Biosystems). The primers used for sequencing were: PLC2, 5k CAC TAG CCG AGA CGA TCA AC 3k; and PLC3, 5k CGC CTG GCG CAC CCA CTT AC 3k.

Analysis of plcD-cutinase region
To assess the frequency of diversity in the plcDcutinase region (Figure 1), a panel of 22 unrelated clinical isolates of M. tuberculosis was collected from patients attending the tuberculosis clinic at St. Mary's Hospital. The isolates were selected to represent a range of disease presentations, including involvement of pulmonary and extrapulmonary sites, and smear-positive and smear-negative speci-mens at the time of diagnosis (Table 2). Together with the well-characterized H37Rv and CDC1551 isolates, this panel was screened using a series of PCR assays to characterize the plcD gene, the cutinase gene and the intervening region.

Phospholipase C
A PCR assay was set up using primers P1 and P2 to amplify a 0.6 kb fragment from the plcD gene described in the M. tuberculosis H37Rv genome sequence (Cole et al., 1998). Initial standardization of this assay, using H37Rv cultures from our laboratory stock, generated a larger than expected product of approximately 2 kb. Analysis of this fragment demonstrated the presence of an additional copy of the IS6110 insertion element interrupting the C-terminal portion of the protein. A similar insertion was previously reported during analysis of cosmid clone Y28 at an intermediate stage of the H37Rv sequencing project (Z95890, EMBL release 52, May 1997), but was not found in the BAC library used to complete the de®nitive H37Rv genome sequence (Cole et al., 1998). Analysis of the nucleotide sequence in our local H37Rv isolate identi®ed the point of insertion at position 1 987 086 ± a location identical to that described in the Y28 sequence. This ®nding demonstrates a degree of heterogeneity in the pattern of IS6110 insertions in the plcD-cutinase region amongst H37Rv cultures derived from a single initial source. Application of the PCR assay to the panel of clinical isolates identi®ed one further sample (TH11) with an IS6110 insertion affecting the C-terminal region of phospholipase C. The point of insertion in this case was at a position 371 bp downstream from the H37Rv insertion. Amongst the other isolates, 15 generated the expected 0.6 kb fragment (TH2, 3, 5, 6, 8, 9, 10, 14, 16, 17, 18, 19, 20, 22 and CDC1551), while the remaining seven isolates produced no detectable PCR product (TH1, 4, 7, 12, 13, 15, 21) (Table 3).

Cutinase
A second PCR assay was established using primers C1 and C2 to amplify a 0.4 kb fragment identi®ed in the H37Rv cutinase sequence. In this case, the expected product was obtained from 16 of the isolates. Of the remaining six isolates that did not amplify (TH1, 7, 9, 12, 13, 21), all but one (TH9) had also failed to produce a result with the plcD assay. Three PCR assays were then used to characterize the regions encoding the N-terminal portions of the phospholipase and cutinase enzymes, together with the region linking the two genes. Primers spanning from the plcD to the cutinase genes (NP2/C3) ampli®ed a 1.5 kb fragment from H37Rv (corre-sponding to the IS6110 insertion element recorded in the genome sequence), but failed to generate a product with CDC1551 or with any of the clinical isolates. Products were obtained, however, when the reaction was redesigned to amplify two shorter products, using primers based on the sequence of MT1802, an ORF absent from H37Rv but lying between plcD and the cutinase gene in CDC1551 (Figure 1). The isolates could be divided into three groups on the basis of results with primers M1 and C3, amplifying the region from MT1802 to the cutinase (Figure 2). Four isolates, TH8, 10, 18 and 22, generated a 3 kb product (identical to that predicted from the CDC1551 sequence). A 1.5 kb product was obtained from a further 10 isolates (TH2,3,5,6,11,14,16,17,19,20), while the remaining eight isolates (TH1,4,7,9,12,13,15,21) resembled H37Rv in generating no product. Hybridization experiments identi®ed the presence of the IS6110 insertion element in each of the 3 kb products. The point of insertion was determined by sequencing the¯anking regions at both ends of the IS6110 element. In two of the isolates, TH8 and TH22, the insertion was located within the cutinase gene at a location identical to that observed in the H37Rv and CDC1551 sequences (position 1 989 056), although in TH22 the orientation of IS6110 was opposite to that seen in the characterized strains. In the two other isolates (TH10 and 18), the insertion occurred at a point 22 bp further downstream. Primers NP2 and M2 ± designed to amplify the region from plcD to MT1802 ± Figure 2. Gel electrophoresis of PCR products generated with M1/C3 primers.Three different patterns were found following PCR ampli®cation of a region spanning genes encoding membrane protein MT1801 and cutinase, characterized by the presence of a 3.0 kb fragment, a 1.5 kb fragment, or the absence of any detectable product. Lane 1, negative control; 2, CDC1551; 3, TH6; 4, TH16; 5, TH20; 6, TH14; 7, TH22; 8, TH8; 9, TH10; 10, TH19; 11, TH3; 12, 1 kb ladder generated a 5.7 kb fragment from CDC1551 and from the 14 clinical isolates positive in the plcD PCR assay.
The results of the above analyses (summarized in Table 3) demonstrated that in nine of the isolates (TH2,3,5,6,14,16,17,19,20) the intact plcD and cutinase genes were present together with the three intervening genes in the absence of any IS6110 insertion. In a further four isolates (TH8,10,18,22) and CDC1551, all of the genes were present, but in each case the cutinase gene was interrupted by IS6110 inserted in either orientation at one of two sites. In one isolate (TH11), the genes were intact with the exception of an insertion in plcD. Characterization of the plcD-cutinase region in the remaining eight isolates (TH1,4,7,9,12,13,15,21) was frustrated by the absence of either or both of the plcD and cutinase genes.

Upstream genes
To characterize the remaining eight clinical isolates, a series of PCR primers were designed to test for the presence of genes lying outside the plcDcutinase region (Figure 3). All eight isolates generated positive results using a PCR assay (primers PR1/PR2) designed to amplify a 0.2 kb fragment from the ORF directly upstream of plcD (Rv1754c, encoding a proline-rich protein). In the case of the two isolates which were positive for the cutinase gene (TH4, 15), primers based on cutinase and Rv1754c ampli®ed 1.8 kb fragments, each containing a copy of the IS6110 insertion element. Sequence analysis identi®ed adjacent, but distinct, insertion points in the two isolates (Table 3). In each case the IS6110 element joined Rv1754c directly to the cutinase, and was associated with loss of the intervening genes.

Downstream genes
Positive results were generated from the remaining six cutinase-negative isolates (TH1,7,9,12,13,21) using PCR primers R1 and R2 directed towards a 0.2 kb fragment overlapping downstream ORFs Rv1766 and Rv1767 (Figure 3). Further ampli®cation reactions using R2 in combination with primers from Rv1754c (PR3) and MT1802 (M1) generated products from each of the isolates. Each of the products contained a copy of the IS6110 insertion sequence, and in each case the insertion was associated with loss of intervening genes. In four of the isolates (TH1, 7, 12, 13), IS6110 linked a portion of the plcD gene to an intergenic region annotated as having homology to a possible insertion element (ISB9). In two of these isolates (TH1, 7) the plcD insertion was at the same point as the additional C-terminal insertion identi®ed in cosmid Y28 and in our local H37Rv culture; in the other two isolates (TH12, 13) the insertion point was 369 bp closer to the 5k end of the gene. In a ®fth isolate, TH21, the IS6110 insertion element linked Rv1754c directly to the ISB9 region, with the loss of all of the intervening ORFs, while the ®nal isolate (TH9) had an insertion joining the cutinase gene to MT1812, an ORF present in CDC1551 but absent from H37Rv (Figure 3).

IS6110 and gene deletion
A potential mechanism to account for the presence of IS6110 elements in association with gene deletion has been proposed by Fang et al. (1999). This involves a homologous recombination event between adjacent copies of IS6110, and is accompanied by loss of the characteristic three or four nucleotide direct repeat¯anking IS6110 elements involved in the normal insertion process. To evaluate the possible contribution of this mechanism to the results described above, sequences anking each side of the IS6110 insertions were determined. In each case, when IS6110 was linked to the loss of genes, the¯anking direct repeat was absent, consistent with the recombination-mediated deletion mechanism.
Screening of sequences¯anking IS6110 insertion elements in the H37Rv genome (Cole et al., 1998) identi®es four copies lacking the trinucleotide repeat. The un¯anked copies have each been shown to be associated with deletions from H37Rv when compared to other M. tuberculosis and M. bovis isolates and have been annotated as RvD2 to RvD5, respectively (Brosch et al., 2000). One is involved in the plcD-cutinase deletion (RvD2), and a second is associated with the deletion of MT1812 and MT1813 (RvD3), illustrated in Figure 3. A third un¯anked copy occurs between two PPE genes (RvD4) and the ®nal example, Rv3325-3326, corresponds to the IS1547-associated copy described by Fang et al. (1999) as characteristic of the variable ipl locus (RvD5). A PCR assay (primers MolyF and MolyR) designed to amplify this locus generated a range of products from the different isolates (Figure 4), demonstrating that this region is also variable amongst the panel of isolates included in the present study.

Discussion
Analysis of a 20 kb region of the M. tuberculosis chromosome in a panel of 22 clinical isolates has revealed surprisingly extensive genetic diversity. Insertion and deletion events were identi®ed in more than half of the isolates, with effects ranging from disruption of individual open reading frames to loss of as many as 13 genes. Variability within this region is associated with a high frequency of IS6110 insertions. We have identi®ed a total of 18 discrete insertion sites, clustered predominantly in three subregions comprising plcD and the adjacent Rv1754c, the cutinase gene, and an ISB9-like element. A similar high frequency of IS6110 insertion events within this region was observed in a recent study of South African isolates (Sampson et al., 1999). Where IS6110 is associated with loss of Figure 4. Genetic diversity in the IS1547 region.PCR ampli®cation using primers spanning Rv3324c to Rv3328c revealed extensive polymorphism amongst the panel of M. tuberculosis clinical isolates, with products ranging in size from 3 to 10 kb. Lane 1, 1 kb ladder; 2, negative control; 3, H37Rv; 4, negative control; 5, TH19; 6, TH15; 7, TH12; 8, TH8; 9, TH4; 10, TH2; 11, TH1; 12, 1 kb ladder genes, the absence of¯anking direct repeat elements strongly suggests that homologous recombination between adjacent copies of the insertion element has been the cause of the deletion event. No clinical isolate was found to have a double IS6110 insertion in this region. The presence of two proximally related IS elements may lead to an unstable conformational structure of the genome, thus precipitating homologous recombination. The presence of other IS6110-associated deletion regions in M. tuberculosis H37Rv (RvD3±RvD5) (Brosch et al., 2000) suggests that the 20 kb region is not unique and probably represents an example of a general phenomenon. These ®ndings identify gene deletion as an important source of genetic diversity amongst clinical isolates of M. tuberculosis.
Two important questions arise from this study. Firstly, is the variable island unique, or is it representative of a general mechanism of genetic diversity in M. tuberculosis? Secondly, does the loss of genes ± by insertion or by deletion ± have a signi®cant effect on the biological properties of the different isolates?
Diversity within the 20 kb region would appear to be the result of its status as a`hot-spot' for IS6110 insertion. Other IS6110 hot-spots have been described (Fomukong et al., 1997;Fang et al., 1997;Kurepina et al., 1998) and two mechanisms may account for their occurrence. First, insertion at a particular position may confer some selective advantage which is then preserved during subsequent strain diversi®cation. Alternatively, local structural features may make certain regions of the genome particularly susceptible to insertion events. The frequent ®nding of clustered insertions, as well as insertions in similar locations but with different orientation, is indicative of the existence of regions prone to multiple insertion events as envisaged in the second mechanism. Similarly, variation in insertion patterns amongst contemporary H37Rv isolates, as re¯ected in the present study and in the comparison with H37Ra , is consistent with a high frequency of IS6110 activity associated with the plcD gene, rather than immortalization of a single rare event. Interestingly, the 20 kb region overlaps with the RD14 BCG Pasteur deletion described by Behr et al. (1999). The absence of IS6110 in this case suggests that this region may also be susceptible to alternative mechanisms of deletion.
Turning to the biological consequences of geno-typic variation in the plcD-cutinase region; recovery of the deletion strains from patients with active tuberculosis demonstrates that they retain the ability to cause disease. An in¯uence of the deletion genes on the course of infection is not excluded, however. Phospholipase C has been identi®ed as a virulence factor for several bacterial pathogens (Titball, 1993) whilst cutinases, although principally associated with the ability of fungi to penetrate the cutin layers of plants (Schafer, 1993), may have been adapted for hydrolysis of some related polymer during mammalian infection. Both genes are found as part of multicopy families. Interestingly, three of the phospholipase C genes are deleted from M. bovis, leaving plcD as the sole source of enzyme activity in the case of bovine infection (Behr et al., 1999;Gordon et al., 1999) and some clinical isolates of M. tuberculosis demonstrate evidence of polymorphism in this region (Vera-Cabrera et al., 1997). The different homologues could provide some level of diversity in terms of precise substrate speci®city, or may simply contribute to an increase in the overall amount of enzyme that can be produced. The loss of one or more copies could therefore affect the balance of the host±pathogen interaction. Other ORFs identi®ed in the variable region encode protein products sharing homology with glycosyl transferases (MT1800), oxidoreductases (MT1801) and enzymes involved in the synthesis of antibiotics (Rv1760). The high prevalence of deletion events suggest that they may be subject to positive selection, either by some speci®c in¯uence on the process of infection, as discussed above, or simply as a result of removal of genes no longer required by the bacteria. If this is the case, it would be anticipated that the deletion strains represent a later stage of M. tuberculosis evolution, as compared to those having an intact complement of genes within the variable region. Sreevatsan et al. (1997) have proposed an evolutionary lineage for M. tuberculosis isolates based on speci®c nucleotide substitutions in codons of the katG and gyrA genes. Examination of these sequences within the present panel failed to demonstrate any correlation. Deletion and non-deletion isolates were distributed amongst both early and late Sreevatsan groups (Sales MPU, Ho TBL, Taylor GM, unpublished observations). Thus, if deletion events do represent an evolutionary progression, it is not one that is coordinated with the sequence of katG/gyrA substitutions.
The very limited panel employed in the present study did not reveal any obvious association between genotype and clinical presentation or geographical origin. However, a more extensive analysis ± such as that described recently by Rhee et al. (1999) ± will be required in order to assess the possible contribution of the deletion genes to the overall process of human infection.