Whole-Genome Sequencing of Mexican Strains of Anaplasma marginale: An Approach to the Causal Agent of Bovine Anaplasmosis.

Anaplasma marginale is the main etiologic agent of bovine anaplasmosis, and it is extensively distributed worldwide. We have previously reported the first genome sequence of a Mexican strain of A. marginale (Mex-01-001-01). In this work, we report the genomic analysis of one strain from Hidalgo (MEX-14-010-01), one from Morelos (MEX-17-017-01), and two strains from Veracruz (MEX-30-184-02 and MEX-30-193-01). We found that the genome average size is 1.16-1.17 Mbp with a GC content close to 49.80%. The genomic comparison reveals that most of the A. marginale genomes are highly conserved and the phylogeny showed that Mexican strains cluster with Brazilian strains. The genomic information contained in the four draft genomes of A. marginale from Mexico will contribute to understanding the molecular landscape of this pathogen.


Introduction
Bovine anaplasmosis is an infectious, tick-borne disease caused mainly by Anaplasma marginale; typical signs include anemia, fever, abortion, weight loss, decreased milk production, jaundice, and potentially death. Although a sick bovine may recover when antibiotics are administered, it usually remains as a carrier for life, being a risk of infection for susceptible cattle. Anaplasma marginale is an obligate intracellular Gram-negative bacterium with a genetic composition that is highly diverse among geographical isolates [1]. Currently, there are no fully effective vaccines against bovine anaplasmosis; therefore, the economic losses due to the disease are present. Whole-genome sequencing (WGS) is an applicable tool for many pathogenic bacterial studies since 1995, when the first bacterial genomes were determined [2,3]. Vaccine formulation became a hard task for pathogens as diverse as Anaplasma marginale, and almost all efforts have been directed toward Outer Membrane Proteins (Omp), Type IV Secretion System (T4SS), and Major Surface Proteins (Msp) [4][5][6][7][8]. Up to date, there are several genomes reported from A. marginale, but only one is from a Mexican strain [9]. New data could be useful for focusing in alternative antigens that induce specific and protective responses against bovine anaplasmosis. In this work, we present draft genomes from four Anaplasma marginale Mexican strains. In addition, a first approach for comparative analyses between them and Brazilian, Australian, and North American strains is shown. In order to advance in the identification of potential vaccine molecules, pathogenicity, transmission and infection mechanisms, and genetic diversity of Anaplasma marginale, further analyses are necessary.
2.2. Genome Sequencing, Assembly, and Annotation. We used 200 μl of bovine blood for each isolate to extract genomic DNA using the UltraClean DNA BloodSpin kit (Mo Bio Laboratories). The library preparation was performed by the University of Arizona Genetics Core, using a DNA TruSeq library construction kit (Illumina). Two micrograms of genomic DNA for each isolate was sequenced with MiSeq platform (Illumina). The NextSeq instrument from Illumina uses sequencing-by-synthesis (SBS) chemistry. The Illumina adapter sequences were removed from paired-end reads using ILLUMINACLIP trimming step of the Trimmomatic (version 0.36) program with default settings [10]. Lowquality bases were removed using the dynamictrim algorithm of SolexaQA++ (version 3.1.7.1) suite [11] with a Phred quality score Q < 13. The resulting paired-end reads were de novo assembled using the SPAdes (version 3.11.1) program [12] with the following options: (i) only runs assembly module (-only-assembler), (ii) reduce number of mismatches (-careful), and (iii) k-mer lengths between 21 and 127. Based on the G+C content of each contig assembled using a Python script (https://github.com/FernandoMtzMx/GC_content_ MultiFasta) (A. marginale genomes reported in databases have a G+C content between 46 and 52%), contigs of four Mexican strains were differentiated from contigs that belong to other organisms (i.e., bovine genomes). Also, we aligned the sequences of each contig assembled with the nucleotide collection (nr/nt) database and Anaplasma marginale as the organism name using BLASTN suite [13]. Contigs with an alignment coverage higher than 50% and an identity higher than 70% belong to A. marginale genomes were considered "reasonably good" alignments [14]. The features of four draft genomes were evaluated using the QUAST (version 4.6.2) program [15].
The draft genomes of four Mexican strains were annotated automatically using the RAST (version 2.0) server [16], and the 16S rRNA gene sequences were obtained using the RNAmmer (version 1.2) server [17].

Genomic
Comparison. The Blast Ring Image Generator (BRIG) (v0.95) program [18] was used to determine the genome comparison between the Mexican A. marginale strains and six strains from Australia, Brazil, and the United States. The circular comparative genomic map was constructed by BRIG using the GenBank files (gbk format) with standard default parameters and NCBI local blast-2.9.0+ suite. (Tlapacoyan, Veracruz), and MEX-30-193-01 (Veracruz, Veracruz). The gene sequence datasets of Mexican strains were compared to 13 downloaded gene sequence datasets of A. marginale, A. centrale, A. ovis, A. phagocytophilum, and Ehrlichia canis and E. ruminantium (as outgroup), which were obtained from the GenBank database (https://www.ncbi.nlm .nih.gov/) using the nucleotide BLAST suite (https://blast .ncbi.nlm.nih.gov/Blast.cgi). Multiple alignments between all gene sequence datasets were made using the MUSCLE (v3.8.31) program [19]. Alignment sequences per genome were concatenated using a Python script. The jModelTest (v2.1.10) program [20] was used to select the best model of nucleotide substitution using the Akaike information criterion (AIC). Phylogenetic tree was inferred based on a maximum likelihood method using the PhyML (v3.1) program [21] with 1000 bootstrap replicates. The phylogenetic tree was visualized and edited using the FigTree (v1.4.3) program (http://tree.bio .ed.ac.uk/software/figtree/).    Table 2, the information derived from the SEED subsystem of the RAST server for each strain is shown.

Genomic Comparison and Phylogeny.
We compared the four Mexican draft genomes of A. marginale with Brazilian, Australian, and North American strains. In Figure 1, the comparative genomics is shown. Although most of the genomes are highly conserved, the Dawn and Gypsy Plains strains showed some differences from the Mexican, North American, and Brazilian strains. We randomly selected fourteen ORFs found in the genomic annotation predicted as membrane proteins, and then, we located them in the genomes; as observed, most of these proteins are conserved in all genomes (Figure 2).

Genome Synteny
Analysis. The genome synteny of 11 A. marginale genomes of Australian, Brazilian, Mexican, and North American strains shows that the first 100,000 bases have a rearrangement of several small fragments (Figure 3). In addition, the genome synteny of A. marginale shows that the Australian, Brazilian, and North American strains have a highly conserved genome structure, while the genomes of Mexican strains show some rearrangement and inversion of genomic segments (Figure 3). In general, the structure
The genomic analysis reveals that their size (ranging from 1,167,111 bp to 1,176,681 bp) and a GC content (about 49.79%) are very similar to other A. marginale strains reported in GenBank such as the reference genome of the St. Maries strain, with a genome size of 1,197,690 bp and a GC content of 49.80%.
The number of Genes and CDS is very similar in the four strains. In fact, in the genome annotation, using the different subsystem classification of RAST server, we identified genes related to cell wall and capsule, virulence, disease and defense, membrane transport, and protein and DNA metabolism, among others. In the virulence, disease, and defense categories, we found genes associated with the cobalt-zinccadmium resistance, fluoroquinolone resistance, cooper homeostasis, and beta lactamase. Also, we identified genes of Mycobacterium virulence operon involved in protein synthesis (SSU and LSU ribosomal proteins) and Mycobacterium virulence operon involved in DNA transcription. Mycobacterium operon is present in several species, including Mycobacterium tuberculosis, Streptococcus pneumoniae, Bartonella bovis, and Streptococcus suis, among other animal and plant pathogens [23][24][25].
In the stress response category, we found genes associated with oxidative stress, cold shock, heat shock, periplasmic stress response, and detoxification. For most of the obligate intracellular bacteria, the presence of peptidoglycan is not necessarily needed to maintain the integrity of the bacterial cell. In A. marginale, there are no reports of the analysis or    International Journal of Genomics isolation of its peptidoglycan [26]; however, we identified genes associated with the cell wall and capsule, specifically with the peptidoglycan biosynthesis. An interesting feature of A. marginale genomes is the role of the genes that we found in nitrogen metabolism. In alphaproteobacteria, the role of nitrogen metabolism may be essential for full virulence [27]. The phylogeny analysis indicates that Mexican strains are more related to Brazilian strains than to North American ones. The genomic comparison of the strains reveals the high percent of identity between A. marginale genomes as observed in the genome synteny analysis, where most of the strains are highly conserved in its structure and the Mexican strains have some rearrangements and inversions in certain genomic sequences.
The report of four draft genomes of A. marginale found in Mexico represents a first approach to unveil information that could help to develop new strategies for the design of

Conclusions
We present here, the genomic report and analyses of four Mexican strains of A. marginale, the causal agent of bovine anaplasmosis. So far, only one genome of a Mexican strain has been reported; with this contribution, we compare our results with information of strains from the USA, Brazil, and Australia and provide more information of this pathogen.