Virulence Factors and Antimicrobial Resistance of Uropathogenic Escherichia coli EQ101 UPEC Isolated from UTI Patient in Quetta, Balochistan, Pakistan

Infectious diseases have been tremendously increasing as the organisms of even normal flora become opportunistic and cause an infection, and Escherichia coli (E. coli EQ101) is one of them. Urinary tract infections are caused by various microorganisms, but Escherichia coli is the primary cause of almost 70%–90% of all UTIs. It has multiple strains, possessing diverse virulence factors, contributing to its pathogenicity. Furthermore, these virulent strains also can cause overlapping pathogenesis by sharing resistance and virulence factors among each other. The current study is aimed at analyzing the genetic variants associated with multi-drug-resistant (MDR) E. coli using the whole genome sequencing platform. The study includes 100 uropathogenic Escherichia coli (UPEC) microorganisms obtained from urine samples out of which 44% were multi-drug-resistant (MDR) E. coli. Bacteria have been isolated and antimicrobial susceptibility test (AST) was determined by disk diffusion method on the Mueller-Hinton agar plate as recommended by the Clinical and Laboratory Standards Institute (CLSI) 2020, and one isolate has been selected which shows resistance to most of the antibiotics, and that isolate has been analyzed by whole genome sequencing (WGS), accompanied by data and phylogenetic analysis, respectively. Organisms were showing resistance against ampicillin (10 μg), cefixime (5 μg), ceftriaxone (30 μg), nalidixic acid (30 μg), ciprofloxacin (5 μg), and ofloxacin (5 μg) on antimicrobial susceptibility test. WGS were done on selected isolate which identified 25 virulence genes (air, astA, chuA, fyuA, gad, hra, iha, irp2, iss, iucC, iutA, kpsE, kpsMII_K1, lpfA, mchF, ompT, papA_F43, sat, senB, sitA, terC, traT, usp, vat, and yfcV) and seven housekeeping genes (adk, fumC, gyrB, icd, mdh, purA, and recA). Among resistance genes, seven genes (TolC, emrR, evgA, qacEdelta1, H-NS, cpxA, and mdtM) were identified to be involved in antibiotic efflux, three AMR genes (aadA5, mphA, and CTX-M-15) were involved in antibiotic inactivation, and two genes (sul1 and dfrA14) were found to be involved in antibiotic drug replacement. Our data identified antibiotic resistance and virulence genes of the isolate. We suggest further research work to establish region-based resistance profile in comparison with the global resistance pattern.


Introduction
The World Health Organization (WHO) has issued a report highlighting common healthcare-associated bacterial organisms showing increased rates of antimicrobial resistance globally [1].It was revealed in a study that the mortality rates can reach to around ten million per year by 2050 in the absence of any action taken against antimicrobial resistance [2].
Therefore, among the various resistance-causing organisms, E. coli has gained great attention globally after being a serious pathogen and with its diverse virulence capabilities [3].
Urinary tract infections (UTIs) are one of the most prevalent bacterial outpatient infections and frequently occur in women accounting for almost 50-60% infections in adult women and affecting 150 million people per annum globally [4,5].This high prevalence has been linked to several risk factors, including catheterization, surgical manipulation, and disruption of the urinary tract, mainly among diabetic and immunosuppressive patients, along with recurrent hospitalizations and other comorbidities [6].UTIs are caused by various microorganisms but uropathogenic Escherichia coli (UPEC) being the primary cause (70%-90% of all UTIs) [7,8].In developing countries like Pakistan, it is a serious threat to the public health due to its association with high morbidity and mortality rate [9].
E. coli is a gram-negative organism and usually resides in the intestine of humans [10].Structurally, the genome of E. coli strain varies in size between extragenetic elements, pathogenic variants, and commensal strains [11].
Despite the knowledge of such a wide range of virulence genes of E. coli, antibacterial resistance is still a major concern around the globe [20].The treatment for an organism has become progressively complicated because of the presence of resistance to usually the 1st-line antimicrobial drugs [21].Antimicrobial resistance in UPEC has remarkably increased in the last few years and has become a tremendous public health issue [14,22,23].UPEC strains were observed to be resistant mainly to trimethoprim-sulfamethoxazole, ampicillin, and ampicillin-sulbactam [21,24].E. coli can acquire AMR genes through mobile genetic elements (MGE), which include plasmids, integrons, gene cassettes, insertion sequences, and transposons [25].A large number of resistance-encoding MGE, specific plasmids, are shared between different members of the Enterobacteriaceae and thus further promote the spread of resistance genes [26].MGE can also indicate virulence factors and the interplay between virulence and antimicrobial resistance [23].Therefore, it imposes the need to identify the genes which are responsible for the resistance to cater and improve the treatment outcome.
Most E. coli strains are isolated and identified by their Oantigen (lipopolysaccharide), H-antigen (flagellar), and Kantigen (capsular).The detailed structure of its genome and the responsible resistant genes can only be discovered through specialized molecular testing techniques, like whole genome sequencing [27,28].With the advent of sequencing technologies, it has become easier to get better insight into bacterial pathogenesis and to identify alternative therapeutics [29].Therefore, the current study is aimed at characterizing a local MDR E. coli strain specifically associated with UTIs and at analyzing the genetic variants associated with it using the whole genome sequencing approach.In this study, we aim to identify the virulence and AMR profile of the local UPEC strain.Eventually, the sequenced genome will be used to get an insight into epidemiological studies of local E. coli strains which will help to get the global pattern using pangenomic approach as there is very less genome sequencing data available from Pakistan (especially from Balochistan).Moreover, the global burden of UTIs suggests detailed in silico analyses of UPEC, thus identifying possible therapeutic strategies.
The World Health Organization (WHO) has issued a report highlighting common healthcare-associated bacterial organisms showing increased rates of antimicrobial resistance globally [1].It was revealed in a study that the mortality rates can reach to around ten million per year by 2050 in the absence of any action taken against antimicrobial resistance [2].

Materials and Methods
2.2.DNA Extraction.Bacterial genomic DNA was extracted from the cultured isolate using CTAB method with few modifications [33].The isolated bacterial colonies were mixed with CTAB buffer containing 0.2% β-mercaptoethanol and proteinase K (20 mg/mL) into the 1.5 mL safe lock tube.The mixture was incubated at 60 °C for 30 minutes, and after that, chloroform-isoamyl alcohol (24 : 1) was added into the tube.The tube was centrifuged, and the aqueous phase was collected into the new tube.Isopropanol (equal volume) was added and incubated at 4 °C for 30 minutes and centrifuged.DNA was collected as a pellet while the supernatant was discarded.The precipitated DNA was washed two times with 70% ethanol, and the DNA pellet was air-dried.The DNA pellet was dissolved in 1× TE buffer and stored at 4 °C till further processing.The quality of genomic DNA was assessed using 1% agarose gel electrophoresis while the quantity was estimated by dsDNA high sensitivity kit by Qubit fluorometer [34].
2.3.DNA Sequencing and Assembly.DNA library of MDR E. coli EQ101 was prepared using the Nextera XT kit (Illumina, San Diego, CA, US) according to the manufacturer's guidelines.High molecular weight genomic DNA (5 ng) was fragmented (~300 bases) by transposomes.The adapters and indexes were ligated to both DNA ends, and then, amplification of the adapter-ligated library was performed by PCR.The amplified library was purified by Agencourt AMPure XP beads.The quantity of purified library was estimated by dsDNA HS Qubit kit while the size distribution of the library was evaluated by 3% agarose gel electrophoresis.The library was denatured and diluted to 16pM with a hybridization buffer (HT1) [35].The diluted library was loaded into the sequencing cartridge for high-throughput sequencing using the MiSeq Illumina platform.The pairedend sequencing was carried out by 2 × 151 cycles using V3 flow cell.

In Silico Genome Characterization and Resistome Analysis.
The paired-end sequencing data was obtained in the fastq format containing short sequencing reads.The sequenced reads were assembled using Unicycler v0.4.9, which uses information by SPAdes v3.13.0 for assembly followed by polishing the aligned reads using Pilon v1.8 with a minimum threshold of 300 bases of contig length.Contigs having a length of fewer than 500 bases were filtered out.The serotype of the sequenced isolate was confirmed by depositing the sequenced data in the Center for Genomic Epidemiology (http://www.genomicepidemiology.org)for E. coli using the web-based SerotypeFinder 2.0, serotyping tool by applying default parameters.FimH was identified using FimTyper 1.0, and virulence genes were identified using the VirulenceFinder 2.0 database with the following parameters: threshold for ID 90% and minimum length 60%.The assembled genome was annotated using RAST tool kit.The comprehensive antibiotic resistance database [36] package was used to predict the antimicrobial resistance genes [37].Resistance gene identifier criteria were set to predict, strict, and complete genes only.VirulenceFinder 2.0 was used to find out the genes responsible for virulent mechanisms.

Pangenome Analysis.
Here, 63 UPEC genomes were retrieved from the NCBI database (40 complete, 22 draft, and E101) and were annotated by Prokka using the default parameters [38].The pangenome analysis and the coregenome SNP-based phylogenetic analysis were performed by PanRV [39] which uses Roary [40] for the pangenome estimation.

Results
All samples showed the presence of E. coli after being processed through culture and sensitivity.Organisms showing resistance against a list of antibiotics were selected as multi-drug-resistant.
The de novo whole genome assembly of E. coli EQ101 was done by Unicycler using SPAdes.The assembled genome comprised 918 contigs covering a total length of 5,764,348 bases with an average G+C content found around 50.89% and N50 was 9,699.The serotype of the sequenced isolate was identified as H18 with 100% identity of H type (fliC gene) with GenBank accession number AY250001, while no hit was identified for O-type genes.FimH64 was identified in the sequenced isolate, and the threshold was 95% using FimTyper 1.0.
Genome annotation of the assembled whole genome was carried out by PATRIC.The assembled genome consisted of 6,277 protein coding sequences (CDS), 53 transfer RNA (tRNA) genes, and 2 ribosomal RNA (rRNA) genes.The sequenced reads were aligned with the E. coli reference genome MG1655.The genome coverage of the sequenced isolate was found to be around 93.7%, and the mean depth coverage was 60.82×.

BioMed Research International
The annotation included 737 hypothetical proteins and 5,540 proteins with functional assignments.The proteins with functional assignments included 1,573 proteins with Enzyme Commission (EC) numbers, 1,304 with Gene Ontology (GO) assignments, and 1,104 proteins that were mapped to KEGG pathways.PATRIC annotation included two types of protein families, the genus-specific protein families (PLFams) which have 5,956 proteins sequenced genome and the cross-genus protein families (PGFams) involving 6,070 proteins.
A subsystem is a collection of proteins that combinedly implement a targeted biological process or structural complex and PATRIC.Numerous genes are involved in metabolisms followed by energy, protein processing, membrane transport, stress response, defense, virulence, cellular process, etc.
3.1.Pangenome-Based Phylogenetic Analysis.The pangenome of selected UPEC strains consists of 21585 genes, of  The core-genome-based phylogenetic analysis grouped the EQ101 with a Brazilian strain BR-14 DEC (Figures 2  and 3).Both the strains have 99.95% identity.The EQ101 genome did not contain any uniquely present or absent genes because of its contig level assembly.
TolC (UniprotKB:P02930) is an outer membrane channel, which is required for the function of several efflux systems such as AcrAB-TolC, AcrEF-TolC, EmrAB-TolC, and MacAB-TolC.These systems are involved in the export of antibiotics and other toxic compounds from the cell.evgA seems to control the expression of multidrug efflux operon.H-NS (UniprotKB:P0ACF8) is involved in bacterial chromosome organization, compaction and binds to the upstream and downstream regions of initiating RNA polymerase, trapping it in a loop and thereby, preventing the transcription process.It can also increase translational efficiency of mRNA with suboptimal Shine-Dalgarno sequences.Hydroxyisobutyrylation on Lys-121 decreases the DNA-binding activity of H-NS, promotes the expression of acid-resistant genes, and enhances bacterial survival under extreme acid stress.
cpxA (UniprotKB:P0AE82) responds to envelope stress response by activating the expression of downstream genes and is involved in several diverse cellular processes, including the functioning of acetohydroxyacid synthase I, the biosynthesis of isoleucine and valine, the TraJ protein activation activity for tra gene expression in F plasmid, and the synthesis, translocation, or stability of cell envelope proteins.cpxA is also involved in cell adhesion, so it takes part in biofilm formation.mdtM (UniprotKB:P39386) plays in cellular transportation and confers resistance to acriflavine, chloramphenicol, and norfloxacin.emrR (UniprotKB:C3SY57) and evgA (UniprotKB:P0ACZ4) play role in transcription.CTX-M-15 (UniprotKB:W8YE54) has beta-lactamase activity and is involved in antibiotic resistance.
sul1 (UniprotKB:P0C002) is implicated in resistance to sulfonamide.dfrA14 (UniprotKB:B6SCG1) is a key enzyme in folate metabolism.The sul1 enzyme catalysed process is an essential reaction for de novo glycine and purine synthesis and DNA precursor synthesis.

Discussion
Genes related to the resistance to a certain type of antibiotics were the key aim of the present study.Therefore, the genes having roles in antimicrobial resistance were analyzed and compared with the phenotype of the resistance pattern.
While discussing the various genes, the isolated traT virulence gene was observed to be involved in antimicrobial 9 BioMed Research International resistance.Its role was also supported by a previous study where Rezatofighi et al. [42] reported the significant prevalence of pathogenicity-associated island, papAH, papEF, fimH, fyuA, and traT genes in UPEC isolates.Another study revealed the presence of certain virulence genes including iha, lpfA, aafC, nfaE, eilA, eae, and bfpA for adherence to host cells.It further identified virulence factors including senB, astA, and pic that promote toxin production.Furthermore, toxins that promote E. coli protease production included sat and vat [43].
The variability of the housekeeping genes has been significantly identified and reported in a study that among the seven genes, fumC and gyrB presented the highest degree of nucleotide diversity and the greatest number of polymorphic nucleotide sites [3], suggesting to contribute to a high degree resistance pattern as identified in this study as well.
Among the identified resistance genes in the study, mdtM and acEdelta1 were found to be associated with antibiotic efflux, which was further confirmed through a study and reported that the same genes were involved in causing multidrug resistance and helped the organism to tolerate higher concentrations of antibiotics, external pH, and stress conditions involving alkaline environment [44,45].Furthermore, mphA gene, which was observed to involve in antibiotic inactivation, was found to be responsible for causing resistance to azithromycin [46,47].Another sul1 gene was observed to be responsible for antibiotic drug replacement in this study and was also found as a sulfonamide resistance gene in some other study [46,47].
The rest of the identified genes were found to be involved in affecting antibiotic targets, and their overexpression resulted in reduced permeability and antibiotic efflux.The presence of such higher numbers of resistance gene cassettes in a single sequence indicates higher resistance patterns in the country and requires further sequence-based studies and population-based comparisons of such gene clusters across the globe to cope with antimicrobial resistance and increase the efficacy of the treatment outcome.
The pangenome analysis revealed that more than 70% of genes were part of a unique genome which confirms the high diversity in ubiquitous E. coli strains and thus more divergence in the phylogenetic relationship as also discussed in earlier studies [48][49][50].

Conclusion
The study concluded that the sequenced organism (E.coli) was chosen based on higher resistance to most of the available antibiotics and genotype-phenotype correlations.Among the number of genes, correlating with antibiotic resistance, certain virulence, housekeeping, and resistance genes were identified which were also present in other UPEC strains.Pangenome analysis confirmed E. coli's ubiquitous nature and identified the closest phylogenetic relative of EQ101.Despite the identification of several genes involved in the resistance pattern, further research work would be required to establish a region-based resistance pattern in comparison with the global resistance pattern.

Figure 1 :Figure 2 :
Figure 1: Pan-genome analysis of E. coli strains causing urinary tract infections in humans.(a) The pie chart shows a number of core, accessory, and unique genes in 63 genomes of UPEC strains.(b) The pangenome vs. core-genome plot of UPEC genomes.

Figure 3 :Figure 4 :
Figure 3: Circularized SNP tree of 63 global UPEC strains indicating same clade of EQ101 and BR-14 DEC.Most of the other Pakistani strains shared similar clade.Color key indicating countries is provided alongside the circular tree.

Table 2 :
The phenotypic resistance profile of EQ101.

Table 3 :
The genomic features and characteristics of the E coli strain EQ101.

Table 4 :
Phenotypic and genotypic resistance profile of EQ101.