Analysis of Bacterial Diversity in Different Heavy Oil Wells of a Reservoir in South Oman with Alkaline pH

The identification of potential hydrocarbon utilizing bacteria is an essential requirement in microbial enhanced oil recovery (MEOR). Molecular approaches like proteomic and genomic characterization of the isolates are replacing the traditional method of identification with systemic classification. Genotypic profiling of the isolates includes fingerprint or pattern-based technique and sequence-based technique. Understanding community structure and dynamics is essential for studying diversity profiles and is challenging in the case of microbial analysis. The present study aims to understand the bacterial community composition from different heavy oil contaminated soil samples collected from geographically related oil well areas in Oman and to identify spore-forming hydrocarbon utilizing cultivable bacteria. V4 region of 16S rDNA gene was the target for Ion PGM™. A total of 825081 raw sequences were obtained from Ion torrent from all the 10 soil samples. The species richness and evenness were found to be moderate in all the samples with four main phyla, Firmicutes, Bacteroidetes, Proteobacteria, and Actinobacteria, the most abundant being Firmicutes. Bacillus sp. ubiquitously dominated in all samples followed by Paenibacillus, which was followed by Brevibacillus, Planococcus, and Flavobacterium. Principal Coordinate Analysis (PCoA) and UPGMA dendrogram clustered the 10 soil samples into four main groups. Weighted UniFrac significance test determined that there was significant difference in the communities present in soil samples examined. It can be concluded that the microbial community was different in all the 10 soil samples with Bacillus and Paenibacillus sp. as predominating genus. The 16S rDNA sequencing of cultivable spore-forming bacteria identified the hydrocarbon utilizing bacteria as Bacillus and Paenibacillus sp. and the nucleotide sequences were submitted to NCBI GenBank under accession numbers KP119097–KP119115. Bacillus and Paenibacillus sp., which were relatively abundant in the oil fields, can be recommended to be chosen as candidates for hydrocarbon utilization study.


Introduction
Oil production has been experiencing decline in many parts of the world due to oilfield's maturity and an example of such includes the major oilfields in the North Sea [1]. Another major concern is the increasing energy demands due to global population growth and the difficulty in discovering new oilfields as an alternative to the exploited oil fields. Therefore, there is an urge to find out alternative technologies to increase oil recovery from existing oilfields around the world. It is a fact that fossil fuels will still remain the key source of energy, regardless of the gross investments in other energy sources such as biofuels, solar energy, and wind energy. This fact is highlighted by the current global energy production from fossil fuels which currently stand at about 80-90% with oil and gas representing about 60% [2]. During oil production, primary oil recovery can account for 30-40% oil productions, while additional 15-25% can be recovered by secondary methods such as water injection leaving behind about 35-55% of oil as residual oil in the reservoirs [3]. This residual oil is usually the target of many enhanced oil recovery technologies and it amounts to about 2-4 trillion barrels [4] or about 67% 2 Scientifica of the total oil reserves [5]. Residual oil recovery is at present necessitated for many oil companies and so there is a constant hunt for a cheap and efficient technology which will raise the global oil production as well as the productive life of many oilfields. The recovery of this residual oil is accomplished by enhanced oil recovery (EOR) or tertiary recovery methods which are used in oil industry to increase the production of crude oil. Indigenous or in situ microbial enhanced oil recovery (IMEOR) is one of the techniques in which the inhabitant microbes in the oil reservoirs were stimulated to enhance oil recovery [6]. IMEOR is reported as a costeffective method [7]. The microbial community composition is influenced by oil reservoir geological conditions and other external factors like nutrient and water flooding, biosurfactant and biopolymer application, and steam injection [8].
For the advancement of a reliable MEOR protocol, understanding the microflora that exist in the oil reservoir is essential. Conventional culture-based techniques have been widely used for bacterial identification and enumeration in oil fields. Culture-based technique is selective and aids in the identification of only a few of the microbes. The problem is even worsened in extreme environmental conditions such as oil fields, where only microbes that can withstand strict circumstances will survive. Therefore, genomic analysis has become an important tool for understanding ecological biodiversity. It evades the need for laboratory cultivation and isolation of individual isolates.
Complex environmental samples need some technique which can read multiple sequences in parallel. Next-Generation Sequencing (NGS) can obtain DNA sequences directly from environmental samples [9,10]. In the study, the distribution of bacteria in steam-injected heavy oil wells (South Oman) with alkaline pH was characterized using high-throughput sequencing technique, Ion Torrent-Personal Genome Machine (PGM). The spore-forming bacteria in the samples were also identified using culture-based method.

Sample Collection and Preparation.
The heavy crude oil contaminated subsurface soil was collected from different heavy oil wells in Southern Oman as described previously [11]. Total ten soil samples were collected aseptically in sterile sampling bags and stored at 4 ∘ C for further studies. The samples were kindly provided by a local oil company.

Genomic DNA Isolation.
The genomic DNA (gDNA) was isolated from cultivable spore-forming bacteria by boiling suspended soil samples in 10 ml distilled water at 90 ∘ C for 30 min to kill all the vegetative cells and enrichment of the bacteria in Bushnell-Haas media (BH media) containing 1% heavy crude oil. The flasks inoculated with the boiled sample supernatant were incubated at 40 ∘ C for 24 h and plated on fresh BH agar plates to obtain pure cultures. The gDNA from the isolates and the soil samples was isolated using PowerSoil5 DNA Isolation Kit (Mo Bio Laboratories Inc.). The nucleic acid concentration and purity were measured by Thermo Fischer NanoDrop6 2000/2000 c spectrophotometer.

Identification of Cultivable Spore-Forming Isolates Using
16S rDNA Sequencing. Cultivable spore-forming bacterial isolates were identified by 16S rDNA sequencing using 27F and 1492R universal primers as described previously [11], where briefly the genomic DNA was extracted using PowerSoil DNA Isolation Kit (Mo Bio Laboratories Inc.), and the amplification was performed using T1006 thermal cycler (Bio-Rad, USA). The PCR products were purified using QIAquick PCR purification kit (QIAGen). The BigDye5 Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems6) was used for de novo sequencing. The sequencing was done using 3130 XL Genetic Analyzer (Applied Biosystem, Hitachi) at Central Analytical and Applied Research Unit (CAARU), Sultan Qaboos University, and was submitted to NCBI GenBank USA. The dendrogram was constructed using maximum likelihood (ML) methods, respectively, using PHYLIP, the Phylogeny Inference Package program. The ML program uses a Hidden Markov Model (HMM) method of inferring different rates of evolution at different sites [12].

NGS Analysis of Microbial Community in Soil Samples.
The metagenomics analysis of all the 10 soil samples was done following the instruction manual of the Ion PGM System. The total gDNA extracted from the soil samples was amplified using 16S Primer Sets targeting hypervariable V4 region: 515F (5 -GTGCCAGCMGCCGCGGTAA-3 ) and 806R (5 -GGACTACVSGGGTATCTAAT-3 ). The PCR procedure was with an initial denaturation at 95 ∘ C for 10 min, 25 cycles of denaturation for 30 sec at 95 ∘ C, annealing at 58 ∘ C, extension at 72 ∘ C for 20 sec, followed by holding at 72 ∘ C for 7 min and a final hold at 4 ∘ C for ∞. The amplified product was purified using Agencourt5 AMPure5 XP Reagent and 70% ethanol. The DNA input for library preparation was calculated using Agilent5 2100 Bioanalyzer5 and the library was prepared using Ion Plus Fragment Library Kit, following the user instructions. The pooled short amplicons were end-repaired using 5x End Repair Buffer and End Repair Enzyme and purified using Agencourt AMPure XP Reagent and 70% ethanol. Barcoded libraries were prepared using Ion Xpress6 Barcode Adapters 1-16 Kit. The adapters are ligated using DNA Ligase and the nicks were repaired using Nick Repair Polymerase. The DNA template for Ion PGM System was prepared using the Ion PGM Template OT2 400 Kit and the Ion OneTouch6 2 System following the instructions in Ion PGM Template OT2 400 Kit User Guide. And the library was sequenced using the Ion Personal Genome Machine5 (PGM6) System and the Ion PGM Sequencing 400 Kit.

Data
Analysis. Primary data analysis was performed with Torrent Suite6 Software v4.0 with automated secondary analysis using Ion Reporter6 Software v4.0. Further analysis was done using a variety of computer packages including Past3, XLstat, NCSS 2007, "R" and NCSS 2010. Alpha diversity analysis was conducted using QIIME pipeline (version 1.8.0) (QIIME, 2016). Significance reported for any analysis is defined as < 0.05.
Short sequences < 200 bp were removed after depleting primers and barcodes; sequences with ambiguous base calls were also removed. Sequences with homopolymer runs exceeding 6 bp were removed, as the threshold used was 6 bp. Sequences were then denoised and chimeras were removed. Operational taxonomic units (OTUs) were formed at 97% similarity (3% divergence) using UCLUST after removing singleton sequences [13]. RDP classifier version 2.2 was used for taxonomic assignments within the GreenGenes taxonomy [14,15]. The OTUs were rarefied (randomly subsampled) to 30000 sequences and the signified sequences were aligned to 16S reference sequences with PyNAST [16]. The aligned multiple sequences were used to create tree in FastTree [17].
Measures of -diversity, the diversity within the community such as (a) Shannon-Wiener index, also termed the Shannon-Wiener index ( ), which evaluates the relative abundance and refers to species richness and species evenness [18] and Simpson index ( ) [19] and (b) rarefaction, a method for species estimation based on the presence-absence data [20] and (c) Chao 1 [21] plots, based on the number of rare OTUs found in sample were measured using QIIME pipeline [22].
Shannon index is = ∑ ln , where is the proportion of individuals found in species , = / , where is the number of individuals in species and is the total number of individuals in the community.
Simpson's index = 1 − , where is the dominance. = ∑ 2 , where is the proportion of individuals found in species . SHE analysis was done which is based on the mathematical approach that diversity (Shannon diversity, ) is related directly to species richness ( ) and evenness ( ) of the distribution [23].
= ln( ) + ln( ), where is Shannon index; is species richness; and is evenness of distribution.
From all OTUs, representative sequences were selected and using QIIME and RDP classifier, taxonomy was assigned for them. Only those phyla which appeared in 75% of the samples were selected. The relative abundance of the genera present in all 10 soil samples was plotted by a heatmap and graphically represented by interactive Krona charts.
-Diversity or community similarity analysis was done by determining the pairwise distances using the phylogenetic distance metric, UniFrac generated by QIIME [24]. Monte Carlo simulations (weighted UniFrac significance test) were done to test the significant difference in OTUs in the samples in the Unweighted Pair Group Method with Arithmetic Mean (UPGMA) tree generated by UniFrac analysis [25], followed by Bonferroni correction [26] to reduce type 1 error rate.

Sample Collection and Preparation.
The soil moisture was determined to be 0.03-0.08 m 3 /m 3 , measured using EM50 Digital data logger, and pH was measured by Jenway 3505 pH meter as 8.5. The viscosity of the heavy crude oil from the well head, which contaminated the soil, was estimated as 4.57 ∘ API using RheolabQC Rotation Viscometer. The eTPH of the soil samples ranged within 3.2-4.8%.

Genomic DNA Isolation and Identification of Cultivable
Spore-Forming Isolates Using 16S rDNA Sequencing. The gDNA from all the isolates (Figure 1(a)) and the soil samples ( Figure 1(b)) was extracted using PowerSoil DNA Isolation Kit and was successfully amplified using universal primers for bacteria, 27F and 1492R. The amplified product was 1400 bp ( Figure 2).
The purified PCR product was sequenced using 3130 XL Genetic Analyzer and the sequences were submitted to NCBI under the accession numbers KP119097-KP119115 (Table 1).
The sequences were aligned with the closely related species sequences from NCBI nucleotide blast. The branches with bootstrap values above 70% are reliable. The dendrogram was made using ML (maximum likelihood) method, which assumes every single site of the multiple sequence alignment independently with bootstrapping of 1000 replications. The ln likelihood for the cladogram was −6866.09722. The closer the value of likelihood to zero, the better is the dendrogram (Figure 3). Among the listed isolates in Table 1, Paenibacillus ehimensis BS1 showed very good potential for applications in MEOR and heavy crude oil biodegradation [11].

NGS Analysis of Microbial Community in Soil Samples.
The gDNA was amplified using specific primers targeting the V4 region of the 16S rRNA bacterial gene using   The -diversity measures of the samples were done in QIIME pipeline and Past3. Shannon index ( ) and Simpson index ( ) were calculated for all the samples (Table 2), which revealed that the species richness and relative abundance were present in all the samples, soil sample 2 being with the least diversity.
Rarefaction curves with chao1 and Shannon diversity were plotted with the average number of OTUs at each interval against the size of the subsample [27] (Figure 4). It was found that, for all the 10 soil samples, the curve reached a plateau at approximately 5000 sequences indicating that sequencing depth was sufficient to capture the full scope of microbial diversity.
Taxonomic comparison of the OTUs identified four main phyla, Firmicutes, Bacteroidetes, Proteobacteria, and Actinobacteria, the most abundant being Firmicutes followed by Bacteroidetes ( Figure 5). Further comparison of genus belonging to these phyla showed that Bacillus was the most abundant genus followed by Paenibacillus, which was [53] [52] [51] [55] [56] [57] [58] [60] [61] [62] [63] [64] [  followed by Brevibacillus, Planococcus, Flavobacterium, and so forth ( Figure 6). The result was confirmed by heatmap analysis in which the highest relative abundance of Bacillus, Paenibacillus, and Brevibacillus was found in all the 10 soil samples and the soil samples with more similar microbial populations were mathematically clustered closer together. The genera (consortium) were used for clustering. Thus, the samples with more similar consortium of genera cluster closer together with the length of connecting lines (top of heatmap) (Figure 7) related to the similarity; shorter lines between two samples indicate closely matched microbial consortium. The heatmap represents the relative percentages of each genus. The predominant genera are represented along the right -axis. SHE analysis was performed to evaluate whether species proportion was similar in all of the 10 soil samples to assess the microbial diversity. The community structure was determined as a log-normal distribution, that is, a few species with high or low abundance and many with intermediate abundance [28] (Figure 8).
The resemblance between the bacterial communities, -diversity, was measured using UniFrac analysis, which provided a tree-based (Figure 9(a)) Principal Coordinate Analysis (PCoA) graph (Figure 9(b)), which grouped the soil samples to 4 main clusters. The eigenvalues for PC1, PC2, and PC3 were 0.014, 0.002, and 0.001, respectively, and accounted for 74%, 12%, and 6% of the total variance. Both the two approaches revealed the same pattern.  UniFrac values were based on comparisons to 1,000 randomized trees. The values are significant only if they were <0.05 (Table 3).

Discussion
Ion PGM6 was used to delineate the bacterial community structure of 10 soil samples contaminated with heavy crude oil collected from near oil wells used in the study. The hypervariable V4 region of 16S rDNA sequence was the target region. Stringent quality sequence curation of a total of 825081 raw reads obtained resulted in 84% reduction of initial reads. There are reports stating 50-80% filtering of initial reads [29][30][31].  The relative abundance of each phyla varied among the 10 soil samples, the predominant phyla observed to be Firmicutes, Bacteroidetes, Proteobacteria, and Actinobacteria, the most abundant being Firmicutes followed by Bacteroidetes. The possible reason for abundance of Firmicutes (Bacillus sp.) could be their ability to form endospores to resist adverse conditions of the oil fields, as in desert habitats of Oman [32][33][34][35]. Also, because of their metabolic and physiologic adaptability and ability to produce enzyme inhibitors and antibiotics, Firmicutes are considered to be better competitors in natural environment [36]. The moisture content of the soil samples was too low, about 0.03-0.08 m 3 /m 3 , which can be another possible reason. The heatmap analysis of genera showed that Bacillus had the highest relative abundance followed by Paenibacillus and Brevibacillus in all the 10 soil samples.
Measures of -diversity such as Shannon index and Simpson index showed that there was diversity of species OTUs within the community in all the 10 soil samples. Multiple rarefaction curves assembled from each sample's Shannon diversity index reached a plateau at approximately 5000 sequences suggesting that the sequencing depth was sufficient to capture the full scope of microbial diversity [37]. SHE analysis revealed the constant proportion of species in all 10 samples.
-Diversity measured using UniFrac analysis provided a tree-based PCoA (Principal Coordinate Analysis) graph. The two approaches revealed the same pattern of clustering; the Monte Carlo simulation test with Bonferroni corrections revealed that there was significant difference in the communities present in the soil samples examined [24].
The microbial community in soils is determined by the physicochemical parameters such as moisture content, temperature, salinity, and pH [38,39]. The extreme temperature conditions along with low moisture content (0.03-0.08 m 3 /m 3 ) and a slightly alkaline pH will have an impact on the diversity of bacterial community in the soil. Similar results were reported earlier for Tibetan plateau [40]. The presence of heavy crude oil can be another limiting factor for the microbial community.
The identification of cultivable spore-forming isolates by 16S rDNA sequencing from the soil samples resulted in Paenibacillus and Bacillus sp. The relative abundance of Bacillus sp. in the microbial community in heavy crude oil sludge was reported in Saudi Arabia, Oman, and Nigeria. [41][42][43]. Bacillus and Paenibacillus sp. which were relatively abundant in the oil fields can be recommended to be chosen as candidates for hydrocarbon utilization study. One of our isolates, Paenibacillus ehimensis BS1, showed maximum growth in presence of heavy oil and biotransformed it to lighter aliphatic and aromatic compounds demonstrating its potential in EOR and environmental bioremediation under aerobic and anaerobic reservoir conditions [11].