Cloning, Sequencing, and In Silico Analysis of β-Propeller Phytase Bacillus licheniformis Strain PB-13

β-Propeller phytases (BPPhy) are widely distributed in nature and play a major role in phytate-phosphorus cycling. In the present study, a BPPhy gene from Bacillus licheniformis strain was expressed in E. coli with a phytase activity of 1.15 U/mL and specific activity of 0.92 U/mg proteins. The expressed enzyme represented a full length ORF “PhyPB13” of 381 amino acid residues and differs by 3 residues from the closest similar existing BPPhy sequences. The PhyPB13 sequence was characterized in silico using various bioinformatic tools to better understand structural, functional, and evolutionary aspects of BPPhy class by multiple sequence alignment and homology search, phylogenetic tree construction, variation in biochemical features, and distribution of motifs and superfamilies. In all sequences, conserved sites were observed toward their N-terminus and C-terminus. Cysteine was not present in the sequence. Overall, three major clusters were observed in phylogenetic tree with variation in biophysical characteristics. A total of 10 motifs were reported with motif “1” observed in all 44 protein sequences and might be used for diversity and expression analysis of BPPhy enzymes. This study revealed important sequence features of BPPhy and pave a way for determining catalytic mechanism and selection of phytase with desirable characteristics.


Biotechnology Research International
Bioinformatics analysis of genes and genomes from different species makes possible the identification of new genes including orthologs or paralogs [12] and also facilitates the establishment of phylogenetic relationships between genes and evolutionary molecular mechanisms [13]. Large numbers of phytase gene sequences are available in various databases providing further opportunity to study detailed mechanistic and sequential diversity of this class of enzymes. It has been utilized for formation of consensus phytase sequence [14], in silico analysis of HAP sequences [15], and motif analysis of different phytases [16]. However, no such study has been conducted to assess sequence diversity of BPPhy. The sequence information and further analysis of superfamily will help in understanding the underlying mechanisms and also helps to develop and/or implement a range of alternate effectors for enzyme activity. The in silico characterization of protein sequences of other industrially important enzymes has also been reported recently [17][18][19].
In the present study, a phytase producing Bacillus licheniformis strain was used for the isolation, cloning, and sequencing of BPPhy gene in pET32a vector and expression in E. coli BL21. The phytase sequence was characterized in silico. Simultaneously, in order to better understand the structural, functional, and evolutionary aspects of BPPhy, we exploited the reference protein sequences of BPPhy in NCBI and ExPASy databases for in silico study of their biochemical features, multiple sequence alignment and identity search, phylogenetic tree construction, and distribution of motifs and superfamilies using various bioinformatics tools. We provide here information regarding conserved and variable amino acids and protein motifs that might have an impact on function. In addition, we analyzed other structural aspects including the position of conserved residues and the cleavage site of the zymogen and presented a preliminary phylogenetic analysis of selected members of this subfamily.

Chemicals and Bacterial
Strains. All the chemicals, solvents, and antibiotics used in this study were of molecular biology and analytical grade and procured from standard manufacturers as GeNei, Sigma, Merck, and HiMedia Pvt. Ltd. Phytase producing Bacillus licheniformis strain PB-13 (identified using 16S rRNA gene sequencing, GenBank Accession number JX406744.1) isolated in our laboratory was used for isolation of phytase gene [20]. E. coli DH5 and E. coli BL21 (DE3) (Novagen) were used as cloning host and expression host, respectively. Plasmid vector pET32a(+) (Novagen) was used for cloning and expression studies. E. coli strains and plasmid were kindly provided by Dr. S. P. Singh, Department of Veterinary Public Health, College of Veterinary and Animal Sciences, G. B. Pant University of Agriculture and Technology, Pantnagar.

PCR Cloning and Expression of the Phytase Gene.
Phytase gene sequence (GenBank accession number BL018) was retrieved from complete genome sequence of Bacillus licheniformis ATCC 14580 from KEGG genome database (http://www.genome.jp/kegg-bin/show organism?org=bli). Primers were designed from end regions of complete ORF. For the directional cloning, restriction sites for HindIII and XhoI were introduced at 5 ends of forward primer, PhyL F11 "CGAAGCTTATCATATGAACTTTTACAAAACG, " and reverse primer, PhyL R "GTGCTCGAGCCTTAT-TTGGCTCGTTTTTTCA, " respectively. The primers were custom-synthesized by SBS Gentech Co. Ltd. The PCR amplification was carried out using Pfu polymerase (Fermentas) for 30 cycles at 94 ∘ C for 45 sec, 50 ∘ C for 45 sec, and 72 ∘ C for 1 min with genomic DNA of Bacillus licheniformis strain PB-13 as template. For directional cloning of PCR product into pET32a(+), the amplified PCR fragment was restriction-digested with HindIII/XhoI and separated on agarose gel. The separated fragment was cut from the gel and purified with the QIAquick DNA purification kit (Qiagen). Purified HindIII/XhoI fragment was cloned into an HindIII/XhoI-cut pET32a(+) E. coli expression vector harbouring C-terminal His6 tag. The E. coli DH5 cells were transformed with recombinant plasmid. Recombinant plasmid from positive clone for phytase gene was isolated and transformed into expression host E. coli BL21-(DE3-) pLysS as per standardized protocol [21]. A colony was randomly picked from among the colonies observed on ampicillin selection plate. This was tested for presence of recombinant plasmid containing phytase gene using gene specific primers (PhyL F11 and PhyL R). The transformants were grown in LB broth containing ampicillin (100 g/mL), induced with the different amount of IPTG to optimize expression. For production analysis, samples were withdrawn at various times after induction and cells were pelleted, resuspended into 50 mM Tris-HCl buffer (pH 7.0, containing 1 mM CaCl 2 ) were sonicated on ice for 5 min with a pulse rate of 30 sec and a gap of 10 sec. Cell debris were removed by centrifugation at 10000 rpm for 30 min, 4 ∘ C. The recombinant protein from supernatant was assayed for crude phytase activity.

Phytase Assay.
Crude phytase activity was determined using 5 mM sodium phytate as substrate in 0.1 M sodium acetate buffer, with pH 5.5 following the method of Engelen et al. [22]. One unit was defined as the amount of enzyme that released 1 M of inorganic phosphate in 1 min. The amount of phosphate released was calculated based on standard curve of KH 2 PO 4 .   [23,24]. The evolutionary history was inferred using the neigbour-joining method [25]. The evolutionary distances were computed using the maximum composite likelihood method and are in the units of the number of base substitutions per site [26]. Evolutionary analyses were conducted in MEGA5.

In Silico Characterization of -Propeller Phytase Sequence.
PhyPB13 -propeller phytases sequence was used as probe NCBI protein database (http://www.ncbi.nlm.nih.gov/; accessed in June, 2012) to retrieve the 44 reference protein sequences of BPPhy used in this study ( Table 1). The protein sequences in FASTA format from RefSeq entries, which were shown to exhibit phytase activities, were selected for further in silico study. The sequences were characterized for homology, phylogenetic relationship, functional domain, and biophysical characteristics using available bioinformatic tools following methodology as adapted by Kumar et al. [15].

Result and Discussion
3.1. Cloning and Expression of Phytase. E. coli expression system is one of the simplest, cost-effective, and suitable systems for large scale production of recombinant proteins [27]. In the present study, we have used a soluble recombinant proteins expression system to express phytase from B. licheniformis PB-13. PCR amplification for the isolation of phytase gene resulted into an amplified PCR product of ∼1,150 bp as observed after electrophoresis on 1% agarose gel. Appearance of single band on gel revealed specific amplification of phytase gene using end-specific primers. This good quality PCR product was taken for restriction digestion using HindIII and XhoI restriction enzymes. E. coli DH5 was transformed with recombinant vector (pET32a + PhyPB13 phytase gene). E. coli BL21 (DE3) was used as an expression host, as it encodes the T7 RNA polymerase under the control of lacUV5 promoter [28]. Transformation of plasmid from positive clone to E. coli BL21 competent cells followed by induction with IPTG for 4 h resulted in expression of recombinant phytase by SDS-PAGE as an intense band of ∼66 kDa while no such band was observed in uninduced culture. The size of induced protein was consistent with the calculated value for the fusion protein (∼63 kDa), which includes an additional peptide sequence of about 20 kDa (175 amino acids) along with encoded phytase sequence of 381 amino acids (theoretical molecular weight ∼42 kDa). The additional sequence includes Trx-tag (109 amino acids; which increases solubility of expressed protein), S-tag (used in purification of recombinant proteins), His 6 -tag (role in purification), and linker sequence [28]. Despite the presence of this additional amino acid stretch, the recombinant phytase was found to be catalytically active. The recombinant phytase was designated as "rPhyPB13. " Transformed E. coli BL21 cells produced rPhyPB13 with an enzyme activity of 1.15 U/mL and specific activity of 0.92 U/mg proteins. It was comparable to wild type B. licheniformis PB-13 phytase in production media. B. licheniformis PB-13 produced 0.99 U/mL phytase in PSMWB media (phytase screening media supplemented with 10% wheat bran) with a specific activity of 0.70 U/mg proteins [20].

Sequencing and Characterization B. licheniformis PB-13
Phytase Gene Sequence. Sequencing of target insert from positive clone by automated DNA sequencer at Department of Biochemistry, University of Delhi (South Campus), New Delhi, resulted in a nucleotide sequence of 1,149 bp (GenBank accession number JX187608.1). Analysis of sequence using BlastN resulted into 99% identity of sequence with B. licheniformis phytase L precursor gene (GenBank accession number AF469936.1). The phylogenetic tree constructed using neighbor-joining method also showed similar classification. The nucleotide sequence was searched for open reading frame (ORF) using ORF Finder. Ten (10) ORFs of varying length starting from different frames were obtained. The largest sequence was present in frame +1 which corresponded to the true ORF for phytase gene as it was, which started from first nucleotide and ended with a stop codon. Also, it showed 99% similarity to phytase sequences present in GenBank database. This full length ORF designated as "PhyPB13" encoded a protein of 381 amino acid residues with a calculated molecular mass of 42.1 kDa. The nucleotide sequence along with translated protein sequence (GenBank accession number AFQ59979.1) using ExPASy translation tool contained a putative signal peptide of 29 amino acid residues starting from amino acid residue 1 to 29. A cleavage site was present between residues 29 and 30 ( Figure 1). Wang et al. [29] isolated a phyC gene of 1,146 bp from B. licheniformis encoding a peptide of 381 amino acids. The length of signal peptide in phyC was 31 amino acids. A BPPhy gene with an ORF of 1,074 bp (357 amino acid residues) and a signal peptide of 27 amino acid residues was isolated from P. nyakensis [30]. The amino acid composition of PhyPB13 protein sequence determined using ProtParam server revealed that Asp, Gly, Lys, and Ala were major amino acids constituting about 36% of PhyPB13. Cysteine was not observed in the sequence indicating that PhyPB13 did not bear disulfide bonds, which were believed to be essential for conformational stability and catalytic activity in several fungal phytases [29,31,32]. It was consistent with absence of cysteine in phytase from B. licheniformis [29].
Alignment of homologous sequences with Mega5 revealed presence of two conserved motifs, namely, "D-  (Figure 1). Similar motifs were reported in multiple sequence alignments of 66 BPPhy sequences by Huang et al. [30]. Like other Bacillus phytases, PhyPB13 did not show sequence homology with HAPhys. The conserved regions "RHGXRXP" and "HD" of HAPhys [33] were absent in PhyPB13. Functional domain analysis using pfam (http:// www.sanger.ac.uk/resources/software/) showed that the complete sequence (residues 1-381) was encoding a phytase enzyme. The sequence (residues 34-378) belongs to a thermostable phytase (3-phytase) superfamily (ID 50956) as indicated by Superfam (http://supfam.cs.bris.ac.uk/ SUPERFAMILY/hmm.html) analysis. This superfamily includes thermostable phytases such as phytase from B. amyloliquefaciens and the other Bacillus sp. with 6bladed beta-propeller fold structure. A putative conserved domain of phytase superfamily has been detected while performing a BlastP (http://blast.ncbi.nlm.nih.gov/Blast.cgi? PAGE=Proteins) similarity search analysis of PhyPB13 protein sequence. Further, the sequence appeared to be 99% identical to phyL precursor from B. licheniformis (GenBank accession number AAM74021.1). Alignment of PhyPB13 with phyL precursor sequence revealed that the sequences were different at three positions (PhyPB13 contains Leu, Lys, and Asn in place of Lys, Asp, and Asp at 33rd, 67th, and 281st positions, resp.).

Prediction of Three-Dimensional Structure of PhyPB13.
Analysis of suitable template for 3D structure model of PhyPB13 using Phyre2 server (http://www.sbg.bio.ic .ac.uk/phyre2/html/page.cgi?id=index) revealed B. amyloliquefaciens phytase (TS-Phy, PDB ID-1H6L) as the best template for 3D modeling based on number of aligned residues and quality of alignment, with a "confidence" score of 100% which indicated the probability that a match between PhyPB13 and TS-Phy was based on homology. A match with "confidence >90%" represents similar fold and high accuracy in the modeling of core protein. The identity between target sequence and template was ∼68%, atgaacttttacaaaacgctcgctttatcaacactcgcagcatccttatggtctccctca tggagcagtctcccccataacgaagctgcggctcacttagaattcacggtgactgccgat gcagagacagagccggtggatacccctgacgacgcggcagatgacccggcgatttgggtt

A E T E P V D T P D D A A D D P A I W V catccgaagcagcccgaaaaaagcaggctcatcaccacaaacaaaaagtcgggcttaatc
H P K Q P E K S R L I T T N K K S G L I gtctatgatttgaagggaaaacagcttgcggcctatccgtttggcaaattaaacaatgtc

S F K M S S Q T E G L A A D D E Y G K atgtacatcgccgaagaagacgttgcgatttggtctttcagcgccgagccggacggcgga M Y I A E E D V A I W S F S A E P D G G gataaaggaaaaatcgtcgatcgtgccgacggaccgcatctaacttctgatattgaaggg D K G K I V D R A D G P H L T S D I E G ctgacgatttactacggagaagaeggagaagggtatttgatcgcgtccagtcagggcgat L T I Y Y G E D G E G Y L I A S S Q G D aaccgctatgccatctatgaccggcgcgggaaaaacgactacgtcactgctttttcaatt N R Y A I Y D R R G K N D Y V T A F S I gaggacggcaaagaaatcgacgggacaagcgataccgatggaatcgacgtcatcggcttc E D G K E I D G T S D T D G I D V I G F ggcctcggcaaaacatatccatacggcatctttgtcgcccaagacggcgaaaatacggaa G L G K T Y P Y G I F V A Q D G E N T E aatggacaaccggccaatcagaacttcaaaattgtctcctgggaaaaaatcgccgacgcg N G Q P A N Q N F K I V S W E K I A D A ctggacgacaaacctgatatcgatgatcaggtcgatccccgaaaactgaaaaaccgagcc L D D K P D I D D Q V D P R K L K N R A aaataa
K * Figure 1: Translated protein sequence from PhyPB13 nucleotide sequence (1146 bp). Signal peptide sequence is present from amino acid residues 1-29 (sequence underlined); ↓ indicates cleavage site of signal peptide; * asterisk indicates stop codon; conserved sequences are highlighted.

M N F Y K T L A L S T L A A S L W S P S W S S L P H N E A A A H L E F T V T A D ↓
which revealed accuracy of model; as for extremely high accuracy models this number should be above 30-40% (http://www.sbg.bio.ic.ac.uk/phyre2/html/help.cgi?id=help). Tridimensional structure of TS-Phy was downloaded from PDB (PDB ID 1H6L) and its PDB ID was provided as template for 3D structure prediction of PhyPB13 protein sequence using SWISS-Model server. It features automated modeling of homooligomeric assemblies and modeling of essential metal ions and cofactors in protein structures [23,24]. Small E-value in sequence identity indicates that the TS-Phy and rPhyPB13 have a very similar sequence and good reliability of the alignment. The model has a six-bladed-propeller folding architecture [10] and 7 calcium binding sites in protein sequence predicted by 3DLigandSite [34]. Oh et al. [35] reported that an electronegative central channel accessible to solvent binds seven Ca 2+ ions and these Ca 2+ ions have been found important in catalytic activity and substrate binding of BPPhy. Valine at 100th position was found to be a putative ligand binding site with 4 contacts as predicted by 3DLigandSite [34]. It is present inside of the conserved region of BPPhys (residues 98-104) and might play an important role in the binding of substrate for enzyme catalysis.

In Silico Analysis and Characterization of BPPhy.
The accession numbers along with source organisms of 44 reference protein sequences of BPPhy are given in     " with minor differences (sequence information was not given) during analysis of several BPPhy sequences. In the present study, we have also observed the presence of highly conserved sequence "DG" towards its C-terminus. Aspartic acid at conserved Cterminal "DG" sequence in these BPPhy sequences might act as a proton donor to the oxygen atom of the scissile phosphomonoester bond and may play a role in catalytic mechanism of these enzymes. Similar role has been suggested for aspartic acid in conserved "HD" residues towards Cterminal in HAPhy sequences [36,37].
Evolutionary relationship among different sequences was studied using phylogenetic tree constructed by neighborjoining method ( Figure 2). Overall, three major clusters were observed in phylogenetic tree. Cluster "1" represented sequences of Bacillus with Paenibacillus species. The amino acid residues in sequences of this cluster were 380 ± 10 except for three sequences from Paenibacillus sp., that is, Y 003868637.1, YP 004639353.1, and ZP 07387907.1 which have length of 465, 461, and 469 amino acid residues, respectively. Cluster "2" represents BPPhy with the largest protein sequence in the range of 436-769, while cluster "3" had the smallest sequence with 331 to 364 amino acid residues ( Table 2).
Other biophysical features of all protein sequences are also given in Table 2. Molecular weight of sequences varied according to length of protein sequences in the range of 37-82 kDa. Isoelectric point (pI) was found between 4.1 and 6.4 with the majority of sequences having a pI value above 5. The pI values for the sequences were highest in cluster 2, followed by cluster 1 and 3, respectively. The instability index was used to measure in vivo half-life of a protein [38]. Analysis of instability index indicated uniformity among all sequences of BPPhy and was predicted to be below 40 for all sequences except phytase from C. phaeobacteroides (YP 001959943.1). Further, a majority of sequences have instability index less than 30, suggesting that these proteins exhibited good in vivo stability [38]. Aliphatic index of reported protein sequences ranged from 69 to 90, indicating the high thermostability of BPPhy enzymes. Aliphatic index of protein measures the relative volume occupied by aliphatic side chains of the amino acids: alanine, valine, leucine, and isoleucine. Globular proteins with high aliphatic index have high thermostability and an increase in aliphatic index suggests an increase in protein thermostability [39]. Superfam server based analysis of protein sequences revealed their similarity to thermostable phytase (3-phytase) superfamily (Table 3). This family represents phytases which are thermostable at high temperatures and have a distinct catalytic mechanism with removal of initial phosphorus from 3rd carbon of phytate ring. A total of 10 motifs with given parameters were reported by MEME analysis. The 29 amino acid residues long motif "1" "DDPAIWVHPHDPEKSRIIGTNKKSGLAVY" was observed in all 44 protein sequences, with a conserved sequence "DDPAIW[VI][HN]PK[DN]P[ESA]KS. " This sequence might be used for diversity and expression analysis of BPPhy enzymes. Functional domain analysis using BlastP search for this motif revealed that the sequence belongs to phytase superfamily (Table 4).

Conclusion
In conclusion, a -propeller phytase of 3-phytase family from B. licheniformis strain PB-13 was successfully expressed in E. coli BL21. Phylogenetic clustering, conserved motifs sequences, and variation among biochemical features of different BPPhy phytases in this study could be key information for screening of novel phytases and comparison with other classes of phytases, which might contribute in further classification and application of diverse BPPhys. Functional characterization of amino acid residues in conserved regions of BPPhy is required for determining their role in enzyme catalysis. Overall, this in silico analysis will be important for Biotechnology Research International 9

Conflict of Interests
On behalf of all contributing authors, it is declared that there is no conflict of interests regarding the publication of this paper.