In Silico Prediction of T and B Cell Epitopes of Der f 25 in Dermatophagoides farinae

The house dust mites are major sources of indoor allergens for humans, which induce asthma, rhinitis, dermatitis, and other allergic diseases. Der f 25 is a triosephosphate isomerase, representing the major allergen identified in Dermatophagoides farinae. The objective of this study was to predict the B and T cell epitopes of Der f 25. In the present study, we analyzed the physiochemical properties, function motifs and domains, and structural-based detailed features of Der f 25 and predicted the B cell linear epitopes of Der f 25 by DNAStar protean system, BPAP, and BepiPred 1.0 server and the T cell epitopes by NetMHCIIpan-3.0 and NetMHCII-2.2. As a result, the sequence and structure analysis identified that Der f 25 belongs to the triosephosphate isomerase family and exhibited a triosephosphate isomerase pattern (PS001371). Eight B cell epitopes (11–18, 30–35, 71–77, 99–107, 132–138, 173–187, 193–197, and 211–224) and five T cell epitopes including 26–34, 38–54, 66–74, 142–151, and 239–247 were predicted in this study. These results can be used to benefit allergen immunotherapies and reduce the frequency of mite allergic reactions.


Introduction
The house dust mites (HDM) are major sources of indoor allergens for humans, which induce asthma, rhinitis, dermatitis, and other allergic diseases [1]. Their major allergens (Dermatophagoides pteronyssinus [Der p] and Dermatophagoides farinae [Der f]) coexist in most geographical regions with a high proportion (up to 85%) of asthmatics being typically HDM allergic; hence, sensitization is attributed as a risk factor for developing asthma. Recently, a birth cohort study showed that sensitization to HDM at age of 2 years was associated with current wheeze at age of 12 years in both monosensitized and polysensitized HDM-sensitized children [2]. In previous studies, fourteen D. farinae allergens (Der f 1-3, 6, 7, 10, 11, 13-18, and 22) were reported before other seventeen allergens belonging to twelve different groups were identified by a procedure of proteomics combined with twodimensional immunoblotting from D. farinae extracts [3,4]. Among the novel identified D. farinae allergens, Der f 25 is a triosephosphate isomerase (TPI) with a molecular weight of 34 kDa, showing 75.6% by immunoblotting and 60% by skin prick positive reaction to dust mite allergic patients, respectively. It represented the major allergen in D. farinae [4].
Currently, specific immunotherapy is the only allergenspecific approach for its treatment of mite allergy. The administration of increasing doses of allergen extracts to patients is the method most commonly applied. However, the use of crude extracts has several disadvantages. It could induce severe anaphylactic side reactions or lead to 2 International Journal of Genomics sensitization towards new allergens present in the mixture [13,14]. Different strategies have been designed to try to overcome these negative effects, as the use of allergen-derived B cell peptides, allergen-derived T cell epitope containing peptides, or vaccination with allergen-encoding DNA [15]. Known epitopes for some of these mite allergens are described in detail in Cui's review [16]. However, there is no report about the epitope of Der f 25 allergen. In the present study, we firstly identified the B and T cell epitopes of Der f 25 allergen by in silico approach. It implied their potential utility in a peptide-based vaccine design for mite allergy. The homologous amino acid sequences were retrieved and aligned using Clustal X 2.1 [17]. Phylogenetic tree was obtained by using ML (maximum-likelihood) method on the basis of the JTT amino acid sequence distance implemented in MEGA 5.1 [18]; the reliability was evaluated by the bootstrap method with 1000 replications.

Physiochemical Analysis and Posttranslational
Patterns and Motifs. Physiochemical analysis including molecular weight, theoretical pI, amino acid composition, instability index, aliphatic index, and grand average of hydropathicity (GRAVY) of Der f 25 was performed by using ProtParam tool (http://web.expasy.org/protparam/). Der f 25 characteristic pattern was checked for original sequence and further analysis was performed to highlight the presence of functional motifs by using the Prosite database (http://prosite.expasy.org/) [20]. Biologically meaningful motifs and susceptibility to posttranslational modifications were derived from multiple alignments and the ScanProsite tool. Phosphorylation motifs with more than 80% of probability of occurrence were analyzed by using NETPhos v2.0 (http://www.cbs.dtu.dk/services/ NetPhos/) and NETPhosK v1.0 (http://www.cbs.dtu.dk/ services/NetPhosK/) [22].

Homology Modeling and
Validation. The Der f 25 protein sequence was searched for homology in the PDB. As well, the homologous templates suitable for Der f 25 were selected by PSI-BLAST server (http://blast.ncbi.nlm.nih.gov/Blast.cgi) and SWISS-MODEL server (http://swissmodel.expasy.org/) [25,26]. The best template was retrieved from the results of previous methods and used for homology modeling. Der f 25 modeled protein structure was built through alignment mode in SWISS-MODEL using the complete amino acid sequence. An initial structural model was generated and checked for recognition of errors in 3D structure by PROCHECK [27], ERRAT [28], and VERIFY 3D [29] programs in structural analysis and verification server (SAVES) (http://nihserver.mbi.ucla.edu/SAVES/). The final model structure quality of Der f 25 was assessed by QMEAN [30], by checking protein stereology with ProSA program [31] and the protein energy with ANOLEA (http://protein.bio.puc.cl/cardex/servers/anolea/) [32]. The Ramachandran plot for all the models was generated, showing the majority of the protein residues in the favored regions.

Conservation Analysis and Poisson-Boltzmann Electrostatic
Potential. Der f 25 model was submitted to Con-Surf server (http://consurf.tau.ac.il/) in order to generate evolutionary related conservation scores helping to identify functional regions in the proteins. Functional and structural key residues in Der f 25 sequence were confirmed by ConSeq server [33].
APBS molecular modeling software implemented in PyMOL 0.99 was used to investigate the electrostatic Poisson-Boltzmann (PB) potentials of Der f 25 model structure. AMBER99 in PDB2PQR server (http://nbcr-222 .ucsd.edu/pdb2pqr 1.8/) was used to assign the charges and radii to all of the atoms (including hydrogens) [34]. Fine grid spaces of 0.35Å were used to solve the linearized PB equation in sequential focusing multigrid calculations in a mesh of 130 points per dimension at 310.00 K. The dielectric constants were 2.0 and 80.0 for the protein and water. The output mesh was processed in the scalar OpenDX format to render isocontours and maps onto the surfaces with PyMOL 0.99. Potential values are given in units of per unit charge ( Boltzmann's constant; temperature).

In Silico Prediction of B Cell Epitopes.
Three immunoinformatics tools including DNAStar protean system, bioinformatics predicted antigenic peptides (BPAP) system (http://imed.med.ucm.es/Tools/antigenic.pl), and Bepi-Pred 1.0 server (http://www.cbs.dtu.dk/services/BepiPred/) were used to predicate the B cell epitopes of Der f 25. The ultimate consensus epitope results were obtained by combining the results of the three tools together with the method published earlier [35]. In the DNAStar protean system, four properties (hydrophilicity, flexibility, accessibility, and antigenicity) of the amino acid sequence were chosen as parameters for epitopes prediction. The BPAP system and the BepiPred 1.0 server only need the amino acid sequence and provide more straightforward results which are combined with physicochemical properties of amino acids such as hydrophilicity, flexibility, accessibility, turns, and exposed surface [36].

In Silico Prediction of T Cell
Epitopes. T cell epitopes are principally predicted indirectly by identifying the binding of peptide fragments to the MHC complexes. The binding significance of each peptide to the given MHC molecule is based on the estimated strength of binding exhibited by a predicted nested core peptide at a set threshold level. For HLA-DRbased T cell epitope prediction, the artificial neural networkbased alignment (NN-align) method NetMHCIIpan-3.0 (http://www.cbs.dtu.dk/services/NetMHCIIpan/) [37] was applied. For HLA-DQ alleles, NetMHCII-2.2 (http://www .cbs.dtu.dk/services/NetMHCII/) [38] was used. In this study, HLA-DR 101, HLA-DR 301, HLA-DR 401, and HLA-DR 501 were used to predict HLA-DR-based T cell epitope prediction. The ultimate HLA-DR-based T cell epitope results were obtained by combining those four results together that if three of them showed epitope, then the consensus result was epitope. This method was also used in HLA-DQ-based T cell epitope prediction. HLA-DQA10101-DQB10501, HLA-DQA10501-DQB10201, HLA-DQA10501-DQB10301, and HLA-DQA10102-DQB10602 were used to predict HLA-DQbased T cell epitope prediction. As a result, the ultimate consensus epitope results were obtained by combining the results of the HLA-DR-based T cell epitope and HLA-DQbased T cell epitope.
B cell and T cell epitopes identified by computational tools were mapped onto linear sequence and on the threedimensional model of Der f 25 to determine their position and secondary structure elements involved.

Sequence Retrieval and Sequence
Analysis. The amino acid sequence of Der f 25 was obtained from the Nucleotide database of NCBI. Uniprot and tBLASTn were used to search the homologous sequences of Der f 25. As a result, thirtysix sequences were obtained and in order to determine the relationships between Der f 25 and its homologous sequences, phylogenetic analysis was performed and the evolutionary tree inferred by the ML method was showed in Figure 1. Phylogenetic analysis result showed that there are proteins including Der f 25 clustered into the same group, belonging to TIPs. Moreover, domain analysis results showed that Der f 25 belongs to the TIM phosphate binding superfamily (SUPER-FAMILY number SSF51351 and InterPro number IPR016040) and TPI family (SUPERFAMILY number SSF51352 and Inter-Pro number IPR000652).
After searching for characteristic motifs or patterns, we found that Der f 25 exhibited a TPI pattern, PS00171 (162-172, AYEPVWAIGTG) ( Figure 2). Phosphorylation sites including two Ser (95 and 221) and two Thr (146 and 171) residues were predicted and showed in Figure 2. Two types of kinases (PKC for 95, 146, and 171 and DNAPK for 221) were predicted to be phosphorylated for Der f 25 complete sequence.
The primary structure of Der f 25 contained 247 amino acids and the molecular weight is 27134.1. The theoretical pI is 6.24 and the aliphatic index is 95.06. The GRAVY is −0.103 meaning that Der f 25 exhibited hydrophilic character. The instability index is 30.57 meaning that the sequence of Der f 25 is stable.

Homology Modeling and Validation.
Searching for the proteins with known tertiary structure in the PDB yielded Tenebrio molitor TPI (PDB accession number: 2I9E) showing the highest sequence identity (74%) with Der f 25. The SWISS-MODEL server was also used to identify the best possible template and found a high score of 365 and very low -value of e-101 for 2I9E template. Hence, the 2I9E template was used for homology modeling. As indicated by the Ramachandran plot (Figure 3(b)), 93% residues in Der f 25 model were within the most favored regions, 7% residues in the additional allowed region, 0% residues in the generously allowed regions, and 0% residues in the disallowed region; 93.4% residues in 2I9E template were within the most favored regions, 6.6% residues in the additional allowed region, 0% residues in the generously allowed regions, and 0% residues in the disallowed region. The goodness factor ( -factor) based on the observed distribution of stereochemical parameters (main chain bond angles, bond length, and phi-psi torsion angles) returned accurate values for a reliable model (  (Table 1). Based on these validations, it is shown that the homology model was adopted for this study.

Structure Analyses.
Secondary structure prediction of Der f 25 with PSIPRED identified ten -helices and sevensheets ( Figure 2) in Der f 25. Alternatively, NetSurfP v1.1 predicted nine -helices and eight -sheets. These results were predicated by different servers and have subtle distinction. The best template 2I9E was used for homology modeling; the overall 3D structure of Der f 25 was shown in Figure 3(a). Sequence polymorphism was responsible for the changes in the spatial distribution of the skeleton alpha carbons, which is reflected in differences between the structures of Der f 25   is shown in Figure 3(b) and the values for superimposed are 0.062Å. As a TPI protein, Der f 25 has two active sites; His in the 94th position is an electrophile while Glu in the 164th position is the proton acceptor. It has two substrate binding sites, the Asn in the 10th position and the Lys in the 12th position. Moreover, the characteristic pattern predicted by ScanProsite tool is shown in Figure 3(c).

Conservational Analysis and Electrostatic Potential.
Con-Surf conservational analysis of structural and functional key amino acids showed that the Der f 25 protein surfaces were not well conserved, with almost forty high variability residues in different superficial areas. All of the amino acids in the TPI pattern (AYEPVWAIGTG) are conserved. Surface electrostatic potential analysis reveals several prominent charged residues, with half of the side exhibiting large positive values (blue regions) and the other half showing predominantly negative values (red regions) (Figure 3(d)).   tools. The ultimate results of the three immunoinformatics tools finally predicted eight peptides (11-18, 30-35, 71-77, 99-107, 132-138, 173-187, 193-197, and 211-224) ( Table 2) and these peptides were also shown in Figures 2 and  4.  Table 1 and the ultimate MCBL: distribution of the main chain bond lengths; CBA: distribution of the covalent bond angles. E Residues in favorable regions; F residues in allowed regions; G residues in generally allowed regions; H residues in disallowed regions; I -factor score of the dihedral bonds; J -factor score of the covalent bonds; K overall -factor score; L root mean square deviation between C Der f 25 structure and 2I9E template. were also shown in Table 3

Discussion
The prevalence of human atopic disorders including allergic rhinitis, asthma, and atopic dermatitis is increasing during the past several decades. House dust mite allergies constitute more than 50% of allergic patients and often have severe forms of respiratory allergy, such as asthma [1]. Characterization of mite allergens will be beneficial in the diagnosis and treatment of mite-induced atopic illnesses. Among the identified allergens, Der f 25 is a new protein with a molecular weight of 34 kDa, representing the major allergen in D. farinae [4]. The objective of this study was to predict the B and T cell epitopes of Der f 25. Firstly, in order to better understand the structure and function of Der f 25, we analyzed the basic sequence properties and studied the 2D and 3D structures of Der f 25. Phylogenetic analysis result showed that Der f 25 protein clustered into the TIPs group; domain analysis also proved a strong evidence illustrating that Der f 25 belongs to the TPI family. In 2D structure analysis, it is clearly shown that Der f 25 composed of tenhelices and seven -sheets. The 3D structure of Der f 25 was performed by homology modelling which was widely used in many areas of structure-based analysis and study [39]. PDB server was used to search templates of Der f 25 and found that the structure of Tenebrio molitor TPI (2I9E) was the best template with the highest identity. Also, small dissimilates in RMSD were observed between Der f 25 and 2I9E template. The built model structure is feasible by the Ramachandran plot analysis, ERRAT program, VERIFY 3D program ProSa analysis, values, and RMSD. All of these validations showed that the homology model was available. The similar methods for the structures modelling were also successfully conducted in other allergens of Api SI and Api SII [40], Der f 5 [41], Ole e 2 [42] Ole e 11 [39], and Ole e 12 [43]. Based on the conservational analysis of the primary sequence of Der f 25, it is found that almost forty high variability residues sit in different superficial areas of protein surface. The active sites His in 94th position and Glu in 164th position as well as the substrate binding sites Asn in 10th position and Lys in 12th position are the completely conserved sites.
In silico prediction has already become a familiar and useful tool for selecting epitopes from immunological relevant proteins, which can save the expense of synthetic peptides and the working time [44]. Recently, many algorithms have been developed to predict B cell epitopes on a protein sequence based on propensity values of amino acid properties of hydrophilicity, antigenicity, segmental mobility, flexibility, and accessibility [45]. In the present study, we used three algorithms (DNAStar protean system, BPAP, and BepiPred 1.0 server) to predict the B cell epitopes. The previous study showed that the use of bioinformatics approach to predict B cell epitopes correlated well with the experimental approach [46]. Earlier study showed that allergen epitopes were comprised of a high proportion of hydrophobic amino acids [47]. The amino acids Ala, Ser, Asn, Gly, and particularly Lys play a key role in the IgE binding allergenic epitopes [48]. In our results, nearly half of the total residues lying in B cell epitopes were hydrophobic (Table 2). Moreover, each predicted B cell epitope has one or more special five amino acids and the common residues in all B cell epitopes were Gly and Lys (Table 2). Electrostatic interactions are known to determine the orientation of the molecules and stabilize antigen-antibody complexes [49]. Surface electrostatic potential analysis result showed that a great part of Der f 25 side exhibits large positive values (blue regions). Most parts of B cell epitopes are distributed in the blue regions and showed a strong negative potential. As a result, eight peptides (11-18, 30-35, 71-77, 99-107, 132-138, 173-187, 193-197, and 211-224) were predicted as the B cell epitopes. However, these B cell epitopes need further investigation in clinical samples.
In the last several years, some methods have substantially improved their accuracy to predict T cell epitopes such as NetMHCpan-3.0 and NetMHCII-2.2. NetMHCpan-3.0 is based on artificial neural networks and is trained on 52,062 quantitative peptide binding data covering all HLA as well as two mouse molecules. In this study, it was used to predict the HLA-DR-based T cell epitopes. For HLA-DQ-based T cell epitopes prediction, NetMHCII-2.2 was used. Although limited binding-affinity data are available for HLA-DQ, it was recently reported to provide the best performance in predicting this locus [50]. As a result, NetMHCIIpan-3.0 and NetMHCII-2.2 were used to predict the T cell epitopes in Der f 25 allergens and predicted 7 potential T cell epitope sequences including 26-34, 38-54, 66-74, 142-151, and 239-247. Despite the high accuracy of these predictions, this approach has not yet been applied to peptide-based vaccine development for allergic diseases.
Allergen-specific immunotherapy (SIT) represents the only allergen-specific and disease-modifying approach with long lasting effects for the treatment of allergic patients [51]. However, SIT can induce side effects, ranging from mild and local to severe and life-threatening symptoms, such as anaphylactic shock [52]. Severe side effects are frequently observed in patients with house dust mite (HDM) allergy [53]. The continuous exposure to HDM allergens further complicates the treatment of patients with HDM allergy. Additionally, the quality of natural HDM allergen extracts and vaccines based on these extracts is often poor. Attenuated allergenic molecules, that is, hypoallergens or synthetic peptide fragments, have been used as high dose and safer alternatives to conventional extract-based SIT [54]. Vaccination with a combination of small peptides that together extend across the entire native allergenic protein theoretically could preserve T cell activation while avoiding IgE-based immune responses. IgE recognizes conformational epitopes of larger peptides (B cell epitopes) and proteins while T cell receptors recognize small linear peptides of 8 to 10 amino acids (T cell epitope). By immunizing with small peptides, T cell activation could occur while IgE binding would be lost [55,56]. Then, we predicted B and T cell epitopes of Der f 25 allergen, the major allergen in HDM, using in silico method which can be used to benefit allergen immunotherapies and reduce the frequency of allergic reactions. However, their accuracies need to be confirmed in the further experiments.

Conclusion
In this study, we have a better understanding of the 2D and 3D structures of Der f 25 and have predicted eight B cell 8 International Journal of Genomics  187-195, 190-198, 191-199, 195-203, 197-