In Silico Screening and Molecular Dynamics Simulation of Disease-Associated nsSNP in TYRP1 Gene and Its Structural Consequences in OCA3

Oculocutaneous albinism type III (OCA3), caused by mutations of TYRP1 gene, is an autosomal recessive disorder characterized by reduced biosynthesis of melanin pigment in the hair, skin, and eyes. The TYRP1 gene encodes a protein called tyrosinase-related protein-1 (Tyrp1). Tyrp1 is involved in maintaining the stability of tyrosinase protein and modulating its catalytic activity in eumelanin synthesis. Tyrp1 is also involved in maintenance of melanosome structure and affects melanocyte proliferation and cell death. In this work we implemented computational analysis to filter the most probable mutation that might be associated with OCA3. We found R326H and R356Q as most deleterious and disease associated by using PolyPhen 2.0, SIFT, PANTHER, I-mutant 3.0, PhD-SNP, SNP&GO, Pmut, and Mutpred tools. To understand the atomic arrangement in 3D space, the native and mutant (R326H and R356Q) structures were modelled. Finally the structural analyses of native and mutant Tyrp1 proteins were investigated using molecular dynamics simulation (MDS) approach. MDS results showed more flexibility in native Tyrp1 structure. Due to mutation in Tyrp1 protein, it became more rigid and might disturb the structural conformation and catalytic function of the structure and might also play a significant role in inducing OCA3. The results obtained from this study would facilitate wet-lab researches to develop a potent drug therapies against OCA3.


Introduction
Oculocutaneous albinism type 3 (OCA3) is an autosomal recessive disorder characterized by reduced biosynthesis of melanin pigment in the hair, skin, and eyes [MIM 203290]. This disorder is mostly caused by the genetic mutation in TYRP1 gene. OCA3 is also known as Rufous oculocutaneous albinism. The human TYRP1 gene consists of 8 exons and 7 introns, spanning almost 15-18 kb of genomic DNA in the region of 9p23 [1][2][3][4]. This gene encodes a protein called Tyrosinase-related protein 1 (Tyrp1), has a molecular weight of ∼75 kDa, and appears to be the most abundant melanosomal protein of the melanocyte [5,6]. Tyrp1 is comprising of 537 amino acid residues and shares 40-52% of amino acid homology to tyrosinase.
The tyrosinase-related family includes tyrosinase, tyrosinaserelated protein 1 (Tyrp1), and tyrosinase-related protein 2 (Tyrp2) involved in this enzymatic process that converts tyrosine to melanin pigments. Certainly, two types of melanin are produced by melanocytes, which are pheomelanins (red or yellow) and eumelanins (brown or black) [7]. The first two steps of both eumelanin and pheomelanin production involve tyrosinase catalysing the conversion of tyrosine to 3,4-dihydroxy-L-phenylalanine (DOPA) and of DOPA to DOPA quinone [8,9]. Then pheomelanogenesis seems to be the default pathway in the absence of MC1R signalling, with a low tyrosinase activity and a high concentration of thiolic compounds, such as cysteine. In another way, eumelanin synthesis requires -MSH binding to MC1R [10,11], which transcriptionally activates tyrosinase and upregulates Tyrp1 and Tyrp2 [12][13][14]. In addition to their roles in pigmentation, tyrosinase family proteins also influence the biology of melanocyte and melanoma. There is evidence that Tyrp1 is involved in the maintenance of melanosome structure and affects melanocyte proliferation and cell death [15][16][17][18]. Based on its homology to tyrosinase, Tyrp1 has also been speculated to be another tyrosinase or reveal the tyrosinaselike activity. Tyrp1 shows tyrosine hydroxylase activity, albeit under low substrate (L-tyrosine) concentration, but no DOPA oxidase activity [19,20]. Based on that human Tyrp1 is involved in conversion of L-tyrosine to DOPA with low turnover rates, sufficient to prime the system by the generation of low amounts of DOPA, a necessary co-factor for tyrosinase activity [21]. Tyrp1 has also been attributed with various other catalytic functions including dopachrome tautomerase (Dct), dihydroxyindole (DHI) oxidase [22] and 5,6-dihydroxyindole-2-carboxylic acid (DHICA) [23]. Mutagenesis studies have recently confirmed that Tyrp1 is actively involved in inactivation of the catalytic activity of tyrosinase [24]. Observing the more number of pathological genetic variants and their structural and functional aspects of OCA3 will aid in development of personalized medicine.
Several computational algorithms used for the accurate prediction of OCA3 uncharacterized alleles for their disease related property. Mutations involved in OCA3 disorder are hard to scrutinize using in vivo examinations. Hence, an efficient experimental design specific to these diseases are mandatory to observe the disease associated mutation of respective SNPs. Several research articles have stated is effectiveness in identifying the deleterious and diseaseassociated mutations, thus predicting the pathogenic nsSNPs in correlation to their functional and structural damaging properties [25][26][27][28]. Computational studies have previously provided an efficient platform for evaluation and analysis of genetic mutations for their pathological consequences and in determining their underlying molecular mechanism [27][28][29][30][31][32][33]. Moreover the conformational changes in the 3D structure of the protein account for the changes in its time dependent physiological affinities and various biochemical pathway alterations [34][35][36][37]. Here we used set of computational platforms that utilizes sequence-based conservation profile, homology-based structure profile information, and support vector algorithm used to examine the disease associated nsSNPs. In this study we have applied a set of tools like PolyPhen 2.0 [38], SIFT [39], I-mutant 3.0 [40], PAN-THER [41], PhD-SNP [42], SNP&GO [43], Pmut [44], and MutPred [45] to show greater accuracy for the prediction of most disease-associated mutations in OCA3 gene and their structural consequence. Further, we carried out molecular dynamic simulations (MDS) to analyse the molecular and structural basis of predicted disease associated nsSNPs. MDS were applied to observe the motion trajectory and atomic interaction of native and mutant (R326H and R356Q) Tyrp1 protein. The overall strategy implemented in this work is shown in Figure 1.

2.1.
Dataset. The data on human TYRP1 genes were collected from OMIM [46] and Entrez gene on National Center for Biotechnology Information (NCBI) Website. The SNP information of TYRP1 gene was obtained from dbSNP (http://www.ncbi.nlm.nih.gov/snp/) [47] and Swissprot databases [48][49][50]. The amino acid sequence of Tyrp1 protein was retrieved from the Uniprot database (Uniprot ID: P17643). In order to build the mutant structures, we induced the point mutations in the position of 326 and 356 of Tyrp1 protein using SPDB viewer package [51]. These structures were energetically optimized by applying the all atom OPLS force field available in GROMACS package 4.5.3 [52].

Disease Related SNP Prediction.
The single nucleotide polymorphism occurring in the protein coding region may lead to the deleterious consequences and might affect its 3D structure. Here we applied PolyPhen 2.0 [38], SIFT [39], I-Mutant 3.0 [40], PANTHER [41], PhD-SNP [42], SNP&GO [43], Pmut [44], and MutPred [45] tools in order to examine the disease-associated nsSNP occurring in the Tyrp1 protein coding region. PolyPhen 2.0 is based on combination of sequence and structure based attributes and uses naive Bayesian classifier for the identification of an amino acid substitution and the impact of mutation. The output levels of probably damaging and possibly damaging were classified as functionally significant (≤0.5) and the benign level being classified as tolerated (≥0.51) [38]. SIFT prediction is based on the sequence homology and the physicochemical properties of amino acids which are dictated by the substituted amino acid. SIFT score ≥0.05 indicates the amino acid substitution is intolerant or deleterious, whereas the score ≤0.05 predicted it as tolerant [39]. I-Mutant 3.0 is a support vector machine (SVM) based tool. We used the sequence based version of I-Mutant 3.0 that classifies the prediction into three classes: neutral mutation (−0.5 ≤ DDG ≥ 0.5 kcal/moL), large decrease (<−0.5 kcal/moL), and a large increase (>0.5 kcal/moL). The free energy change (DDG) predicted by I-Mutant 3.0 is based on the difference between unfolding Gibbs free energy change of mutant and native protein (kcal/moL) [40]. PANTHER program is a protein family and subfamily database which predicts the frequency of occurrence of amino acid at a particular position in evolutionary related protein sequences. The threshold subPSEC score of −3 has been assigned below which the predictions are considered as deleterious [41]. We filtered the nsSNPs that were combinedly predicted to be deleterious and damaging from these four servers. Further we used PhD-SNP, SNP&GO, Pmut, and MutPred tools to examine the disease-associated nsSNPs. PhD-SNP is SVM based classifier, trained over the million amino acid polymorphism datasets using supervised training algorithm [42]. It predicts whether the given amino acid substitution leads to disease associated or neutral along with the reliability index score [42]. SNP&GO retrieves data from protein sequence, evolutionary information, and functions as encoded in the gene ontology terms [43]. Pmut is a neural network based program which is trained on large database of neutral and pathological mutations [44]. Pmut uses 3 parameters including mutation descriptors, solvent accessibility, and residue and sequence properties to calculate the pathogenicity indexes of given input mutation data ranging from 0 to 1. The mutations with index score greater than 0.5 are predicted to be pathologically significant [44]. MutPred is a web based tool, used to predict the molecular changes associated with amino acid variants [45]. It uses SIFT, PSI-BLAST, and Pfam profiles along with some structural disorder prediction algorithms, including TMHMM, MARCOIL, I-Mutant 2.0, B-factor prediction, and DisProt [45]. Functional analysis includes the prediction of DNA-binding site, catalytic domains, calmodulinbinding targets, and posttranslational modification sites [45]. Combining the scores of all four servers, the accuracy of prediction rises to a greater extent and finally we filtered the most disease-associated mutation.

Modelling of Native and Mutant TYRP1
Proteins. According to the annotated information available in UNIPROT entry-P17643, the predicted deleterious mutation sites of Tyrp1 protein were observed in the topological domain which comprised between the regions 190-385. Hence, we modeled Tyrp1 protein segment which consists of 196 amino acid residues by I-TASSER server [53]. This program works by combining the folds and secondary structure by profileprofile alignment threading techniques for non-aligned regions. For the submitted sequences, five 3D models were obtained and the best model was selected based on the lowest energy. Further the native structure was mutated with the most deleterious substitution predicted in this study. In order to build the mutant structures, we made a point mutation in native Tyrp1 protein at R326H (arginine to histidine) and R356Q (arginine to glutamine) using SPDB viewer [51]. The native and mutant structures were energetically optimized by applying the all atom OPLS force field available under the GROMACS 4.5.3 package [52]. The quality of model structures was verified using the PROCHECK [54] and PROSA [55] programs.

Molecular Dynamics
Simulation. Molecular dynamics simulation was performed by using gromacs 4.5.3 package [52] running on a single Intel Core2Duo machine with 3 GB RAM and running Ubuntu 11.10 Linux package. Structure of native and mutant Tyrp1 protein was used as starting point for MD simulations. Systems were solvated in a cubic box with simple point charge (SPC) water molecules at 10Å marginal radius. At physiological pH the structures were found to be negatively charged; thus in order to make the simulation system electrically neutral, we added 10 sodium ions (Na+) to the simulation box using the "genion" tool that accompanies with gromacs package. Initially the solvent molecules were relaxed while all the solute atoms were harmonically restrained to their original positions with a force constant of 100 kcal/moL for 5000 steps. After this, whole molecular system was subjected to energy minimization for 5000 iterations by steepest descent algorithm implementing GROMOS96 43a1 force field. Berendsen temperature coupling method [56] was used to regulate the temperature inside the box. Electrostatic interactions were computed using the Particle Mesh Ewald method [57]. The ionization states of the residues were set appropriate to pH 7 with all histidines assumed neutral. The pressure was maintained at 1 atm with the allowed compressibility range of 4.5e − 5 atm. SHAKE algorithm was used to constrain bond lengths involving hydrogen, permitting a time step of 2 fs. Van der Waals and coulomb interactions were truncated at 1.0 nm. The nonbonded pair list was updated every 10 steps and conformations were stored every 0.5 ps. Position restraint simulation for 500 ps was implemented to allow solvent molecules to enter the cavity region of structure. Finally, systems were subjected to MD simulation for 20 ns. We then computed the comparative analysis of structural deviations in native and mutant structure. RMSD, RMSF, SASA, Rg, DSSP, and density plot analysis were carried out by using g rms, g rmsf, g sas, g gyrate, do dssp, and g density tool, respectively. Number of distinct hydrogen bonds formed by specific residues to other amino acids within the protein during the simulation (NH bond) was calculated using g hbond. NH bond determined on the basis of donor-acceptor distance smaller than 0.35 nm and of donor-hydrogen-acceptor. All the graphs were plotted using XMGRACE [58] program.

Principal Component Analysis.
The calculation of the eigenvectors and eigenvalues, and their projection along the first two principal components, was carried out using essential dynamics (ED) method according to protocol [59] within the GROMACS software package. The principle component analysis or ED is a technique that reduces the complexity of the data and extracts the concerted motion in simulations that are essentially correlated and presumably meaningful for biological function [59]. In the ED analysis, a variance/covariance matrix was constructed from the trajectories after removal of the rotational and translational movements. A set of eigenvectors and eigenvalues was identified by diagonalizing the matrix. The eigenvalues represents the amplitude of the eigenvector along the multidimensional space, and the displacement of atoms along each eigenvector shows the concerted motions of protein along each direction. The movements of structures in the essential subspace were identified by projecting the Cartesian trajectory coordinates along the most important eigenvectors from the analysis. Backbone C-alpha bonds trajectories were obtained using g covar and g anaeig of gromacs utilities.

Results and Discussion
To determine the deleterious nonsynonymous single nucleotide polymorphisms (nsSNPs), which might be involved in inducing disease associated phenomena, is now among the most important field of computational genomic research. The disease associated mutations can be identified with the help of genome sequencing and its analysis. The advanced method in computational biology has now enabled us to determine the deleterious nsSNPs in the target candidate genes. Computational methods were applied to study the protein structural and functional effect on point mutation at molecular level. In this investigation we implemented multiple computational methods to identify the most likely pathogenic mutations in TYRP1 gene. Our results also revealed that implementations of different algorithms often serve as powerful tools for prioritizing candidate functional nsSNPs. Here we used SIFT, PolyPhen, I-Mutant 3.0, PANTHER, PhD-SNP, SNP&GO, Pmut, and MutPred tools to examine the most deleterious and disease associated nsSNPs from the SNP dataset. The combination of methods based on evolutionary information and protein structure and/or functional parameters were used in order to increase the prediction accuracy.

Prediction of Disease-Associated nsSNPs.
We applied PhD-SNP which is based on support vector machine tool to further classify the predicted deleterious nsSNPs as disease associated. Total 19 nsSNPs which were commonly predicted   Table 2). In SNP&GO, 19 nsSNPs were predicted to be disease associated. To verify this prediction, we further employed artificial neural network (ANN) based Pmut tool. Out of 19 nsSNPs, 10 mutations showed pathogenecity and remaining 9 nsSNP showed as neutral (Table 2). Particularly, R326H showed higher pathogenecity level with pathogenicity index of 0.9314 (Table 2). 8 mutations (R146W, G309E, G309R, D343V, T262M, R93H, R326H, and R356Q) were predicted as most disease associated by PhD-SNP, SNP&GO, and Pmut servers. These 8 mutations were further analysed by MutPred tool to predict the SNP disease-association probability and probable change in the molecular mechanism in the mutant. We found R326H to be highly deleterious with general probability ( ) scores of 0.938 and was predicted to induce the loss of stability with ( ) score of 0.0202, showing confident hypothesis. R356Q was found to be highly deleterious with general probability ( ) scores of 0.801 and was predicted to induce the loss of catalytic residue at R356 with ( ) score of 0.0446, showing confident hypothesis. At the end of so many mutations considered, we screened R326H and R356Q as the most deleterious and disease associated mutation in TYRP1 gene (Table 3). This prediction could be endorsed with the observed experimental data [62].

Modelling of Protein.
The human Tyrp1 protein (between domain regions 190-385) was modelled by automated protein structure prediction program, I-TASSER [53]. The program used more than ten templates to model the protein. The top most template (PDB ID: 3nm8A) has covered 86% of the Tyrp1 protein query sequence. The best structure with high confidence score was collected and used for further investigations. The disease associated mutations of R326H and R356Q can probably alter the native conformation of the Tyrp1 protein structure. Hence we made a point mutation in native Tyrp1 protein at the position of 326 (arginine to histidine) and 356 (arginine to glutamine) to build the mutant structures. The quality of the modeled structure of native and mutant Tyrp1 protein was evaluated independently by the PROCHECK [54] and PROSA [55] programs, which showed good stereochemical properties of the modeled proteins. Native Tyrp1 protein showed 91.7% of residues in most favoured and allowed region and -score value of −3.4. Mutant of R326H showed 92.3% of residues in most favoured and allowed region and -score value of −2.13. Mutant of R356Q showed 92.3% of residues in most favoured and allowed region and -score value of −2.81. Native and mutant (R326H and R356Q) Tyrp1 structures showed the -factor in the ranges of 0.40, 0.45, and 0.46, respectively. The overall -factors of native and mutant Tyrp1 protein structures (acceptable between 0 and 0.5) were produced by PROCHECK in the range of 0.40-0.46. These scores implicate high confidence level and hence the structures were selected for further MD analysis.

Molecular Dynamics Simulation.
To understand the structural and functional behaviour of the prioritized disease associated mutations, we performed molecular dynamics simulation for native and mutant Tyrp1 proteins. The following seven factors, namely, tolerance index, PSIC score, DDG value, subPSEC score, disease-association study, pathogenecity index, general score ( ), and property score ( ),  which correspond to conformational changes in protein residues due to the mutation, may lead to affect the functional behaviour of Tyrp1 protein. The results obtained from above analysis further inspired us to study the dynamic behaviour of native and mutant (R326H and R356Q) Tyrp1 proteins. We studied RMSD, RMSF, Rg, SASA, and NH bond variations, DSSP, density plot, and ED analysis between the native and mutant (R326H and R356Q) Tyrp1 protein structures. Further, the RMSD for all C atoms from the initial structure was examined to study the convergence of the protein system. In Figure 2(a), native and both mutant (R326H and R356Q) structures showed similar way of deviation till 3050 ps from their starting structure, resulting in a backbone RMSD of ∼0.14 to 0.52 nm during the simulations. After this, native structure retained the maximum deviation till the end of the simulation resulting in the backbone RMSD of ∼0.51 to ∼0.66 nm, respectively. R326H and R356Q mutant structures showed the minimum deviation till the end of the simulation, resulting in the backbone RMSD of ∼0.37 to ∼0.51 nm and ∼0.38 to ∼0.54 nm, respectively. This magnitude of fluctuations, together with very small difference between the average RMSD values after the relaxation period (∼0.52 nm), leads to produce stable trajectories in simulation, and it provided an appropriate basis for further analysis. The average value of RMSD during the simulation time period both native and mutant (R326H and R356Q) structures is signified in Table 4. Through the aim of determining RMSF we predicted whether the mutation disturbs the dynamic behaviour of residues. The RMSF values of native and mutant (R325H and R356Q) structures were collected and shown in Figure 2(b). Analysis of fluctuation score depicted that the higher degree of flexibility was observed in native structure than mutant (R326H and R356Q) Tyrp1 protein structures. The radius of gyration (Rg) is defined  as the mass-weight root-mean-square distance of collection of atoms from their common center of mass. Therefore it provides an insight into the overall dimension of the protein.
Radius of gyration plot for C atoms of protein versus time at 300 K is shown in Figure 2(c).
In Figure 2 Table 4. A notable change was observed in both mutant (R326H and R356Q) structures as compared to the native. The change of SASA for native and mutant (R326H and R356Q) proteins with time is shown in Figure 2(d). Solvent accessibility surface area accounts for bimolecular surface area that is assessable to solvent molecules. Decreased value of SASA in mutant structures denotes its relatively shrunken nature as compared to the native structure. The change of SASA of native and mutant proteins with time is shown in Figure 2(d). Native and mutant (R326H and R356Q) structures showed similar fashion of deviation till 12000 ps from the initial structure, but after this native structure showed greater value of SASA than BioMed Research International  Table 4. We also observed notable differences in NH bond pattern during simulation, whereas the native structure showed less participation in NH bonds formation with other amino acid, while in mutant (R326H and R356Q) structures there was greater number of NH bonds (Figure 2(e)). The average value of NH bond was signified in Table 4.
The NH bond results of native and mutant Tyrp1 structure according to the RMSD, RMSF, Rg, and SASA plot results depicted that the mutant (R326H and R356Q) structural conformation became rigid upon mutation which may lead to disturb the functional behaviour of the protein. This was further supported by the atomic density plot and PCA analysis. The consequences of these molecular changes were clearly observed in the atomic density distribution plot.
There was a significant change in density distribution in mutant as compared to the native and it was depicted. Moreover the native structure shows highest atomic density distribution of 41.9 nm −3 but mutant (R326H and R356Q) structures showed 52.2 and 52.3 nm −3 , respectively, ( Table 4) (Figure S1a-c available online at: http://dx .doi.org/10.1155/2013/697051). It was further indicated that native structure has more flexibility than mutant (R326H and R356Q) structures.
The spectrum of the corresponding eigenvalues indicated the level of fluctuation and dynamic behaviour of protein molecule in the system and was basically confined within the first two eigenvectors. Both mutant (R326H and R356Q) structures covered a small region of phase space particularly along PC1 plane than native (Figure 3(a)). Overall flexibility of two proteins was further examined by the trace of the diagonalized covariance matrix of the C atomic positional fluctuations. We obtained the following values for native and  (Table 4), and again it was confirming the overall increased flexibility in native than mutant (R326H and R356Q) structures at 300 K. We applied the DSSP algorithm [63] to monitor changes in secondary structure during the simulations. As shown in Figures 4(a)-4(c), a difference is observed only at the level of Helix from the amino acid residual position of 150 to 175 and the level of sheet from the amino acid residual position of 5 to 20. In DSSP, the most significant structural change was an increase in helical content and absence in -sheet, which was observed in both mutant (R326H and R356Q) structures.
To examine how the structure got damaged and leads to affect the functions upon mutation, we analysed the native and mutant (R326H and R356Q) structures at different time scales ( Figure 5). It was clearly observed that there is continuous loss of helix in native structure than mutant structures till the end of the simulation. This was well supported by the DSSP analysis. It indicates that both mutant (R326H and R356Q) structures showed an increase in helical content and total absence in -sheet in the amino acid residual position from 150 to 170 and 5 to 20, respectively. In general, helices are mostly rigid, whereas spanning loop regions are mostly flexible. [64][65][66]. Based on that here, both mutant structures (R326H and R356Q) showed more helical content which leads to more rigidity in the conformation. On the basis of DSSP analysis, it was confirmed that due to mutation the Tyrp1 structure became rigid. Therefore, it seems evident that both mutations (R326H and R356Q) have cruel damaging impact on protein structural orientation and its function. This prediction is also endorsed with the observed experimental data [62,67]. This study provides an essential insight into the underlying molecular mechanism of Tyrp1 protein upon mutation and in future it may help to develop a personalized medicine for OCA3.

Conclusion
Computational analysis has now become a roadmap to define a standard disease specific SNP at molecular level. In this study we screened two most disease associated mutations (R326H and R356Q) which are related to OCA3. Molecular dynamics simulation approaches have also been extensively used to report the structural consequences of the deleterious predicted point mutations. The flexibility loss is observed in RMSD, RMSF, Rg plot which is further supported by a decrease in SASA value in both mutant structures. This may produce a major impact on the structural conformation of Tyrp1 protein, which also affects the function of Tyrp1 protein. Due to mutation, the structure became more rigid which is also supported by NH bond, density plot, PCA, and DSSP analysis. Our result suggests a significant computational roadmap to detect the OCA3 associated SNPs from the large SNP dataset and reduce the expenses in experimental depiction of pathological nsSNPs. Further the predicted R326H and R356Q mutations can be further studied by wet lab scientist to investigate the evidence of Tyrp1 protein mutation in association to OCA3 and develop a potent drug target for OCA3.