Novel Deleterious nsSNPs within MEFV Gene that Could Be Used as Diagnostic Markers to Predict Hereditary Familial Mediterranean Fever: Using Bioinformatics Analysis

Background Familial Mediterranean Fever (FMF) is the most common autoinflammatory disease (AID) affecting mainly the ethnic groups originating from Mediterranean basin. We aimed to identify the pathogenic SNPs in MEFV by computational analysis software. Methods We carried out in silico prediction of structural effect of each SNP using different bioinformatics tools to predict substitution influence on protein structure and function. Result 23 novel mutations out of 857 nsSNPs are found to have deleterious effect on the MEFV structure and function. Conclusion This is the first in silico analysis of MEFV gene to prioritize SNPs for further genetic mapping studies. After using multiple bioinformatics tools to compare and rely on the results predicted, we found 23 novel mutations that may cause FMF disease and it could be used as diagnostic markers for Mediterranean basin populations.


Introduction
Familial Mediterranean Fever is an autosomal recessive inherited inflammatory disease [1][2][3] (however, it has been observed that a substantial number of patients with clinical FMF possess only one demonstrable MEFV mutation [4,5]) that is principally seen in different countries [6][7][8][9][10]. However, patients from different ethnicities (such as Japan) are being increasingly recognized [2,11], and the carrier frequency for MEFV genetic variants in the population in the Mediterranean basin is about 8% [12]. Most cases of FMF usually present with acute abdominal pain and fever [1,3,7], both of which are also the main causes of referral in the emergency department [13]. All these factors may help in medical treatment. Colchicine is the first line therapy [14], but in resistant cases (<10% of patients) [15], it affects the responsiveness to Colchicine [16]; other anti-inflammatory drugs can be used for extra anti-inflammatory effect [17]. If FMF is not treated, it may be an etiologic factor for colonic LNH in children [18]. MEFV gene is localized on 16p13.3 of chromosome 16 at position 13.3 which consists of 10 exons with 21600 bp [3,19]. The disease is characterized by recurrent febrile episodes and inflammation in the form of sterile polyserositis. Amyloid protein involved in inflammatory amyloidosis was named AA (amyloid-associated) protein and its circulating precursor was named SAA (serum amyloid-associated). Amyloidosis of the AA type is the most severe complication of the disease. The gene responsible for FMF, MEFV, encodes a protein called pyrin or marenostrin and is expressed mainly in neutrophils [3,19].
The definition of the MEFV gene has permitted genetic diagnosis of the disease. Nevertheless, as studies have unwrapped molecular data, problems have arisen with the clinical definitions of the disease [20]. FMF is caused by mutations in the MEFV missense SNPs (we were focusing on SNPs which are located in the coding region because it is much important in disease causing potential, which are responsible for amino acid residue substitutions resulting in functional diversity of proteins in humans) [20] coding for pyrin, which is a component of inflammasome functioning 2 Advances in Bioinformatics in inflammatory response and production of interleukin-1 (IL-1 ). Recent studies have shown that pyrin recognizes bacterial modifications in Rho GTPases, which results in inflammasome activation and increase in IL-1 . Pyrin does not directly recognize Rho modification but probably is affected by Rho effector kinase, which is a downstream event in the actin cytoskeleton pathway [19,21,22].
The aim of this study was to identify the pathogenic SNPs in MEFV using in silico prediction software and to determine the structure, function, and regulation of their respective proteins. This is the first in silico analysis in MEFV gene to prioritize SNPs for further genetic mapping studies. The usage of in silico approach has strong impact on the identification of candidate SNPs since they are easy and less costly and can facilitate future genetic studies [23].

Data
Mining. The data on human MEFV gene was collected from National Center for Biological Information (NCBI) website [24]. The SNP information (protein accession number and SNP ID) of the MEFV gene was retrieved from the NCBI dbSNP (http://www.ncbi.nlm.nih.gov/snp/) and the protein sequence was collected from Swiss Prot databases (http://expasy.org/) [25].

SIFT.
SIFT is a sequence homology-based tool [26] that sorts intolerant from tolerant amino acid substitutions and predicts whether an amino acid substitution in a protein will have a phenotypic effect. It considers the position at which the change occurred and the type of amino acid change. Given a protein sequence, SIFT chooses related proteins and obtains an alignment of these proteins with the query. Based on the amino acids appearing at each position in the alignment, SIFT calculates the probability that an amino acid at a position is tolerated conditional on the most frequent amino acid being tolerated. If this normalized value is less than a cutoff, the substitution is predicted to be deleterious. SIFT scores <0.05 are predicted by the algorithm to be intolerant or deleterious amino acid substitutions, whereas scores >0.05 are considered tolerant. It is available at (http://sift.bii.astar.edu.sg/).

PolyPhen-2.
It is a software tool [27] to predict possible impact of an amino acid substitution on both structure and function of a human protein by analysis of multiple sequence alignment and protein 3D structure; in addition, it calculates position-specific independent count scores (PSIC) for each of the two variants and then calculates the PSIC scores difference between the two variants. The higher a PSIC score difference is, the higher the functional impact a particular amino acid substitution is likely to have. Prediction outcomes could be classified as probably damaging, possibly damaging or benign according to the value of PSIC as it ranges from (0 1); values closer to zero were considered benign while values closer to 1 were considered probably damaging and also it can be indicated by a vertical black marker inside a color gradient bar, where green is benign and red is damaging. nsSNPs that is predicted to be intolerant by SIFT has been submitted to PolyPhen as protein sequence in FASTA format obtained from UniproktB/Expasy after submitting the relevant ensemble protein (ESNP) there, and then we entered position of mutation, native amino acid, and the new substituent for both structural and functional predictions. PolyPhen version 2.2.2 is available at http://genetics.bwh.harvard.edu/pph2/index.shtml.

Provean.
Provean is a software tool [28] which predicts whether an amino acid substitution or indel has an impact on the biological function of a protein. It is useful for filtering sequence variants to identify nonsynonymous or indel variants that are predicted to be functionally important. It is available at (https://rostlab.org/services/snap2web/).

SNAP2
. Functional effects of mutations are predicted with SNAP2 [29]. SNAP2 is a trained classifier that is based on a machine learning device called "neural network". It distinguishes between effect and neutral variants/nonsynonymous SNPs by taking a variety of sequence and variant features into account. The most important input signal for the prediction is the evolutionary information taken from an automatically generated multiple sequence alignment. Also structural features such as predicted secondary structure and solvent accessibility are considered. If available also annotation (i.e., known functional residues, pattern, regions) of the sequence or close homologs are pulled in. In a cross-validation over 100,000 experimentally annotated variants, SNAP2 reached sustained two-state accuracy (effect/neutral) of 82% (at an AUC of 0.9). In our hands this constitutes an important and significant improvement over other methods. It is available at (https://rostlab.org/services/snap2web/).

PHD-SNP.
An online Support Vector Machine (SVM) based classifier is optimized to predict if a given single point protein mutation can be classified as disease related or as a neutral polymorphism. It is available at (http://snps.biofold.org/phd-snp/phd-snp.html).

SNP&Go.
SNPs&GO is an algorithm developed in the Laboratory of Biocomputing at the University of Bologna directed by Prof. Rita Casadio. SNPs&GO is an accurate method that, starting from a protein sequence, can predict whether a variation is disease related or not by exploiting the corresponding protein functional annotation. SNPs&GO collects in unique framework information derived from protein sequence, evolutionary information, and function as encoded in the Gene Ontology terms and outperforms other available predictive methods [30]. It is available at (http://snps.biofold.org/snps-and-go/snps-and-go.html).
2.8. P-Mut. P-MuT, a web-based tool [31] for the annotation of pathological variants on proteins, allows the fast and accurate prediction (approximately 80% success rate in humans) of the pathological character of single point amino acidic mutations based on the use of neural networks. It is available at (http://mmb.irbbarcelona.org/PMut). 2.11. GeneMANIA. We submitted genes and selected from a list of data sets that they wish to query. GeneMANIA's [33] approach is to know protein function prediction integrating multiple genomics and proteomics data sources to make inferences about the function of unknown proteins. It is available at (http://www.genemania.org/).

Discussion
23 novel mutations have been found (see Table 3) which affected the stability and function of the MEFV gene using bioinformatics tools. The methods used were based on different aspects and parameters describing the pathogenicity and provided clues on the molecular level about the effect of mutations. It was not easy to predict the pathogenic effect of SNPs using single method. Therefore, multiple methods were used to compare and rely on the results predicted. In this study we used different in silico prediction algorithms: SIFT, PolyPhen-2, Provean, SNAP2, SNP&GO, PHD-SNP, P-MuT, and I-Mutant 3.0 (see Figure 1). This study identified the total number of nsSNP in Homo sapiens located in coding region of MEFV gene, which were investigated in dbSNP/NCBI Database [24]. Out of 2369, there are 856 nsSNPs (missense mutations) submitted to SIFT server, PolyPhen-2 server, Provean sever, and SNAP2, respectively, and 392 SNPs were predicted to be deleterious in SIFT server. In PolyPhen-2 server, the result showed that 453 were found to be damaging (147 possibly damaging and 306 probably damaging showing deleterious). In Provean server our result showed that 244 SNPs were predicted to be    deleterious. While in SNAP2 server the result showed that 566 SNPs were predicted to have effect. The differences in prediction capabilities refer to the fact that every prediction algorithm uses different sets of sequences and alignments. In Table 2 we submitted four positive results from SIFT, PolyPhen-2, Provean, and SNAP2 (see Table 1) to observe the disease causing one by SNP&GO, PHD-SNP, and P-Mut servers.
BioEdit software was used to align 10 amino acid sequences of MEFV demonstrating that the residues predicted to be mutated in our band (indicated by red arrow) are evolutionarily conserved across species (see Figure 2). While Project HOPE software was used to submit the 23 most deleterious and damaging nsSNPs (see , L86P: Proline (the mutant residue) is smaller than Leucine (the wild-type residue); this might lead to loss of interactions. The wild-type and mutant amino acids differ in size. The mutation is located within a domain, annotated in UniProt as Pyrin. The mutation introduces an amino acid with different properties, which can disturb this domain and abolish its function. The wild-type residue is located in a region annotated in UniProt to form an -helix. Proline disrupts an -helix when not located at one of the first 3 positions of that helix. In case of the mutation at hand, the helix will be disturbed and this can have severe effects on the structure of the protein.
GeneMANIA revealed that MEFV has many vital functions: chemokine production, inflammatory response, interleukin-1 beta production, interleukin-1 production,    intracellular receptor signaling pathway, nucleotide-binding domain, Leucine rich repeat containing receptor signaling pathway, positive regulation of cysteine-type endopeptidase activity, positive regulation of endopeptidase activity, positive regulation of peptidase activity, regulation of chemokine production, regulation of cysteine-type endopeptidase activity, regulation of endopeptidase activity, regulation of interleukin-1 beta production, regulation of interleukin-1 production, and regulation of peptidase activity. The genes coexpressed with, sharing similar protein domain, or       participated to achieve similar function were shown in (see Figure 26) Tables 4 and 5.
In this study we also retrieved all these SNPs as untested (V659F, L709R, F743S, S749Y). We found it to be all damaging. Our study is the first in silico analysis of MEFV gene which was based on functional analysis while all previous studies [34,35] were based on frequency. This study revealed that 23 novel pathological mutations have a potential functional impact and may thus be used as diagnostic markers for Mediterranean basin populations.

Conclusion
In this work the influence of functional SNPs in the MEFV gene was investigated through various computational methods, which determined that S749Y, F743S, Y741C, F731V, I720T, L709R, V691G, W689R, G668R, V659F, F636C, R461W, H407Q,, H407R, H404R, C398Y, C395Y, C395F, C395R, H378Q, H378Y, C375R, and L86P are new SNPs having a potential functional impact and can thus be used as diagnostic markers. They constitute possible candidates for further genetic epidemiological studies with a special consideration of the large heterogeneity of MEFV SNPs among the different populations.

Data Availability
The data which support our findings in this study are available from the corresponding author upon reasonable request.