A Simulation Analysis and Screening of Deleterious Nonsynonymous Single Nucleotide Polymorphisms (nsSNPs) in Sheep LEP Gene

Leptin is a polypeptide hormone produced in the adipose tissue and governs many processes in the body. Recently, polymorphisms in the LEP gene revealed a significant change in body weight regulation, energy balance, food intake, and reproductive hormone secretion. This study considers its crucial role in the regulation of the economically important traits of sheep. Several computational tools, including SIFT, Predict SNP2, SNAP2, and PROVEAN, have been used to screen out the deleterious nsSNPs. Following the screening of 11 nsSNPs in the sheep genome, 5 nsSNPs, T86M (C → T), D98N (G → A), N136T (A → C), R142Q (G → A), and P157Q (C → A), were predicted to have a significant deleterious effect on the LEP protein function, leading to phenotypic difference. The analysis of proteins' stability change due to amino acid substitution using the I-stable, SDM, and DynaMut consistently confirmed that three nsSNPs (T86M (C → T), D98N (G → A), and P157Q (C → A)) increased protein stability. It is suggested that these three nsSNPs may enhance the evolvability of LEP protein, which is vital for the evolutionary adaptation of sheep. Our findings demonstrate that the five nsSNPs reported in this study might be responsible for sheep's structural and functional modifications of LEP protein. This is the first comprehensive report on the sheep LEP gene. It narrow downs the candidate nsSNPs for in vitro experiments to facilitate the development of reliable molecular markers for associated traits.


Introductions
Leptin is one of the major hormones secreted by adipocytes. It is the primary function of regulating homeostatic control of energy balance, metabolism, neuroendocrine system, and other functions through its effects on the central nervous system [1,2].
The short negative feedback maintains the body leptin proportion through pituitary cells, or via the long feedback performed by neurosecretory cells, interacting with corresponding receptors in the hypothalamus (paraventricular, lateral, ventromedial, and dorsomedial nuclei), leads to repression of orexigenic peptide production and stimulation of anorexigenic factors [3]. Then, the brain regulates the balance of energy expenditure from the body and the amount of energy stored in the body of the organism [4]. Usually, blood leptin levels are proportionate to the mass of adipose tissue, so the fatter individuals have, the more leptin they have circulating in their blood. Nonetheless, people and animals with a higher level of adipose tissue seldom show leptin synthesis or leptin production [5]. The reduction in the hormone's effectiveness may be associated with a fault in drawing hormonal signals or the ability of leptin to penetrate the blood-brain barrier, resulting in leptin resistance [6]. In turn, leptin deficiency or polymorphisms in the leptin pathways increase appetite and energy intake, ultimately leading to obesity [7].
Single nucleotide polymorphisms are essential and the most common types of mutations associated with complex traits [8]. In particular, the missense mutations located in the coding region, which alter the amino acid configuration, may cause a significant change in the structure and function of the native protein [9]. In vitro, functional comparison of mutated proteins with their corresponding wild-type proteins associated with specific traits of animals is a common practice that enables us to identify the impact of each nsSNP in the corresponding protein [10,11]. However, the experimental design for each mutational change is laborious and time-consuming. Therefore, it is reasonable and economical to carry out the basic work required for mutation data mining and predict its effect on protein properties through computational analysis (Nailwal and Chauhan, 2017). Screening out the deleterious nsSNPs with a significant consequence on phenotype from the tolerant nsSNPs (without phenotypic changes) has a pivotal role in understanding the molecular basis of the polygenetic traits in sheep. Therefore, considering the role played by the LEP gene in regulating body weight, energy balance, and feed intake traits in sheep, we retrieved all of the mutations found in sheep LEP gene deposited in the Ensembl sheep genome browser until March 2022. The objective of this study was to characterize the potential variations in the ovine LEP gene using in silico analysis.

Materials and Methods
We used various computational tools to screen out the functional effects of the nsSNPs in the LEP gene. Details of each bioinformatics tool are presented below and summarized in Figure 1.

Collecting SNPs and
Protein's Sequence from the Databases. Sheep leptin (LEP) gene SNPs were collected from the Ovis aries Ensembl genome browser 106 (https:// uswest.ensembl.org/Ovis_aries/Info/Index) [12]. The transcript sequence and the protein encoded by the sheep LEP gene were retrieved from the Ensemble database (https:// ensembl.org/) [13]. Then, the UniProt ID for the amino acid (UniProtKB -W5NWK7) was obtained from UniProt Protein Database (http://www.uniprot.org).

Nonsynonymous SNP Functional
Analysis for LEP. Four tools, SIFT, PredictSNP2, PROVEAN, and SNAP2, were used to predict the practical context of missense mutations in the LEP gene.
PROVEAN (Protein Variation Effect Analyzer: http:// provean.jcvi.org/index.php) is used to predict the level of impact on protein structure and biological function. A protein FASTA sequence with amino acid alterations is used as the input query. It classifies nsSNPs as deleterious or neutral based on whether the final score falls below a threshold of − 2.5. Scores over this threshold are considered neutral [17].
SNAP2 predicts the functional impact of mutations [18]. SNAP2 predicts the functional impact of mutations (Turkson, 2004). SNAP2 is a taught classifier based on the "neural network" machine learning device. It uses a number of sequence and variant properties to discriminate between effect and neutral variants/nonsynonymous SNPs. The SNAP2 achieved sustained two-state accuracy (effect/neutral) of 82 percent in cross-validation across 100,000 experimentally annotated variations (at an AUC of 0.9). In other words, this represents a major improvement over existing methods [19] (the website https://rostlab.org/services/snap2web/ is where you may find it).

Analysis of Protein's Stability Change upon Amino Acid
Substitution. Protein evolution is mainly governed by protein stability [20]. After folding, we analyzed the relationship between LEP mutations and protein stability based on a smaller free energy change (ΔG or dG). In contrast, the difference in folding free energy change between wild-type and mutant protein (ΔΔG or ddG) is often considered an impact factor on protein stability changes [21]. The protein structure stability and the molecular and structural effects of proteincoding variants were predicted using the integrated predictor for protein stability change upon a single mutation called iStable (http://predictor.nchu.edu.tw/iStable) [22]. Furthermore, the change in protein stability upon mutation was estimated using a site-directed mutator (SDM) (http://marid.bioc .cam.ac.uk/sdm2/prediction) a server [23]. Finally, the Dyna-Mut (http://biosig.unimelb.edu.au/dynamut/) was used to verify the impact of mutations on protein conformation, flexibility, and stability and to visualize the protein dynamics [24].

Prediction of Secondary
Structure. The secondary structure of LEP was predicted using the PSIPRED server available at (http://bioinf.cs.ucl.ac.uk/psipred/) [28]. It is based on a two-stage neural network with the implementation of position-specific scoring matrices constructed from PSI-BLAST to predict the available secondary structures of a protein [29].  [36]. If a good proportion of residues lie in the favored and allowed region, then the model is predicted to be good.

Data
Mining. Data mining of the sheep genome browser from Ensembl database on March 12, 2022, revealed 543 SNPs of the LEP gene in sheep, of which 11 were missense mutations. Detailed information about these SNPs is shown in Table 1.

Nonsynonymous SNPs Functional Analysis for LEP.
Out of the 11 nsSNPs subjected to SIFT tool to predict their impact level on the LEP protein function as deleterious or tolerated, the nine nsSNPs have shown a damaging effect with an average prediction score of 0.00-0.02, and the remaining two nsSNPs were tolerant (T86M and R196Q) Table 2.
PredictSNP2 further evaluated all the 11nsSNPs, including the nine predicted as deleterious by SIFT. The Pre-dictSNP2 is a consensus classifier tool for predicting the effect of amino acid substitutions based on the output of eight computational prediction tools. The PredictSNP2 tool prediction outcomes revealed that eight nsSNPs were predicted as deleterious (Table S1). The SNAP2 was used to confirm further the sequence variants' anticipated effects on the LEP protein function.

Mutant Protein Stability Prediction for LEP. The iStable
Meta server tool demonstrated that the amino acid changes in R2C, L35P, D98N, N136T, R142Q, P157Q, V181L, and R196Q cause loss and decreased stability of the LEP protein.
However, the other three mutations (T75M, T86M, and V94I) increase the protein stability Table 3. These results were cross-checked with a site-directed mutator (SDM), and DynaMut servers found that eight nsSNPs significantly change the stability compared to the wild-type proteins.
Double-checking with site-directed mutator (SDM) and DynaMut servers consistently verified the molecular  3 BioMed Research International consequences of 5 nsSNPs. Out of these, three nsSNPs (T86M, D98N, and P157Q) increased stability by decreasing the molecular flexibility of the wild-type proteins. However, the other two nsSNPs (N136T and R142Q) revealed a decrease in stability and molecular flexibility Table S4.
Furthermore, the interatomic interactions prediction outcomes between wild type (left side) and after single point mutations (right side) are presented in Figures 2(a)-2(e). The wild-type and mutant residues are colored in light green and represented as sticks alongside the surrounding residues that take part in any interactions.

Structural Conformation and Conservation Analysis by
ConSurf Server. Our ConSurf results indicated that nsSNPs at position L35P, T75M, V94I, D98N, and R142Q are in the highly conserved region, with a conservation score 9. Likewise, nsSNPs R2C, N136T, and P157Q have shown a conservation score of 8 ( Figure S1). From this, we can speculate that these nsSNPs have a potential effect on LEP protein.

LEP Protein Secondary Structure Prediction by PSIPRED.
The alpha helix and beta sheet distribution and the coil are exposed by PSIPRED ( Figure S2). Among the secondary structures, the highest in percentage was coils (51.5%) followed by alpha helix (48.5%) and no beta-sheet (0.0%).
3.6. Homology Modelling. Structure refined and energy minimized homology models of LEP by Swiss-Model, Phyre-2, ConSurf, and RaptorX servers are illustrated using computational program QMEAN (Qualitative Model Energy ANalysis) ( Figure S3). These models were checked for validation through Ramachandran plot analysis. Some residues in the model may lie in favorable (blue), allowed (green), or disallowed (red) regions of the Ramachandran plot. This coloring indicates residues that may have problems with the backbone phi/psi angles.

Ramachandran Plot Analysis. Ramachandran plot is an
x-y plot of phi/psi dihedral angles between NC-alpha and Calpha-C bonds to evaluate a protein's backbone conformation. The Ramachandran plot of the wild-type protein in the Swiss-Model showed 132 residues (93.6%) in the favored region, 8 residues in allowing region (5.7%), and one residue (0.7%) in the outer region ( Figure S4A). The LEP hypothetical protein built by Phyre2 revealed 135 residues (95.7%) in  Here, the Ramachandran plot analysis of the model produced with this tool shows that the number of residues in the favored region was 52 (96.3%), the number of residues in allowed region 2 (3.7%), and the number of residues in outlier region was 0 (0.0%). Overall, the homology models evidenced the models

Discussion
Missense mutations in genes have a profound impact on protein function. Extensive computational analysis of the phenotypic features attributed to nsSNPs may reveal the susceptible variants by interrupting the original function [37,38]. In humans, mutations of the LEP gene have been associated with obesity in different populations [39]. The sheep LEP gene, located on chromosome 4, encodes 204 amino acids annotated with 11 domains and is currently associated with 543 variations [12].
Leptin is an endocrine hormone member of the longchain helical cytokine family. It has multiple effects on regulating food intake, energy expenditure, body weight, and immune responses [40]. Polymorphism in the leptin gene has been suggested to be connected to cattle carcass composition differences [38,41,42]. For example, a nucleotide switched from cytosine (C) to thymine (T) causes an amino acid change in the plasma leptin circulation and a higher 12 th rib fat and marbling score [41]. In addition, the Tallele in cattle has been reported to be associated with fatter carcasses, whereas the C-allele is associated with leaner carcasses [43].
Numerous research studies have reported polymorphisms in the sheep leptin (LEP) gene [44,45] and its association with food intake [46], growth traits [47,48], and carcass and mutton quality traits [49,50]. Consequently, amino acid substitutions in the LEP gene have been recommended as predictors of relative differences among individuals for such economic traits [51].
In this study, following the screening of nsSNPs as deleterious or neutral upon their effect on protein function using the SIFT, predictSNP2, PROVEAN, and SNAP2, five nsSNPs T86M (C → T, rs593507294), D98N (G → A, rs426762318), N136T (A → C, rs429690456), R142Q (G → A, rs409584889), and P157Q (C → A, rs1093355763) were predicted to have a significant impact on the protein structure, stability, and function by the majority of the tools. As described [37] in the human CDKN1A gene and ADI-POR2 gene by [52], and the CSN3 gene in cattle by [53], dbSNP-based studies have been reported. Our study employed similar tools to predict the impact of the nsSNPs in the LEP gene and found important results.
The sequence of a protein determines the protein's physicochemical properties, such as protein structure, protein thermodynamic stability, the ability to interact with other molecules, and catalytic capacity [54,55]. Thus, these properties, in turn, determine protein function [56].
Further analysis of these nsSNPs on protein functionality was predicted indirectly based on the effects exerted on protein stability. First, the I-stable server predicted that all the 11 nsSNPs impacted protein stability. Then, the report from I-stable was crosschecked using SDM and DynaMut. The stability analysis verified by SDM and DynaMut servers consistently narrowed to 5 nsSNPs. Three nsSNPs (T86M, D98N, and P157Q) had increased protein stability, whereas the other two nsSNPs (N136T and R142Q) showed a destabilizing effect on the wild-type proteins. DynaMut predicts the molecular consequences of the mutations in-depth through changes in protein dynamics and stability from vibrational entropy change [24]. The DynaMut-prediction outcomes of the Δ vibrational entropy energy between wild-type and mutants (ΔΔSVib ENCoM) for the 3 nsSNPs were − 0.046 kcal.mol −1 .K −1 , − 0.056 kcal.mol −1 .K −1 , and − 0.027 kcal.mol −1 .K −1 , respectively. Our data suggest that the vibrational entropy change of protein upon mutation of the amino acid decreases the molecular flexibility of the protein. Therefore, the three nsSNPs that increased the protein stability may increase evolvability by allowing a protein to accept a wider range of beneficial mutations while still folding to its innate structure [57]. Our findings are compatible with [58], who reported that the marginal protein thermostability is a consequence of the mutation-selection balance. Similarly, [57,59] explained that mutants derived from highly stable variants of a given protein are more likely to retain their fold and are consequently more likely to develop novel or improved protein functions.
In contrast to the other two nsSNPs, N136T (A → C, rs429690456) and R142Q (G → A, rs409584889) were evidenced by reduced stability. Table 3 suggests that these missense mutations might lead to new functions of the leptin gene in sheep. This finding is analogous to [21] stating that the evolution of new enzymatic specificities is accompanied by a loss of the protein's thermodynamic stability (DDG), implying a trade-off between acquiring new function and stability. Most likely, the mutations confer new functions that destabilize [57]. Thus, in this study, the mutations that cause a destabilizing effect on the thermodynamic stability of the protein through increasing the molecular flexibility may constrain the native evolution of leptin protein in sheep. Taken all together, the effect of the deleterious nsSNPs in the LEP gene may have a diverse impact on the performance of the sheep for the particular traits. The results from this study provide promising insight for future in vitro studies to specify the impact of these sorted nsSNPs on sheep performance.

Conclusion
A simulation-based study to detect the nsSNPs using many computational programs employed in this study has revealed 5 amino acid substitutions: T86M (C → T, rs593507294), D98N (G → A, rs426762318), N136T (A → C, rs429690456), R142Q (G → A, rs409584889), and P157Q (C → A, rs1093355763) were found to be functionally deleterious. These nsSNPs in the sheep LEP gene may contribute to the functional discrepancies compared to the wild LEP gene, which significantly regulates energy homeostasis, body weight control, and reproduction traits. Most of the variants identified in this study are located in the conserved domain of leptin. We also calculated the free energy changes for mutants and wild-type LEP proteins to evaluate their stability. Our data provide evidence for the functional role of the 5nsSNPs, which is helpful for further in vitro study to facilitate the 6 BioMed Research International development of reliable molecular markers for future practical application in sheep breeding.

Data Availability
All datasets generated/analyzed for this study are included in the manuscript and submitted as supplementary.

Conflicts of Interest
The authors declare that they have no competing interests.

Authors' Contributions
S. G. and H.I. A conceived and designed the methodology, Q. A. Z. performed data analysis, and S. G. wrote the manuscript.

Acknowledgments
The authors are thankful to Samara University for providing encouragement and facilities and colleagues in the Department of Animal Science for their valuable comments and suggestions. The research was funded by Samara University, Ethiopia.

Supplementary Materials
Supplementary Materials Table S1. Deleterious SNPs predicted by predictSNP2. Table S2. nsSNP analysis by SNAP 2 . Table S 3. nsSNP analysis by PROVEAN. Table S4. SDM and DynaMut tools analysis results of the effect of missense mutations on protein stability. Figure S1. Prediction of evolutionary conserved amino acid residues by ConSurf server. Conservation score is represented as the color coding bars. Figure S2. Protein secondary structure predictions by PSIPRED tool. The graphical output of PSIPRED prediction of secondary structure of the sheep LEP protein shows 6 αhelices extends from 41 th to 55 th , 66 thto 84 th , 110 th to 125 th , 133 th to 150 th , 164 th to 176 th , and 178 th to 198 th residue and no β-strands. Figure S3. Homology models from different servers; (a) homology modelling by Swiss-Model server; (b) homology modelling by Phyre-2 server; and (c) Homology modelling by ConSurf; homology modelling by RaptorX server. Figure S4.