Predicting the Most Deleterious Missense Nonsynonymous Single-Nucleotide Polymorphisms of Hennekam Syndrome-Causing CCBE1 Gene, In Silico Analysis

Hennekam lymphangiectasia-lymphedema syndrome has been linked to single-nucleotide polymorphisms in the CCBE1 (collagen and calcium-binding EGF domains 1) gene. Several bioinformatics methods were used to find the most dangerous nsSNPs that could affect CCBE1 structure and function. Using state-of-the-art in silico tools, this study examined the most pathogenic nonsynonymous single-nucleotide polymorphisms (nsSNPs) that disrupt the CCBE1 protein and extracellular matrix remodeling and migration. Our results indicate that seven nsSNPs, rs115982879, rs149792489, rs374941368, rs121908254, rs149531418, rs121908251, and rs372499913, are deleterious in the CCBE1 gene, four (G330E, C102S, C174R, and G107D) of which are the highly deleterious, two of them (G330E and G107D) have never been seen reported in the context of Hennekam syndrome. Twelve missense SNPs, rs199902030, rs267605221, rs37517418, rs80008675, rs116596858, rs116675104, rs121908252, rs147974432, rs147681552, rs192224843, rs139059968, and rs148498685, are found to revert into stop codons. Structural homology-based methods and sequence homology-based tools revealed that 8.8% of the nsSNPs are pathogenic. SIFT, PolyPhen2, M-CAP, CADD, FATHMM-MKL, DANN, PANTHER, Mutation Taster, LRT, and SNAP2 had a significant score for identifying deleterious nsSNPs. The importance of rs374941368 and rs200149541 in the prediction of post-translation changes was highlighted because it impacts a possible phosphorylation site. Gene-gene interactions revealed CCBE1's association with other genes, showing its role in a number of pathways and coexpressions. The top 16 deleterious nsSNPs found in this research should be investigated further in the future while researching diseases caused CCBE1 gene specifically HS. The FT web server predicted amino acid residues involved in the ligand-binding site of the CCBE1 protein, and two of the substitutions (R167W and T153N) were found to be involved. These highly deleterious nsSNPs can be used as marker pathogenic variants in the mutational diagnosis of the HS syndrome, and this research also offers potential insights that will aid in the development of precision medicines. CCBE1 proteins from Hennekam syndrome patients should be tested in animal models for this purpose.


Introduction
Lymphangiogenesis is a process that helps the lymphatic system in its development. is includes migrations, proliferation, and budding of endothelial lymphatic progenitor cell lines [1][2][3]. e interstitial fluids, which are normally stored in the cardiovascular system, frequently flow away due to irregular Lymphangiogenesis, and this drainage can cause chylothorax, pleural effusion, angiectasias, lymphedema, and chylous ascites of lymph vessels in various organs, including the intestines [4]. Dysplasias's symptoms of lymph vessels are usually reserved for the limbs [1]. Hennekam syndrome is a genetically heterogeneous condition. Hennekam lymphangiectasia is a condition marked by disorders of the lymphatic system, which affects a variety of organs and links the gastrointestinal tract and the pericardium. Lymphedema demonstrates abnormal facial dysmorphism and cognitive dysfunction [5]. Approximately, up to now 45 people have been diagnosed with HS syndrome [6]. Almost 25% of patient's diseases are influenced by biallelic mutations in CCBE1 (Hennekam lymphangiectasia-lymphedema syndrome 1 (HKLLS1; MIM: 235510)) and FAT4 (Hennekam lymphangiectasia-lymphedema syndrome 2 (HKLLS2; MIM: 616006)) while CCBE 1 gene mutation [7]. In the examination of two siblings with missense, the type was found the biallelic mutation in the ADAMTS3 gene [8]. In humans and model organisms, the signaling protein collagen-and calcium-binding domain 1 (CCBE1) is required for lymphangiogenesis. As per forward genetic screening in zebrafish for a causative coding mutation in CCBE1, there is a mutant known as full of fluid (fof ) that misses the thoracic duct's truncal lymphatic vessels but retains normal blood vasculature [9]. Missense mutation in the CCBE1 gene in the protein functional domain or upstream cysteine-rich domain of EGF was identified as the causative agent of HKLLS1 [6]. e CCBE1 gene plays a significant role in the growth of the lymphatic system in a model organism [9,10]. However, the connection between FAT4 and lymphatic development is still not clear. Over time, our understanding of the phenotype associated with the CCBE1 mutation evolves. In the original account, the key inconsistency in the degree of cognitive damage (expansion from normal to moderate damage) is displayed by Hennekam syndrome subjects [11]. Specimens with clinically diagnosed Hennekam syndrome with or without mutations in CCBE1 were compared in the most recent study [6]. e CCBE1 gene interacts with connective tissue in the extracellular matrix and is then secreted [10][11][12]. Zebrafish often lacks lymphatic vessels and thoracic ducts, as well as the ability to develop edema [9,11]. A mutation in the CCBE1 gene confirmed this. e same case of developing edema was shown in mice models [10]. On this basis, a mutation in this gene, which is thought to be the key gene between organisms, was linked to vascular lymphatic system dysfunction, leading to the conclusion that the human CCBE1 mutation is linked to widespread lymphatic dysplasia. Aagenaes syndrome, a rare AR condition, has also been linked to the biallelic CCBE1 mutation. is rare condition causes neonatal intrahepatic cholestasis, extreme chronic lymphedema without mental retardation, and lymphangiectasia [13]. Aagenaes syndrome was common in untreated children, and fetal hydrops was also found in HS patients [13,14]. e proof that disease is caused by the rarity of a mutated allele is supported by the CCBE1 gene triggering the mutation in the latest evidence. Because of their segregation of phenotype in an AR inheritance model, their sporadic repetition in unrelated organisms, and the large number of associated carrying mutations, these mutated alleles may have a harmful impact [15]. Molecular biology, statistics, mathematics, computer science, and genetics all fall under the umbrella of bioinformatics [16]. Single-nucleotide polymorphism is the most common genetic variation present in the general population (SNPs). Every single nucleotide in the entire genome has been modified by SNPs. ere are 200-300 bp SNPs in the human genome, but there are 5000,000 SNPs in the entire human genome. is can result in a variety of sequence changes, which can contribute to abnormal function [17][18][19]. Aside from SNPs in the exonic region of the genome, nonsynonymous SNPs (ns SNPs) and amino acid sequence changes in gene products are often affected by genetic variation (ns SNPs). SNPs do not have a large biological impact, but they can disclose a variety of disorders, such as affecting immunological response to drugs, and in some cases, SNPs can be used as biomarkers for disease vulnerability [20]. Changes in amino acid sequence caused by SNPs are responsible for 50% of reported cases of inheritance disorders [21]. Gene expression and transcription factor binding are also affected by promoter regions and regions outside of the gene [22,23]. Single-nucleotide polymorphisms have a critical role to play in determining an individual's susceptibility to various diseases and drug reactions (SNPs). SNPs that cause disorders are discovered biologically through a simple procedure, so it is critical that we find them before they are used as a tool in genetics technologies [24]. Alignment methods based on matrix and data tree structure computation are used in the tools. Recent results, such as [25,26], show that hash-based functions can speed up the entire process. e aim of this study is to use a variety of in silico approaches based on different concepts to investigate the potentially harmful effects of nsSNPs in the CCBE1 gene and protein. e study's aim is to provide a valuable tool for quick and cost-effective screening for pathologic nsSNPs, rather than biological experiment validation.

SNP Retrieval.
Entrez Gene on the website of the National Center for Biological Information (NCBI) was collected from the data of the human CCBE1 gene. e information of SNP (protein accession number and SNP ID) of the CCBE1 gene was gained from NCBI dbSNP (http:// ncbi.nlm.nih.gov/snp/) and SwissProt databases (http:// expasy.org./). ere was also searched other databases as Exome Aggregation Consortium, Genome Variation Server, and F-SNP to cross-check the nonsynonymous SNP (nsSNP) data for the CCBE1 gene [27]. e databases were accessed: 3 July 2020.

GeneMANIA.
To check the interaction of the CCBE1 gene and observation of its association with other genes in order to predict the effect of nsSNPs on other related genes was used, GeneMANIA (https://genemania.org/) and STRING (https://string-db.org/cgi/) (accessed on 6 July 2020 using manual search for CCBE1 in the search box) [28]. Prediction of gene-gene interaction by GeneMANIA is that interaction is based on the basis of pathways, colocalization, coexpression protein domain similarity, genetic, and protein interaction. Predictions of STRING were limited to the top 2 e Scientific World Journal 10 best interactive genes with parameters that included gene fusion, co-occurrence, coexpression, and experimental and biochemical data. ose data showed a combined score for each gene's interaction with the target gene in range from 0 to 1, when 0 was the lowest interaction and 1 was the highest interaction. erefore, CCBE1 was presented as our input gene and that generated the gene-gene interaction network.

Prediction Tool Used for nsSNP
2.3.1. Sequence Homology Tool (SIFT). For every sequences of query, the SIFT takes referential SNP ID and sequence of query by using multiple closely related information to prediction of tolerated and damaging substitutions [29,30]. It tells whether the substitution is tolerated at that position. e tool was used on 6 July 2020.

PolyPhen.
(http://genetics.bwh.harvard.edu/pph2/) PolyPhen predicts by using specific empiric rules the effect of amino acids substitution on the protein's structure and function. Protein sequence, amino acid position, database ID/accession number, and amino acid variant details are the input for the PolyPhen [31], and the score difference between variants and wild-type amino acid is calculated. e tool was used on 6 July 2020.

Prediction of Disease-Related Amino Acid Substitution and Phenotypes by MutPred.
e online server MutPred (http://mutpred.mutdb.org/) is used as searching tool for prediction of the molecular basis of the disease which is related with amino acid substitution in a mutant protein [37]. It uses several attributes that are related to protein structure, function, and evolution.
ere are used three servers, PSI-BLASAT, SIFT, and Pfam profiles, along with TMHMM, MARCOIL, and DisProt algorithms. ese are the prediction of some structural damages. e greater accuracy of prediction is reached by combining of the scores of all those three servers.

Prediction of Stability of the Mutated Protein due to
SNPs by iStable 2.0. Amino acid substitutions are caused by missense SNPs and can change the stability of native protein which can lead to influencing of protein and in the end lead to diseases [38]. By a metaclassifier, iStable 2.0, we are predicting changes due to missense SNPs in protein stability.
is metaclassifier uses machine learning and investigates the increasing or decreasing stability of the protein. It happens due to an amino acid substitution which is based on prediction of 8 structural-based (I-Mutant2.0, CUPSAT, PoPMuSiC, AUTO-MUTE2.0, SDM, DUET, mCSM, MAESTRO, and SDM2) and 3 sequential-based (I-Mu-tant2.0, MUpro, and iPTREESTAB) tools of stabilization's prediction. 4-letter PDB code or protein sequence in FASTA format is used as input, but the structural predictor achieves better performances than the sequential predictor. At the web server, http://ncblab.nchu.edu.tw/iStable2 can be found, the iStable 2.0.

Identification of Conserved Residues and Sequence Motifs.
Sequence of human-CCBE1 protein UniProt showed markable comparison up to maximum of 100 sequences, and it was blasted against the UniProtKB/SwissProt database in NCBI (http://blast.ncbi.nlm.nih.gov/Blast.cgi). To perform, another computational analysis of the sequence was used, Clustal Omega. It showed more than 50% identity and E-value under 1, 00E-20 [39]. e amino acids identified were colored by scheme of Clustal color, and the alignment position conservation index was provided by Jalview [40].

Prediction of Amino Acid Conservation by ConSurf (ConSurf.tau.ac.il).
Bayesian empirical inference is used to calculate evolutionary conversation sequence of amino acid within a sequence of protein.
is inference is giving us conservation scores along with schemes of color. Variable amino acid gets score 1, while the most conserved amino acid gets score 9. To ConSurf analysis was submitted the FASTA sequence of CCBE1 protein [41].

Project HOPE.
Analysis of structural effects of the intended mutation is performed by the website Project HOPE. In cooperation with UniProt and DAS servers of prediction, the HOPE Project shows the mutated protein in an observable 3D structure. Project HOPE is the protein sequence used as the input source, and then the wild-type amino acid comparison of the structure is performed [42].

Secondary Structure Prediction by NetSurfP.
In a fully folded protein, to identify the interaction interfaces or active sites is necessary knowledge of amino acid surface and accessibility of solvent. When the amino acid substitutions in such sites are noticed, then the affinity of binding is disturbed [43]. Binding affinity is also disturbed by catalytic activity when an enzyme is a protein. Surface and solvent accessibility, structural disorder, backbone dihedral angles, and secondary structure, for amino acid residues, can be effectively estimated by NetSurf-2.0. Protein sequences in FASTA format are utilized as input. ey recruit deep neural nets that were trained on solved protein structures [43]. e availability of NetSurfP-2.0 is on the website http://www.cbs. dtu.dk/services/NetSurfP/.
ere were predicted TM-score (template modelling score), RMSD (root-mean-square deviation) and structural superposition. e range of TM-scores is provided from 0 to 1, where 1 is identified as a higher structural similarity. e greater will be the variation between mutant and wild-type structures, the higher will be the RMSD values [45,46]. To I-TASSER for further study of 3D protein structure study (https://zhanglab.ccmb.med. umich.edu/I-TASSE%20R/), were submitted 3 mutants with higher RMSD along with the wild-type CCBE1 [47,48,49]. Chimera v1.11 was used to investigate molecular characteristics and to visualize the resulting protein structure interactively [50].

PTM Site Prediction.
Post-translation modification (PTM) in protein is used to predict the function of the protein. GPS-MSP v3.0 (http://msp.biocuckoo.org/online. php) was used to predicate methylation sites in CCBE1 protein [51]. At residual positions of serine, tyrosine, and threonine at CCBE1 sequence of protein, the prediction of phosphorylation sites is made by using GPS 3.0 (http://gps. biocuckoo.org/online.php) [52] and NetPhos 3.1 (http:// www.cbs.dtu.dk/services/NetPhos/). By employing Net-Phos 3.1 for neural network ensembles, a threshold of 0.5 was created, which predicted more specific findings than GPS 3.0 [53]. ere was a prediction that residues having a higher score than threshold should be phosphorylated. To the prediction of ubiquitylation sites in CCBE1 protein were used BDM-PUB (http://bdmpub.biocuckoo.org/ prediction.php) and UbPred (http://www.ubpred.org/). UbPred had chosen a balanced cutoff [37] for lysine residues that were predicted ubiquitinated to have scored at or above the 0.62 thresholds [54]. NetOglyc4.0 (http://www. cbs.dtu.dk/services/NetOG%20lyc/) predicted glycosylation, which is another very important post-transcriptional event [55]. e website of NetOglyc4.0 is analyzing protein sequence with amino acid substitution and also a wild-type protein sequence. Mutation is functionally significant when there is difference between the functional pattern in mutant type and wild type. ere is the prediction that glycosylation sites with higher score than threshold 0.5 will be glycosylated.

Ligand-Binding Site Prediction by FTSite Server.
(http://FTSite.buedu/) Server FTSite has predicted the ligand-binding site in the 3D protein structure. Prediction of this site is based on energy, and the binding site over 94% of the apoproteins is identified. To the prediction of the hotspot, ligand-binding used PDB data as input.

Statistical Analysis.
Computational in silico tool predication was subjected to correlation analysis using SPSS v23 and MS excel. e various computational tool prediction significance differences were compared using Student's ttest. A pvalue <0.01 was considered significant.

Exploring the Desired Gene
Using dbSNPs/NCBI. CCBE1 gene SNP data were searched in the NCBI database (http://www.ncbi.nlm.nih.gov/). It contains a total of 73845 SNPs, which were present in Homo sapiens, 407 were found in nonsynonymous regions (missense), and 156 were in synonymous as shown in Figure 1.

GeneMANIA.
e CCBE1 gene provides instructions for making a protein that is found in the extracellular matrix of protein lattice and other molecules. e CCBE1 protein is involved in the formation of the lymphatic system. Specifically, the CCBE1 protein helps guide immature cells called lymphangioblast maturation (differentiation) and movement (migration) that will eventually form the lining (epithelium) of lymphatic vessels. Our findings revealed that CCBE1 is coexpressed with 17 genes (COL6A6, MXRA8, PLEKHF2, RPRM, CDH4, PLEKHG1, CAND1, MY010, LRRC4C, LRAT, ANK3, OLFM1, DCN, NEURL1B, PLEKHH2, GLTSCR2, and NDRG2) and shared domain with only 2 genes (PLEKHH2 and DCN), physical interaction with two genes (SIAH2 and TOX4), and colocalization with 2 genes (MYRA8 and DCN). Predictions resulted from STRING showed combined score for each of the genes and showed interaction of the gene with FLT4, VEGFC, ADAMTS3, GJC2, FLGF, FAM43A, SNX29, PKD2L2, and PHF5A. Gene interactions predicted by GeneMANIA (Figures 2(a) and 2(b) and Table 1) and STRING ( Figure 2(c)) are given in Figure 2, respectively.

Prediction of Deleterious nsSNP by SIFT and PolyPhen in CCBE1.
A total of 407 nsSNPs (missense) were screened to find their effect on protein structure and function. e first step was to predict the nsSNP carried out the amino acid substitution. SIFT predicts the effect of nsSNP on protein structure and tells whether the induced amino acid is tolerable at that position or not. Out of a total of 407 nsSNPs, 23 were found to be deleterious with a tolerance index score of 0.00 on the SIFT network, as well as on prediction matching of highly pathogenic nsSNPs with a PSIC score of >0.5 on the PolyPhen server. ere 11 nsSNPs contained the information of minor allele frequency (MAF). Except for T153N, G107D, P249S, S19N, C75S, C102S, G327 R, C174R, D397Y, R125W, P87W, and G330E, other MAFs of nsSNPs might be lower than 1% (Table 2). . Any of the seventeen prediction tools was used independently or in combination with a tool that showed the effects of several prediction tools. Each method has a different number of deleterious SNPs. SIFT classified 36 and PolyPhen 23 nsSNPs as damaging or deleterious, but Pol-yPhen did not demonstrate any of the damaging 13 nsSNPs that SIFT classified as deleterious. With a cutoff of >0.5, SNP-GO revealed the fewest 4 SNPs (17.23%) in total of 23 SIFT-and PolyPhen-predicated nsSNPs in the CCBE1 gene as damaging or deleterious, and 19 as neutral. Using SNAP2 tool, 18 (78.26%) (09 effective nsSNPs : SNAP2 score 0 to 50; 09 highly effective: SNAP2 score 50 to 100) and 05 were neutral (SNAP2 score −100). e deleterious and damaging effects of 21 (91.23%) nsSNPs in which 18 nsSNPs probably damaging, 3 nsSNPs as possibly damaging, and 2 (8.6%) probably benign (time > 450my "possibly damaging," 450my > time > 200my, "probably benign," and time-< 200my on CCBE1 protein), were predicted using the PANTHER (Figure 1 S4). Furthermore, the analysis was carried out using the PROVEAN, which predicts the impact of SNP on the biological function of a protein.  (higher score->more damaging)), FATHMM (>0.453), and highly deleterious nsSNPs on CCBE1 gene. Analysis of 407 nsSNPs of CCBE1 gene for the prediction of pathogenic nsSNPs was almost similar (87%) for the SIFT and PolyPhen while disagreement was 36%. We selected for further study 23 nsSNPs which were predicated deleterious/damaging by both SIFT and PolyPhen. More than 100% of overlapped similarity was observed between the SIFT, M-CAP, CADD, PolyPhen, and FATHMM-MKL, on pathogenic nsSNPs. Similarity between SNP-GO and PhD-SNP is 13%, and disagreement is 73% while between SIFT and SNP-GO dissimilarity was 82%. Almost more than 50% of the    e results of all the predication algorithms were found statistically significant and were highly correlated. Student's t-test between the tools was significant at p value <0.001. e results are shown in Table 3 as well as the cumulative score and total significance of all the tools in the study are shown in Figure 1 S4.

Conservation Analysis.
We analyzed the degree of conservations of CCBE1 residues by using the ConSurf web server. e results of the ConSurf analysis indicated that 23 deleterious missense SNPs are located in highly conserved regions (7-8-9). Among these 23 missenses variants, 13 were located in the highly conserved positions: 11 (C75S, P87S, P290L, A96G, G107D, R118L, G330E, D336N, R125W, Q353R, and T153N) were predicted as functional and    8 e Scientific World Journal exposed residues and the other 2 (C102S and C174R) were predicted as buried and structural residues. e S19N was predicted as conserved and buried residue, and the other 8 (T144M, R167W, P249S, R301W, G327R, K355T, D397Y, and D41E) were exposed residues. e results are shown in Figure 3.

Project
Hope. All of the 23 nonsynonymous SNPs that were predicted to be deleterious and damaging by both SIFT and PolyPhen software were submitted to Project HOPE software. e findings revealed that rs149531418 resulted in the substitution of glycine (wild type) into glutamic acid (mutant) at position 330. e mutant residue is bigger than the wild-type residue. e wild-type residue charge was neutral, and the mutant residue charge was negative. e wild-type residue is more hydrophobic than the mutant residue as well as the mutation is located within a domain, annotated in UniProt as collagen-like 2, and the mutation introduces an amino acid with different properties, which can disturb this domain and abolish its function. Neither our mutant residue nor another residue type with similar properties was observed at this position in other homologous sequences. Based on conservation scores, this mutation is probably damaging to the protein.
e mutant residue is located near a highly conserved position. e rs121908251 resulted in the substitution of cysteine (wild type) into serine (mutant type) at position 102. e wild-type residue is more hydrophobic than the mutant residue. e variant is annotated with severity: disease, and the mutation is located in a region with known splice variants, described as C-> S (in HKLLS1; dbSNP: rs121908251). e mutant and wild-type residues are not very similar. Based on this conservation information, this mutation is probably damaging to the protein. is mutant residue is located near a highly conserved position. e rs121908254 shows the substitution of cysteine (wild type) into arginine (mutant type) at position 174. e mutant residue is bigger than the wild-type residue. e wild-type residue charge was neutral, and the mutant residue charge was positive. e wild-type residue is more hydrophobic than the mutant residue. e mutation is located within a domain, annotated in UniProt as EGF-like, calciumbinding. e mutation introduces an amino acid with different properties, which can disturb this domain and abolish its function. e variant is annotated with severity: disease, and mutation is located in a region with known splice variants, described as C-> R (in HKLLS1; dbSNP: rs121908254). e mutant and wild-type residues are not very similar. Based on this conservation information, this mutation is probably damaging to the protein. e mutant residue is located near a highly conserved position. e rs372499913 indicates the substitution of glycine (wild type) into aspartic acid (mutant type) at position 107. e mutant residue is bigger than the wild-type residue. e wild-type residue charge was neutral, and the mutant residue charge was negative. e wild-type residue is more hydrophobic than the mutant residue. e mutant and wild-type residues are not very similar. Based on this conservation information, this mutation is probably damaging to the protein. Our mutant residue is located near a highly conserved position. SNP rs147208835 results in the substitution of arginine (wild type) into tryptophan (mutant type) at position 125. e mutant residue is bigger than the wild-type residue. e wild-type residue charge was positive, and the mutant residue charge was neutral. e mutant residue is more hydrophobic than the wild-type residue. e mutant residue was not among the other residue types observed at this position in other homologous proteins. However, residues that have some properties in common with your mutated residue were observed. is means that in some rare cases, your mutation might occur without damaging the protein. e mutant residue is located near a highly conserved position.

Association of SNPs with Highly Conserved Buried (Structural) and Exposed (Functional) Amino Acid Residues in CCBE1
Protein. CCBE1 from a structural point of view expresses as a 406 amino acid long protein having 11 exons located at 18q21.32. CCBE1 sequence-based structuralfunctional analysis was performed using Clustal Omegabased multiple sequence alignment analysis. For this analysis, the CCBE1 protein sequence (UniProt ID: Q6UXH8) was retrieved from the UniProt Knowledgebase. e CCBE1 protein sequence was blasted against the UniProtKB/Swis-sProt entries and aligned using Clustal Omega with default settings. e results generated by the Clustal Omega tool consist of CCBE1 protein sequence aligned with other phylogenetically close sequences from other organisms. e results contain a colorimetric conservation score in the range of 1-10. Multiple sequence alignment using Clustal Omega revealed that the human CCBE1 protein sequence contains a number of conserved residues and motifs. e highly conserved amino acid residues in human CCBE1 protein were G262, P264, G265, G270, P272, G273, G276, R284, G285, R315, G317, R322, G323, G329, A345, E368, F370, P371, P374, P381, E382, D385, and D391. ere are twenty-four different conserved residues Figure 4.

Prediction of Pathogenic Amino Acid Substitutions by
MutPred2. MutPred2 considers several molecular characteristics of amino acid residues to predict whether an amino acid substitution is disease-related or neutral in humans. e score it provides is the probability predicted for an amino acid substitution should affect the function of the respective protein or not. e threshold score for pathogenicity prediction is 0.5, and a MutPred2 score ≥0.8 can be considered as a highly confident one. All substitutions have prediction scores ≤0.5. Table 4 provides MutPred2 outcomes.

Prediction of Stability of the Mutated Protein due to
SNPs by iStable 2.0. Web tool iStable 2.0 was used to analysis for protein stability prediction.

Surface and Solvent Accessibility of Residues and CCBE1
Secondary Structure by NetSurfP-2.0. Surface accessibility (exposed or buried) of amino acids in a given protein was predicated by NetSurfP-2.0, which provides a relative and absolute accessible surface area of each residue. It also predicts the protein secondary structure. Relative surface accessibility: red upward elevation is exposed to residue, and sky blue downward elevation is buried residue; the threshold is at 25%. Secondary structure is as follows: orange spiral � helix, indigo arrow � strand, and pink straight line � coil. Disorder is represented as black swollen line; thickness of line equals the probability of disordered residue. Figure 5 shows NetSurfP-2.0 outcomes.
An exposed residue according to the neural-network algorithm. A buried residue according to the neural-network algorithm. A predicted functional residue (highly conserved and exposed). A predicted structural residue (highly conserved and buried). Insufficient data -the calculation for this site was performed on less than 10% of the sequences.   Figure 4: Amino acid alignment of human CCBE1 (UniProt ID: Q6UXH8) along with its homologues in phylogenetically close species in ABWGB and Q3MI99. Solid horizontal bars indicate conserved sequence motifs, and residues with asterisk ( * ) mark indicate evolutionary conserved amino acids. e amino acid identities were colored according the Clustal color scheme, and the conservation index at each alignment position was provided by Jalview.   Table 6. Four nsSNPs (R118L, A96G, S19N, and C174R) having the highest RMSD values were selected and submitted to I-TASSER for remodeling. Protein structure generated by the I-TASSER is the most reliable as it is the most advanced modelling tool. Each of these 3 mutants was studied and superimposed using Chimera 1.11 over the wild-type CCBE1 protein, shown in Figures 6(a)-6(d).

Predicted PTMs (Post-Translation Modifications).
GPS-MSP 3.0 was used for this purpose which predicted no sites in CCBE1 to be methylated. GPS 3.0 and NetPhos 3.1 predicted CCBE1 phosphorylation sites which are given in Table S1. 62 residues (Ser: 23, r: 22, and Tyr: 17) were predicted by NetPhos 3.1 to have phosphorylation potential. On the other hand, 18 residues (Ser: 12, r: 06, and Tyr: 00) were predicted by GPS 3.0 to be capable of getting phosphorylated. BDM-PUB and UbPred were used for ubiquitylation prediction. BDM-PUB predicted 11 lysine residues to get ubiquitinated, while UbPred predicted none of the lysine residues to get ubiquitinated. Among those predicted by BDM-PUB, none was located at a highly conserved or deleterious nsSNP region. e results obtained are labeled in Table S1. NetOGlyc4.0 was used for the prediction of potential glycosylation sites. e output showed all the possible sites for glycosylation in which positions 19, 144, and 153 were predicted to be glycosylated with scores of 0.34, 0.43, and 0.17 in wild-type CCBE1 protein. Interestingly, mutant S19N showed loss of glycosylation site at position 19 while T144M also showed loss of glycosylation sites at position 144. All the scores for the wild-type and mutant proteins are given in Table S2.

Ligand-Binding Site Prediction by FTSite.
Sites for ligand-binding were predicted by FTSite algorithms and visualized and further analyzed using PyMOL. By this tool, 3 ligand-binding sites were identified in human

Discussion
Several studies have linked the CCBE1 gene to single-nucleotide polymorphisms in the cases of lymph vessel dysplasia [13,14]. Utilizing state-of-the-art in silico methods, the current research explored the impact of SNPs on the structural and interactive behaviors of the CCBE1 protein. e most pathogenic polymorphisms in different genes have been screened using these methods in a sequential order [42,56]. e current study also used the sequential application of all these methods to classify deleterious variants in CCBE1 that may interact with the machinery's role in extracellular matrix remodeling and migration by silencing its function. We screened 73845 SNPs in the CCBE1 gene through multiple dbSNP databases for their effect on the gene's structure and interactions with a variety of protein molecules. Various in silico methods were used to screen the pathogenicity of 407 retrieved nonsynonymous SNPs. Our study found 23 nsSNPs that were predicted to be deleterious by SIFT and PolyPhen2 but instead verified through other tools (PROVEAN, FATHMM, LRT, M-CAP, VEST3, CAAD, MetaLR, Mutation Assessor, Mutation Taster, and FATHMM-MKL, SNP-GO, PhD-SNP, PAN-THER, SNAP2, and MutPred). Four nsSNPs were classified as highly pathogenic which were rs149531418, rs121908251, rs121908254, and rs372499913. is is a lower number than which was previously estimated using the same methods in different genes [56,57]. e two of the variant shown in our study (C102S, C174R) are already reported for Hennekam syndrome in a study [11], while the other two variants (G330E and G107D) are not reported until now for Hennekam syndrome. Highly pathogenic variants were selected on the basis of the impact of nsSNPs on sequence conservation, sequence attributes, and structural impute [58].
e chosen state-of-the-art tools covered the largest possible range of methods (AS: alignment score; NN: neural networks; HMM: hidden Markov models; SVM: support vector machine; BC: Bayesian classification) for predicting pathogenic nsSNPs [58]. Since essential amino acids that are involved in a wide range of biological methods and processes, particularly protein interactions, are highly modified and conserved, SNPs on conserved loci are more likely to cause damage than SNPs on nonconserved loci [59]. In total 23 nsSNPs, only 11 SNPs are located at evolutionary conserved, exposed, and functionally important residues which are C75S, P87S, P290L, A96G, G107D, R118L, G330E, D336N, R125W, Q353R, and T153N. ere were 2 nsSNPs (C102S and C174R) located at conserved, buried, and structurally important residues. All the rest of the nsSNPs were found to be located in either only exposed or buried residues which were not predicted to have any structural or functional importance in CCBE1 protein.
ese 11 nsSNPs for CCBE1 have not yet reported with patients in Hennekam disorder, and in future, these can be considered pathogenic nsSNPs when reported in Hennekam patients. For prediction of protein stability, I-STAB2 web server was used which predicted nsSNP rs149531418, rs121908254, rs147681552, rs192224843, rs147974432, rs141125426, rs374941368, and rs149792489 increased stability while C75S, P87S, R125W, K355T, D336N, T153N, P87S, R118L, R301W, P249S, and R167W decrease protein stability. ese nsSNPs can be used as marker for diagnostic and revealing new therapeutic targets for Hennekam disorder. RAMPAGE values were used to verify all of the modeled structures. Protein structures with RAMPAGE values greater than 80% as core values are thought to be higher [60]. For the structure given in Figure 5(a) (CCEB1 wild-type), RAMPAGE values were 75.5% favored residues, 19.1% allowed, 4.5% generally allowed, and disallowed 0.9%. Similarly, for mutants R118L (80.0% favored residues, 13.6% allowed, 4.5% generally allowed, and disallowed 1.8%), A96G (76.4% favored residues, 16.4% allowed, 5.5% generally allowed, and disallowed 1.8%), C174R (79.1% favored residues, 15.5% allowed, 2.7% generally allowed, and disallowed 0.9%), and S19N (78.2% favored residues, 16.4% allowed, 4.5% generally allowed, and disallowed 0.9%), all the structures were somehow validated. PTMs have been shown to be important in cell signaling and protein-protein interactions, as well as other significant events such as biological processes, control protein structures, and functions [61,62]. In this analysis, we looked to see if the chosen nsSNPs modified the PTMs of the CCBE1 protein. A variety of bioinformatics methods were used to predict PTM sites in our understudied protein. Methylation is a critical PTM because lysine residues in some proteins are methylated, which influences their binding to DNA and changes gene expression. Another important mechanism for protein regulation acts as a molecular switch of protein  [63][64][65][66]. S19 is highly conserved, exposed, and functionally significant, according to the ConSurf conservation profile, indicating its significance. Phosphorylation potential is seen at position S19, which also contains one of the most damaging nsSNPs (rs137 6162684), which really is structurally important and highly  conserved (ConSurf prediction), making it highly important. Ubiquitylation is a protein degradation mechanism that also helps in DNA damage repair [67]. It is crucial to the function and stability of proteins. It plays a structural role in protein-protein interactions. Phosphorylation is the only PTM that can have a major impact on CCBE1 protein structure and function, as shown by these PTM predictions, with residuals S19 and T153 being the most significant phosphorylation sites. STRING and GeneMANIA predictions show that ADAMTS3 is the most interactive gene with CCBE1, supported by VEGFC and FLT4. CCBE1 ADAMTS3, VEGFC, FLTR4, and GJC2 are thought to be related with either Hennekam disorder or its related symptoms in many diseases, including rheumatoid arthritis [8,13,68,69]. As a result of their interaction patterns and coexpression profiles, it can be inferred that some of the most harmful nsSNPs in the CCBE1 gene will influence and possibly disrupt the normal functioning of other interacting genes. is demonstrates the significance of these interacting and coexpressing genes, which may be significant during the Hennekam syndrome or other primary immunodeficiency disorders. FTSite was used to look into the impact of substitutions on protein function. e FTSite server predicted three ligand-binding sites, each with 14, 7, and 9 residues. We discovered that R167W and T153N substitutions are involved in the ligand-binding site and form the catalytic coordination sphere, which can affect the CCBE1 protein's binding affinity. Since our research was thorough, it contains all of the necessary data and analysis for identifying the most harmful nsSNPs. Any research, including ours, has some limitations. e focus of our research is on mathematical and computational algorithms used in programming tools and web servers. As a consequence, experimental research is needed to confirm these findings. Our findings shed light on the CCBE1 gene's nsSNPs, protein 3D structure, PTM potential sites, and gene-gene interaction, and all of which may help researchers better understand the gene's role in autoimmunity and related diseases in the future.

Conclusion
e impact of nsSNPs on the functional and structural deviations in the CCBE1 protein was predicted using a variety of Taster > LRT > DANN > MetaLR > SNAP2 > VEST3> MutPred > PhD-SNP > Mutation Assessor > PROVEAN > SNP-GO > Cumulative ), prediction matching among the tools. As a consequence, the findings of these tools for other studies may be considered more reliable. e importance of rs374941368 and rs200149541 in the prediction of post-transcriptional modifications was highlighted because it affects a possible phosphorylation location. In the future, the 4 reported extremely deleterious, protein stability decreasing, and nsSNPs in highly conserved positions could be used as Hennekam syndrome marker nsSNPs. Even though we performed a thorough in silico study, further research is needed to fully understand the impact of these nsSNPs on protein structure and function.
Data Availability e data used in the article are given with the information from where the data were taken, e.g., (http://www.ncbi.nlm. nih.gov/snp/).

Ethical Approval
e study did not include any living objects to be studied; therefore, no ethical approval was needed.

Conflicts of Interest
e authors declare that they have no conflicts of interest.