Protein posttranslational modifications (PTMs) play key roles in a variety of protein activities and cellular processes. Different PTMs show distinct impacts on protein functions, and normal protein activities are consequences of all kinds of PTMs working together. With the development of high throughput technologies such as tandem mass spectrometry (MS/MS) and next generation sequencing, more and more nonsynonymous single-nucleotide variations (nsSNVs) that cause variation of amino acids have been identified, some of which result in the damage of PTMs. The damaged PTMs could be the reason of the development of some human diseases. In this study, we elucidated the proteome wide relationship of eight damaged PTMs to human inherited diseases and cancers. Some human inherited diseases or cancers may be the consequences of the interactions of damaged PTMs, rather than the result of single damaged PTM site.
More than 200 different types of protein posttranslational modifications (PTMs) have been detected. PTMs are involved in many protein activities and cellular processes, such as protein folding, stability, conformation, and some significant regulatory mechanisms [
With the development of high-throughput sequencing technology, gene mutation detection has become another important resource to investigate regulatory mechanisms and cellular processes. Some databases such as dbSNP [
A PTM site that bears nsSNVs can be defined as damaged PTM. Recently, large-scale studies have shown that damaged PTMs caused by numerous inherited and somatic amino acid substitutions [
However, some of these previous studies concluded the relationship between damaged PTMs and human health based on predications; some focused only on cancers and many focused on only unique type of PTM. Although data of both gene mutations and PTMs are increasing fast, the proteome-wide analysis on the relationship between damaged PTMs and human diseases is not well studied. In this work, we chose eight experimentally demonstrated damaged PTMs to elucidate their association to human diseases including inherited diseases and cancers (somatic diseases). These eight types of damaged PTMs include amino acid variations on Phosphorylation, Ubiquitylation, Acetylation, Glycosylation, Methylation, SUMOylation, Hydroxylation, and Sulfation, which have been well proved to play key roles in important cellular processes and have close relationship with human disease development; moreover, some cross talks among them have been recently revealed in the view of systematic biology [
The eight human PTM data sets of Phosphorylation, Ubiquitylation, Acetylation, Glycosylation, Methylation, SUMOylation, Hydroxylation, and Sulfation were obtained from SysPTM 2.0 (released in June, 2013) [
The inherited-diseases-related nsSNVs were obtained from ClinVar (accessed in November, 2013) [
For phosphorylation mapping, we set three criteria: exact match; ±2 sites around the phosphorylated amino acid; ±7 sites around the phosphorylated amino acid [
nsSNV affected PTM sites are defined as damaged PTMs in this work. Annotations of nsSNVs (deleterious or neutral) were based on the information from the databases mentioned above and on Online Mendelian Inheritance in Man (OMIM;
To further analyze the functions and features of diseases-related damaged PTMs and their proteins, enrichment analyses were performed using DAVID 6.7 (the database for annotation, visualization, and integrated discovery) [
As for the cross talks between some pairwise types of PTMs, positive and negative cross talks were both considered. Positive cross talk means that one PTM serves as a signal for the addition or removal of a second PTM, or for recognition by a binding protein that carries out a second modification. The negative cross talk could be direct competition for modification of one single residue on a protein, or one modification masks the recognition site of a second PTM [
The workflow and protocol of this study are shown in Figure
Numbers of nsSNVs on each PTM category.
PTM | # of exact match | # of ±2 match | # of ±7 match |
---|---|---|---|
Phosphorylation | 7005 | 33119 | 78123 |
Ubiquitylation | 1628 | 10096 | — |
Acetylation | 607 | 9642 | — |
Glycosylation | 385 | 2199 | — |
Methylation | 124 | 427 | — |
SUMOylation | 34 | 231 | — |
Hydroxylation | 29 | 328 | — |
Sulfation | 7 | 26 | — |
Workflow or protocol for identifying damaged PTMs and associated diseases.
Proportions of exact matched nsSNVs on each PTM out of all sites analyzed. Both the exact number of sites affected and the proportion are shown.
We calculated the PTMs affected by inherited disease and cancer-related nsSNVs, respectively, using hypergeometric test and found that phosphorylation affected by nsSNVs was most significantly related to both inherited diseases and cancers. The next is ubiquitylation; however, based on our calculation, it is not significant in inherited diseases, albeit significant in cancers when performing the exact match. The remaining types of PTMs affected by nsSNVs were not significantly associated with inherited diseases. When we expanded to ±2 amino acids around the modified sites, the damaged PTMs significantly associated with inherited diseases included not only ubiquitylation, but also acetylation and glycosylation. Our results implied that most PTMs affected by nsSNVs were cancer-related, rather than inherited-disease-related (see Tables
Numbers and
PTM | Inherited disease |
|
Cancer |
|
---|---|---|---|---|
Phosphorylation | 313 | 0.0133 | 2684 | 0.0197 |
Ubiquitylation | 59 | 0.0807 | 651 | 0.0172 |
Acetylation | 34 | 0.0701 | 233 | 0.1058 |
Glycosylation | 13 | 0.2062 | 57 | 0.1813 |
Methylation | 15 | 0.1638 | 67 | 0.0912 |
SUMOylation | 1 | 0.7752 | 22 | 0.0152 |
Hydroxylation | 2 | 0.7423 | 14 | 0.3507 |
Sulfation | 4 | 0.5503 | 0 | 0.5503 |
We chose the most frequent modified amino acids, such as Histidine (H), Serine (S), Threonine (T), and Tyrosine (Y) for phosphorylation, Lysine (K) for ubiquitylation, and made a calculation on the frequency of the appearance of nsSNVs on these modified amino acids. We found that the occurring frequency of the modified amino acids affected by nsSNVs was lower compared with their appearance on the whole proteome (data not shown). This demonstrated that the modified amino acids were less affected by mutations. Previous researches showed that PTM sites generally play a key role in normal cellular process like protein-protein interactions and signal transduction and therefore are more stable [
Phosphorylation is the best studied and also the most prominent PTM, which has the most abundant data as well [
Numbers and
PTM | Genetic disease |
|
Cancer |
|
---|---|---|---|---|
Phosphorylation | 1422 |
|
12826 | 0.0111 |
Ubiquitylation | 439 |
|
4074 |
|
Acetylation | 552 |
|
4019 |
|
Glycosylation | 214 | 0.0261 | 795 |
|
Methylation | 44 | 0.1036 | 231 |
|
SUMOylation | 11 | 0.1526 | 115 |
|
Hydroxylation | 22 | 0.1997 | 63 |
|
Sulfation | 7 | 0.2446 | 9 | 0.2526 |
In contrast, ubiquitylation shows little selectivity on primary sequence, such as Lysine, which is highly preferred as the target site of most E3 ubiquitin ligases [
Then, for the remaining four types of PTMs, the numbers of both exact match and ±2 range match were much less than those of the PTMs above, albeit these four types of PTMs are involved in a lot of important cellular processes, and recent works also discovered their related functions and diseases. For instance, SUMOylation proteins are implicated in human diseases including cancers and “Huntington’s, Alzheimer’s, and Parkinson’s diseases”; hydroxylation in Asp110Asn is related with “hemophilia b”; methylation in Arg75Trp is associated with “deafness” [
Although we found that a lot of damaged PTMs were related with human inherited diseases and cancers, however, almost half of the data remain to be elucidated on their relationships with human diseases. With more damaged PTMs being annotated and analyzed, their impact over health or disease development may become clearer.
For all of the eight PTM types studied, we annotated some curated information of diseases based on SwissVar, some annotation information were obtained from the source databases. Although the disease information is up-to-date, the limitation of different databases makes it hard to acquire all the information of known diseases. For instance, inherited-disease-related phosphorylation, “congenital, hereditary, and neonatal diseases and abnormalities,” is the most associated disease based on the analysis of SwissVar on exact matched inherited-diseases-related nsSNVs. The next is “skin and connective tissue diseases” and “nervous system diseases.” However, “neoplasms” account for the most part of the known diseases in ubiquitylation and acetylation.
In order to acquire more information on related diseases, we performed enrichment analysis of diseases using IPA (Figures
Diseases for each type of damaged PTM affected by nsSNVs in IPA. Threshold was chosen as
We then expanded our search range to the nsSNVs that could affect the PTMs: ±2, ±7 around phosphorylation sites and ±2 for the remaining types of PTMs. First, we chose ±2 range for all the 8 types of PTMs to analyze the associated diseases. For inherited diseases, “autosomal dominant disease” and “autosomal recessive disease” ranked top three in phosphorylation, Ubiquitylation, Acetylation, Glycosylation, Methylation, Hydroxylation, and Sulfation. This was clearly different from the exact matched results. Both autosomal diseases and X-linked hereditary diseases became significant when more nsSNVs were accumulated around PTM sites. The comparison between exact-matched and ±2 range-matched results indicates that (a) mutations on PTMs are rare and, only some certain kinds of inherited diseases were indicated to be caused by them, while more kinds of diseases were indicated to be caused by nsSNVs surrounding PTM sites; (b) human inherited diseases are closely associated with disturbances on and surrounding PTM sites.
Next, we analyzed the ±2 sites range-matched on cancers; the results did not introduce as many changes as exact-matched results. We also compared the data between ±2 and ±7 range around phosphorylation sites; however, their difference was not significant. The differences of human inherited diseases and cancers could be related with the damages of nsSNVs on PTM sites and phenotype: cancers are mostly caused by somatic mutations and present in the current generation; however, the damages of nsSNVs on PTM sites are not easily inherited to the next generation, so the numbers and types of inherited diseases are less compared with damaged-PTM related cancers.
We performed functional enrichment analysis using DAVID. First, we performed keywords and GO association analysis (FDR < 0.01). We still divided data into two parts: exact match and ±2 amino acids (AA) match. “Disease mutation” was the most significant keyword based on the inherited-disease-related nsSNVs that appeared in all the four types of PTMs: Phosphorylation, Ubiquitylation, Acetylation, and Glycosylation. The enrichment analyses showed that the proteins we chose were more likely related to diseases when they encountered mutations. GO enrichment analysis was also performed for the four types of PTMs mentioned above. For each PTM category, the differences of functions among them are obvious (see Table S3). For example, the proteins with phosphorylation mainly involve cell activities like cell death, apoptosis, and signal transduction. Coagulation and wound healing were the GO tags for glycosylation. Through the analyses, we found that the diseases led by the damaged PTMs were closely associated with the role of these proteins played in the regulation of normal cellular processes, which indicated that the damage caused by damaged PTMs was serious.
When we moved to cancer-related nsSNVs on PTMs, the keywords about them had less information about mutations, but rather directing to the function of the proteins. What interested us the most was ubiquitylation; the keywords did not show much about themselves, but other modifications on them. This indicates that ubiquitylation is more likely coexisting with other types of PTMs. Then we examined the GO terms on cancers, besides the functions of the proteins performed, also the chemical characters of them showed up. Like phosphorylation, the most significant GO term about phosphorylation was “protein amino acid phosphorylation” on both exact match and ±2 range match. For the remaining types of PTMs, GO terms more revealed protein roles on different processes; for example, “modification-dependent protein catabolic process” ranked in the top two on both range criteria of ubiquitylation.
Then we examined the damaged PTMs associated domains based on the data from Pfam to analyze the impact of damaged PTMs on protein structures. For damaged phosphorylation, “protein tyrosine kinase” (
In order to investigate the function of damaged PTMs in proteome-wide scale, we performed pathway analysis by IPA (details available in Table S4). In IPA analysis for inherited-disease associated damaged PTMs of the exact matched data, some pathways are significant: “ovarian cancer signaling” in Phosphorylation (corrected
On the proteome-wide range, the associations among these proteins were close, and we illustrated the interactions using networks of protein-protein interactions with STRING (Figure
Cross talk between some paired PTMs of different types such as phosphorylation and ubiquitylation and ubiquitylation and acetylation, has become a study theme on proteomics [
The cross talk of disease-related phosphorylation site Y62 with other PTM sites in protein PTN11_HUMAN. The two “SH2” and one “PTPc” boxed in green and pink are domains in the protein; green lines and yellow lines show the association between PTM sites based on evidence of coevolution and physical distance, respectively. Disease-related PTM sites are boxed in red.
Network of protein-protein interactions among the proteins carrying inherited-disease or cancers related damaged PTMs identified by SwissVar. The proteins were divided into six parts; each category was circled by different colors except for phosphorylation in the center: red represented acetylation, green represented methylation, black represented glycosylation, blue represented hydroxylation and yellow represented ubiquitylation. Stronger associations were represented by thicker lines.
On the proteome-wide range, the associations were more prevalent. Then we took P53_HUMAN and TOP1_HUMAN as examples for the cross talks between different PTM sites on distinct proteins: on P53_HUMAN, we found 21 phosphorylation sites, 14 ubiquitylation sites, and 9 acetylation sites; among them, the associations were prevalent within the protein, and the damaged PTMs mostly resulted in the deficiency in the role it played in significant cellular functions [
The cross talks between the ubiquitylation site K326 of protein TOP1 with other PTM sites on TP53. Green lines show the association of K326 with other PTM sites based on the evidence of coevolution. Some domains on the two proteins are also given, largely boxed in blue and grey. The different PTMs boxed in red show disease-related PTM sites and those with more than one kind of PTM on the same residue were boxed in black.
For the negative cross talk, where more than one kind of PTMs could happen on the same residue, could be occurred in different stage of cellular processes or on different positions. We chose three pairwise PTMs to perform the analysis: phosphorylation and ubiquitylation, phosphorylation and acetylation, and ubiquitylation and acetylation. For the first and second group, phosphorylation and ubiquitylation, and phosphorylation and acetylation, the exact match sites were not overlapped, but when we used damaged ubiquitylation and acetylation sites to match with ±7 sites around phosphorylational sites, we obtained 12 overlapping sites and 10 overlapping sites, respectively, for ubiquitylation and acetylation, and, among them, 7 and 5 sites were on P53_HUMAN, respectively. For example, K320 on TP53 could be ubiquitylated or acetylated (Figure
The damaged PTMs may cause protein functions to be out of control in canonical pathways [
In summary, through this work, we investigated the associations between PTMs affected by nsSNVs and human inherited diseases and cancers from diverse perspectives such as functions, pathways, and cross talks. These provided us a proteome-wide view of how the proteins, which carry modifications and nsSNVs, play roles in the development of diseases and cancers. Not only do PTMs play key roles in almost every important cellular process, but also their dysfunction could result in human diseases. We provided a practical protocol to analyze disease-related proteins that carry damaged PTMs; some valuable proteins were listed out as the candidate biomarkers for potential research and clinical use. However, still almost half of damaged PTMs did not demonstrate associations with human health based on our current analysis, and their functions need to be revealed. Moreover, what we need to do in the future is to identify the causative relationships between the damaged PTMs and human diseases, by discovering key nsSNVs on protein modifications.
Protein posttranslational modification
Nonsynonymous single-nucleotide variations
Gene Ontology
The Cancer Genome Atlas
Cyclin D1
Amino acid.
The authors confirm that this paper’s content has no conflict of interests.
This work was funded by National Hi-Tech Program (2012AA020201); Key Infectious Disease Project (2012ZX10002012-014); National Key Basic Research Program (2010CB912702, 2011CB910204).