Pseudogene, disabled copy of functional gene, plays a subtle role in gene expression and genome evolution. The first step in deciphering RNA-level regulation of pseudogenes is to understand their transcriptional activity. So far, there has been no report on possible roles of nucleosome organization in pseudogene transcription. In this paper, we investigated the effect of nucleosome positioning on pseudogene transcription. For transcribed pseudogenes, the experimental nucleosome occupancy shows a prominent depletion at the regions both upstream of pseudogene start positions and downstream of pseudogene end positions. Intriguingly, the same depletion is also observed for nontranscribed pseudogenes, which is unexpected since nucleosome depletion in those regions is thought to be unnecessary in light of the nontranscriptional property of those pseudogenes. The sequence-dependent prediction of nucleosome occupancy shows a consistent pattern with the experimental data-based analysis. Our results indicate that nucleosome positioning may play important roles in both the transcription initiation and termination of pseudogenes.
Pseudogenes are produced from protein-coding genes during evolution. Though highly homologous with their parent genes, pseudogenes are unable to synthesize functional protein due to the defects in their sequences. There are two major types of pseudogenes: duplicated pseudogenes and processed pseudogenes (or retropseudogenes). The former type is created by genomic duplication and the latter by retrotransposition [
Many unexpected discoveries of biological functions for pseudogenes challenge the popular belief that pseudogenes are nonfunctional and simply molecular fossils. A nitric oxide synthase (NOS) pseudogene functions as a regulator of the paralogous protein-coding neuronal nitric oxide synthase (nNOS) gene by producing antisense RNA that forms a duplex with some of the gene’s mRNA [
The variety of known or suspected pseudogene functions discovered to date suggests that pseudogenes as a whole have a wide range of previously unsuspected functions. Of the functions, RNA-level functions are of great importance and are most frequently discussed. The prerequisite of understanding the RNA-level functions of pseudogenes is to explore their transcriptional activity. It has been shown that the nucleosome, a fundamental composing unit of the chromatin structure in eukaryotes, affects gene transcription in that it modulates the accessibility of underlying genomic sequence to proteins [
A total of 201 consensus pseudogenes, including 124 processed pseudogenes and 77 duplicated pseudogenes, were identified in ENCODE regions [
The statistics of pseudogenes.
Transcribed | Nontranscribed | |
---|---|---|
Processed | 192 | 106 |
Duplicated | 0 | 57 |
|
||
Total | 192 | 163 |
Experimental data-based nucleosome occupancy profile mapping to the human genome (hg18) was taken from Schones et al. [
Conformational energy is to be calculated on the basis of the geometrical description of DNA double helix structure. According to Cambridge Convention [
Nucleosomal DNA deformation is viewed as forced bending. It is assumed that torque
In (
The empirical parameters of our model for conformational energy calculation consist of force constants (
The dinucleotide-dependent force constants and parameters
Step |
|
|
|
|
---|---|---|---|---|
AA/TT | 0.2 | 0.406 | 0.76 | −1.84 |
AT | 0.124 | 0.641 | −1.39 | 0 |
AG/CT | 0.077 | 0.28 | 3.15 | −1.48 |
AC/GT | 0.085 | 0.302 | 0.91 | −0.64 |
TA | 0.064 | 0.365 | 5.25 | 0 |
TG/CA | 0.059 | 0.393 | 5.95 | −0.05 |
TC/GA | 0.097 | 0.408 | 3.87 | −1.52 |
GG/CC | 0.075 | 0.218 | 3.86 | 0.4 |
GC | 0.057 | 0.256 | 0.67 | 0 |
CG | 0.04 | 0.255 | 4.25 | 0 |
According to Boltzmann distribution, the potential of forming a nucleosome which centers at position
Normalized nucleosome occupancy at every base-pair is calculated by the log-ratio between the corresponding absolute nucleosome occupancy
As shown in Figure
Experimental nucleosome occupancy around start positions and end positions of pseudogenes.
An obvious nucleosome depletion detected upstream of the start positions of transcribed pseudogenes, suggesting that the nucleosome depletion at the region may promote the pseudogene transcription by exposing the underlying sequence in a linker region, which is accessible for transcription factor binding. A similar depletion at the region downstream of the end positions of transcribed pseudogenes might imply the role of nucleosome positioning in transcription termination by facilitating the sequence to form hairpin structure to terminate transcription. Note that the nucleosome depleted regions detected upstream of the start positions and downstream of the end positions of transcribed pseudogenes match well with the transcription start region and transcription end region of the pseudogenes, respectively.
As compared with transcribed pseudogenes, nucleosome depletion both upstream and downstream of the nontranscribed pseudogenes is unexpected since nucleosome depletion in those regions is thought to be unnecessary in light of the nontranscriptional property of those pseudogenes.
The overall distribution trend of experimentally determined nucleosome occupancy around both start positions and end positions of pseudogenes is reproduced successfully by our computational model (Figure
Calculated nucleosome occupancy around start positions and end positions of pseudogenes. Analysis of variance (ANOVA) shows significant differences of average nucleosome occupancy between transcribed and nontranscribed pseudogenes (
Pseudogenes provide a natural resource of relics for researchers to explore the chromatin response to sequence mutations that are enriched in pseudogenes. Specifically, a number of structurally similar but not identical pseudogenes can be produced from a single functional gene during evolution. In particular, each of the high-transcriptional ribosomal protein genes tends to have many, in some cases over 100, pseudogenes. A simple way to test the possible change of nucleosome distribution over pseudogenes is to correlate the nucleosome occupancy over the pseudogenes with their evolutionary distances. To do this, we first downloaded the annotation (hg16-based) for 2536 ribosomal protein (RP) pseudogenes [
The proportion of significant Spearman correlations between nucleosome occupancy and pseudogene characteristics with regard to 79 RP pseudogene families.
pgene GC | Identitya | Divergencea | |
---|---|---|---|
Predicted | 68/77b ( |
41/77 ( |
41/77 ( |
Experimental | 3/77 ( |
3/77 ( |
5/77 ( |
bAmong 79 RP pseudogene families, there are two RP pseudogene families whose lengths are not up to 129 bp, a minimum required size for nucleosome occupancy prediction.
cThe average of the significant Spearman correlation coefficients and the number of positive significant correlations were indicated in the parenthesis.
Our data clearly illustrate that predicted nucleosome occupancy over pseudogenes tends to positively correlate with their DNA identity, suggesting that the ability of the pseudogenes to form nucleosome(s) tends to decline in the process of their evolution. However, we did not detect a positive correlation between experimental nucleosome occupancy and DNA identity. There are three possible reasons for this. Firstly, the effects of some nonsequence factors which are likely to play a larger role in nucleosome positioning in human than in simple eukaryotes, such as yeast, exceed the sequence-induced effect on nucleosome positioning [
We also found a significant correlation between pseudogenes’ divergence and their predicted nucleosome occupancy, indicating again the decreasing trend of nucleosome-forming ability of pseudogenes during their degradation process. Furthermore, there is a strong positive correlation of predicted nucleosome occupancy of pseudogenes with their GC content, consistent with the previous finding that GC content dominates intrinsic nucleosome occupancy [
In this report, we analyzed the organization of nucleosomes around pseudogenes and compared between transcribed and nontranscribed pseudogenes. Experimental data-based analysis shows nucleosome depletion both upstream of the start positions and downstream of the end positions of transcribed pseudogenes, suggesting that nucleosome positioning plays an important role in both transcription initiation and transcription termination of pseudogenes. A similar depletion of nucleosomes is detected for nontranscribed pseudogenes, which is likely to be caused by sequence-dependent nucleosome-inhibitory effect. We also applied a sequence-dependent model for calculating nucleosome occupancy to pseudogenes and obtained consistent pattern with experimental nucleosome organization.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work was supported by Grants from the National Natural Science Foundation (61102162, 61271448, and 61361014) and the Program for Young Talents of Science and Technology in Universities of Inner Mongolia Autonomous Region (NJYT-14-B10).