The lengths of intergenic regions between neighboring genes that are convergent, divergent, or unidirectional were calculated for plastids of the rhodophytic branch and complete archaeal and bacterial genomes. Statistically significant linear relationships between any pair of the medians of these three length types have been revealed in each genomic group. Exponential relationships between the optimal growth temperature and each of the three medians have been revealed as well. The leading coefficients of the regression equations relating all pairs of the medians as well as temperature and any of the medians have the same sign and order of magnitude. The results obtained for plastids, archaea, and bacteria are also similar at the qualitative level. For instance, the medians are always low at high temperatures. At low temperatures, the medians tend to statistically significant greater values and scattering. The original model was used to test our hypothesis that the intergenic distances are optimized in particular to decrease the competition of RNA polymerases within the locus that results in transcribing shortened RNAs. Overall, this points to an effect of temperature for both remote and close genomes.
The dependence of intergenic distances between each other and on the species’ optimal growth temperature was considered in three large groups: plastids of the rhodophytic branch, archaea, and bacteria. For consistency, we use the term
Plastids are semiautonomous organelles originating from cyanobacteria; in the rhodophytic branch, they are represented in red algae (Rhodophyta) as well as in species with plastids of secondary or tertiary origin from Rhodophyta plastids. Such species belong to the superphyla Alveolata and Heterokonta (classes Bacillariophyceae, Bolidophyceae, Chrysophyceae, Dictyochophyceae, Eustigmatophyceae, Phaeophyceae, Xanthophyceae, and Raphidophyceae) as well as to the phyla Cryptophyta and Haptophyta. Plastids of brown, diatom, yellow-green, and other related algae grouped into Stramenopiles also have secondary origin from plastids of red algae [
Archaea are prokaryotes distinct from bacteria and are putative ancestors of eukaryotes [
Taking into account the cyanobacterial origin of rhodophytic plastids, the considered bacteria include no cyanobacteria except
A genome is described here by three medians for each type of gene arrangement: convergent, divergent, and unidirectional. These medians are referred to as
The correlation coefficients for
Linear regression plots for pairs of medians in plastids (a), Archaea (b), and Bacteria (c). Simple regressions are shown in gray and Deming regressions in red. Negative values are due to overlapped genes.
Genome assignment to one of the thermal ranges is indicated by a number in the Supplementary Spreadsheet (column
The analyzed plastids (59 genomes in total) excluding that of
The considered archaea (123 genomes) can be clustered into three parts by OGT, which is natural since little data are available for archaea living at temperatures below 30°C (in contrast to plastids). These intervals are 20–40°C (47), 40–65°C (15), and 65–115°C (61). The corresponding Fisher indices
The considered bacteria (810 genomes) can also be clustered into four parts by OGT with the following intervals: 5–30°C (305), 30–40°C (406), 40–65°C (45), and 65–85°C (54). The corresponding Fisher indices
In all groups, the best regressions between each of the medians and OGT (designated as
Scatter plots for temperature versus medians and the corresponding nonlinear regressions for plastids (a), archaea (b), and bacteria (c).
We assume that at high temperature the distance between genes is generally smaller than that at low temperature. As an example, let us consider a pair of species
A locus in
Now let us consider the ratio
The model requires that the elongation rate of RNA polymerase is specified, which is not known precisely from experiments. Ryals et al. [
The boundaries of growth and optimal temperatures as well as lethal or sublethal (suppressing growth) temperatures (marked by an asterisk) were considered for both bacterial species:
In the case of
A rather different situation is observed for
Also important are the mechanisms that coordinate intergenic distances and optimal temperature of the species on the proportion of 1D to 3D diffusion rates of RNA polymerases. Proteins that act at specific DNA sequences initially bind random DNA and then translocate to the target site. Proteins move along DNA by multiple cycles of dissociation and reassociation with the same DNA. Each landing at a new site is then followed by a series of one-dimensional diffusion steps covering 50–100 bp around a site [
This suggests that the proportion between the contributions of 1D diffusion and 3D diffusion to the recognition of specific DNA binding sites by both RNA polymerases and transcription termination factors can be important at least in the case of divergent and unidirectional genes. Thus, 1D diffusion matters at low temperature and long intergenic regions serve as an “antenna” assembling the proper RNA polymerase and transcription factors. At high temperature, 1D diffusion is not as significant and this antenna is not needed; moreover, a high elongation rate allows RNA polymerase to transcribe long operons using a single promoter without additional ones.
A linear relationship between the medians is beyond question; the corresponding linear regressions are unconditionally confirmed by the statistical test for plastids, archaea, and bacteria. Analysis of the relationship between the medians and the temperature
Alterations of environmental temperature (resulting from changed habitat or climate) underlie the predominant survival of organisms with the median intergenic distances fitting the temperature. Our hypothesis is the following. Evolutionary selection provides for the adaptation of the medians to environmental temperature so that high temperatures and small medians as well as low temperatures and large medians favor efficient survival in at least three considered groups. Indeed, the relationship between the medians and temperature is quite similar for distant groups: plastids, archaea, and bacteria; which points to a macroevolutionary effect of temperature. Adaptation to temperature also proceeds at the microevolutionary level: a locus was exemplified in close species
Although at high temperatures only small median values are observed, the species living at relatively low temperatures feature the more dispersed medians.
A significant deviation from the general pattern is observed in the plastids of phototrophic alveolates:
The plastids of algae of the family Bangiaceae with a complex temperature-dependent life cycle have the distances (with an account of all three types, see the sheet “Bangiaceae” in the Supplementary Spreadsheet) similar to those in other algae living at low temperature (Figure
Large distances between genes in the plastids of
For all types, the small medians in the apicoplasts of
The medians take on only small values at high temperatures (Figure
The results obtained for plastids of the rhodophytic branch are qualitatively similar to those for archaea. However, there are significant distinctions, in particular, the minimum standard deviation, coefficients of the corresponding regressions, and the proper medians for three types of gene arrangement markedly differ.
Only minor differences in temperature and the medians are observed within the classes Archaeoglobi, Halobacteria, and Thermococci (Figure
Convergent genes often overlap in many hyperthermophiles. At low temperatures, it can cause competition between RNA polymerases in the course of elongation. This effect is confirmed by modeling in plastids [
In bacteria, the overlapping of unidirectional or convergent neighboring genes is typical for mesophiles and thermophiles but not for psychrophiles. The greatest medians of any type correspond to mesophiles (Figure
A large volume of data from plastids of the rhodophytic branch as well as complete archaeal and bacterial genomes was used to demonstrate the following uniform and statistically significant patterns. The median distances between convergent, divergent, and unidirectional neighboring genes are linearly related to each other. The optimal growth temperature and each of the three medians are exponentially related. The equations relating the medians as well as the optimal growth temperature to each median have their leading coefficients of the same sign and order of magnitude, which can indicate a universal pattern of these relationships. Similar results for such distant genomes are prominent at the qualitative level as well. For instance, the medians are low at high temperatures. At low temperatures, the medians tend to statistically significant greater values and scattering. We propose that changes in environmental temperature optimize intergenic distances, among other things, to decrease the competition of RNA polymerases within the locus that results in transcribing shortened RNAs. Overall, this points to an effect of temperature for both remote and close genomes.
Plastid, archaean, and bacterial genomes were extracted from GenBank, and the OGT values are taken from [
All data generated or analyzed during this study are included in this article and Supplementary Materials.
The authors declare no conflicts of interest.
The reported study was funded by RFBR according to the research project no. 18-29-13037.
Supplementary Spreadsheet.xlsx: this Microsoft Office Spreadsheet (cited as Supplementary Spreadsheet) contains the source data: median intergenic distances and optimal growth temperatures. Each row contains the following: sequence accessions; medians