The HIV virus is one of the most studied viruses in the world. This is especially true in terms of gene sequencing, and to date more than 9 thousand genomic sequences of HIV isolates have been sequenced and analyzed. In this study, a series of DNA sequences, which have the potential to form G-quadruplex structures, is analyzed. Several such sequences were found in various coding and noncoding virus domains, including the U3 LTR, tat, rev, env, and vpx regions. Interestingly, a homological sequence to the already well-known HIV integrase aptamer was identified in the minus-strand. The sequences derived from original isolates were analyzed using standard spectral and electrophoretic methods. In addition, a recently developed methodology is applied which uses induced circular dichroism spectral profiles of G-quadruplex-ligand (Thiazole Orange) complexes to determine if G-rich sequences can adopt G-quadruplex structure. Targeting the G-quadruplexes or peptide domains corresponding to the G-rich coding sequence in HIV offers researchers attractive therapeutic targets which would be of particular use in the development of novel antiviral therapies. The analysis of G-rich regions can provide researchers with a path to find specific targets which could be of interest for specific types of virus.
The human immunodeficiency virus (HIV) is an RNA retrovirus in the Retroviridae family which causes HIV infection and over time can lead to acquired immunodeficiency syndrome (AIDS). The virus belongs to the single-stranded positive-sense RNA
Recently it was confirmed that consensus sequences forming stable G-quadruplex structures are responsible for RNA replication and inhibition of protein translation of hepatitis C virus [
Our recently published results have also highlighted the significance of some G-rich regions in regulating areas with the ability to form stable G-quadruplexes in papilloma viruses [
G-quadruplex structures also seem to be critical for HIV-1 infectivity and could represent novel targets for antiviral drug development. For example, it is known that mutations disrupting G-quadruplex formation can enhance HIV promoter activity in cells and that treatment with G-quadruplex ligands decreases promoter activity and displays antiviral effects [
Retroviral RNAs are now known to dimerize via G-rich regions in the cytoplasm of infected cells allowing two copies of the genome which is encapsidated in the newly produced virion [
G-rich sequences can form bimolecular G-quadruplex structures in the gag region of the HIV-1 genome, in close proximity to the dimer initiation site (DIS) [
G-quadruplexes derived from the sequence of the negative regulatory factor (Nef) of HIV-1 were recently analyzed in vitro [
Therefore, the main goal of this study is to scrutinize HIV provirus genomes in an attempt to find G-rich regions which may be prone to forming G-quadruplex motifs. Several tools and strategies are available to predict G-quadruplex propensity from some sequences, but there are disadvantages and limitations associated with each algorithm [
DNA oligonucleotides used in this study originating from HIVs and SIVs.
All chemicals and reagents were obtained from commercial sources. DNA oligomers were obtained from Metabion, Germany (Figure
The search criteria for G-quadruplex forming sequences were restricted to sequences which possessed three continuous G-runs containing at least three neighboring Gs and one G-run containing only two neighboring Gs. We aimed to identify sequences with 1–4 nucleotides occurring between two continuous G-runs and with fewer than 9 nucleobases between G-runs in total; thus, the total required number of Gs was set at a minimum of 12. Initially, the reading frame of DNA was adjusted to 20 nucleotides (Figure
Strategy and searching criteria of putative G-quadruplex sequences. Randomly selected genomic sequences of HIV-1, HIV-2, and SIV from NCBI Gene database were analyzed (panel (a)). The criteria used to determine putative G-quadruplex sequences are listed in panel (b).
The sequences fulfilling these criteria were considered as putative G-quadruplex forming sequences. If an additional G-run was located in close proximity (i.e., less than 3 nucleotides) to the putative sequence, it was also judged to be suitable for inclusion. In principle, we applied criteria similar to those utilized by QGRS mapper and the more comprehensive mining tool QuadBase2, a program which predicts G-quadruplex forming G-rich sequences (QGRS) in nucleotide sequences [
The randomly selected complete genomic sequences of 20 different HIV-1, 5 HIV-2, and 5 SIV viruses were analyzed, thereby identifying the sequences listed in Figure
CD spectra were recorded on a Jasco J-810 spectropolarimeter equipped with a PTC-423L temperature controller using a quartz cell of 1 mm optical path length in a reaction volume of 150
DNA titration was performed with increasing concentrations of Thiazole Orange (TO). TO was solubilized in DMSO to reach a final concentration of stock solution of 10 mM. The concentrations of DNA and TO in a 1 mm quartz cell were 30
CD melting profiles were collected at ~295 and ~265 nm as a function of temperature, using a procedure which has been published previously [
The conditions and parameters used in the examination of the thermal difference spectra were identical to those used in the CD spectroscopy assay. The spectra analysis performed in this study has been described in an earlier publication [
Native polyacrylamide gel electrophoresis (PAGE) was performed in a temperature controlled vertical electrophoretic apparatus (Z375039-1EA, Sigma-Aldrich, San Francisco, CA). Gel concentration was 12% (19 : 1 monomer to bis ratio, Applichem, Darmstadt). Approximately two micrograms of DNA was loaded onto 14 × 16 × 0.1 cm gels. Prior to loading, each DNA sample was heated to 95°C for 5 min in an appropriate buffer and cooled to room temperature. Electrophoreses were performed at 20°C for 4 hours at 120 V (~8 V·cm−1). DNA oligomers were visualized with Stains-All immediately after electrophoresis, and the electrophoretic record was photographed on a white pad with a Nikon D3100 camera. The gel was also later stained by silver staining procedure in order to improve the sensitivity of the DNA visualization [
Although many different HIV sequence comparisons have been performed to date, this study offers an alternative means of identifying putative G-quadruplex forming sequences. The search criteria were not restricted to the LTR regions of the proviral HIV genome, but were instead applied to the entire proviral HIV genome. In this overview, the occurrence of 16 selected oligonucleotide sequences within more than nine thousand previously sequenced HIV/SIV genomes was examined in an attempt to identify some general relationship between them. The sequence structure consisting of three G-runs containing at least three neighboring Gs and one G-run containing two Gs was found in HIV-1, HIV-2, and SIV provirus DNAs. These sequences and their sources are summarized in Figure
Schematic drawing of the HIV-1 and HIV-2 genome organizations and locations of studied sequence.
The sequence H2-U22 was found only in one HIV-2 isolate in the terminal part of the vpx region (ID: U22047.1), but its derivatives containing 1–3 point mutations were found in an additional 9 HIV-2 isolates. H2-U38 is a truncated version of H2-U22, and this sequence was found again in the same region in an additional five HIV-2 genomic sequences (ID: M30502.1, U38293.1, M31113.1, U22047.1, and KU168289.1). The first 20 nucleotides of H2-M15 are identical to those of the H2-U22 and H2-U38 sequences, and this oligomeric sequence occurred very rarely in HIVs, being found in only 2 isolates of HIV-2 in vpx region (ID: X05291.1, M15390.1) and two derivatives containing 1-2 mutations. Interestingly, both sequence derivatives can also be found in other organisms. Considering the extreme rarity of these three sequences in HIV-2 genomes, their significance and biological role are questionable.
The G-rich region located in the env gene of HIV-1 is also a promising potential source of G-quadruplex formations. The env and rev coding sequences overlap, but their reading frames are different. This region contains the H1-JN-A sequence which occurs in only 11 HIV-1 isolates, but a derivative in which the central guanosine is substituted for adenosine (AGGGACTGAG
A number of different research projects have attempted to identify conserved structural motifs in highly variable viruses which can be used as specific targets for the development of efficient antiviral therapies. Interestingly, H1-JN-A sequence encodes the oligopeptides Gly–Leu–Arg–Leu–Gly–Trp–Glu and Gly–Thr–Glu–Ala/Thr–Gly–Val–Gly which are integral parts of Env and Rev proteins, respectively. These oligopeptide motifs are highly abundant in HIV-1 proteins and were found in more than 1140 coding sequences of Env and Rev.
All of the sequences described thus far are found in the plus-DNA/RNA strand. However, the sequence 5′-ACCCACCTCCCAACCCCG-3′ is typically located in the plus-strand of HIV-1 at the beginning of the second exon of tat/rev and env genes. This motif is complementary to the sequence 5′-d(CGGGGTTGGGAGGTGGGT)-3′ in the minus-strand and, interestingly, is very similar to the well-known HIV-93del aptamer d(GGGGTGGGAGGAGGGT), which forms very stable interlocked dimeric G-quadruplex [
Although the homology of H1-K03 and aptamer sequences is undoubtedly interesting, we are unable to offer a convincing explanation for the phenomenon. Nevertheless, this is the first reported case of a natural coding sequence being homological to an aptamer which was originally developed against the protein produced by the same organism. Is this merely a coincidence or is it an exception? However, if the sequence was located in the coding strand, it would be possible to elucidate an explanation or a convincing theory about the biological role of the sequence.
In recent years, a wide range of research and publications has focused on the study of G-rich sequences in LTR. In principle, the results of our research into U3 LTR sequences fully corroborate the earlier findings of other authors [
The H1-JN sequence was found in a significantly higher number of HIV-1 variants, 3181 hits in HIV-1 genomic sequences and only one in the SIV isolate. This sequence was not found to occur in HIV-2 and other organisms. Recent studies have identified and described the structure of HIV-1 sequence LTR-IV: d(C
H1-K02 and H1-M27 sequences are highly homological. These sequences partially overlap with H1-JN; their occurrence in HIV-1 genome was again found to be very high and is identified in more than 1700 various isolates in the NCBI database. The first guanosine is highly conservative in both sequences. This guanosine could be essential for the formation of G-tetrads and may also contribute to the stability of G-quadruplexes exhibiting bulge features. The large size of the statistical set would suggest the likelihood of higher numbers of nucleobase variations, and this was confirmed by the BLAST analysis of the region occurring between NF-
The sequence alignment of the region containing the sequences used in this study, located in LTR of HIV-1 and SIV, is summarized in Figure
Sequence alignment of LTR regions containing the sequence listed in Figure
The formula including all possible variants is as follows:
Targeting G-quadruplexes including the possible variations located in the LTR coding sequence of HIVs can therefore offer an attractive therapeutic opportunity for the development of highly efficient inhibitors of processes depending on the secondary motifs in this regulating region.
Another aim of this study was to confirm the ability of the studied oligonucleotides to form stable G-quadruplexes. In order to ascertain this, a series of experiments using circular dichroism analysis was performed (Figure
CD spectra of HIV oligomers in modified 25 mM mBR buffer (pH 7.0) in the presence of 50 mM KCl. The corresponding UV, TDS, and CD melting curves obtained at 265 and 293 nm are shown in Supporting Figures
Information about the molecularity and the presence of multimeric conformers of G-quadruplexes can be obtained by examining samples using electrophoretic separation [
Melting temperatures and molecularities of the studied DNA oligonucleotides in the presence of 50 mM sodium and potassium ions.
Potassium | Sodium | ||||
---|---|---|---|---|---|
Oligo. | Fold |
|
Fold |
|
|
|
H2-U38 | M, D | 56.8 |
M, D | 52.4 |
|
H2-U22 | M | 56.6 |
|
ND |
|
H2-M15 | M | 57.1 |
M, D | 46.6 |
|
H1-JX | M | 57.5 |
M > D | 30.3 |
|
H1-JX1 | M | 61.6 |
M > D | 35.8 |
|
H1-L20 | M | 58.2 |
M | 36.4 |
|
H2-U38B | M < D | 59.5 |
M | 55.3 |
|
H2-J0 |
|
51.8 |
|
50.8 |
|
H2-M15B | M | 46.5 |
|
44.2 |
|
S-M30 | M | 67.8 |
M | 49.6 |
|
S-JX | M, D |
54.2 |
M, D | 35.0 |
|
H1-JN | M | 48.5 |
M > D | 38.7 |
|
H1-K02 | M | 48.4 |
M > D | 32.1 |
|
H1-M27 | M | 39.6 |
M | ND |
|
H1-K03 | D, H |
60.3 |
M |
39.7 |
|
H1-JN-A | M | 58.6 |
|
37.2 |
Molecular standard S, the mix of d(AC)9, d(AC)14, and d(AC)18, was used. Electrophoretic separation was performed in a 14% polyacrylamide gel at 10°C in 25 mM Britton-Robinson buffer (pH 7.0) and 50 mM KCl at 8°C in (a) and 50 mM NaCl in (b). Prior to being used, the DNA sample was heated in the same buffer for 5 min at ~98°C and slowly cooled to room temperature within 30 min.
Interestingly, the fastest band of H1-K03 represents a dimeric conformer, while the slowest and middle bands correspond to high-ordered structures. Preliminary NMR data indicates that H1-K03 forms an interlocked G-quadruplex structure which is analogical to HIV integrase aptamer [
It is important to note here, however, that not each band in a certain electrophoretic column necessarily represents a G-quadruplex structure.
The profiles of thermal difference spectra (TDS) of G-quadruplexes are highly specific, and therefore this analysis was also performed on the studied oligonucleotides, although, as our previous studies have noted, this technique is not wholly reliable and may provide erroneous results [
More reliable results can, however, be obtained from melting curve analysis. The methodology of this technique is based on the fact that all known G-quadruplexes are more stable in the presence of potassium than in the presence of sodium ions [
Recently, a newly developed experimental methodology using the ligand Thiazole Orange (TO) for the identification of G-quadruplex forming sequences has been applied [
The representative CD titration spectra of ~27
The ICD results display the expected positive signals at ~495 and ~510 nm and the negative signals at ~475 nm. These signatures are characteristic for TO-quadruplexes complexes [
As expected, each oligonucleotide was also found to have formed G-quadruplexes under the given conditions. Signals corresponding to those of G-quadruplex structures were also clearly detected in the UV region. In case of antiparallel G-quadruplexes, the signals at 295 and 265 nm were seen to decrease and increase, respectively, by increasing the concentration of TO, phenomena which are indicative of the conversion from antiparallel to parallel folding.
The titration analysis of the HPV25 sequence shows that ICD was mirrored; the positive peaks become negative and vice versa. It is therefore possible to assume that the binding mode of TO with this sequence must be different than those typical for G-quadruplex motifs (Figure
CD titration spectra of 27
Our bioinformatic study of HIV genomes partially corresponds with the analyses recently published by many authors in that it too focuses primarily on G-rich regions located in U3 LTRs. However, this study reveals that G-quadruplexes can be formed in HIV provirus DNA when it only consists of three G-runs and one G2. These domains are not necessarily located only in regulating LTRs, but also in other gene coding regions. In addition, G-rich domains were also located in the minus-strand of many HIV-1 isolates, the sequence of which is highly homological with the well-known sequence forming the interlocked and extremely stable HIV integrase aptamer [
Several unanswered questions require deeper analysis to determine the features that provide specific G-quadruplex motifs with the ability to function as structural elements.
In this study, we used only the cost-effective methods to confirm that some oligonucleotides form G-quadruplex motifs. However, we again demonstrate that ICD signal of TO-quadruplex complex offers valuable additional information, allowing distinguishing whether an unknown sequence has ability to adopt G-quadruplex structure.
The authors declare that they have no conflicts of interest.
This work was supported by the Slovak Research and Development Agency under Contract no. APVV-0280-11, European Cooperation in Science and Technology (COST CM1406), Slovak Grant Agency 1/0131/16, and internal university grants (VVGS-PF-2017-251 and VVGS-2016-2596). The authors thank G. Cowper for critical reading and correction of the manuscript.
Table S1: the sequence alignment of the regions determined by NF-