Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.216 Conference Review Endogenous retroviruses and human

Humans share about 99% of their genomic DNA with chimpanzees and bonobos; thus, the differences between these species are unlikely to be in gene content but could be caused by inherited changes in regulatory systems. Endogenous retroviruses (ERVs) comprise ∼ 5% of the human genome. The LTRs of ERVs contain many regulatory sequences, such as promoters, enhancers, polyadenylation signals and factor-binding sites. Thus, they can influence the expression of nearby human genes. All known human-specific LTRs belong to the HERV-K (human ERV) family, the most active family in the human genome. It is likely that some of these ERVs could have integrated into regulatory regions of the human genome, and therefore could have had an impact on the expression of adjacent genes, which have consequently contributed to human evolution. This review discusses possible functional consequences of ERV integration in active coding regions. Copyright  2002 John Wiley & Sons, Ltd.


Introduction
Humans tend to think of themselves as being something special, truly different from other animals, but at the molecular level it is obvious that we are very similar to the chimpanzee species. The average DNA sequence difference between human and chimpanzee is only 1.24% [7] and probably only 0.5% in active coding regions [9]. In the light of this data, one could ask the question: what genetic differences separated us from apes and made us human? So far, only a few significant human-specific genomic features have been identified (reviewed in [8]). Eighteen of the 23 pairs of modern human chromosomes are virtually identical to chimpanzee chromosomes, with the most significant differences between the human and chimpanzee karyotypes being a telomeric fusion of chimpanzee chromosomes 12 and 13 to form human chromosome 2, and pericentric inversions in several chromosomes [30]. Among known human-specific differences in gene coding regions, the most important is the inactivation of the human CMP-Nacetylneuraminic acid (CMP-Neu5Ac) hydroxylase gene, which leads to an absence of CMP-N -glycolylneuraminic acid (CMP-Neu5Gc) on the surface of all human cells and, consequently, an increased quantity of CMP-Neu5Ac [10]. However, as early as 1975, King and Wilson concluded [15]: 'their (human and chimpanzee) macromolecules are so alike that regulatory mutations may account for their biological differences' and it is most probable that the differences between human and chimpanzee could be caused by differences in the regulatory systems of their genomes. To date, there are only a few examples of differences in regulatory regions between these species [8,12]. Among the best candidates that could play a role in generating such differences in regulatory regions are transposable elements and, of these, especially retroelements (REs).

Retroelements -Endogenous retroviruses
Retroelements are mobile elements that transpose via an RNA intermediate. There are three main groups of REs: long interspersed elements (LINEs), short interspersed elements (SINEs) and long terminal repeat (LTR) elements. Endogenous retroviruses (ERVs), which are members of the LTR elements, are the most complex REs. They are widespread throughout vertebrates and the number of ERVs in a haploid genome could be between tens of copies and several tens of thousands of copies [11]. A full-length provirus consists of three major genes, gag, pol and env, and is flanked by LTRs ( Figure 1A). LTRs contain many regulatory sequences ( Figure 1B), such as promoter, enhancer, polyadenylation signal and factor-binding sites.
REs could influence gene regulation by expressing their retroviral genes, inducing genomic rearrangements, providing new regulatory sequences or simply by disrupting gene functions. The involvement of REs in the regulation of gene expression has been demonstrated in a number of studies [4,18,24,25].
It is universally recognized that ERVs are the remnants of exogenous retroviral germ cell infections [4,18,24,25]. After invasion of the host cell, viral RNA is converted into cDNA, which then integrates into the host genome. Over time, many ERVs have disappeared from host genomes due to homologous recombination between two LTRs (generating solitary LTRs), as demonstrated by the lower number of full-length proviruses than solitary LTRs in modern vertebrate genomes.

Human ERVs
Over 41% of the human genome is represented by retroelements; 13% LINEs, 20% SINEs and 8%  polyadenylation signal, enhancer core and putative factor-binding sites marked. TBF, TATA-box binding factor; CBF/NF1, CCAAT binding factor/nuclear factor 1; HRE, hormone-response element; C/EBP, core/enhancer binding protein; NFκB, nuclear factor κB; YY1, yin yang-1 LTR elements [16]. Most of the LTR elements are from ERVs. Human ERVs (HERVs) are specific for primate genomes. Their expression has been found in almost every human tissue and organ, including placenta and embryonic tissues, different tumours, lung and kidney [18,23,25]. HERV transcripts commonly contain many mutations in their ORFs and do not code for any functional proteins. However, some proviruses have intact ORFs, which are evidenced by the presence of retroviral proteins or the detection of their enzymatic activities in some human tissues [2,3,18,26]. Furthermore, in addition to retroviral proteins, virus-like particles (VLPs) have been found in human tissues, which were shown to be able to bud from the cell membrane, but to be unable to infect cells [18]. Recently, Mi et al. [22] showed that the HERV-W Env protein (also called syncytin) expresses in placenta and takes part in the syncytiotrophoblast formation. Since syncytin has been detected only in the primate lineage and not in other mammals, it could explain differences in placental biology between primates and other mammals. It is not known whether HERVs retained a transpositional capability after the divergence of the human and chimpanzee evolutionary branches, but the presence of human-specific HERVs suggests that some of them were still active.
All known human-specific HERVs belong to HML-2 subfamily of the HERV-K family. This is one of the largest ERV groups in the human genome, represented by about 170 full-length proviruses [27] and 2000 solo LTRs [20]. It has been suggested that HERV-K (HML-2) is the most biologically active retroviral group in the human genome and contains many young members. Some of them are human-specific and, because of their regulatory potential, they could have influenced the expression of adjacent genes and thus contributed to human evolution [2,17,20,28].
Recently, in our laboratory two different methods for a whole-genome comparison of integrations of interspersed repeats between closely related genomes were invented and applied to the genomewide identification of human-specific HERV-K LTRs [5,19]. Using one of these methods (called targeted genomic differences analysis; TGDA) we found 23 new human-specific LTR members and estimated a lower limit of the total number of them as 67 [5]. Applying another method, known as DiffIR (differences in integration sites of low and medium copy number interspersed repeats), led to the discovery of 11 new human-specific LTR members [19]. On the basis of known humanspecific LTR sequences, we created a consensus sequence of the evolutionarily young HERV-K (HML-2) LTRs and searched for similar sequences in the human genome databases. We found ∼ 140 LTRs with 97-100% identity to the consensus and checked 19 selected LTRs for their presence in human and non-human primate genomes. Seventeen of these 19 LTRs were human-specific. Since only ∼ 90% of the human genome sequence was available at that time, we concluded that the total number of human-specific HERV-K (HML-2) LTRs could be about 140 [6]. We have also shown that there were at least three active groups of HERVs after the divergence of the human and chimpanzee evolutionary branches. These are HERV-K (HML-2), with LTRs of groups II-T, HSa and HS-b, and they are represented by 1, 89 and 53 copies in the human genome, respectively [6].

The effects of HERVs on gene regulation and their potential contribution to human evolution
Many of the identified human-specific LTRs are located in promoter or enhancer regions, as well as in introns of known or candidate genes [6], e.g. one such LTR is situated in the second intron of the cbf2 gene (CCAAT-binding factor). Cbf takes part in regulating the expression of many genes, which are involved in various cell processes, such as heat-shock activated genes. Another human-specific LTR is located ∼ 6 kb upstream of the transcription start of the fntb gene (β subunit of CAAX box-farnesyltransferase). Fntb is required for protein farnesylation, which facilitates protein-membrane association and also promotes protein-protein interactions. There are at least 30 known genes that are co-localized with humanspecific LTRs, and several other candidate genes; examples are: ppm1G (protein phosphatase 1G); mmp24 (matrix metalloproteinase 24); and il23a (interleukin 23, α subunit) [6]. Changes in the expression of such proteins would become apparent, even at the organism level. The identified human-specific LTRs could have influenced the expression of these genes during the process of human and chimpanzee divergence and contributed to human evolution. To date, human-specific LTRs have been detected in the genomes and transcriptomes of various human cancer cells [14,29].
Analyses of individual non-human-specific HERV members have shown their ability to affect the regulation of human genes. Jurka and Kapitonov [13] analysed the leptin receptor (OBR), which is involved in energy expenditure, production of sex hormones and other important biological processes, and found two alternative forms (short and long) of expressed protein. The short form is generated as a result of alternative splicing within a HERV-K (HML-2) LTR and, moreover, this LTR encodes 67 terminal leptin receptor amino acids. Medstrand and Mager analysed the effect of two HERV-E LTRs, one of which had integrated into the 5 flanking region of the apoC1 (apolipoprotein C-I) gene and the other into the 5 flanking region of the ednrb gene (endothelin B receptor), on the expression of the adjacent genes [21]. They showed that apoC1 and ednrb have alternative promoters within the LTRs, and that transcription from the alternative endrb promoter is much stronger than from the native promoter. Transcription from the apoC1 LTR promoter is equal to that from native promoter, but the presence of the LTR increases the apoC1 promoter activity in human and baboons (in this case the LTR plays the role of an enhancer). Recently, we have demonstrated promoter and enhancer activity of human-specific LTRs in reporter gene assays [1]. However, further analysis is needed to verify the hypothesis that ERVs have played a role in human evolution.
To determine the significance of ERVs in human evolution we need to analyse not only young ERV members, but also old ones. Some old members, which existed in all primates and retained transpositional activity after the human-chimpanzee divergence, could have transposed to another locus in the human genome, but not in the chimpanzee genome. Even if such an ERV did transpose in the chimpanzee genome, it is almost impossible that it would transpose to the same site as in the human genome. Furthermore, as a result of mutations, inactive (or silent) proviruses could become transcriptionally (or even transpositionally) active. Thus, old ERVs could express their proteins, form VLP or transpose. Old solitary LTRs could acquire a new functional capacity due to specific mutations in the human genome, e.g. they could become new promoters or enhancers, or alternative splice sites, or factor-binding sites could appear within their sequence. Recently, Schon et al. found a subgroup of HERV-W LTRs, which have mutations between the CCAAT-box and TATA-box forming the consensus sequence of the Sp1 binding site [23].

Conclusion
Retroviral LTRs contain various regulatory sequences in a compact state. The appearance of such elements in a host genome could dramatically change the expression of adjacent genes or even inactivate them. In some cases, the effects of these integrated elements could be of benefit to the host cell. In this way, HERVs could have been an impulsive force in the divergence of human and chimpanzee.