Comparative Analysis of Context-Dependent Mutagenesis in Humans and Fruit Flies

In general, mutation frequencies are context-dependent: specific adjacent nucleotides may influence the probability to observe a specific type of mutation in a genome. Recently, several hypermutable motifs were identified in the human genome. Namely, there is an increased frequency of T>C mutations in the second position of the words ATTG and ATAG and an increased frequency of A>C mutations in the first position of the word ACAA. Previous studies have also shown that there is a remarkable difference between the mutagenesis of humans and drosophila. While C>T mutations are overrepresented in the CG context in humans (and other vertebrates), this mutation regularity is not observed in Drosophila melanogaster. Such differences in the observed regularities of mutagenesis between representatives of different taxa might reflect differences in the mechanisms involved in mutagenesis. We performed a systematical comparison of mutation regularities within 2–4 bp contexts in Homo sapiens and Drosophila melanogaster and found that the aforementioned contexts are not hypermutable in fruit flies. It seems that most mutation contexts affect mutation rates in a similar manner in H. sapiens and D. melanogaster; however, several important exceptions are noted and discussed.


Introduction
The average rates of point mutations in multicellular eukaryotic genomes are usually between 10 −7 and 10 −10 mutations per nucleotide per generation [1,2]. However, the rates of point mutations may be dramatically altered by their genomic context. In some cases, this context-dependent change in mutation frequency can be attributed to known molecular mechanisms involved in mutagenesis. For example, the increased frequency of C>T mutations in the word CG in humans (and other vertebrates) is attributed to the methylation of cytosines by context-specific DNA methyltransferases [3]. This mutation regularity is absent in D. melanogaster [4], in which cytosine methylation occurs, but appears to be restricted to early embryonic development and is not specific to cytosines followed by guanines [5]. Many other examples of context-dependent mutagenesis have been reported [4,[6][7][8][9].
Recently, an increased rate of T>C mutations in the second position of the words ATTG and ATAG and an increased rate of A>C mutations in the first position of the ACAA word were reported in the human genome [10]. This was achieved by calculating the values called "minimal contrast" and "mutation bias" for 2-4 bp mutation contexts to evaluate if the addition of specific nucleotides to the 5 or 3 end of 1-3 bp words increases the probability of observing certain mutations in fixed positions. Mutation bias indicates the total excess (or deficiency) of mutations within a given context. Minimal contrast indicates the excess (or deficiency) of mutations within a given context that cannot be explained by the excess (or deficiency) of mutations in one of its subcontexts.
H. sapiens and D. melanogaster are perspective model organisms for this kind of studies because of the vast amount of data on genetic variation that is available for them. The goal of our study was to compare the mutation regularities of H. sapiens and D. melanogaster in terms of "minimal contrast" and "mutation bias. "

Methods
We searched for single nucleotide variable positions in intergenic sequences of 37 individual D. melanogaster genomes (multiple alignments obtained from http://genome.ucsc.edu/ [11]

Mutation Data.
We assume that a mutation with a known direction within a known context has occurred in a specific position of the D. melanogaster genome if the following conditions are met.
(1) D. sechellia and D. erecta genomes have the same nucleotide aligned to this position (this nucleotide will be referred to as the "ancestral nucleotide").
(2) Among the 37 D. melanogaster genomes, some contain the ancestral nucleotide in this position, while some other genomes contain a different nucleotide.
(3) Only 2 genetic variants are present in this position for the 37 D. melanogaster genomes.
(4) The 3 bp upstream and downstream positions from these positions in the multiple alignment do not contain any substitutions or gaps.
Mutation bias and minimal contrasts for D. melanogaster were calculated for 2-4 bp mutation contexts using the methods described in [10]. Mutation bias, contrasts, and other data for H. sapiens were taken directly from [10].
Here, {mut|pos,W} and {mut|pos ,W } are the conditional probabilities of observing mutation mut in the position pos of the word W and position pos of word W , respectively, in a given dataset. Although these probabilities cannot be explicitly calculated without assumptions of the general probability of mutation per nucleotide in the genome, their ratio can be estimated by the following formula: Here, W and W are the observed frequencies of words W and W , respectively, among all words of the same length. We did not study discontigvous contexts such as CNG and CNNG.

Mutation Bias.
For any context {mut | pos, W}, there exists only one subcontext {mut | pos , W } such that the length of W is equal to 1 (i.e., W is the one-letter word consisting of the mutated letter). The mutation bias is the contrast of the given context and this subcontext.

Word Frequencies.
We used two measures of D. melanogaster word frequencies. The first measure was obtained using complete aligned sequences of 37 D. melanogaster, the D. sechellia, and D. erecta genomes. For the second measure, we used conserved regions in which the ancestral nucleotide matches at least one of the D. melanogaster genetic variants, and no gaps or unread sequences are present in the multiple alignment. Word frequencies from the conserved regions were used for calculating mutation biases and contrasts.

Results and Discussion
The nucleotide composition of complete alignments and conserved regions (see Section 2) of D. melanogaster were similar (Table 1). We decided to use word frequencies within conserved regions of D. melanogaster for calculations of contrast and mutation bias.
Previous studies have shown that the representation of mutation data on a plot of mutation bias versus minimal contrast is useful for identifying important mutation contexts [10]. Mutation bias and minimal contrasts of mutation contexts in D. melanogaster are shown in Figure 1. The {A>C | 2,CACC} and {A>C | 3,CCA} mutation contexts have the highest minimal contrast values in D. melanogaster. Interestingly, the addition of C or G nucleotides to either end of the word CCA increases mutation bias of the A>C mutation, while the addition of A or T nucleotides to these words decreases mutation bias.
As shown in Table 2, mutation patterns differ between D. melanogaster and H. sapiens at the single nucleotide scale: D. melanogaster has a lower transition/transversion ratio. Moreover, the G>T (C>A) transversion in D. melanogaster comprises a much larger fraction of mutations than the A>G (T>C) transition, which is consistent with previous findings [4].
One of the mechanisms by which G>T (C>A) transversions occur is through the formation of 8-Oxoguanine [12] caused by reactive oxygen species [13] or ultraviolet irradiation [14]. In eukaryotes, the damaged DNA is repaired with the help of DNA glycosylase OGG1. This enzyme removes the 8-oxoguanine, forming a DNA apurinic-apyrimidinic site, which is then recognized by other proteins of the DNA repair system. If further reparation does not occur, the apurinicapyrimidinic site will be complemented with an adenine nucleotide during DNA replication, resulting in a C>A mutation. Another protein with DNA glycosylase activity for 8-hydroxyguanine, called dOgg1, was also described in D. melanogaster [15].
Another factor that might be responsible for increased G>T (C>A) transversion rates in D. melanogaster is aflatoxin B1. Aflatoxin B1 is known to induce base substitutions in DNA [16,17], especially G>T (C>A) transversions. It is a product of a fungus from the Aspergillus genus, which grows on fruits and grains in a humid climate; thus, it is quite possible that D. melanogaster is exposed to this toxin.    {T>C | 2, ATAG}, and {A>C | 1, ACAA} mutation contexts appear to have excessive mutation frequencies in H. sapiens but not in D. melanogaster. Interestingly, the CAATT sequence (contains the ATTG word on the reverse strand) appears to be a mutation hotspot for the human DNA polymerase eta [18]. Also, the CCAAT (contains the ATTG word on reverse strand) motif is a known target site for enhancer-binding proteins [19]. The increased number of ATTG>ACTG mutations might be partially due to selection against enhancers sequences in nontranscribed regions of the genome.
On the other hand, several mutation contexts seem to have increased mutation bias in D. melanogaster. The differences between different mutation contexts in D. melanogaster and H. sapiens are shown in more detail in Figure 3.
In a previous study, we compared the over-and underrepresentation of 1-7 bp nucleotide words in the genomes of 139 complete eukaryotic genomes, including H. sapiens and D. melanogaster [20]. Table 3 contains a part of this comparison for several words in H. sapiens and D. melanogaster related to the previously discussed mutation contexts. The word CG has a strong underrepresentation in H. sapiens (by 76.37% from the expected genomic frequency) while in D. melanogaster it is only slightly underrepresented (by 5.93% from the expected genomic frequency). The derived word TG is overrepresented by 20.1% and by 10.67% in H. sapiens and D. melanogaster, respectively. The {C>T | 1,CG} mutation context seems to be the only example of a mutation context that has remarkably affected the genomic word composition in H. sapiens compared to D. melanogaster. The absence of such effects for words related to other mutation contexts might be due to us not taking into account the rates of other mutations in these words or mutations that produce these words.

Conclusions
The regularities of mutagenesis are different in D. melanogaster and H. sapiens. However, these differences may be attributed to a rather small number of mutation contexts that behave in a different manner in these two species. of C>T mutation in the word CG in H. sapiens. This is probably explained by the fact that human germline methylation is abundant and CpG specific, while D. melanogaster is not. Third, there is an increased frequency of T>C mutations in the second position of the words ATTG and ATAG and an increased frequency of A>C mutations in the first position of the ACAA word in H. sapiens but not in D. melanogaster. And finally, there is an increased A>C mutations rate in {A>C | 2, CACC} and {A>C | 3, CCA} mutation contexts in D. melanogaster but not in H. sapiens.