Many commonly used models of molecular evolution assume homogeneous nucleotide frequencies. A deviation from this assumption has been shown to cause problems for phylogenetic inference. However, some claim that only extreme heterogeneity affects phylogenetic accuracy and suggest that violations of other model assumptions, such as variable rates among sites, are more problematic. In order to explore the interaction between compositional heterogeneity and variable rates among sites, I reanalyzed 3 real heterogeneous datasets using several models. My Bayesian inference recovers accurate topologies under variable rates-among-sites models, but fails under some models that account for compositional heterogeneity. I also ran simulations and found that accounting for rates among sites improves topology accuracy in compositionally heterogeneous data. This indicates that in some cases, models accounting for among-site rate variation can improve outcomes for data that violates the assumption of compositional homogeneity.

Recent phylogenetic studies have explored the effect of compositional heterogeneity on phylogenetic methods. Compositional heterogeneity can arise in a dataset as a result of nonstationary evolution (when the substitution pattern is not uniform across an evolutionary tree). If two nonsister subtrees have similar substitution bias, this can lead to a convergence in nucleotide composition (CNC). The taxa may then look similar due to convergent evolution rather than common ancestry, which can mislead phylogenetic analysis. There are several methods to detect and quantify the level of compositional heterogeneity in a dataset, including chi-squared tests (e.g., [

Another commonly studied modeling question is the variation of substitution rates among sites. It has been established that accounting for among-site rate variation is important in phylogenetics [

Several studies, including those mentioned above, have suggested that violations of the assumption of constant rates among sites are more problematic than that of compositional homogeneity [

In order to further explore the interaction between base compositional heterogeneity and among-site rate variation, I selected 3 recent empirical datasets that exhibit base compositional heterogeneity and re-analyzed them using both gamma-distributed rates models (+G and +I) and constant rate models. I also ran a simulation study to see if accounting for variable rates among sites helps infer accurate topologies from compositionally heterogeneous data.

Gruber et al. [

In another study, Ho and Jermiin [

Mallatt et al. [

In each of these 3 studies, the authors found significant compositional heterogeneity in a dataset. They ascribed failure of many phylogenetic methods to the confounding signal of the convergent nucleotide composition, but the first 2 reports did not thoroughly explore models that account for among-site rate variation. In the third study, Mallatt et al. [

To elucidate the effects of among-site rate heterogeneity in datasets with compositional heterogeneity, I re-analyzed these 3 datasets using Bayesian phylogeny software MrBayes version 3.1.2 under several different evolutionary models. My results show that when accounting for among-site rate heterogeneity, Bayesian inference helps in each study; in each case, I find the GTR+I+G, GTR+I and GTR+G models to outperform the GTR model alone.

I used the data matrix of Gruber et al. [

In order to more thoroughly explore this interaction, I constructed a simulation test using p4 [

In all 3 cases, I find that accounting for among-site rate variation with the GTR+I+G model in Bayesian inference (as implemented in MrBayes) recovers the expected relationships, despite the model violation of compositional heterogeneity. Assuming the “expected” relationships are actually the correct relationships (which seems reasonable in these datasets), in these datasets, the GTR+I+G model is robust to this assumption violation. This is not to say that the GTR+I+G model is the “correct” model for the data; on the contrary, the demonstrated heterogeneity violates the assumptions of the model. However, it does indicate that the model is robust to this violation, at least to the extent that it yields the accepted relationships. In this case, I have not considered the branch lengths, which are more difficult to assess. To conclude, in these 3 datasets, it is not necessary to account for nonstationary evolution to recover the established relationships. It is necessary, instead, to account for among-site rate variation.

For the Gruber et al. dataset, the Bayesian analysis under models GTR+I+G, GTR+I, and GTR+G all converged on an identical topology with the expected clade grouping (Figure

Dataset of Gruber et al. [

Dataset of Gruber et al. [

For the dataset of Ho and Jermiin [

Dataset of Ho and Jermiin [

HKY

GTR

GTR+I+G

According to the results of Mallatt et al. [

Dataset of Mallatt et al. [

GTR+G+I

GTR

In all of these datasets, the authors demonstrated significant deviation in base frequency; however, the results of the Bayesian analysis show that nonstationarity is not the prevalent model violation in these datasets. Rather, using the GTR+I+G model in MrBayes to account for among-site rate variation is sufficient to recover the more accurate result. In fact, accounting for either invariable sites (+I) or gamma-distributed rate variation (+G) alone was sufficient to recover the correct topology (in the dataset of Ho and Jermiin, they both gave increases to the posterior support of the correct grouping). However, using both invariable-sites and gamma-distributed rates consistently gave the best results (as reflected in higher posterior probabilities for “correct” clades). These two models account for positional rate variation in a similar manner [

In each case, there is compelling evidence for nonstationary evolutionary rates, but in each case, correct relationships are only recovered with high support under models that account for among-site rate variation. These examples appear to support the conclusion of Conant and Lewis [

The results of my simulations are consistent with the results from the empirical data. I find that the GTR+G model outperforms the GTR model only on the data with significant compositional heterogeneity (Figure

Simulation model and results. (a) Tree from which the sequences were simulated. The colored branches show the different composition vectors, which drove the simulated sequences toward a convergence in nucleotide composition. I ran this simulation with 3 different levels of bias; each level had 100 simulation replicates. (b) Histograms showing the distribution of Robinson-Foulds distances (RF distance) between the maximum likelihood tree and the true tree. The maximum likelihood tree was generated by PhyML for 100 simulated replicates for each of the 3 parameter settings (high, low, and no CNC). The difference between the models (GTR versus GTR+G) grows as the level of CNC increases (**: highly significant; *: marginally significant; NS: not significant). In the simulation with no CNC, the GTR model performs as well as the GTR+G model. (c) Boxplot showing the distribution of bias introduced by the different composition vectors, under each of the 3 simulation parameter settings.

How could accounting for among-site rate variation contribute to correct inferences in heterogeneous data? If the model is able to assign the sites with convergent signal to the highest rate categories, these sites would be downweighted in the likelihood and contribute less to the tree inference. In my simulation study, there was a clear trend; the most convergent sites were those that were assigned to the highest rate category. The +G model improves the results by assigning these sites a higher substitution rate. In this way, accounting for among-site rate variation is able to contribute to reducing the effect of compositional heterogeneity.

The importance of accounting for among-site rate variation shown here is not new; however, its importance has perhaps been forgotten as recent studies focus on other issues. There has been growing interest in nonstationary evolution and how it affects phylogenetic methods [

Thanks to authors of the original data for providing their original data matrices. The author would also like to thank Hojun Song and Michael Whiting for their guidance and advice, and the BYU Department of Biology for computational resources.