Structure of Low-Lying Excited States of Guanine in DNA and Solution: Combined Molecular Mechanics and High-Level Coupled Cluster Studies

High-level ab-initio equation-of-motion coupled-cluster methods with singles, doubles, and noniterative triples are used, in conjunction with the combined quantum mechanical molecular mechanics approach, to investigate the structure of low-lying excited states of the guanine base in DNA and solvated environments. Our results indicate that while the excitation energy of the first excited state is barely changed compared to its gas-phase counterpart, the excitation energy of the second excited state is blue-shifted by 0.24 eV.


INTRODUCTION
Despite their high numerical overhead, the high-level abinitio methods continue to provide strong evidence for a proper inclusion of the excited-state correlation effects in biomolecular systems. The methods such as the CASPT2 [1], EOMCCSD [2][3][4], CC2 [5], or CR-EOMCCSD(T) [6][7][8] slowly find their way into the area of large scale molecular simulations, which until recently have been reserved exclusively for less accurate ab-initio approaches. In our opinion there are several reasons for this important progress as follows. (1) The level of sophistication reached by current ab-initio methods provides a reliable description of complex electronic structures of excited states. (2) The use of massively parallel computers and scalable ab-initio codes such as NWChem [9] significantly reduces the time to solution.
(3) Multiscale approaches that integrate ab-initio methods with coarse-grained representations such as molecular mechanics enable a realistic description of large scale systems at finite temperature and pressure conditions. All these factors play an equally important role in providing a reliable understanding of the excited-state processes in biologically relevant molecular systems.
In our previous work [10], we have shown that integration of high-level ab-initio methods with molecular dynamics simulations can provide an efficient and accurate way to study the thermodynamics of excited states. The success of this methodology ultimately depends on the efficiency of the ab-initio treatment of the excited state surfaces. This is the subject addressed in this work. This article reports the results of our studies on the guanine in the native DNA environment and in aqueous solution. The photochemistry of this system, as well as that of other nucleic acids, plays a critical role in understanding the photostability of DNA; the process that prevents all living organism from possibly lethal mutations caused by the UV irradiation (for more details see [11][12][13][14][15][16][17]). First, let us focus on the highly correlated component of QM/MM formalism, which embraces not only the well-established completely renormalized equation-of-motion coupled cluster approach with singles, doubles, and noniterative triples EOM-CCSD(T) (CR-EOMCCSD(T)) approach [7], but also its recent modification: the reduced CR-EOMCCSD(T) approach (r-CR-EOMCCSD(T)) [18]. The r-CR-EOMCCSD(T) approaches put emphasis on the noniterative coupling of triply excited components of cluster/excitation operators with the equations for singles and doubles, which leads to the structural similarities of these corrections to the ground-state CCSD(T) approach [19]. At the same time, the reduced variants provide a lot of flexibility in approximating triply excited effects. The standard completely renormalized approach [7,8] can be derived from the excited-state extension of method of moments of coupled-cluster equations (MMCC) [6]. The form of the CR-EOMCCSD(T) correction to the equation-of-motion with singles and doubles (EOM-CCSD) energies takes a simple form where T CCSD = T 1 + T 2 is the CCSD cluster operator and R CCSD K = R K,0 + R K,1 + R K,2 is an EOMCCSD excitation operator. Usually, the trial wavefunction |Ψ K (3) is defined as where the T 1 (singly-) and T 2 (doubly-excited) cluster operators are obtained in the CC with singles and doubles calculations whereas components of excitation operator R CCSD K for the Kth state, R K,0 , R K,1 , and R K,2 , are obtained from the EOMCCSD calculations. The P and Q i (i = 1, 2, 3) operators refer to the projection operators onto the reference function |Φ and spaces spanned by i-tuply (i = 1, 2, 3) excited configurations, respectively. As always, triply excited moments of the EOMCCSD equations are denoted as M EOMCCSD 3 is an approximation to the exact EOMCC with singles, doubles, and triples (EOMCCSDT) R K,3 operator.
The reduced variant of the CR-EOMCCSD(T) correction can be derived form the MMCC-type functional: where T 3 is an approximation to the CCSDT triply excited cluster operator T 3 . The values of the Γ functional with |Ψ K ≈ |Ψ K (3) approximating the exact Kth state can be interpreted as a correction (δ K (2, 3)) to the EOMCCSD energies due to triples. Once the |Ψ K (3) is replaced by the exact wavefunction of the Kth state, this functional gives us a difference between the exact (E K ) and EOMCCSD (E EOMCCSD K ) energies. From now on the denominator on the right-hand side of (3) will be denoted as D K . The above expression can be decomposed into four components: where The χ (3) operator is defined as e T3 |Φ = (1 + χ (3) )|Φ . There are a few legitimate steps that can simplify the form of (4) as follows. (1) The Γ IV (Ψ K (3)) is simply equal zero. (2) Since the T CCSD and R CCSD K operators satisfy the CCSD/EOMCCSD equations, the Γ I (Ψ K (3)) term can be written as (3) Let us consider Γ I (Ψ K (3)) term and components of the Γ II (Ψ K (3)) and Γ III (Ψ K (3)) corresponding to the terms Adding all these terms (i.e., Γ I (Ψ K (3)), Γ II,T (Ψ K (3)), and Γ III,T (Ψ K (3))) leads to the expression whose numerator takes the form of triply excited part of the EOM- 2 , and perturbatively approximated R K,3 , projected on the triply excited component of Ψ K (3)|. In contrast to the standard excited-state MMCC, we will assume that with approximate cluster/excitation amplitudes these equations are equal to zero, which is equivalent to assuming that the leading effect of triples is accumulated in terms that involve triply excited amplitudes in the equations for singles and doubles. Thus the two surviving terms are In the following, we will use its simplified form obtained by retaining only 1 in the factors standing next to Ψ K (3)| in the numerators of (9) and (10), and by simplifying the form of H CCSD , The form of r-CR-EOMCCSD(T) shown in the above expression reveals structural similarities with the popular CCSD(T) correction for the ground state. It also shares the same N 7 scaling of the CCSD(T) approach. The CR-EOMCCSD(T) approaches should be considered as a formalism for calculating entire excited-state potential energy surfaces (PESs). Several benchmarks clearly show that the topology of the CR-EOMCCSD(T) excited-state PESs is superior with respect to the EOMCCSD results. In calculating the vertical excitation energies, we use the simplified approach in which the r-CR-EOMCCSD(T) correction is directly added to the EOMCCSD excitation energy (see [10,20]), which is equivalent to neglecting the ground-state correlation effects due to triples. In some situations, this procedure may lead to overcorrect the exact excitation energies, however, our benchmark calculations for the ozone and Cl 2 O molecule clearly indicate that the accuracies offered by some variants of the r-CR-EOMCCSD(T) approach are within 0.3 eV of error for doubly excited states and 0.1 eV of error for singly excited states with respect to the EOMCCSDT vertical excitation energies [18]. As we have already shown for the cytosine molecule [10], the effect of surrounding environment resulted in significant blue shifts of the two lowest lying valence excited states (ππ and nπ ), thereby increasing the energy gap between these states. Since the doubly excited component of the second excited state in the thermally averaged calculations turned out to be more pronounced than in the gas-phase calculations, the use of high-level CC methods accounting for triply excited configurations was imperative (for the original development of the combined CC/MM formalism see [21][22][23][24]) . For this reason, in this review we apply the CR-and r-CR-EOMCCSD(T) methodologies in the context of our combined QM/MM formalism [10] to another DNA base, the guanine molecule, whose valence excited states (nπ and ππ ) were the subject of several theoretical studies involving high-level approaches (see, e.g., [25][26][27] and references therein). The QM/MM Hamiltonian used in our calculations, is optimized including static charges, but the complete linear response function of Christiansen and coworkers [28] has not been used because the QM charge density response is not included. In the absence of this term, a simple way to increase the accuracy of this approach is to include the first solvation shell within the QM part of the calculation. While this is computationally expensive, it is likely to be as accurate, if not more so, than treating the first solvation shell using polarizable force fields with more terms in the response function. The QM/MM variant based on the CCSD charge densities is under intensive development [29].
The first system considered in this work (see Figure 1) consists of a 12-mer fragment of B-DNA (3 -TCGCGTTGCGCT-5 ) solvated in a rectangular box (51 × 51 × 69Å) of SPC/E [30] water. The charge was neutralized by adding 22 sodium ions resulting in a total system size of 18060 atoms. For the initial optimization and equilibration steps the entire system was treated at the molecular mechanics level utilizing the Amber-type force field representation. The equilibration protocol consisted of warming up in stages (50 K increments) over the course of 60 ps of the classical molecular dynamics simulation. In the next step, the entire system was optimized within the QM/MM representation of the system where the guanine base capped with a hydrogen link atom was described by density-functional theory (DFT). The DFT representation in this step was based on the B3LYP approximation for the exchange-correlation functional utilizing the cc-pVDZ basis set [31]. In addition to the solvated DNA fragment, we have also considered a system where guanine residue was embedded in water solution represented by 30 A cubic box of 878 SPC/E water molecules. The geometry of guanine residue was maintained at the same configuration as that obtained from DNA optimizations. The surrounding waters were equilibrated to room temperature and then optimized utilizing the same protocol used for a DNA simulation. The two structures formed the basis for our excited state calculations using time-dependent density-functional theory (TDDFT), EOMCCSD, CR-EOMCCSD(T), and r-CR-EOMCCSD(T) approaches. These calculations utilized the quantum representation of guanine base using cc-pVDZ basis set [31]. The gas-phase calculation was also performed by removing all the classical particles from the system. The results of these calculations are presented in Table 1. Regardless of the environment, the addition of the CR-or r-CR-EOMCCSD(T) corrections to EOMCCSD excitation energies results in a lowering of the EOMCCSD vertical excitation energies (VEE) by around 0.3 eV. In all cases studied in this paper, the discrepancies between CR-and r-CR-EOMCCSD(T) do not exceed 0.02 eV. While the TDDFT method for the ππ state is in good agreement with the CR-and r-CR-EOMCCSD(T) results for all environments, the TDDFT and CR-EOMCCSD(T) results for the nπ state in water solution differ by as much as 0.51 eV. This reflects the general fact that TDDFT gap between ππ and nπ is much smaller than that of CR-EOMCCSD(T) or r-CR-EOMCCSD(T) ones. An analogous effect was found for the cytosine molecule [10]. It is also interesting to analyze the changes in excitation energies caused by the surrounding environment. One can easily see that the excitation energy of the ππ is practically insensitive to the environment. For example, the gas-phase and aqueous solution VEE difference for the r-CR-EOMCCSD(T) methods is as small as 0.06 eV (analogous difference for DNA environment equals 0.01 eV). The situation is completely different for the nπ VEE, where the gas-phase r-CR-EOMCCSD(T) (CR-EOMCCSD(T)) VEE is blue-shifted by 0.24 (0.23) and 0.37 (0.37) eV in DNA environment and aqueous solution, respectively. These changes are smaller when TDDFT approach is employed (0.20 and 0.19 eV, resp.). The CR-and r-CR-EOMCCSD(T) blue-shifts for water solution are in good agreement with results reported by Shukla et al. [26] although different basis set was used.
In the future studies, we are planning to use bigger basis sets for the quantum region as well as to use the CCSD/EOMCCSD charge densities in order to describe the interaction of the quantum part with the solution more adequately. We are also planning to study the impact of the solution on the excited-state potential energies. In particular, the location of conical intersections is of paramount importance.