Description of Weak Halogen Bonding Using Various Levels of Symmetry-Adapted Perturbation Theory Combined with Effective Core Potentials

The present work starts with providing a description of the halogen bonding (XB) interaction between the halogen atom of MH3X (where M = C–Pb and X = I, At) and the N atom of HCN. This interaction leads to the formation of stable yet very weakly bound MH3X⋅ ⋅ ⋅NCH complexes for which the interaction energy (Eint) between MH3X and HCN is calculated using various symmetryadapted perturbation theory (SAPT) methods combined with the def2-QZVPP basis set and midbond functions. This basis set assigns effective core potentials (ECPs) not only to the I or At atom directly participating in the XB interaction with HCN but also to the M atom when substituted with Sn or Pb. Twelve SAPT methods (or levels) are taken into consideration. According to the SAPT analysis of Eint, the XB interaction in the complexes showsmixed electrostatic-dispersion nature. Next, the accuracy of SAPT Eint is evaluated by comparing with CCSD(T) reference data. This comparison reveals that high-order SAPT2+(3)method and the much less computationally demanding SAPT(DFT) method perform very well in describing Eint of the complexes. However, the accuracy of these methods decreases dramatically if they are combined with the so-called Hartree-Fock correction.


Introduction
Weak intermolecular interactions play an important role in establishing the structure and stability of a broad range of chemical systems, from simple molecular complexes to macromolecular assemblies [1][2][3].Computational methods based on electronic structure calculations are useful in determining the magnitude of weak intermolecular interactions as well as in understanding their nature [4][5][6].However, the ability to describe weak intermolecular interactions accurately and efficiently differs significantly between various families of computational electronic structure methods.Among these families, the symmetry-adapted perturbation theory (SAPT) [7][8][9][10][11] is capable of providing weak interaction energies accurate even up to 2-4% for benchmark systems of rare-gas dimers [9] but the computational cost of the corresponding calculations in principle scales like  7 with the system size.This makes such calculations very time-consuming, especially if they are carried out for large chemical systems.The high computational cost of SAPT calculations is associated, to a great extent, with the inclusion of intramolecular electron correlation effects.It is particularly true for the traditional formulation of SAPT which uses many-body electron correlation wave functions for molecules forming a complex [7].An alternative approach to handling the intramolecular electron correlation within SAPT is to utilize the density functional theory (DFT) for the molecular wave functions [12][13][14][15][16][17].The resulting SAPT(DFT) formulation scales better than the traditional formulation, and, therefore, the former can be applied to systems with hundreds of atoms [18].In order to adjust the computational demands of SAPT calculations to the size of a given system, various methods (or levels) have been introduced for the traditional SAPT formulation [10,19].In general, these levels differ in their scaling behavior through the omission of some energy correction terms from the SAPT expansion of intermolecular interaction energy.
Weak intermolecular interactions sometimes occur between molecular sites, of which one contains an atom of element with high nuclear charge.Halogen bonding (XB) 2 Journal of Chemistry [20] involving iodine constitutes a good example of such interactions (historically speaking, the XB formation in H 3 N⋅ ⋅ ⋅ I 2 was probably the very first observation of XB interaction [21]).From the computational point of view, describing weak intermolecular interactions involving heavier elements is a real challenge for electronic structure methods for two main reasons.Firstly, the large number of electrons in atoms of heavier elements makes electronic structure calculations very costly.Secondly, the incorporation of relativistic effects in the calculations is essential to obtain accurate energies of weak interactions involving heavier elements.These difficulties can be largely overcome by the use of effective core potentials (ECPs) for atoms of heavier elements (usually with the nuclear charge  > 36) [22].Within the ECP approximation the explicit treatment of core electrons of heavier atoms is replaced by introducing potential operators fitted to the results of relativistic allelectron calculations for such atoms.Therefore, the ECP approximation is able to implicitly account for leading relativistic effects at low computational cost.The use of ECPs has also been implemented for SAPT calculations of interaction energies [23].
In this work a systematic examination of the accuracy of a combination of SAPT and ECP for the description of weak XB is provided relative to the level of SAPT.The accuracy of a variety of SAPT methods is established across a series of 10 heterodimers containing MH 3 X (where M = C-Pb and X = I, At) complexed with HCN.These complexes exemplify the X⋅ ⋅ ⋅ N type of XB in which the halogen atom possesses a very high nuclear charge (53 and 85 for iodine and astatine, resp.).Thus, the ECP treatment of I-and At-atom core electrons is employed here for the SAPT description of the XB interaction in the MH 3 X⋅ ⋅ ⋅ NCH complexes.

Methods
The geometries of the MH 3 X⋅ ⋅ ⋅ NCH complexes have been optimized at the MP2/def2-QZVPP level of theory [24][25][26][27].These geometries are characterized by the absence of imaginary vibrational frequencies, and, therefore, they correspond to minima on the potential energy surfaces of the complexes.These geometries will also be kept fixed in all subsequent calculations of the interaction energy ( int ) between the MH 3 X and HCN parts of the complexes.The MP2 method has been combined with the resolution-ofthe-identity approximation [28].The def2-QZVPP basis set makes use of Stuttgart relativistic ECPs [26] to describe the core electrons of Sn, Pb, I, and At.These calculations have been carried out with the TURBOMOLE 6.6 program [29].
The term  HF int in (2) denotes the supermolecular Hartree-Fock interaction energy in the complex for which the SAPT interaction energy is calculated.The HF correction is a way to capture some higher-order terms not explicitly evaluated by SAPT.All SAPT results have been produced using the dimer-centered plus basis set (DC + BS) scheme, with the def2-QZVPP basis set extended with a set of midbond (mb) functions [33].The resulting basis set will be denoted by def2-QZVPP+mb further in the text.The set of mb functions is of the 3s3p2d2f form, with the exponents 0.9, 0.3, and 0.1 for  and p and 0.6 and 0.2 for  and .These functions are centered midway between the X and N atoms of the complexes.All SAPT calculations have been done using the SAPT 2012.2 [34] program interfaced with DALTON 2.0 [35].The SAPT(DFT) calculations have additionally been carried out with the SAPT(DFT) implementation available in the MOLPRO 2012.1 program [36], but these calculations do not end with successful reproduction of the results obtained with SAPT 2012.2.
Apart from the SAPT description of the XB interaction in MH 3 X⋅ ⋅ ⋅ NCH, additional supermolecular estimations of  int are provided in this work for comparison purposes.A number of popular wave function theory methods, mainly those belonging to the family of Møller-Plesset methods [24], will be taken into account.MP2 [37], SCS-MP2 [38], SCS(MI)-MP2 [39], MP2C [40], MP2.5 [41], MP3 [37], MP4(SDQ) [37], MP4 [37], and CCSD [42] are selected.The resolution-of-the-identity approximation is used wherever possible.The aforementioned methods are combined with the def2-QZVPP+mb basis set in order to calculate the magnitude of the XB interaction in the MH 3 X⋅ ⋅ ⋅ NCH complexes.The resulting  int values are corrected for the basis set superposition error by utilizing a counterpoise correction [43].All the supermolecular calculations of  int have been performed using TURBOMOLE 6.6.
The accuracy of SAPT methods is evaluated by comparing their  int values calculated for MH 3 X⋅ ⋅ ⋅ NCH with the corresponding reference data.The reference values of  int have been obtained from the CCSD(T) method [42] combined with the def2-QZVPP+mb basis set.The CCSD(T)/def2-QZVPP+mb values of  int are corrected for the basis set superposition error [43].Additionally, the CCSD(T)  int values extrapolated to the complete basis set (CBS) limit [44][45][46][47] will also serve as reference values.The CCSD(T)/CBS level of theory is commonly perceived as the "gold standard" for predicting reliable energies of weak intermolecular interactions [48].This level of theory was previously used to provide benchmark  int values in a set of 40 complexes that were stabilized by a variety of weak intermolecular interactions in which halogens participated [49].Details of the CCSD(T)/CBS calculations performed for MH 3 X⋅ ⋅ ⋅ NCH are given in Supplementary Section 3.  is oriented precisely along the M-X bond axis (see Figure 1).The values of the distance between the X and N atoms ( X⋅⋅⋅N ) for the complexes in their optimized geometries are listed in Table 1.These values are rather large ( X⋅⋅⋅N > 3 Å) but in the case of the complexes with X = I they are still shorter than the sum of the I-and N-atom van der Waals radii.This sum is equal to 3.70 Å if the van der Waals radii proposed by Alvarez [50] are taken.The fulfillment of the geometrical criterion formulated as  I⋅⋅⋅N < 3.70 Å suggests the existence of XB in the MH 3 I⋅ ⋅ ⋅ NCH complexes.Such a geometrical criterion cannot be applied to the complexes with X = At because the van der Waals radius of astatine is unknown [50].

Results and Discussion
The structure of the MH 3 X⋅ ⋅ ⋅ NCH complexes results from the highly directional ability of the X atom, covalently bonded to the M atom, to attract the N-atom lone electron pair of HCN, acting as a nucleophilic species in this case.A widely accepted mechanism of XB formation postulates that XB originates from the interaction between two centers possessing opposite electrostatic potentials [51].For the MH 3 X⋅ ⋅ ⋅ NCH complexes the center exhibiting a positive electrostatic potential is located on the X atom in the region on the extension of the M-X bond (i.e., the -hole), whereas the negative electrostatic potential characterizes the region around the lone electron pair of the N atom.The maximal value of electrostatic potential on the molecular surface delineated by the electron density contour of 0.001 au ( ,max ) in the -hole region on the X atom is a convenient tool for quantifying the -hole [52].The calculated values of  ,max are presented in Table 1.An increase in  ,max values is observed if the X atom changes from I to At, and thus it becomes heavier and more polarizable [52].Moreover,  ,max values increase with the growing electron-attracting character of the MH 3 group [53].The electron-attracting character can be represented by the so-called group electronegativity [54] which is calculated for the MH 3 groups (for details, see Supplementary Section 4).The electronegativity of the MH 3 groups increases gradually while going up group 14 from M = Pb to M = C, and, therefore, the most positive  ,max values are achieved for CH 3 X in either of the sequences of the MH 3 X⋅ ⋅ ⋅ NCH complexes with the X atom kept fixed.
The CCSD(T)/CBS  int values shown in Table 1 are negative for all MH 3 X⋅ ⋅ ⋅ NCH complexes.This implies their energetic stabilization and this also constitutes an energetic criterion for XB occurrence in the complexes.The values of  int fall in a narrow range from −0.55 to −2.36 kcal/mol.The XB interaction in MH 3 X⋅ ⋅ ⋅ NCH can be classified as weak in view of the fact that the strongest XB interactions reported in Table 1: Distance between the X and N atoms ( X⋅⋅⋅N ; in Å), maximal value of electrostatic potential in the -hole region on the X atom ( ,max ; in kcal/mol), CCSD(T)/CBS and SAPT(DFT) interaction energies between MH 3 X and HCN ( int ; in kcal/mol), and SAPT(DFT) principal interaction energy components ( elst ,  ind ,  disp , and  exch ; in kcal/mol) for the investigated MH 3 X⋅ ⋅ ⋅ NCH complexes.The percentage of each attractive SAPT(DFT) component relative to the total attraction is given in parentheses.The XB interaction becomes weaker and weaker when the M atom traverses down group 14 for two sequences of the complexes with the X atom kept fixed and varying M. The clear trend in  int is only roughly followed by a trend in  X⋅⋅⋅N for these sequences.These trends do not correlate with each other ideally but the monotonous weakening of the XB interaction is often accompanied by the gradual elongation of  X⋅⋅⋅N .It is noteworthy that  ,max correlates well with  int for all MH 3 X⋅ ⋅ ⋅ NCH complexes: the greater  ,max is developed in the -hole region, the stronger the XB occurs.

Complex
In order to establish the nature of the XB interaction in MH 3 X⋅ ⋅ ⋅ NCH, the energy correction terms from the SAPT(DFT) expansion of  int have been grouped into four principal components corresponding to electrostatics, induction, dispersion, and exchange.Details of the grouping scheme can be found in Supplementary Section 5.The values of the electrostatic ( elst ), induction ( ind ), dispersion ( disp ), and exchange ( exch ) components of  int obtained from SAPT(DFT) are listed in Table 1.The most important aspect of these results is the fact that  elst and  disp play the central role in the attraction between MH 3 X and HCN.These two components are responsible for at least 88% of the total attraction occurring in the complexes.Thus, the XB interaction in MH 3 X⋅ ⋅ ⋅ NCH shows mixed electrostatic-dispersion nature.This finding is in agreement with the analogical conclusion drawn in an earlier work taking various types of XB into account [58].It is interesting to examine trends in the attractive components for the sequences of the complexes with the X atom kept fixed.For such sequences the percentage of electrostatics decreases (and simultaneously the percentage of dispersion increases) while descending group 14 from M = C to M = Pb.This leads to the conclusion that the role of dispersion in the stabilization of the complexes is most significant for the weakest XB interaction. ind component does not change monotonously if M descends group 14.Moreover, the percentage share of  ind in the total attraction between MH 3 X and HCN remains practically constant for all 10 complexes.

Assessment of Various SAPT Methods.
In order to evaluate the performance of various SAPT methods in describing the XB interaction in the MH 3 X⋅ ⋅ ⋅ NCH complexes,  int values predicted by these methods are compared with CCSD(T) reference results in a statistical manner.This statistical comparison is based on three metrics of errors in the SAPT  int values.The root mean-squared error (RMSE), the percentage relative root mean-squared error (rRMSE (%)), and the mean signed error (MSE) are calculated using the following formulas: where  is the number of the complexes studied (M = 10).RMSE is used to characterize the overall performance of each SAPT method in predicting  int .Because  int values are small themselves and the same applies to the RMSE values, then rRMSE (%) should more conveniently and reliably illustrate the accuracy of the SAPT  int values relative to the corresponding CCSD(T) results.Finally, MSE provides information about systematic errors occurring in  int values obtained from each SAPT method.If MSE is negative (positive) for a given SAPT method, then this method tends to overestimate (underestimate) the strength of the XB interaction in the MH 3 X⋅ ⋅ ⋅ NCH complexes.The evaluation of SAPT accuracy has been performed for 12 SAPT methods.Table 2 lists these methods, together with the three mean errors determined for their  int values.Based on the RMSE values one can think that SAPT generally performs very well because RMSE is less than 1 kcal/mol for all methods.This is not necessarily true if rRMSE (%) is analyzed as well.rRMSE (%) is high for several SAPT methods, which indicates that the RMSE values are in fact quite large when compared to both the CCSD(T)/def2-QZVPP+mb and CCSD(T)/CBS values.It is justified to rank SAPT methods' performances relative to the rRMSE (%) values.From the magnitude of rRMSE (%) one can conclude that the SAPT2+(3) method provides the best approximation of the CCSD(T) reference results.As with SAPT2+(3), the SAPT(DFT) method reproduces perfectly the CCSD(T)/def2-QZVPP+mb  int energies.A slightly larger rRMSE (%) value is observed for SAPT2+3 but this value still does not exceed 10%.The values of MSE for the three methods indicate that SAPT2+(3) tends to slightly overestimate the strength of the XB interaction in MH 3 X⋅ ⋅ ⋅ NCH, while SAPT(DFT) and SAPT2+3 show the opposite tendency (the mean underestimation of  int reaches a maximum of 0.14 kcal/mol for SAPT(DFT)).Unsurprisingly, two regular high-order SAPT methods SAPT2+(3) and SAPT2+3 yield  int with very good accuracy.It is rather puzzling, however, that SAPT2+3 performs slightly worse than SAPT2+(3) while the reverse ordering of these two methods could actually be expected.This means that the inclusion of all third-order interaction energy correction terms does not lead to any improvement in  int for the MH 3 X⋅ ⋅ ⋅ NCH complexes.The SAPT2+(3) accuracy superior to that of SAPT2+3 is mostly due to the fact that the third-order induction and exchangeinduction terms are poorly recovered by SAPT combined with ECPs [23] (see also Supplementary Section 2).
From the results in Table 2 the effect of the HF correction on  int can be established.This correction collects mainly higher-order induction and exchange-induction terms but it may also contain some spurious, unphysical terms.It is evident that the HF correction adversely affects the accuracies of all SAPT methods in describing the weak XB interaction in MH 3 X⋅ ⋅ ⋅ NCH.Six SAPT methods utilizing this correction (the so-called hybrid approach) show many times larger RMSE values than the same methods without  (2)  HF int or  (3)  HF int .As illustrated by the negative values of MSE, the majority of SAPT methods tend to predict an overbinding between MH 3 X and HCN.Adding the HF correction results in further overestimation of the XB strength in the complexes.A similar overestimation of I⋅ ⋅ ⋅ O XB strength was also detected in a previous SAPT study of the CH 3 I⋅ ⋅ ⋅ OCH 2 complex [57].
The deterioration of SAPT accuracy due to the incorporation of  (2)  HF int or  (3)  HF int into  int is likely caused by a spurious, unphysical part of the HF correction.It is known that spurious, unphysical effects contribute considerably to the HF correction if a molecular complex is bound primarily by dispersion [59].This seems to be the case for the MH 3 X⋅ ⋅ ⋅ NCH complexes for which the dispersion component of their  int is always significant or even dominant (the percentage of  disp ≥ 40%; see Table 1).The simplest SAPT method, that is, SAPT0, outperforms both SAPT2 and SAPT2+, although the latter methods incorporate more energy correction terms, and, therefore, they are computationally more demanding.This superior accuracy of SAPT0 seems to be due to a fairly good compensation of energy correction terms appearing in the SAPT0 expansion of  int .It has been demonstrated in another recent work [19] that SAPT0 yields in many cases not much worse interaction energies compared to more advanced SAPT methods which account for intramolecular electron correlation effects.The combination of SAPT0, SAPT2, or SAPT2+ with  (2)  HF int fails badly in describing the XB interaction in MH 3 X⋅ ⋅ ⋅ NCH; rRMSE (%) of the resulting hybrid levels exceeds 50% relative to the CCSD(T)/def2-QZVPP+mb reference results.
It is instructive to compare the performance of the SAPT methods with the performance of a series of popular wave function theory methods that compute  int within the supermolecular approach.Statistical metrics measuring the accuracy of supermolecular  int values obtained from 10 wave function theory methods against the CCSD(T) reference data are presented in Table 3.Of the 10 methods, SCS(MI)-MP2, MP2C, and MP2.5 perform best.They afford very small RMSE and rRMSE (%) and these errors are even slightly smaller than those made by SAPT2+(3) and SAPT2+3.What is particularly important is that the SCS(MI)-MP2 calculations entail much lower computational cost (MP2 formally scales as  5 with the system size).The good performance of SCS(MI)-MP2 has actually been expected because this method has been specially designed for weak intermolecular interactions.Similarly, the MP2C and MP2.5 methods were also developed for providing accurate interaction energies in various molecular complexes.For the MH 3 X⋅ ⋅ ⋅ NCH complexes, these modifications of the original MP2 method do show a great improvement in the reproduction of  int over their ancestor.The performances of MP4 and CCSD are rather disappointing and these advanced methods yield less accurate values of  int than the regular high-order SAPT and SAPT(DFT) methods.According to the HF method, the interaction between MH 3 X and HCN destabilizes the MH 3 X⋅ ⋅ ⋅ NCH complexes, which disqualifies this method as a computational tool for describing weak XB interactions reliably.

Conclusions
A description of the X⋅ ⋅ ⋅ N XB interaction in a series of 10 MH 3 X⋅ ⋅ ⋅ NCH complexes (where M = C-Pb and X = I, At) has been obtained from various SAPT methods combined with the def2-QZVPP basis set and midbond functions.This basis set assigns ECPs not only to the I or At atom directly participating in the XB interaction with HCN but also to the M atom when substituted with Sn or Pb.The accuracy of SAPT  int has been evaluated by comparing with CCSD(T) reference data.The main findings made in this work can be summarized as follows: (1) The CCSD(T)/CBS calculations of  int reveal that the complexes are stabilized by a weak X⋅ ⋅ ⋅ N XB interaction whose values fall in the range between −0.55 and −2.36 kcal/mol.Astatine forms stronger XB interaction with HCN than iodine, by no more than 1 kcal/mol in each pair of the complexes with a given M. A relationship between  int and the kind of group 14 element substituting the M atom has been identified for two sequences of the complexes with the X atom kept fixed.The strength of the XB interaction decreases gradually when the M atom varies down group 14 from C to Pb.This is rationalized in terms of  ,max and MH 3 -group electronegativity.
(2) Grouping the terms from the SAPT(DFT) expansion of  int into four principal components indicates the mixed electrostatic-dispersion nature of the XB interaction in MH 3 X⋅ ⋅ ⋅ NCH.The XB weakening caused by the substitution of the M and X atoms clearly affects the percentage shares of  elst and  disp .This weakening is associated with the decrease of  elst percentage share and the simultaneous increase of  disp percentage share.
(3) Regular high-order SAPT methods such as SAPT2+(3) and SAPT2+3 provide XB description very close to that obtained at the CCSD(T)/def2-QZVPP+mb and CCSD(T)/CBS levels of theory.The accuracy of  int obtained from the computationally favorable SAPT(DFT) method is comparable to that found for SAPT2+(3) interaction energies.The HF correction present in the hybrid approach leads to a significant deterioration in the accuracy of the SAPT description of the XB interaction in the MH 3 X⋅ ⋅ ⋅ NCH complexes.

3. 1 .
Fundamental Properties of the Complexes.The geometry optimization of the complexes with the assumed X⋅ ⋅ ⋅ N XB interaction leads to structures in which the HCN molecule

Figure 1 :
Figure 1: Structure of the investigated MH 3 X⋅ ⋅ ⋅ NCH complexes.The optimized geometry of CH 3 I⋅ ⋅ ⋅ NCH is shown as an example.The XB interaction is marked by a dotted line.

Table 2 :
Errors (RMSE, rRMSE (%), and MSE; in kcal/mol or %) in  int obtained from various SAPT methods.The errors without parentheses have been calculated relative to the CCSD(T)/def2-QZVPP+mb results, while those put in parentheses are based on the CCSD(T)/CBS reference values of  int .

Table 3 :
Errors (RMSE, rRMSE (%), and MSE; in kcal/mol or %) in  int obtained from various wave function theory methods within the supermolecular approach.The errors without parentheses have been calculated relative to the CCSD(T)/def2-QZVPP+mb results, while those put in parentheses are based on the CCSD(T)/CBS reference values of  int .