Relationships between Kováts Retention Indices and Molecular Descriptors of 1-(2-Hydroxy)-3-Arylpropane-1,3-Diones

Experimental and theoretical results for retention index of a set of 20 beta-diketones are given. The quantitative structure-chromatographic retention relationships (QSRR) theory is employed and six molecular descriptors are chosen to compute the fitting polynomials. Multiple regression analysis yields satisfactory results when one resorts to several variables equations, instead of computing just one-variable formulae. Average absolute deviations from experimental results are rather low, which seems to point out the suitability of the present approach.

The purpose of this work is to extend the set of compounds, including other aryl and alkyl groups at the 3-position, as part of a more general study on the subject to evaluate the ability of the molecular descriptors method for predicting the Kováts retention indices (RIs).
Many quantitative structure-chromatographic retention relationships (QSRR) have shown that the chromatographic behavior can be predicted from molecular structure, which gives information about the different molecular properties that may participate in the interaction between molecules and the stationary phase in gas chromatography(GC) [30,31,32,33,34]. The RI concept, first proposed by E. Kóvats [35], has turned out to be a very useful tool for the presentation and interpretation of chromatographic data. The main advantage of RIs is the possibility of their precalculations by numerous methods [36] for comparison with experimental data from chromatographic and/or chromato-spectral methods of organic compound identifications.
The theoretical calculation of RIs is very important for the formation of GC databases[37] because the number of experimentally measured available values is not as large as the number of standard MS [38]. The key role in the realization of this possibility is the capability to attain precise RI values, so that the search for new methods of predicting RIs with a rather high degree of accuracy and suitable data quality control is very topical at present.
The aim of this work is to report new experimental values and theoretical calculations of Kóvats RIs for a set of 20 1-(2-hydroxyphenyl)-3-arylpropane-1,3-diones to test the possibility for an efficient and reliable way to obtain gas chromatographic data.
The paper is organized as follows: the next section deals with the experimental details and after that we present the theoretical framework of this study. Later on we present the calculation procedure and the numerical data. Finally, we discuss the results and analyze some of the implying derivations of this work, as well as some of the possible further extensions.

EXPERIMENTAL DETAILS
The molecules chosen for the present study have the general form given in Fig. 1.
The complete set of 20 molecules is given in Table 1.
The synthesis of all 1-(2-hydroxyphenyl)-3-α-and -β-naphthylpropane-1,3-diones has been reported elsewhere [27,28]. The other compounds were obtained following the same preparation procedure. Retention times were obtained with a 5890 model Series II Plus Hewlett-Packard gas chromatograph equipped with a HP 5MS silica fused capillary column (5% cross-linked phenylmethylsilicone) 30-m length, 0.25-mm i.d., and 0.25-µm film thickness and coupled to an HP 5972 A model mass selective detector. Samples were dissolved in chloroform and 2 µl of 0.01 M solutions were introduced in the injection port at 250°C (split mode) on N 2 carrier (constant pressure: 12.0 Psi). The oven column was maintained at 250°C as isothermal temperature program. In order to obtain the adjusted retention indices, the dead time was determined by injection of nitrogen gas.

THEORETICAL FRAMEWORK
QSRR are some of the most extensively studied expressions of linear free-energy relationships (LFER). These are statistically derived relationships between the structure of solutes and their chromatographic retention [36]. Using QSRR, the chromatographic column can be considered as a "free-energy transducer," translating the existing differences in chemical potentials of solutes resulting from differences in their structures to chromatographic RIs. If statistically significant QSRR are derived and if these equations approximate the experimental retention data for a structurally representative set of model solutes, it is possible to define the dominant factors that determine the interactions of solute molecules with the chemical entities forming the chromatographic system [39]. It means that QSRR analysis can provide an insight into the molecular mechanism of chromatographic retention in a given chromatographic system [40]. One can distinguish two central approaches to QSRR analysis. One approach employs-as independent variables in QSRR equations-the structural descriptors provided solely by the computational chemistry. With good QSRR equations with such descriptors, one can predict retention for any given structural formula. Besides, it is also possible to assign physical meaning to the more commonly used theoretical descriptors. In turn, this procedure facilitates interpretations of the mechanism of separation operating in a given chromatographic system [41].
Another approach to QSRR employs the LFER-based empirical solute parameters based on spectroscopic complexation, and dissolution scales.
In this work we have resorted to the first approximation, choosing a set of well-known molecular descriptors to predict RIs. Quantum-chemical methods and molecular modeling techniques enable the definition of a large number of molecular and local quantities characterizing the reactivity, shape, and binding properties of complete molecules as well as of molecular fragments and substitutes. Because of the large well-defined physical information content encoded in many theoretical descriptors, their use in the design of a training set in QSRR studies present two main advantages: 1. The compounds and their various fragments and substitutes can be characterized on the basis of their molecular structure only, and 2. The proposed mechanism of action can be directly accounted for in terms of the chemical reactivity of the compounds under study [42]. Consequently, the derived QSRR model will include information regarding the nature of the intermolecular forces involved in determining the physical property of the compounds in question.
The molecular descriptors chosen for computing RIs of the beta-diketones through the employment of QSRR relate as directly as possible to the key physical chemistry property studied here. They are Van der Waals-surface-bounded molecular volume (V), the log of the octanolwater partition coefficient (log p), molecular polarizability (α), solvent-accessible surface bounded molecular volume (SAG), molar refractivity (RM), and molecular mass (M). In a previous paper Katritzky et al. [43] have pointed out that charged partial surface area descriptors have been successfully combined with topological and geometrical descriptors to predict GC RIs of substituted pyrazines, polycyclic aromatic compounds, stimulants, and narcotic and anabolic steroids. The satisfactory results found in this and other similar studies encouraged us to continue with the use of such descriptors within the realm of the QSRR to analyze this GC system.

CALCULATION METHOD
Calculation of log p is carried out using atomic parameters derived by Ghose and coworkers [44]. Computation of molar refractivity was made via the same method as log p. Ghose and Crippen [45] have presented atomic contributions to the refractivity. The SAG and V calculations are based on a grid method derived by Bodor and coworkers [46], using the atomic radii of Gavezotti [47]. The polarizability was estimated from an additivity scheme given by Miller [48], where different increments are associated with different atom types. The six quantum-chemical descriptors were computed with the aid of the software ChemPlus [49] and the calculations were run in a Pentium PC with 1 GHz. We have made a complete regression analysis resorting to linear, quadratic, and cubic relationships in several variables and calculations were carried out by means of the Mathematica  software [50,51].
When predicting physical chemistry properties, it is important to make full use of the intrinsic advantages of the regression formulae. In fact, we previously verified that a satisfactory improvement of these relationships can be obtained via the simple resort of employing higherorder fitting polynomials as well as by choosing noninteger powers for the independent variables [52,53,54,55,56]. Nonlinear models may be fitted to data sets by the inclusion of functions of physical chemistry parameters in a linear regression model or by use of nonlinear fitting models.
Construction of linear regression models containing nonlinear terms is most often prompted when the data are clearly not well fitted by a linear model. A very common example in the field of QSRR involves nonlinear relationships with the hydrophobic descriptors, such as log p [57].
Nonlinear dependency of molecular properties on this parameter became apparent early in the development of the QSRR model and a first approach to the solution of these drawbacks involved fitting a parabola in log p [58]. Whatever the cause of such relationships, it is clear that nonlinear functions are required in order to model the physical chemistry data. An interesting feature in the employment of nonlinear functions is that it is possible to calculate an optimum value for the physical chemistry property under consideration [59,60].

RESULTS AND DISCUSSION
The results for the calculation of the 6 molecular descriptors corresponding to the 20 molecules given in Table 1 are displayed in Table 2.
The first set of fitting equations are those associated with the prediction of RIs vs. just one independent variable. The results are not really significant, as can be seen in Table 3 where we present the regression coefficients for the lineal, quadratic, and cubic equations, respectively.
Regression coefficients are rather poor and in some cases (i.e., log p) they are definitively unacceptable. The calculation of higher order one-variable fitting polynomials does not improve remarkably the results arising from the linear equations, save for the variables α and M.
When one resorts to several variables equations the results are quite satisfactory. In fact, as can be seen in the following typical equations, statistical data are acceptable. Here we report just some representative equations, and complete results are available upon request to E.A.C. at the above address.
where n is the number of molecules, r is the regression coefficient, and EV is the estimated variance.
In Table 4 we display some theoretical results applying several variable equations together with the experimentally determined RIs. The analysis of statistical results associated with Eq. 1 through 6 and the agreement between theoretical and experimental results shows that satisfactory predictions can be obtained when one appeals to several variable fitting polynomials.  Once again we verify that one can get quite satisfactory results by resorting to higher order polynomials, so that a valid procedure to take full advantage of fitting equations can be made. The low-average absolute deviations from experimental values derived from the above predictions are particularly notorious. In fact, they represent around 2 and 1.2%, respectively, of the average experimental RIs, which are rather small overall variations. Besides, it is interesting to point out that there are not "deviant" behaviors within the chosen molecular set, so that the present molecular descriptors seem to be suitable for the present purposes of predicting RIs.
Finally, we present a correlation matrix of collinearity of the chosen molecular descriptors in Table 5.

CONCLUSIONS
One of the widely used data reduction techniques, multiple regression analysis, often gives valuable insights into structure-property relationships. However, most often a direct interpretation of the results emerging from such analysis is rather difficult. It is generally understood that QSRR correlations at best suggest a parallel between the quantities involved (evaluators and responses) and do not necessarily reflect a cause-effect relationship [61]. The physical chemistry property studied in this work via the aforesaid quantum-mechanical descriptors-M, SAG, V, log p, RM, and α-are dependent upon the structure in general terms and also are dependent on more subtle quantities, some of which are directly related to these descriptors. This study is a first attempt to thoroughly analyze the influence of the intimate molecular structure on the RIs.
We have presented the results of determining experimental GC and theoretical RIs for a set of 20 1-(2-hydroxyphenyl)-3-arylpropane-1,3-diones. The calculations were made within the frame of the QSRR theory, resorting to six molecular descriptors closely associated with the molecular structure. The resulting fitting equations are quite satisfactory when one appeals to several variable polynomials and takes recourse of computing higher-order formulae. These results are in line with others previously determined for a set of organic bromo and nitrile derivatives, which seems to point out the suitability of the present molecular descriptors for computing RIs.