JCHEMJournal of Chemistry2090-90712090-9063Hindawi Publishing Corporation10.1155/2016/17917561791756Research ArticleNanoquantitative Structure-Property Relationship Modeling on C_{42} Fullerene IsomersBolboacăSorana D.^{1}JäntschiLorentz^{2,3}LimTeik-Cheng^{1}Department of Medical Informatics and BiostatisticsIuliu Haţieganu University of Medicine and Pharmacy6 Louis Pasteur Street400349 Cluj-NapocaRomaniaumfcluj.ro^{2}Department of Physics and ChemistryTechnical University of Cluj-Napoca103-105 Muncii Bulevardul400641 Cluj-NapocaRomaniautcluj.ro^{3}Doctoral Studies-ChemistryBabeş-Bolyai University11 Arany Janos Street400028 Cluj-NapocaRomaniaubbcluj.ro
This paper is dedicated to Professor Mircea V. Diudea on the occasion of his 65th birthday.
The interest of scientists in nanostructures has been increased in the last years and proper methods for their assessment are needed. In silico methods found their usefulness in the replacement of experimental evaluation and are successfully used as efficient alternatives for estimation and prediction of compound’s properties or activities. In this paper, it is shown that a Quantitative Structure-Property Relationship method is proper to be applied also on nanostructures. Based on computational experiment, several models to describe the total strain energy of C_{42} fullerene isomers were obtained and their characteristics are presented. Furthermore, the best performing model obtained on C_{42} fullerene isomers was validated on C_{40} fullerene isomers.
1. Introduction
Since their discovery in 1985 [1], fullerenes attracted interest in different fields of science, including medical field (e.g., for potential use as antibiotics [2–4], as inhibitors of erythroid cells—fullerenol [5], as drug delivery system [6], or as inhibitors of inflammatory mediators [7]). Fullerene molecules are constructed from carbon atoms and take the shape of sphere (also known as buckyballs), ellipsoid, or tube [8]. First spherical fullerene, C_{60}, was discovered in 1985 [1]. Fullerenes have different properties and showed different number of associated isomers (Table 1) [9]. The smallest fullerene (C_{28}) was stabilized by metal encapsulation (with Ti, Zr, and U) by Dunk et al. [10]. Chen et al. showed that C_{32} fullerene has stronger aromaticity compared with C_{30} and C_{34}, respectively [11]. Fifteen distinct isomers with different energies were reported by Manna and Ghanty who encapsulate U into various C_{36} cages [12]. Muhammad et al. showed that C_{20} is a closed-shell fullerene and fullerenes C_{26} and C_{30} are pure open-shell compounds, whereas C_{36}, C_{40}, and C_{42} are intermediate open-shell compounds [13].
Several small fullerenes and their number of isomers.
Number
Fullerene
Number of isomers
1
C_{28}
2
2
C_{30}
3
3
C_{32}
6
4
C_{34}
6
5
C_{36}
15
6
C_{38}
17
7
C_{40}
40
8
C_{42}
45
9
C_{44}
89
10
C_{46}
116
Source: http://www.nanotube.msu.edu/fullerene/fullerene-isomers.html [accessed June 7, 2015].
The C_{42} fullerenes are small, not necessary spherical cages. The C_{42} cages enclosed high pentagon/hexagon ratio [14]. Fullerene C_{42} along with C_{60} showed highest values of the main peak on Matrix-Assisted Laser Desorption Ionization Time-of-Flight (MALDI-TOF) on mass spectrometric measurement [15].
Some activities of fullerenes have been modeled using quantitative structure-activity relationship (QSAR) approaches (such as anti-HIV protease inhibition activity [16], antiviral activity [17], and drug delivery system [18]). However, C_{60} received the main attention while other fullerenes were neglected in regard of QSAR/QSPR (Quantitative Structure-Property Relationship) modeling. The aim of our research was to model the total strain energy of the isomers of C_{42} fullerene using the structural information.
2. Materials and Methods
All C_{42} fullerene isomers were included in the analysis. Data related to continuum elasticity expressed as total strain energy (TSE in eV) and the structures as .xyz∗ files of C_{42} fullerene isomers were taken from [19] (Table 2).
C_{42} fullerene isomers: identification number (IsoID) and total strain energy (TSE).
IsoID
TSE (eV)
#01
31.060
#02
30.537
#03
29.791
#04
29.805
#05
30.618
#06
29.850
#07
30.608
#08
29.782
#09
28.527
#10
29.393
#11
29.475
#12
28.340
#13
28.157
#14
27.147
#15
29.955
#16
28.175
#17
28.276
#18
29.474
#19
27.408
#20
28.175
#21
27.283
#22
29.140
#23
28.765
#24
27.743
#25
27.487
#26
28.353
#27
28.014
#28
29.051
#29
27.489
#30
28.972
#31
27.484
#32
26.657
#33
26.639
#34
27.371
#35
26.554
#36
27.973
#37
29.764
#38
31.101
#39
26.639
#40
27.501
#41
26.672
#42
28.665
#43
28.284
#44
26.737
#45
25.661
The analysis was conducted on the downloaded file of the C_{42} isomers without any modification on the available geometry. According to [19], the fullerene geometries were based on the geometry of the structures in Yoshida’s Fullerene Library (UNIX files) and reoptimized using Dreiding-like force-field [20]. Here the obtained geometry is used.
The steps applied in the analysis are depicted in Scheme 1.
Flowchart of the applied methods. The pool of filtered SMPI (Szeged Matrix Property Indices) descriptors contains those descriptors with absolute values between 10^{−7} and 10^{7}.
In the first step of the analysis the downloaded files were translated into .mol∗ file with Spartan software (https://www.wavefun.com/products/spartan.html). In the second step the .mol∗ file is transformed as .hin∗ file using Babel software (http://openbabel.org). The partial charges were calculated in the third step using HyperChem software (http://www.hyper.com/) by applying PM3 (Parameterized Model number 3 [21]) single point (energy) semiempirical calculations. The structural features of the investigated nanoclass of compounds were extracted using unsymmetrical Szeged set, an extension of corresponding Szeged Matrix [22] (forth step). The calculated values of the structural descriptors and the collected values of total strain energy were included in nano-QSPR modeling in the fifth step of the analysis and models with the highest goodness-of-fit (defined as highest correlation coefficients) were analyzed and validated in leave-one-out and leave-many-out analyses [23, 24].
Leave-one-out analysis retrieves valid models if determination coefficient (Q2) takes values higher than 0.5. Leave-many-out analysis was conducted for the models with highest abilities in estimation expressed as the highest value of the correlation coefficient. The set was split using a simple random technique [25] in training and test with 2/3 of compounds in training set. The models obtained in training sets were used to predict the TSE in the test sets. The leave-many-out analysis was run five times for equations identified as being with highest estimation and internal prediction abilities in order to assess their prediction abilities.
The assessment of the prediction ability was done on an external dataset represented by C_{40} isomers considering the same property. The TSE values and the structures for external validation were taken from the same source as C_{42} isomers: http://nanotube.msu.edu/fullerene/fullerene.php?C=40 (accessed December 20, 2015). Several metrics were used to assess the prediction ability of the model [23, 24]: determination coefficient on the external set (R2ext), predictive square correlation coefficient on external set (Q2F2, [26]), external prediction ability (Q2F3), root mean square error of prediction (RMSEP), mean absolute error of prediction (MAEP), percentage predictive error (%PredErr), and concordance correlation coefficient (CCC [27]).
3. Results and Discussion
Structural information of the investigated C_{42} isomers was obtained by calculation of the pool of descriptors given by Szeged Matrix Property Indices (SMPI) method [28]. Performing models in regard of goodness-of-fit (highest correlation coefficient) with 1, 2, 3, and 4 SMPI descriptors was obtained and is given in (1)–(4):(1)Y^TSE1=-1176.25+IJUGE×1.96(2)Y^TSE2=-542.87-IIUGF×1.93×10-3+IJUGE×1.81(3)Y^TSE3=838.80-IFEGE×1.41-IIUGF×3.66×10-3+IJUGE×2.16(4)Y^TSE4=-199.61-IFETB×21.63+IFUGB×40.90-IIUGF×2.62×10-3+IJUGE×1.56,where Y^TSE is total strain energy estimated by the model; IJUGE, IIUGF, IFEGE, IFETB, and IFUGB are SMPI descriptors. Two descriptors (IFETB and IFUGB) account for the atomic number as atomic property; the other two descriptors account for electronegativity (IJUGE and IFEGE), while one accounts for the first ionization energy (IIUGF). The investigated property is related to the geometry of compounds (fourth letter “G” in the name of descriptors) with one exception when it is related to topology (fourth letter “T” in the IFETB descriptor). The other letters reflect the linearization operator (first letter), matrix operation (second letter), and interaction descriptor (third letter).
As expected, the determination coefficient increases as the number of descriptors in the models increases, while the standard error of the estimate decreases (Table 3).
Characteristics of nano-QSPR models obtained on C_{42} isomers.
Equation
R2
R2adj
se
F(p)
|tmin|(p)
%PredErr
Q2
seloo
Floo(ploo)
(1)
0.8883
0.8857
0.4577
342(4.38 × 10^{−22})
18.05(1.09 × 10^{−21})
46.77
0.8656
0.5039
275(1.56 × 10^{−20})
(2)
0.9612
0.9593
0.2729
520(2.32 × 10^{−30})
6.69(4.10 × 10^{−8})
31.92
0.9545
0.2960
439(8.48 × 10^{−30})
(3)
0.9836
0.9824
0.1796
820(1.30 × 10^{−36})
4.37(8.40 × 10^{−5})
19.76
0.9809
0.1939
701(3.80 × 10^{−37})
(4)
0.9898
0.9888
0.1431
974(2.87 × 10^{−39})
3.55(1.01 × 10^{−3})
15.95
0.9768
0.2171
418(2.28 × 10^{−34})
R2: determination coefficient; R2adj: adjusted determination coefficient; se: standard error of estimate; F(p): Fisher’s statistic (p-value); |tmin|: the minimum of absolute t-statistic associated with the intercept and coefficients of the model; %PredErr: percentage prediction error; Q2: determination coefficient in leave-one-out analysis; loo: leave-one-out analysis.
The distance between determination coefficient of the model and determination coefficient obtained in leave-one-out analysis varied from 0.0027 to 0.0227, the smallest distance being obtained by (3) (Table 3). On the other hand, the smallest difference between standard errors (estimation model and leave-one-out model) is obtained by the same model (3).
The analysis of the results presented in Table 3 showed that the model with four descriptors is the one with smallest percentage of prediction error. Furthermore, the data on the scatter closest to the straight line is observed for the model given by (4) (Figure 1). Figure 1 shows the absence of the differences between models from (3) and (4), with the dispersion of the point in the scatter closest to the line for model given by (4).
Observed versus estimated TSE by (1)–(4).
The main characteristics of the models given by (3) and (4) obtained in leave-many-out analysis (training versus test analysis; 2/3 of compounds in training set run 5 times) are presented in Table 4.
Characteristics of nano-QSPR models in leave-many-out analysis: C_{42} isomers.
Id
Model
Training
Test
Equation (3)
Intercept
IFEGE
IIUGF × 10^{−3}
IJUGE
R2
F-stat
R2
F-stat
1
744.37
−1.27
−3.57
2.09
0.9797
360
0.9877
264
2
682.4
−1.21
−3.46
2.07
0.9788
369
0.9935
361
3
902.68
−1.48
−3.79
2.20
0.9794
376
0.9894
358
4
678.39
−1.22
−3.53
2.12
0.9853
534
0.9851
171
5
854.05
−1.49
−3.51
2.17
0.9828
458
0.9835
219
Equation (4)
Intercept
IFETB
IFUGB
IIUGF × 10^{−3}
IJUGE
R2
F-stat
R2
F-stat
6
−120.59
−19.88
37.19
−2.87
1.55
0.9901
568
0.9637
72
7
−277.84
−20.87
40.14
−2.30
1.54
0.9819
310
0.9814
154
8
−91.46
−20.86
39.01
−3.00
1.56
0.9878
459
0.9636
64
9
−225.97
−21.69
42.02
−2.35
1.48
0.9830
331
0.9794
139
10
−223.28
−18.71
36.37
−2.35
1.47
0.9887
497
0.9701
85
The results presented in Table 4 showed the stability of the models, with internal prediction power (defined as determination coefficient in test sets) closed to the estimation power (determination coefficient in training set) from both investigated models. Therefore, the results obtained in training sets closely follow the results on the whole sample for (3) with R2 in the same range when two decimals are of interest. The R2 obtained in test set in all five runs of the leave-many-out analysis was equal to 0.99, so slightly higher than the R2 obtained in training sets (0.98). In three cases out of five, the R2 in training sets for (4) was in the same range for two decimals with the R2 value given in Table 3. However, without any exception, the R2 in test sets was smaller than the R2 in training sets for (4), with values that varied from 0.0005 (id 7 in Table 4) to 0.0264 (id 6 in Table 4). These results showed that (3) performs slightly better in terms of determination coefficients in leave-many-out analysis.
The plots of the models obtained in the fourth run for (3) and fifth run for (4), as examples, are given in Figure 2.
Internal prediction versus estimation power in training and test analysis for (3) and (4).
The equations identified with estimation power and internal prediction abilities, namely, (3) and (4), were further applied on C_{40} isomers to test the external prediction abilities. The prediction power of (4) proved to be better compared with prediction power of (3) (see Figure 3 and Table 5).
Prediction power of nano-QSPR given by (3) and (4) on C_{40} isomers.
Equation
R2ext
Q2F2
Q2F3
RMSEP
MAEP
|t(Y-Ypred)|(p)
%PredErr
(3)
0.6183
0.9501
NR
1.60
51.28
324 (8.37 × 10^{−69})
63.19
(4)
0.8462
0.5144
NR
1.60
5.27
52 (4.96 × 10^{−38})
6.49
R2ext: determination coefficient on the external set; Q2F2: predictive square correlation coefficient on external set; Q2F3: external prediction ability; RMSEP: root mean square error of prediction; MAEP: mean absolute error of prediction; %PredErr: percentage predictive error; NR: not reliable value.
Analysis of (3) and (4) on external dataset represented by C_{40 }isomers.
Despite the fact that the predictive square correlation coefficient on external set is higher for (3) compared with the value obtained with (4), all other calculated metrics sustain that the model given by (4) has better prediction abilities (highest determination coefficient on external set, lowest mean absolute error of prediction, and lowest percentage of predictive error; see Table 5). Furthermore, the analysis of the overall spread of the points in the scatter-plot leads to the conclusion that (4) had better prediction abilities compared with (3). Nevertheless, the mean of residuals proved to be significantly different than the expected value (zero). It could be concluded that the model given by (4) better fit the data on which it was constructed compared with all other models. Nevertheless, are the structural features extracted by SMPI descriptors on C_{
42} isomers able to predict the TSE on C_{
40} isomers?
SMPI descriptors used by (3) and, respectively, (4) were used to predict the TSE on C_{40} isomers. One out the three descriptors from (3) proved to have the slope not significantly different by zero and was not included in further analysis. The identified models obtained on C_{40} isomers are given in(5)Y^TSE5=-328.66-IIUGF×2.43×10-3+IJUGE×1.70R2=0.8483,R2adj=0.8401,se=0.65,Fp=1037.07×10-16,tminp=2.880.0066,n=40,Q2F3=0.7834,RMSEP=1.60,MAEP=0.52,%PredErr=0.64,CCC=0.9179(6)Y^TSE6=-IFETB×15.05+IFUGB×31.49-IIUGF×2.64×10-3+IJUGE×1.21R2=0.8853,R2adj=0.8479,se=0.57,Fp=693.76×10-16,tminp=3.100.0038,n=40,Q2F3=0.8362,RMSEP=1.60,MAEP=0.43,%PredErr=0.52,CCC=0.9390,where Y^TSE is total strain energy estimated by the model; IJUGE, IIUGF, IFETB, and IFUGB are SMPI descriptors. Two descriptors (IFETB and IFUGB) account for the atomic number as atomic property, one descriptor accounts for electronegativity (IJUGE), and one accounts for the first ionization energy (IIUGF). The investigated property is related to the geometry of compounds (fourth letter “G” in the name of descriptors) with one exception that is related with compounds topology (IFETB descriptor). The other letters reflect the linearization operator (first letter), matrix operation (second letter), and interaction descriptor (third letter). Note that both models have the mean of residual not significantly different by zero (p>0.49).
The analysis of the metrics associated with (5) and (6) leads to the conclusion that model given by (6) perform better than the model given by (5). The same conclusion is obtained by analyzing the plots of observed versus predicted TSE (Figure 4).
Analysis of (5) and (6) on external dataset represented by C_{40 }isomers.
The results of our study showed that the identified nano-QSPR models fit the data based on which the model was identified (C_{42} isomers) but could be used for selection of those structural descriptors with fair abilities in prediction on external dataset (C_{40} isomers). To sum up, equations relating electronegativities, ionization potential, and energy have been identified on C_{42} isomers and proved to work also on C_{40} isomers. Note that electronegativities and ionization potential are atomic properties and since the investigated set contains just C and H atoms, the identified relation between the three properties could be assigned also to the topology and geometry of the investigated compounds.
To the best of our knowledge, structure-property relationship approaches were not applied on C_{42} or C_{40} fullerene isomers. The small-diameter fullerenes (C_{20}, C_{34}, C_{42}, and C_{60}) were mainly investigated in regard of properties (such as adsorption [29], distribution of CC distance [14], and Schlegel diagrams of molecular structures [30]). Therefore, this is the first report of a quantitative relationship between structure and property of C_{42} fullerene. Undoubtedly, the advancement from theoretical to experimental studies is desired.
4. Conclusions
The C_{42} fullerene isomers were successfully modeled and the total strain energy was characterized as function of information extracted from structure of the compounds. The models with goodness-of-fit in leave-one-out (Q2=0.9768) and leave-many-out analyses proved also that prediction power is the one with four descriptors. The total strain reaction proved a function of electronegativity and first ionization energy, in relation to geometry of compounds. The structural descriptors able to fairly explain the total strain energy on C_{42} isomers proved also able to explain the same property on C_{40} fullerene isomers.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
KrotoH. W.HeathJ. R.O'BrienS. C.CurlR. F.SmalleyR. E.C_{60}: BuckminsterfullereneDineshR.AnandarajM.SrinivasanV.HamzaS.Engineered nanoparticles in the soil and their potential implications to microbial activityHuhA. J.KwonY. J.Nanoantibiotics: a new paradigm for treating infectious diseases using nanomaterials in the antibiotics resistant eraZhangY. S.DaiT. H.WangM.VecchioD.ChiangL. Y.HamblinM. R.Potentiation of antimicrobial photodynamic inactivation mediated by a cationic fullerene by added iodide: in vitro and in vivo studiesTishevskayaN. V.ZakharovYu. M.GolubotovskiiE. V.KolesnikovO. L.TrofimovaN. V.ArkhipenkoYu. V.SazontovaT. G.Effects of fullerenol C_{60}(OH)_{24} on erythropoiesis in vitroPacorS.GrilloA.ĐorđevićL.ZorzetS.LucafòM.Da RosT.PratoM.SavaG.Effects of two fullerene derivatives on monocytes and macrophagesDellingerA. L.ZhouZ.KepleyC. L.A steroid-mimicking nanomaterial that mediates inhibition of human lung mast cell responsesHirschA.BrettreichM.TománekD.DunkP. W.KaiserN. K.Mulet-GasM.Rodríguez-ForteaA.PobletJ. M.ShinoharaH.HendricksonC. L.MarshallA. G.KrotoH. W.The smallest stable fullerene, M@C_{28} (M = Ti, Zr, U): stabilization and growth from carbon vaporChenY.-M.ShiJ.RuiL.GuoQ.-X.Theoretical study on C_{32} fullerenes and their endohedral complexes with noble gas atomsMannaD.GhantyT. K.Enhancement in the stability of 36-atom fullerene through encapsulation of a uranium atomMuhammadS.FukudaK.MinamiT.KishiR.ShigetaY.NakanoM.Interplay between the diradical character and third-order nonlinear optical properties in fullerene systemsMałolepszaE.LeeY.-P.WitekH. A.IrleS.LinC.-F.HsiehH.-M.Comparison of geometric, electronic, and vibrational properties for all pentagon/hexagon-bearing isomers of fullerenes C_{38}, C_{40}, and C_{42}KauppineE. I.Carbon Nanotubes and NanoBuds—Synthesis, Structure, Functionalisation and Dry Deposition for TCE and TFT ApplicationsJuly 2015, http://www.jst.go.jp/sicp/ws2009_finland/abstract/wg2_02kau.pdfIbrahimM.SalehN. A.ElshemeyW. M.ElsayedA. A.Fullerene derivative as anti-HIV protease inhibitor: molecular modeling and QSAR approachesAhmedL.RasulevB.TurabekovaM.LeszczynskaD.LeszczynskiJ.Receptor- and ligand-based study of fullerene analogues: comprehensive computational approach including quantum-chemical, QSAR and molecular docking simulationsTrpkovicA.Todorovic-MarkovicB.TrajkovicV.Toxicity of pristine versus functionalized fullerenes: mechanisms of cell damage and the role of oxidative stressTománekD.C42 Isomers. In: Guide through the Nanocarbon Jungle: Buckyballs, Nanotubes, Graphene, and Beyond, 2015, http://www.nanotube.msu.edu/fullerene/fullerene.php?C=42MayoS. L.OlafsonB. D.GoddardW. A.DREIDING: a generic force field for molecular simulationsStewartJ. J. P.vonP.SchleyerR.PM3DiudeaM. V.MinailiucO. M.KatonaG.GutmanI.Szeged matrices and related numbersBolboacăS. D.JäntschiL.Quantitative structure-activity relationships: linear regression modelling and validation strategies by exampleBolboacăS. D.JäntschiL.DiudeaM. V.Molecular design and QSARs/QSPRs with molecular descriptors familyBolboacăS. D.Assessment of random assignment in training and test sets using generalized cluster analysis techniqueChiricoN.GramaticaP.Real external predictivity of QSAR models. Part 2. New intercomparable thresholds for different validation criteria and the need for scatter plot inspectionLin's Concordance, December 2015, http://services.niwa.co.nz/services/statistical/concordanceJäntschiL.Szeged Matrix Property Indices2014, http://l.academicdirect.org/Chemistry/SARs/SMPILiuX.WenY.ChenZ.LinH.ChenR.ChoK.ShanB.Modulation of Dirac points and band-gaps in graphene via periodic fullerene adsorptionChiuY.-N.XiaoJ.MerrittC. D.LiuK.HuangW.-X.GanelinP. V.LiN. N.Special geminals and Schlegel diagrams of molecular structures of fullerenes and metallofullerenes