Space Group Approximation of a Molecular Crystal by Classifying Molecules for Their Electric Potentials and Roughness on Their Inertial Ellipsoid Surface

In order to predict the most probable space group where a molecule crystallizes, it is assumed that molecular shape and electric potential distribution on the molecular surface are the main factors or predictors. However, to compare and classify molecules by these two factors seems to be very difficult for in general such different objects. Thus, in order to compare molecules, they are reduced to their inertial ellipsoid in which surface 26 equally spaced points were chosen where a roughness factor and an electric potential due to all atomic charges of the whole molecule are calculated. By this procedure, different molecules encoded by these two predictor vectors can be compared and classified, showing that molecules that crystallize in the same space group have more similar predictor vectors. This result opens the possibility to predict the more probable spatial group associated with a molecule.


Introduction
The first hypothesis considered for crystal packing prediction CPP of organic molecules is that the isolated molecule contains the information of its future crystal [1].So it is for the so-called "blind tests, " the last published in 2011 [2], where several laboratories compete to find the crystal structure of a molecule by diverse calculations, where it is assumed that 95% of the molecules present no polymorph and prefer a cell with a given space group SG.Some approaches were previously done to get molecular crystal structure information by data mining [3,4].In the present study it is further assumed that the molecular crystal space group is mainly predetermined by the molecular form or roughness and by the electric potential distribution on the molecular surface, both factors being the best predictors for crystal packing, including the formation of hydrogen bonds.In fact electrostatic forces determine molecular reactions as can be observed by X-ray in the electron density distributions of crystalline molecules [5,6], were interactions by electrostatic forces (including H-bonds) between equal molecules in a crystal, would determine its crystal packing.The purpose of this work is to classify the molecules into groups by similarity of the above predictors and to check if these assumed types of packing are correlated with their space group SG, or conversely, to verify that molecules crystallizing in the same SG have similar aggregation predictors.However, comparing these two predictors between usually dissimilar molecules does not seem so simple, unless the molecules could first be reduced to more comparable objects.In a previous work [7], some molecular crystal descriptors like cell axes and the presence of some symmetry elements, but not the SG, were predicted by reducing each molecule to its inertial ellipsoid (adding to each axis the hydrogen VDW radius of 1.17 A).The same molecular reduction to its inertial ellipsoid is taken here, defined by its three axes: Large, Medium, and Small (L, M, S), where 26 points are added to its surface: 2 × 3 on the ends of the ellipsoid axes, 3 × 4 in the edges centers, and 8 in the face centers: all points approximately equidistant from each other.Figure 1(a) shows the numbered 26 points with their sequence of coordinates on the ellipsoid surface, where for clarity the ellipsoid has been deformed to a cube in this figure .The classical charges   for the atoms in every molecule were calculated by Chem3D Pro [8]; then the electrostatic potential   on those 26 points of every ellipsoid surface, due 2 Advances in Chemistry to the charges   of all  atoms in the molecule, was calculated by   = Σ (  /), where  are the distances from point  to the  atoms.Besides, a roughness factor in those 26 points of every molecular ellipsoid surface, from now concavity factor, was calculated as the average of the distances from each point to their four closest atoms in the molecule.In total two vectors per molecule: 26 for potentials and 26 for concavities, each of 26 components, are both the space group predictors in this work.In fact, the main differences with the previous work [7] are the addition of this concavity vector, not scaling the molecular potential vectors and a more simple space group classification procedure.
Table 1 shows molecular global form parameters such as the average molecular concavity ⟨conc⟩ and its standard deviation ⟨desv⟩, the relationship between the principal inertial axes (M/L, S/L, S/M), and other parameters of internal symmetry of the molecule, some of them used in the previous work [7].
Table 2 shows the averages of those parameters per space group SG.Although all these form parameters are somehow conditioning the SG of the crystal, it is not easy to find in these tables any similarities between those global form parameters for molecules of the same SG, which justifies extending that global form information among 26P over the inertial ellipsoid.

Molecular Space Group Predictors
Due to the relation between the present and previous work [7], the same 31 molecules are also chosen here.Table 3 shows those molecules all having azol group with three different substitutes: a group of 17 molecules 17P21/c with space group    SG P21/c (also P21/n or P21/a), six of them 6FAQP21/c sharing also a COOEt group in the same substituent, eleven molecules distributed among several SG: 3 in Pbca, 3 in P212121, 3 in Pna21, 2 in P-1, and other three molecules crystallizing in different SG.Finally, in order to compare properly both molecular vectors, 26 for potentials and 26 for concavities, between different molecules, each molecule was reoriented within its inertial ellipsoid IE, by using the symmetry planes L = 0, M = 0, or S = 0, in order to have the molecular largest potential  weighted octant, among the eight octants, in the IE octant (+L+M+S), were  weighted is calculated with the potentials of the seven points of the octant in the proportion: [3 (111) +2 (110 +101 +011) +  (100 +010 +001)].The Supplementary Table (available online at http://dx.doi.org/10.1155/2014/737480)shows the final comparable values of 26 and 26 for the reoriented 31 molecules.It is important to see in Tables 1 and 3 that the presence of two molecules in the same space group SG occurs not only for little changes in their structure like between FAQROE and FAQSAR but also for big changes like for FAQROE and KOSFUT, suggesting that the 26 potentials and 26 concavities sequences around the inertial ellipsoid will determine better the space group.
It is interesting to consider here that all the molecules could also be reoriented to have the lowest, instead of the largest, potential  weighted octant in (+L+M+S).Although the relative orientation between the largest and lowest  weighted octants is not the same for all molecules, however the average distribution on the inertial ellipsoid IE surface of the 26 and −26, respectively, is similar for both orientations of the 31 molecules, which reinforces this analysis.Finally the largest  weighted octant was taken for molecular reorientation.Figure 2(a) shows first the distribution along the IE of the 26 averaged for the 31 azole molecules 26⟨31⟩, together with the 26 standard deviations of these averages 26std⟨31⟩, showing two saw-tooth peaks, around (110) (111) and (010) (011), as expected (see Figure 1(a)) for molecules oriented with the largest potential  weighted octant in (+L+M+S).
The centrosymmetric molecule HEPJER almost planar in (L, M, 0) is shown in Figure 1(b) into a schematic inertial ellipsoid IE, because it is more suitable for its simplicity to understand the sequence of concavities and potentials along the 26 points on its IE shown in Figure 1(c).The concavities  are maximal in the 8 molecular LM0 contour points, the remaining concavities on either side of plane S = 0 being lower.The symmetrical potentials  have lower values on 0MS points away from the molecule and higher peaks especially on LM0 points.

Analysis of Averaged Potentials and Concavities by Space Group
Figure 2(a) shows the distribution from (1−1−1) to (−111) along the inertial ellipsoid of the 26 potentials averaged for the  molecules 26⟨⟩ belonging to different groups: the total of 31az, the 14NotP21/c, the 17P21/c, and the 6FAQP21/c.It also shows the distribution of the 26 standard deviations of those averages des (26⟨⟩) per group, which indicate the similarity between the potentials of the  molecules in each one of the 26 points of their inertial ellipsoid.In general des(26⟨⟩) are maximum in singular points with maximum or minimum values of ⟨⟩ and in particular are clearly superior for the first two groups of mixed space groups compared to the last two P21/c groups, as also shown in Table 4 with the total averages under the column ⟨des(26⟨⟩)⟩.Figure 2(a) also shows that the distribution of the 26⟨⟩ for 31az and 14azNotP21/c is similar with two peaks in saw-tooth form, although being less pronounced the second peak for 31az.The distributions of the 26⟨⟩ for the equal space groups 17P21/c and 6FAQP21/c differ from the previous with the loss of the second saw-tooth peak and an increased negative potential  on (0−10).
Advances in Chemistry 31az 26Vx10 0,5  Therefore, although the overall molecular shape factors of Tables 1 and 2 and the contents of the molecules in Table 3 do not seem to indicate similarity between molecules within each space group SG, however the distributions of the 26⟨⟩ potentials along the 26 points of the inertial ellipsoid appear to have some similarity between the molecules of the same SG.In fact, the analysis of Figure 2(a) and the values of ⟨26(des⟨⟩)⟩ in Table 4 suggest that there exists an association of the potential 26 predictor vector of a molecule with its molecular crystal or space group SG.
Figure 2(b) shows the average distribution of the 26 concavities 26⟨⟩ and of the corresponding standard deviations des(26⟨⟩) for the four groups of  molecules: 31az, 14NotP21/c,17P21/c, and 6FAQP21/c.While for 14azNotP21/c, the distribution of 26⟨⟩ is pseudosymmetrical around the center (00 ± 1), that symmetry tends to break for 17azP21/c and more for 6FAQP21/c molecules, especially in the area where their 26⟨⟩ potentials lose the second saw-tooth peak.The distribution of 26⟨⟩ for 31az involves molecules of equal and different space group being intermediate as might be expected.Table 4 also shows the quantitative differences between the average distributions of 26⟨⟩ concavities for different groups of molecules.The average values under ⟨26(des⟨⟩)⟩ showing more similar distributions for the 26 concavities between the molecules of the groups 6FAQP21/c and 17P21/c than between the molecules of the group 14NotP21/c, while an intermediate similarity for the total of 31az.This shows the association of the predictor vector 26 of molecular concavities with the space group SG of the molecular crystal, parallel to the previous association observed of the predictor vector 26 of potentials with the space group SG.
Figure 3 shows the molecular distributions of 26 potentials and 26 concavities for the four minority space groups: Pbca, P212121, Pna21, and P-1.Although this analysis is less significant with only 3, 3, 3, and 2 molecules each, it also notes some similarity between the distributions of its 26 and of its 26 for molecules with the same space group SG, except for P-1 and BIWWEJ (unique molecule in Table 2 with its 26 "calculated" positive).Furthermore, Table 4 shows that the average standard deviation values ⟨26(des⟨⟩)⟩ and ⟨26(des⟨⟩)⟩ for these four space groups do not deviate too much from those of groups 6FAQP21/c and 17P21/c, with the exceptions described above.
Finally, Figure 2(b) also shows the differences between the average potentials ⟨⟩ and average concavities ⟨⟩ distributions along the inertial ellipsoid surface for the 17P21/c and 14NonP21/c molecular groups.While for L = 1 there is quit similitude between the 's and between the 's of both groups, for L = 0 the second saw-tooth disappearance for the 's is accompanied with some 's distribution variations between both groups, and for L = −1 besides the notable 's distribution differences there are drastic differences between the 's distributions for both molecular groups.

Conclusion
Assuming that the molecular form and the potential distribution on its surface were the major predictors for crystal packing, it is not easy to compare these properties between different molecules, in order to classify them by their space group.To enable this comparison, molecules are reduced to their inertial ellipsoids IE with 26 singular points equally spaced on the surface, in which 26 potentials and 26 concavity factors are calculated.These two molecular vectors 26 and 26 are taken as molecular packing or space group predictors for the molecular crystal (assuming no polymorphism).Comparing both predictors between 31 molecules, there is more similarity between them for molecules crystallizing in the same space group SG than between molecules with different SG.This suggests that each space group would have its own mean distribution of their 26 potentials and 26 concavities on a virtual inertial ellipsoid, which would

Table 1 :
Global form descriptors for each molecule.

Table 3 :
The 31 molecules selected for the present work, with the CSD codes.

Table 4 :
Total averages ⟨26⟨⟩⟩ and ⟨26⟨⟩⟩ for potentials  and concavities  on the inertial ellipsoid IE, and the average of the 26 standard deviations of 26⟨⟩ and 26⟨⟩ for each space group SG.⟨26⟨⟩⟩ is the average of the 26 averaged (vertical sense in Supplementary Table) between the  molecules of the SG.⟨26(des⟨⟩)⟩ is the average of the 26 standard deviations for the 26⟨⟩.⟨26⟨⟩⟩ is the average of the 26 averaged (vertical sense in Supplementary Table) between the  molecules of the SG.⟨26(des⟨⟩)⟩ is the average of the 26 standard deviations for the 26⟨⟩.des⟨26des⟩ is the standard deviation of the last average ⟨26(des⟨⟩)⟩.shows the  similarity for the  molecules in the same point of IE. ⟨26(des⟨⟩)⟩ shows the  similarity for the  molecules in the same point of IE. des⟨26des⟩ shows the similarity between the 26(des⟨⟩).