Computer-Aided Classification of New Psychoactive Substances

)e appearance on the free market of synthetic cannabinoids raised the researchers’ interest in establishing their molecular similarity by QSAR analysis. A rigorous criterion for classifying drugs is their chemical structure. )erefore, this article presents the structural similarity of two groups of drugs: benzoylindoles and phenylacetylindoles. Statistical analysis and clustering of the molecules are performed based on their numerical characteristics extracted using Cheminformatics methods. )eir similarities/ dissimilarities are emphasized using the dendrograms and heat map. )e highest discrepancies are found in the phenylacetylindoles group.


Introduction
e consumption of psychoactive substances is among the principal causes of young people's mortality and health issues. A significant proportion of adults consumed some drugs at a certain period in their lives or presented with drug addiction. Synthetic cannabinoids are found in "herbal smoking mixes" and mixtures with other psychoactive substances such as sedatives/hypnotics, hallucinogens, and stimulants [1]. Manufactured in clandestine laboratories and widely sold, the class of new psychoactive substances known as different spices poses a growing threat to public health, such as Spice, Spice Gold, and Silver. In some countries, acute poisonings of users' groups [2][3][4] have been recorded. Synthetic cannabinoids have been missold as THC (Δ9-tetrahydrocannabinol) or CBD (cannabidiol) and are packed in tablets containing powder or utilized in devices as liquids [5].
Another recent development is the popularity of "vaping" among young people, synthetic cannabinoids being discovered in cartridges filled with liquid for use in e-cigarettes [6].
Psychoactive drugs increase the dopamine available in the brain (called the reward circuit) with participation in pleasure modulation. Synthetic cannabinoids (SCs) are much stronger and more toxic than users can often think.
eir increasing prevalence has led to severe health problems among young people and adults. Consumption of these new psychoactive substances poses a health risk and generates severe issues in adapting to reality, education, work, family life, incidents, and road accidents [7,8]. Research has been performed on the cannabinoids detected in different products to determine how cannabinoids affect the human body [9]. ey also aim to analyze the possibilities of using these substances to treat certain neurodegenerative diseases, drug addiction, emotional disorders, or cancer [10]. However, separating the psychoactive effects and the medicinal properties of these drugs is challenging [11].
In about 8-10 years, researchers have tried different dynamic simulation models on rapid identification and establishing the relationships of addiction between the chemical structure and effects on consumers to detect possible ways for curing addiction and overdose [12]. Functionally similar to THC (the cannabis active principle), SCs bind to the same receptors in the human body [6]. e group of synthetic cannabinoids or synthetic cannabimimetic is the largest monitored by the EU Early Warning System. In December 2016, 169 synthetic cannabinoids were reported to the EMCDDA and increased to 190 substances reported in 2018 [8]. About 280 synthetic cannabinoids were known at the end of 2019 [6].
In the investigation of the various chemical synthesis compounds or any chemical synthesis preparation, the chemical analyst establishes the relationships between physicochemical parameters and biological (pharmacological) properties, the degree of structural superposability of the chemical products, using filters based on computational structure and quantitative relationship models of structureactivity (QSARs models) [11]. e search for similarity using molecular fingerprints are among the most common approaches in identifying active chemical compounds proposed for analysis in this study. Many structural descriptors or 2D and 3D ownership are encoded in different representations of fingerprints. All fingerprints eventually transform molecular likeness analysis into a paired comparison of template compound patterns and the entire database [17].
Several fingerprint types are known, each representing a difference in the molecule. Artificial intelligence techniques have been employed in the last period for data analysis in various domains [18][19][20][21]. Cheminformatics is a tool used to analyze statistical data based on the digital footprint, which is the dynamic field of modern design for pharmaceuticals, in particular. Cheminformatics plays an essential role in accumulating, grouping, examining chemical data, and finding new entities based on which new chemical structures can build active molecules [22].
Swandana et al. [23] showed that the pharmacokinetic parameters could be predicted using the in silico method. e authors emphasize some software (admetSAR, Chemicalize, Molinspiration, QikProp, SwissADME, and pkCSM software) to predict oral-systemic drugs' pharmacokinetics characteristics. Other scientists [24,25] emphasized different computational methods utilized for discovering drugs and their limitations.
Finding similarities of different molecules has been of interest since the 90s [26] that can be easily done using the new capabilities of the new software. e rcdk, ChemmineR, and rpubchem packages of R are powerful tools in the field of cheminformatics [27][28][29][30][31][32].
eir utilization helps the cheminformatics-oriented R user to organize chemical information efficiently. Guha et al. [27] showed that one of the main tasks is to include further coverage of the CDK API by developing the CDK itself, such as extending the 3D builder model. e authors noted that the CDK API could be accessed directly via rJava, so the functions provided by the rcdk package make cheminformatics functionality readily available for the end-users.
In this study, we used the R software capacity to characterize 14 new cannabinoids from the benzoylindoles and phenylacetylindoles groups and determine the similarities among their molecules. Clustering techniques are applied for grouping the studied molecules based on the determined characteristics to cross-validate similar attributes in the same group of drugs.

Materials and Methods
e structures subject to the study have been downloaded as .sdf files from PubChem [33]. ey belong to the class aminoalkylindoles, subclasses benzoylindoles, and phenylacetylindoles.
e R software has been used for performing the study. For this aim, the following libraries have been installed: ChemmineR, chemometrics, cluster, factoextra, fingerprint, fmcsR, ggplot2, gridExtra, iqspr, NbClust, rcdk, rJava, rgl, and vegan. ey are necessary for drawing the structures, finding the characteristics of the chemical structures, or determining the appurtenance of the chemical structures to some clusters. e stages of this study are presented in Figure 2. ey are (i) Import the molecule from ChemPub, as SDF files, and visualize them (ii) Visualize the molecule structure and molecular formula (MF), and determine the molar weight filter (MW), the types and numbers of atoms, and the functional groups together with the corresponding frequencies (iii) Compute different descriptors utilizing the library ChemmineOB: (i) HBA1: number of hydrogen bond acceptors 1 (ii) HBA2: number of hydrogen bond acceptors 2 (iii) HBD: number of hydrogen bond donors (iv) logP: logarithmic of partition coefficients (log P) (v) MR: molar refractivity (vi) TPSA: topological polar surface area e hydrogen bonds are among the most significant interactions between the solute and solvent. In binding the ligands, the hydrogen bonds have important contributions to the substrates, antagonists, and agonist recognition [34]. e octanol/water partition coefficient is used to characterize the lipophilicity of the compounds and is defined by the ratio [solute] octanol /[solute] water . Generally, the logarithm of this ratio, logP, is reported [35]. e molar refractivity (MR) measures the steric factor, which is the volume covered by an atom or a group of atoms. It is correlated to the London dispersive forces acting during the drug-receptor interaction [36]. e hydrophilic interaction and the hydrogen bond formation are reflected by the polar surface (PSA) [37], computed by eliminating the area of nonpolar hydrogen, carbon, and halogen atoms from the molecular surface. Determined by the method of Ertl et al. [38], it is referred to as topological PSA (TPSA) [39]. According to the available literature, Turner and Agatonovic-Kustrin [40] showed that most drugs have a molar weight lower than 500 and PSA less than 120Å.
(i) Search the atom pair similarity by cmp.similarity command from ChemmineR. For this aim, the Tanimoto index was utilized [41] due to its performance [26]. For dichotomous variables, it is computed by .
(1) e following relationship exists between the distance and similarity metrics: (ii) Determine the intermolecular distances, using the fmcs function from the fmscR library and the compound similarities. (iii) Clustering the molecules' sets: for this aim, the binning clustering (using cmp.function that implements the Tanimoto and Tversky methods) [41,42], the Jarvis-Patrick algorithm [43], the k-means, and the hierarchical clustering (with the NbClust package) have been utilized [44]. Comparisons are provided.

Results and Discussion
e structures of the molecules from the benzoylindoles group (Group 1) are represented in Figure 3, and those from the phenylacetylindoles (Group 2) are represented in Figure 4, together with the identification number from PubChem.
A similar representation, but showing the number of atoms, is presented in Figure 5, for the molecule 9889172. It contains an automatic numbering added by R software when extracting the information from PubChem: the atoms are numbered consecutively (from 1 to the total number of atoms), based on the following order: elements in the seventh group, from the highest to the lowest atomic number, elements in the sixth group, from the highest to the lowest atomic number, and so on, from left to right in the Mendeleev table. Table 1 contains the molecules formula, the types and numbers of atoms in each molecule from both groups, the molecular weight filters (MW), the number of rings, and the number of aromatic rings retrieved using ChemmineR. e molar weight filters vary in a larger range in the first group by comparison to the second one.
All but the first molecule do not contain F; all the molecules in Group 2 do not contain I. Only one molecule in each group contains Cl. e number of N and O is one or two. e number of C and H atoms is lower in the first group than the second one. e analysis shows that no molecule contains groups RNH2, R2NH, ROPO3, ROH, RCHO, RCOOH, RCOOR, RCCH, and RCN, but all contain rings, most of them aromatic. Table 2 contains the most important descriptors that characterize the molecules. e values of HBA1 are lower for the first group, HBD is absent for both groups, logP is in the range of 3.4594-5.8860 for Group 1 and 4.4975-6.0457 for Group 2 (showing higher hydrophilicity for the second group compared to the first one). TPSA varies in larger limits for Group 1 than for Group 2.
e higher the TPSA is, the higher the drug transport is. In the studied case, the molecule with CID � 56463 has the highest TPSA.   Journal of Chemistry e similarity among the pairs of atoms in different molecules is given in Table 3.
Most similarity coefficients are above 0.5, the best ones corresponding to the pair of CIDs (57507911, 57507905), (988917, 0.8462) in Group 1 and (44397540, 11616723), (44397500, 11616723) in Group 2. e compounds' similarity has been studied using the fmcs function. e output shows the size of the query molecule and that of the target molecule (Query_Size and Target_Size, respectively), the Tanimoto coefficient, and the overlap coefficient, measuring the structures' superposition degree.
For example, in Table 4, the query molecule has CID 9889172, and the target molecule in row three has CID 117587582. For both, the query and target sizes are equal (24 atoms, the number of hydrogen atoms is not counted), the Tanimoto coefficient is 0.92, and the overlap coefficient is 0.9583. e higher the Tanimoto and the overlap coefficients are, the higher the similarity of the structures is. e highest dissimilarity is that between molecule CID 56463 and the other molecules in Group 1. e similarity coefficients in Group 2 are higher than in the first group, as shown in Table 5. Hence, the compounds in the second group have more similar structural characteristics than those in the first group.
e computed similarity coefficients between the pairs of atoms in different molecules from Group 1 and those from Group 2 are shown in Table 6. e similarities of atom pairs from 9889172 (Group 1) with those from 44397500 and 1161723 (Group 2) are higher than those from 57507905 and 56463 (Group 1). ere are high similarities of atoms pairs from 57507911 (Group 1), on the one hand, and 44397540 and 11616723 (Group 2), on the other hand (higher than those with other molecules from Group 1). Similarities above 0.60, which are significant, are noticed for other pairs of molecules belonging to different groups.
A similar result has been obtained using the Tanimoto coefficient for molecules' similarities. Table 7 displays the Tanimoto coefficients and their ranked similarities (from 1 to 7) for pairs of molecules: one from the first group and another from the second group. Blank means a similarity rank over 8.
ere is only one molecule not listed in Table 7(ID 53394099, Group 2), whose similarity with all the molecules in Group 1 is very low.
Based on the structural characteristics, the molecules' clustering has been performed. e results for Group 1 are presented in the following. e binning clustering, with cutoffs (lower similarity bounds) of 0.3, 0.6, and 0.9 resulted in 1, 2, or 7 clusters, respectively (for both groups). e Jarvis-Patrick algorithm provided 2 clusters when considering four neighbors, and a single cluster when considering 5 and 6 neighbors. e optimal number of clusters (3) in the k-means algorithm has been selected by the elbow method.
Running the k-means algorithm resulted in three clusters with the sizes 1, 3, and 3 and between the sum of squares/total sum of squares � 77.0%. e first clusters contain the first three molecules; the second one, the next three molecules; and the last cluster contains molecule CID 56463.
For the second group, the algorithms binning and Jarvis-Patrick algorithms provided a single cluster. e heat map and the dendrogram resulted from the hierarchical clustering are presented in Figure 6 for Group 1 and in Figure 7 for Group 2.
In the heat maps, the darker the color is, the lower the distance between the compounds is, so the higher the similarity is. Hence, the pair of compounds whose corresponding squares are dark green are the most similar. A very low similarity is noticed for the pairs of compounds situated in the bottom right-hand side of both heat maps. e dendrograms (right-hand side of Figures 6 and 7) confirm the similarity strength. e higher the branch between two compounds is, the lower the similarity is.
In Figure 6, the highest similarity is found between the groups of molecules (117587582 and 9889172) and (79507911 and 56841530). ey both have a Tanimoto coefficient of 0.900 and an overlap coefficient of 0.9583 (listed in Table 4 for the first pair, not listed for the second one for the lack of space). From the viewpoint of similarity intensity, the second place is occupied by the pairs of molecules (10226340, 117587582) and (10226340, 9889172), with a Tanimoto coefficient of 0.8519 and the same overlap coefficient.
e first cluster contains all the molecules in Group 2, a result consistent with the previous findings. e first group is divided into two clusters: the first one containing the first three molecules and the second one, the rest. e result is consistent with the dendrogram in Figure 6. Cluster 2 (Figure 9) contains the same elements as the top subtree in Figure 7.

Conclusions
Cheminformatics is a study field that offers many facilities in collecting, organizing, and studying chemical information [45]. Its contribution is essential to drug design [46,47]. In this view, our article focused on some applications of Cheminformatics to analyze the benzoylindoles and phenylacetylindoles groups of drugs. e structures and molecules functional groups have been emphasized using the facilities provided by rcdk and ChemmineR packages. Structural similarities of different molecules have been determined using the Tanimoto index and clustering techniques. A higher similarity in the second group compared to the first one is observed. Further practical studies should confirm the hypothesis of the similar actions and effects of these drugs and the possible cure using the same inhibitors.

Data Availability
Data can be freely downloaded from https://pubchem.ncbi. nlm.nih.gov/ by introducing the compound name in the search window.

Conflicts of Interest
e authors declare that they have no conflicts of interest.