K Nearest Neighbor Algorithm Coupled with Metabonomics to Study the Therapeutic Mechanism of Sendeng-4 in Adjuvant-Induced Rheumatoid Arthritis Rat

As a traditional Mongolian medicine, Sendeng-4 (SD) has been widely used to treat rheumatoid arthritis (RA) in Inner Mongolia and exhibits a good curative effect. Unfortunately, due to geographical factors, it is difficult to popularize this drug throughout the whole country, and the mechanism of action of SD has been unclear. In this study, a serum metabolite profile analysis was performed to identify potential biomarkers associated with adjuvant-induced RA and investigate the mechanism of action of SD. Ultraperformance liquid chromatography coupled with quadrupole time-of-flight mass spectrometry (UPLC-Q-TOF-MS) was performed for the metabonomics analysis. K nearest neighbor (KNN) models were established in both positive and negative spectra for classifying data from the control, model, and SD administration groups. Accuracy rate for classification was 95.8% in positive ion mode and 91.7% in negative ion mode. Orthogonal partial least squares discriminant analysis (OPLS-DA) enabled the identification of 12 metabolites as potential biomarkers of adjuvant-induced RA. After treatment with SD, the levels of uridine triphosphate, calcitroic acid, dynorphin B (6-9), and docosahexaenoic acid were restored to normal, indicating that SD likely ameliorated RA by regulating the levels of these biomarkers. This study identified early biomarkers of RA and elucidated the underlying mechanism of action of SD, which is worth further investigation for development as a clinical therapy.


Introduction
Rheumatoid arthritis (RA) is an autoimmune inflammatory disease. The main clinical manifestations are chronic, symmetrical, and synovial arthritis and extra-articular disease. RA, a progressive disease, occurs in small joints such as the hand, wrist, and foot and causes joint deformities and loss of joint function [1]. Currently, the treatment of RA mainly relies on Western medicine such as the nonsteroidal drug, diclofenac [2]; antirheumatic drug, methotrexate [3]; and glucocorticoid drug, dexamethasone [4]. These Western medicines are highly efficacious but are accompanied by toxic side effects. In addition, Western medicine can only temporarily relieve or eliminate the pain and cannot cure RA fundamentally.
Mongolian medicine has its own unique theory and method in the treatment of RA and can regulate the physiological activity of the human body from the perspective of allometric function. Mongolian medicine Sendeng-4 (SD) is comprised of Xanthoceras sorbifolia, Toosendan fructus, Gardeniae fructus, and Chebulae fructus at a ratio of 5 : 3 : 1 : 1. SD is mainly used in the treatment of gout, rheumatism, joint grasserie, and edema [5,6]. Although SD has a good curative effect on RA which is not prone to relapse after recovery, the popularization of SD is challenging due to geographical factors and incomplete understanding about its mechanism of action.
Metabonomics is the science about the changing roles of endogenous metabolic substances in organisms and has been widely used in the study of the mechanism of action 2 Evidence-Based Complementary and Alternative Medicine of drugs [7][8][9]. As a burgeoning metabonomics approach, the nearest neighbor (KNN) classification algorithm is one of the simplest methods in data mining classification. The nearest neighbor of means that each sample can be represented by its nearest samples. The classification model of KNN is simple and effective. The principle of KNN describes that if there is a test data to be classified, the most similar known data is found by comparison to the classifying data, and the category of the data to be classified is judged based on the type of known data [10,11]. The traditional metabonomics method is principal component analysis (PCA). PCA loses a considerable amount of raw data-the cumulative contribution rate of the first few principal components must be high or else the model is not qualified. KNN does not lose any raw data, which gives it a stronger advantage over traditional statistical methods in the classification of multidimensional data. At present, KNN classification has been widely used in molecular biology research of RA. Inhibition of TNF-alpha converting enzyme (TACE) is one of the most direct and effective therapies for RA. Therefore, the screening of TACE inhibitors has become a very important task. KNN was used to classify TACE inhibitors and noninhibitors, and the KNN model gave a classification accuracy of 98.32% [12]. Peripheral blood gene expression profiles (PBGE) were used to predict disease severity in early RA patients, based on KNN classification, and results showed that KNN effectively predicted RA severity [13]. The bone lesion volume (BeltaBLV) in RA patients' hands would be used to evaluate RA progression. This study demonstrated that the combination of multispectral (MS) MRI analysis method and KNN classification provided the quantitative tool for the BeltaBLV [14]. So far, work regarding RA metabonomics coupled with KNN classification has not been reported. In this study, KNN algorithm combined with the metabonomics method was applied to classify data from different groups. Potential biomarkers were identified using the OPLS-DA model. Our study revealed the mechanism of SD in RA treatment by studying the metabolic pathway of potential biomarkers.

Adjuvant-Induced Arthritis Model Establishment and
Treatment. The study was approved by the ethics committee of the Affiliated Hospital of Inner Mongolia University for the Nationalities (NMMZDX2017[K]0013). Male Wistar rats (200 ± 10 g) were provided by YiSi Laboratory Animal Technology Co., Ltd., (Changchun, China). All animals were reared under standard conditions (21 ± 2 ∘ C, daily sunshine for 14 hours) with free access to rodent chow and water in the Affiliated Hospital and allowed to acclimatize in metabolism cages for 1 week prior to experiment. The rats were divided into three groups: control, model, and SD administration groups (CG, MG, and SG, resp.), with eight rats in each group. Prior to the experimentation, all the rats were acclimated for 7 days. On Day 1, the rats in MG and SG were intradermally injected with 0.1 mL CFA in the right posterior toe and the rats in CG were injected with 0.1 mL saline. After 7 days, the rats in MG and SG were injected with 0.1 mL CFA again. On Day 14, the rats in SG were administered SD at doses of 0.43 g/kg/day for 35 consecutive days, and on Day 49, all the rats were euthanized. Blood was collected from the hepatic portal vein and centrifuged at 3500 rpm for 10 min at 4 ∘ C. The supernatants were immediately frozen, stored at −20 ∘ C, and thawed before analysis. Arthrodial cartilage was fixed in 10% formaldehyde for paraffin-embedding.

Biochemical and Histological
Analysis. Arthrodial cartilage was cut and processed for hematoxylin and eosin (H&E) staining. H&E staining was performed to examine the pathological changes of arthrodial cartilage, and the specimens were observed by microscopy. The levels of SOD, OH•, TNF-, and IL-6 were measured using a Multiskan FC Microplate Reader (Fisher Scientific, UK).

Serum Sample Preparation.
The serum samples were thawed before analysis, and 100-L aliquots were added to 400 L acetonitrile, followed by vortexing for 30 s and centrifugation at 12000 rpm for 10 min at 4 ∘ C. The supernatant was subsequently filtered through a 0.22-m filter membrane.

UPLC-MS Conditions.
A Waters Acquity UPLC system coupled with a Q-TOF Xevo G2-S high definition mass spectrometer (Waters, UK) was used for the metabonomics analysis. The Waters Acquity UPLC BEH C 18 Column (1.7m, 2.1 mm × 50 mm, Waters, USA) was maintained at 40 ∘ C with a flow rate of 0.4 mL⋅min −1 for the separation. The mobile phases were 0.1% formic acid in deionized water (A) and methanol (B). The gradient elution with B was performed according to the following schedule: 8-80% B for 0-3 min, 80-100% B for 3-6 min, 100% B for 6-8 min, and 100-8% B for 8-9 min, followed by maintaining at 8% B for 2 min. The sample injection volume was 10 L.
For the UPLC-high-definition MS (HDMS) analysis, the optimal conditions were the positive ion mode with source and desolvation gas temperatures of 100 ∘ C and 400 ∘ C, respectively. Nitrogen was used as the cone and desolvation gas at flow rates of 50 and 800 L/h, respectively. The capillary, cone, and offset voltages were set at 3.0 kV, 40 V, and 80 V, respectively. In the negative ion mode, the source and desolvation gas temperatures were 80 ∘ C and 150 ∘ C. Nitrogen was the cone and desolvation gas at flow rates of 0 L/h and ion modes with a lock spray interface. The data were collected in the continuum mode, and the lock spray frequency was set at 10 s.

Data Analysis.
A pooled quality control (QC) sample was prepared by mixing aliquots (20 L) of each sample to monitor instrument stability. Every day, after the instrument was calibrated, five QC samples were analyzed to test the stability of the instrument. The MassLynx V4.1 software was used for peak detection and alignment. After recognition and alignment, all the data acquired were normalized to the total ion intensity of each chromatograph.
The data matrix was established by aligning the peaks with the exact mass/retention time pair from each data file in the data set with their associated normalized peak areas. Then, the data matrix was analyzed using pattern recognition. Matlab 2012 software was applied for classification of data based on the KNN algorithm. EZinfo 2.0 software was used for OPLS-DA while an independent sample -test was performed using the statistical package for the social sciences (SPSS) version 17.0. HemI software was used to analyze the heatmap of the metabolites [15]. The Human Metabolome Database (HMDB), METLIN, and Kyoto Encyclopedia of Genes and Genomes (KEGG) metabonomics databases were used to identify potential biomarkers. The tandem MS (MS/MS) spectra were compared with MS/MS information from these databases to verify the structure of the putative metabolites. Figure 1 shows the serum biochemical parameters derived from each group. SOD is an important antioxidant enzyme that can scavenge free radicals in organisms. As indicated in Figure 1(a), the content of SOD decreased and the content of OH• increased in MG compared to those in CG, meaning that the antioxidant system in RA rats was compromised. After SD administration, the contents of SOD and OH• were restored, indicating effective antioxidant activity for SD.

Biochemistry and Histology.
Inflammatory mediators TNF-and IL-6 play an important role in the pathogenesis of RA. They are produced by infiltration of immune cells and excessive secretion of matrix metalloproteinases (MMPs). These mediators have a strong erosive effect on cartilage and bone joints [16]. As shown in Figure 1(b), the contents of TNF-and IL-6 increased significantly in MG compared to those in CG. However, in SG, the contents of TNF-and IL-6 decreased compared to those in MG, indicating that SD has an immunosuppressive effect. Figure 2(a) shows the arthrodial cartilage in CG and a large number of pannus (yellow circle in Figure 2(b)) can be seen in MG. Pannus is formed by plasma cells, macrophages, and lymphocytes and can release immunoglobulins and rheumatoid factor (RF). Pannus not only hinders bone's ability to obtain nourishment through the synovial membrane, but also releases a variety of inflammatory mediators and proteolytic enzymes to erode the articular cartilage, subchondral bone, ligament, and tendon tissue, causing destruction of articular cartilage, subchondral osteolysis, joint capsule relaxation damage, joint dislocation, joint fusion, and ossification [17]. After administration of SD (Figure 2(c)), the number and volume of panni decreased in SG, indicating that SD alleviated the symptoms of RA in rats.

Application of KNN Algorithm for Classification.
The metabolite characteristics were investigated using positive and negative ion modes. The serum base peak intensity (BPI) chromatograms of the CG, MG, and SG in the positive and negative ion modes are described in Figure 3. Despite the obvious differences in the chromatogram, the multivariate analysis distinguished the three groups more accurately than other analyses.
In this experiment, the KNN algorithm was used for classification of metabonomics data. Each group of data in the vector space model was expressed as a vector form. As shown in Figure 4, the calculation of the similarity (distance) of the two sets of data was converted into the calculation of the vector.
Methods for calculating similarity are Euclidean distance, Vector Inner Product, and Included Angle Cosine, and their formulas are as follows:  represent two different omics data, represents the weight value of the th feature of the th omics data, and represents the angle of the two omics data vectors ( can be seen as one sample, and can be seen as one biomarker in this sample). calculated value is smaller, the distance between the two omics data vectors is smaller, and the similarity of the two omics data is greater.

Vector Inner Product
where 1 and 2 are weights representing the corresponding feature in the omics data vectors 1 and 2 . The inner product represents the projection of a vector on another vector. When the inner product is used as the similarity formula and the inner product is larger, the similarity of the two omics data is greater.

Included Angle Cosine
Sim where 1 and 2 are weights representing the corresponding feature in the omics data vectors 1 and 2 . If the angle is smaller between the two vectors and the cosine value is larger, it is likely that the two groups of data belong to the same category. On the contrary, if the cosine value is smaller, the two groups of data are not likely to belong to the same category. In this paper, we use the angle cosine algorithm to calculate the similarity between two sets of metabonomics data.
In the KNN-based metabolomics data classification study, the cross-validation method is applied to complete the KNN model classification evaluation. Part of the sample is used as training set data, and the rest is used as testing data to calculate classification accuracy rate. The above process needs to be performed until every sample is predicted once and only once. The purpose of cross-validation is to obtain a reliable and stable model. The method of 4-fold cross-validation was used in this study. The group data set was divided into four sets, and the verification experiments were completed four times. One copy of the data was used to test the model in each experiment, while the remaining three data copies were used to train the model.
Ultimately, two KNN models were established by Matlab 2012 software (one model for positive ion mode, the other for negative ion mode). The data matrix contained 3253 dimensional data points in positive ion mode and 1045 dimensional data points in negative ion mode. Twenty-four samples were assigned to three categories. The classification accuracy rates were 95.8% (23/24) Figures 5(c) and 5(d), and the variables contributed highly to the differentiation situated at the edges of the plots. The spots, which showed VIP values > 1.0 and < 0.05, were considered to represent potential biomarkers. We identified 12 metabolites that showed significant differences between the CG and MG as potential biomarkers. These metabolites consisted of four and eight points selected based on the data of the positive and negative mode analyses, respectively. Among these metabolites, the levels of docosahexaenoic acid, stearic acid, eicosenoic acid, MG (0:0/14:0/0:0), calcitroic acid, uridine triphosphate, and guanosine diphosphate in the MG were higher than the levels in the CG. In addition, the levels of dynorphin B (6-9), guanosine pentaphosphate adenosine, bilirubin glucuronide, LysoPE (0:0/24:0), and alpha-tocotrienol in the MG were lower than those in the CG. However, after treatment with SD, uridine triphosphate, calcitroic acid, dynorphin B (6-9), and docosahexaenoic acid returned to normal levels. This observation indicates that SD may prevent the pathological process of RA rats by regulating the disturbed metabolic pathway of these four potential biomarkers. The heatmap of the potential biomarkers is described in Figure 6. All the biomarker information is summarized in Table 1.

Biological Relevance.
Uridine triphosphate (UTP) is the raw material for RNA synthesis. The combination of UTP and vitamin B12 has been reported to have a significant effect on the treatment of compressive neuralgias. Compared with vitamin B12 administration alone, the combination of UTP and vitamin B12 had more statistically significant  Evidence-Based Complementary and Alternative Medicine advantages in alleviating pain in patients with no serious adverse events during the study period [18]. Dysregulation of adipogenesis of bone marrow-derived stromal cells (BMSCs) and osteogenesis can lead to osteoporosis. After activating the P2Y2 receptor, UTP can retard the progression of osteoporosis through regulating the osteogenic and adipogenic differentiation of BMSCs [19]. Therefore, the dysregulation of UTP in the MG is likely to result in osteoarthropathy.
Calcitroic acid is the metabolite of 1,25(OH) 2 D 3 , which is the active form of vitamin D. 1,25(OH) 2 D 3 has many physiological functions such as increasing the absorption of calcium and phosphorus, promoting growth and bone calcification, maintaining normal levels of citrate in blood, and preventing the loss of amino acid in renal metabolism [20]. The immune response to antigens derived from Aspergillus fumigatus can cause the allergic disease, allergic bronchopulmonary aspergillosis (ABPA), which is accompanied by an increased interleukin-13 response in blood CD4+ T cells. 1,25(OH) 2 D 3 can inhibit this allergic response and improve the ABPA patient's condition [21]. In addition, 1,25(OH) 2 D 3 has immunomodulatory and anti-inflammatory activity. Research shows that 1,25(OH) 2 D 3 plays a crucial role in the progression of many autoimmune diseases such as rheumatoid arthritis, multiple sclerosis (MS), and osteoporosis [22]. Numerous studies have shown that the lack of sunlight and 1,25(OH) 2 D 3 will lead to MS; thus, 1,25(OH) 2 D 3 supplementation is a very effective treatment for MS patients [23]. Osteoporosis is highly correlated with atherosclerosis. The parallel progression of the two diseases increases coronary and fracture risks. 1,25(OH) 2 D 3 deficiency can greatly increase the risk of fracture and result in secondary hyperparathyroidism and coronary artery calcification [24]. The content of calcitroic acid increased significantly in the MG, indicating that the content of 1,25(OH) 2 D 3 decreased, which reduces the regulatory ability of the immune system.
Dynorphin B (DYN) is an endogenous opioid peptide (EOP) widely distributed in the central nervous system and peripheral nervous system. EOPs are selective for different receptors, and DYN is a type receptor ligand. Peripheral inflammation of RA causes an increase in the number of immune cells (T cells, B lymphocytes, macrophages, monocytes). The immune cells that migrate to the inflammatory region can synthesize large amounts of EOP. Meanwhile, inflammatory responses promote the synthesis of EOP receptors in the dorsal root ganglion (DRG) and transport these receptors to the peripheral nerve tip of the inflammatory region. After EOP activates the receptors, they can act as the analgesic for RA [25,26]. Studies have shown that in synovial tissue and immune cells of RA rats, EOPs were released in large quantities, and the expression of their receptors 8 Evidence-Based Complementary and Alternative Medicine increased in peripheral nerve tips. EOPs and their receptors also participate in the regulation of chronic inflammation [27]. Studies have found that injecting EOP receptor agonists in the knee reduces pain in patients with joint pain, while injecting EOP receptor antagonists in the knee after surgery increases the pain of the joints [28,29]. EOP may act on the EOP receptors in immune cells and regulate the function of immune cells, which eventually participate in the regulation of inflammatory factors. Levels of EOP are associated with the concentrations and mRNA expression of inflammatory factors such as interleukin-1 (IL-1 ), interleukin-6 (IL-6), and tumor necrosis factor (TNF-). These inflammatory factors can cause pain in the joints, fever, and edema. By utilizing the negative feedback regulation of inflammatory factors and decreasing the excitability of the sensory nerve, EOP regulates inflammatory pain [30]. The content of DYN significantly decreased in the MG, suggesting that it was complexed with the EOP receptor to inhibit inflammation caused by RA.
Docosahexaenoic acid (DHA) is a necessary polyunsaturated fatty acid for the human body. DHA has a variety of biological activities such as assisting brain cell development, slowing aging, improving blood circulation, and reducing blood lipids. In addition, DHA may protect against RA. It was reported that DHA could inhibit the proliferation and differentiation of bone marrow-derived macrophages (BMMs) and induce the apoptosis of mature osteoclasts. Eventually, DHA led to a reduction in the number of boneresorptive cells [31]. DHA can generate a new type of bioactive lipid mediator through biological derivatization. These endogenous mediators can act on specific G proteincoupled receptors (GPCRs) to inhibit inflammation [32]. The content of DHA significantly increased in the MG, indicating that the body showed a stress response to the inflammation caused by RA.
Dysregulation of UTP, calcitroic acid, DYN, and DHA levels could lead to a series of changes in related metabolic pathways in the body. However, after administration of SD, their levels were close to normal, indicating that SD successfully ameliorated RA by regulating the levels of these four potential biomarkers.

Conclusions
In summary, SD showed a good therapeutic effect on RA rats induced by adjuvant. The KNN algorithm was used to classify metabonomics data and achieved a high classification accuracy in both positive and negative spectra. The results illustrated that the KNN model in this study was reliable. In total, 12 potential biomarkers were identified, of which UTP, calcitroic acid, DYN, and DHA were considered correlated with the therapeutic effect of SD in RA rats. Further investigation is required to determine the relationship between these four biomarkers and to infer the complete metabolic pathways of SD for RA treatment.

Conflicts of Interest
The authors declare that they have no conflicts of interest