A Urine Metabonomics Study of Rat Bladder Cancer by Combining Gas Chromatography-Mass Spectrometry with Random Forest Algorithm

A urine metabolomics study based on gas chromatography-mass spectrometry (GC-MS) and multivariate statistical analysis was applied to distinguish rat bladder cancer. Urine samples with different stages were collected from animal models, i.e., the early stage, medium stage, and advanced stage of the bladder cancer model group and healthy group. After resolving urea with urease, the urine samples were extracted with methanol and, then, derived with N, O-Bis(trimethylsilyl) trifluoroacetamide and trimethylchlorosilane (BSTFA + TMCS, 99 : 1, v/v), before analyzed by GC-MS. Three classification models, i.e., healthy control vs. early- and middle-stage groups, healthy control vs. advanced-stage group, and early- and middle-stage groups vs. advanced-stage group, were established to analyze these experimental data by using Random Forests (RF) algorithm, respectively. The classification results showed that combining random forest algorithm with metabolites characters, the differences caused by the progress of disease could be effectively exhibited. Our results showed that glyceric acid, 2, 3-dihydroxybutanoic acid, N-(oxohexyl)-glycine, and D-turanose had higher contributions in classification of different groups. The pathway analysis results showed that these metabolites had relationships with starch and sucrose, glycine, serine, threonine, and galactose metabolism. Our study results suggested that urine metabolomics was an effective approach for disease diagnosis.


Introduction
Bladder cancer (BC) is a common malignant tumor disease of the urinary tract, and its incidence and mortality have always occupied the first place in the urinary reproductive system tumors. Due to the easy relapse characteristic, BC has been the focus of researchers to search tumor markers for the early diagnosis and postoperative evaluation to improve the survival rate of bladder cancer patients [1].
Metabonomics has been widely used in the research of diseases diagnosis [2], pharmacological [3] and toxicological mechanisms [4], plant and microorganism metabolism [5,6], and so on [7]. Two analysis methods were widely used in metabonomics. One is the metabonomics technology basing on nuclear magnetic resonance (NMR) [8][9][10], and the other is chromatography-mass spectrometry [11][12][13]. NMR technology has the advantages of fast analysis and simple sample preparation, while it also possesses the disadvantage of low sensitivity [14]. Chromatography-mass spectrometry technology mainly contains gas chromatography-mass spectrometry (GC-MS) [15,16], liquid chromatography-mass spectrometry (LC-MS) [17][18][19], and capillary electrophoresis-mass spectrometry (CE-MS) [20][21][22]. Among these technologies, GC-MS has been widely used in metabonomics studies owing to its high sensitivity, strong analysis ability, and possessing more mature commercial mass spectrum library [23]. e samples used in metabonomics commonly are biological fluids, such as urine [24,25], serum [26,27], interstitial fluid [28], and cerebrospinal fluid [29]. Due to the characteristic of weak volatility and poor thermal stability, many analytes in biological fluids, such as amino acids and organic acids, must be derivatized before GC-MS analysis. e development of modern derivatization technology has greatly promoted the application of GC-MS in metabonomics [30,31].
Some studies were proposed to discover biomarkers for BC [32][33][34][35][36] in recent years. Pasikanti et al. [37] have developed a noninvasive method for the diagnosis and surveillance of BC progression by using the GC × GC-MS method. Peng et al. [38] reported a chemical isotope labeling LC-MS metabolomics method based on the universal metabolome-standard method with low CV for all quantified metabolites. is method was used to screen potential biomarkers from urine samples for bladder cancer diagnosis. However, the current study for diagnosis of BC is mostly focusing on high-grade tumors. Discovering biomarkers which could characterize different stages of BC is of more useful in diagnosis and prognostics of bladder cancer [39]. Alberice et al. [40] followed up 48 urothelial bladder cancer patients by using urine metabolomics, and 27 metabolites were highlighted between different BC stages and recurrence. However, the BC patients following up is difficult, especially the early-stage BC samples collection is of a hard work since most BC patients are in high stage when diagnosed. erefore, in this study, the rat bladder cancer model was established, and the urine metabonomics was studied with GC-MS technology. e rat urine samples of the advanced stage, medium stage, and early stage of the bladder cancer model group and healthy group had been detected to establish the bladder cancer urine metabolic fingerprint, and then, the experimental data were analyzed with algorithm of random forests [41,42] and used for the preliminary exploration of the tumor markers of BC.

Rat Bladder Cancer Model.
All animal experiments were conducted according to the institutional guidelines of Medical College of Nanchang University. One hundred and eighty male SD rats (6 weeks old) weighing 150-160 g were purchased from Hunan Experimental Animal Co. LTD. (Hunan, China), with the license number of SCXK (Hunan) 2009-0004 and the qualified number of HNASLKJ20102985. e rats were randomly assigned to two groups, forty-five rats were in the control group and one hundred and thirtyfive rats were in the model group. Bladder tumors were induced by adding 0.05% BBN (N-butyl-N-(4-hydroxybutyl) nitrosamine) (Tokyo, Japan) to freely available drinking water after a week of adaptive period in the experimental animal room. BBN was continuously administered for 35 weeks in the experiment.

Histologic Examination.
After induced by BBN, three rats of the BC model group were randomly selected for histological examination every five weeks. e rats were anesthetized by intraperitoneal injection of ketamine (0.6 mL/50 g), and the rats were killed under deep anesthesia. After death, rats were catheterized and 0.2 mL buffered formalin was instilled into the bladder. e urethras were, then, ligated, and the bladders were removed. Fixed specimens were embedded in paraffin, and 4-5 μm thick horizontal slices in each rat were prepared at 2 mm intervals followed by routine hematoxylin and eosin staining. en, the prepared pathological sections were detected with optical microscopy using ordinary white light by the same pathologists.

Sample Preparation.
e rat urine samples of the control group and model group were collected with a metabolism cage for 24 h every five weeks. e urine samples were centrifuged immediately for 20 min at 4000 r/min to remove protein with a TDL-5-A low-speed tabletop centrifuge (Shanghai Anting Scientific Instrument Factory, China). en, 150 μL supernatant liquid was mixed with 100 μL of urease solution (2 mg/mL) in a 1 mL centrifuge tube and heated at 37°C in a DGG-9140BD constant temperature oven (Shanghai Senxin Experimental Instrument Co., Ltd., China) for 30 min to decompose urea. After adding 800 μL methanol, the mixture was homogenized for 1 min at 1800 r/min with a MS2 mini shaker (Guangzhou Yike Lab Technology LTM Co., China), followed by ultrasonic processing in an ice bath for 10 min with a KQ-100DE ultrasonic cleaner (Kunshan Ultrasonic Instruments Co., Ltd., China) and centrifuging for 10 min at 12000 r/min with a TDL-16G high-speed tabletop centrifuge (Shanghai Anting Scientific Instrument Factory, China). en, 500 μl supernatant liquid was transferred into a 2 mL centrifuge tube and evaporated to dryness under a gentle nitrogen stream, and then, 75 μL methoxyamine hydrochloride solution (20 mg/ mL in pyridine) was added into the tube to react for 1 h at 70°C. After the reaction, the mixture was cooled to room temperature and reacted with 75 μl BSTFA + TMCS (99 : 1, v/v) for 1 h at room temperature to form trimethylsilyl (TMS) derivatives. Finally, the reaction was terminated by the addition of 150 μL n-heptane (containing 0.1 g/L n-docosane, which was used as an internal standard substance), and the products of the derivative reaction were analyzed by GC-MS.

GC-MS Analysis.
GC-MS analysis was carried out using an Agilent 6890N Gas Chromatograph (Agilent Technologies, Palo Alto, CA, USA) integrated with an Agilent 7683 series autosampler and a 5973 I mass selective detector (MSD). e analytes were separated on a 30 m × 0.25 mm i.d. × 0.25 μm film thickness DB-5MS fused-silica capillary column (Agilent Technologies). e injector was set at 270°C, and the carrier gas was UHP helium at a flow rate of 1.0 mL/min. e samples were injected in a splitless mode. e oven temperature was initially at 70°C for 5 min; increased at a rate of 20°C/min up to 160°C, held for 4 min; increased at a rate of 10°C/min up to 300°C; and held for 1.5 min. e ion source, quadrupole, and transfer line temperature were set at 230, 150°C, and 280°C, respectively. e detector was operated at 70 ev in an electron impact (EI) mode with full scan (60∼600 amu). e solvent delay and injection volume were set as 5 min and 2 μL, respectively. All the data was collected and analyzed with MSD ChemStation D.01.02 software (Agilent Technologies), with the NIST02 mass database.

Selection of Extraction Solvent.
ere are many kinds of endogenous metabolites with high polarity difference and wide concentration range in rat urine. To obtain more information of endogenous metabolites, the extraction efficiencies of different organic solvents were investigated. It was found that methanol and acetonitrile performed better than others. Finally, methanol was chosen as the extraction solvent in this study because of the lower toxicity of methanol than acetonitrile.

Optimization of the Urea Decomposition Conditions.
e urine should be treated with urease to decompose the urea before GC-MS analysis, since the urea with high concentration in urine may affect other compounds analysis. Effects of several factors including the dosage of urease, decomposition temperature, and decomposition time were investigated. 50 µL, 100 µL, and 200 µL urease solution (2 mg/mL) was added to a 150 µL urine sample and reacted at 60°C, 37°C, and 20°C for 15 min, 30 min, and 60 min. en, its decomposition effect was examined. e results are listed in Supplementary Materials. Finally, the optimum conditions for urea decomposing in the 150 μL urine sample were set as dosage of urease solution (2 mg/mL), 100 μL; decomposition temperature, 37°C; and decomposition time, 30 min.

Optimization of the Multistep Temperature Program.
To obtain better separation and get more information of endogenous metabolites, four multistep temperature programs as shown in below were carried out on a DB-5MS capillary column in this study. Considering the number of chromatographic peaks and their separation efficiency, the third temperature-rising program was selected as the experimental condition. e total ion chromatogram (TIC) of the actual sample is shown in Figure 1, and there were more than 40 peaks obtained in 29 min with good separation efficiency.
(1) e temperature was initially at 85°C for 5 min, increased at a rate of 10°C/min up to 300°C, and held for 10 min (2) e temperature was initially at 100°C for 3 min, increased at a rate of 8°C/min, and held for 2 min (3) e temperature was initially at 70°C for 5 min, increased at a rate of 20°C/min up to 160°C, held for 4 min, increased at a rate of 10°C/min up to 300°C, and held for 1.5 min (4) e temperature was initially at 85°C for 5 min, increased at a rate of 8°C/min up to 205°C, held for 5 min, increased at a rate of 8°C/min up to 300°C, and held for 5 min

Urine Sample Analysis.
e different stages of bladder cancer rats were confirmed by the histopathology analysis. Because the rat bladder of fifteenth, twenty-fifth, and thirtyfifth week after BBN inducing displayed the typical characteristics of early, medium, and advanced stage tumors (shown in Figure 2), the urine samples collected in these three weeks were chosen as early-, medium-, and advanced-stage samples, respectively. e early, medium, and advanced stage of the bladder cancer model group and healthy group include 45 samples each, and all the samples were analyzed by GC-MS under the optimized conditions with 41 common peaks obtained. To identify the complex metabolites in urine, the metabolites were divided into three categories: amino acids, carbohydrates, and fatty acids. Because the NIST database has rich information of fatty acid derivatives but poor information of amino acid derivatives and carbohydrate derivatives, the fatty acid derivatives were identified directly with the NIST database, while the amino acids and carbohydrate derivatives were identified by standard compounds. e quantitative results of metabolites were given as the peak area ratio of the analyte to the internal standard of n-docosane. Finally, all the 41 common peaks were identified successfully, and the qualitative and quantitative results are listed in Table 1.

Experimental Data Analysis.
e classifiers commonly used in metabolomics include partial least squares discriminant analysis (PLS-DA), support vector machine (SVM), and random forest (RF) [43]. PLS-DA can reduce the impact of multiple correlations between variables, but it is easy to overfit the data, and the selected biomarkers are not robust enough. SVM can solve small sample classification, high-dimensional data classification, and nonlinear problems; however, it is more difficult to train large-scale samples and deal with multiclassification problems. Random forests (RF) algorithm was first proposed by Breiman in 2001 [44] and widely used [45,46] since it can distinguish the differences between different group samples effectively. RF is a supervised International Journal of Analytical Chemistry machine learning classifier, including a collection of classification and regression trees. It consists of many different decision trees, which are grown based on various guide samples. Each tree voted for the sample for classification, and RF chose the majority vote to determine the final classification result. It has good performance and has great advantages compared with other algorithms. RF can handle high-dimensional data (many feature data) and does not need to make feature selection. After training, random forest can screen out more important features. Compared with other Note. (a) e epithelial tissue was composed of 2-3 layers, and the cells were extremely obvious, without abnormality, and arranged in an orderly manner. (b) ere were papillary hyperplasia in the partial region, the epithelial cell had 4-6 layers, polarity had a little disorder, and the cell's morphology and size had a certain atypia. (c) e layer number of tumor cell increased significantly with ball-shaped distribution, the sizes of tumor cells were different, the nucleus was deeply dyed and showed polymorphism, the atypia was obvious, and some tumor cells showed the characteristics of squamous cell tumor differentiation. (d) e nucleus was deeply stained, the nuclear membrane was thickened, the nucleoli were obvious and showed pathological karyokinesis, and the muscularis was deeply infiltrated.  classification models, its biased estimation of classification results is low, which makes random forests applicable to many research fields. RF was adopted to classify metabolites among the four groups of rat urines, and the obtained multidimensional scaling (MDS) figure is shown in Figure 3. As shown in Figure 3, the differences among different groups were obviously in the classification plot. e cancer groups could be effectively distinguished from the healthy groups; moreover, the advanced-stage groups could also be distinguished from the early and middle-stage groups. However, the sample points of the early stage and middle stage are somewhat overlapped. e results suggested that the metabolic pathways of rat bladder cancer were obviously different from that of healthy rats, which were similar in the early and middle stage, but in the advanced stage, the metabolic pathways had changed significantly due to tumor deterioration and excessive nutrient consumption.
us, we further established three classification models, i.e., healthy control vs. early-and middle-stage groups, healthy control vs. advanced-stage group, and early-and middle-stage groups vs. advanced stage group, respectively. e classification results for three models are listed in Table 2. It is obvious that the healthy control vs. advanced-stage group had the best classification accuracy, indicating the signification differences of metabolic features between the healthy control and advanced-stage group; and the healthy control vs. early-and middle-stage groups also had a classification accuracy of 96.06%. ese results suggested that the proposed metabolomics approach can reflect the differences among different groups, with the progress of the disease. No: the serial number of the common peak; t a r : retention time. A x /A i : the ratio of the peak area of the analyte to that of the internal standard.

International Journal of Analytical Chemistry
During the classification model establishing, the importance of the metabolites was calculated. e metabolites with higher importance values in the classification have more contributions to clinical diagnosis of disease, which means these metabolites can be used as potential biomarkers for disease diagnosis, especially in the early diagnosis. e variable importance of metabolites for each classification model is shown in Figure 4. Glyceric acid, 2, 3-dihydroxybutanoic acid, N-(oxohexyl)-glycine, and D-turanose showed the highest variable importance (higher than 0.45, Figure 4), which are more likely to be useful markers for BC diagnosis.
Glyceric acid is an important intermediate in the lipid metabolism, which can be produced by the oxidation of fatty acids and the hydrolysis of phosphoglyceric acids (such as 2phosphoglyceric acid, 3-phosphoglyceric acid, and 1, 3bisphosphoglyceric acid). Phosphoglyceric acids are important intermediate products of the tricarboxylic acid cycle in the organisms and directly involved in the metabolism and transformation of energy, such as 1, 3-bisphosphoglyceric acid, which is a high-energy phosphate compound in vivo and can produce one molecule of ATP to the living body under the catalysis of the phosphoglycerate enzyme.
2, 3-Dihydroxybutanoic acid is related to the metabolic pathway of L-threonine and generated by the metabolites of L-threonine, which is a ketogenic amino acid, and its metabolites can directly join in the energy metabolism.
N-(oxohexyl)-glycine is one of the acyl amino acids, exactly acyl glycine, in organisms and usually produced in the metabolic process of fatty acids with very small quantity. Acyl glycine is usually produced under the catalysis of acyltransferase. e reaction is as follows: glycine + acyl-coenzyme A ⟶ acyl glycine + coenzyme A. Acetyl coenzyme A can directly provide a molecular dicarbon compound for the tricarboxylic acid cycle. Furthermore, the combination of oxaloacetate and acetyl coenzyme A is believed to be the initial step in the citric acid cycle. As a result, the abnormalities of the acyl glycine metabolism may affect the energy metabolism of cells and form the specific metabolic pathways of tumor.
All the abovementioned results indicated that the metabolic pathways of lipid and some amino acids changed significantly as the bladder tumor grew. us, based on these metabolites, the pathway analysis was implemented by using Metaboanalyte software. e pathway analysis results showed that starch and sucrose metabolism, glycine, serine, and threonine metabolism, and galactose metabolism have strong relationships with selected metabolites (shown in Figure 5).

Conclusions
Metabolomics is an effective approach to discover biomarkers by analyzing global changes in the metabolic profiles. To collect early-stage bladder cancer (BC) samples and follow-up the BC progress, the rat bladder cancer model was established by BBN inducing in this study. e metabolites in rat urine were detected with GC-MS and analyzed with random forests algorithm to distinguish the early, middle, and advanced stage of the bladder cancer group and healthy group. e results showed that urinary levels of some metabolites had a significant difference between the cancer group and the healthy group and advanced-stage group and the other two stage groups, which suggested that the growth of bladder tumor might result in the abnormality of the metabolism of lipids and some amino acids. Furthermore, glyceric acid, 2, 3-dihydroxybutanoic acid, N-(oxohexyl)glycine, and D-turanose with the highest variable importance might be the potential markers of bladder cancer, and their metabolic pathways were studied. Yet, the data reported here are preliminary and need to be confirmed by large scale of samples. Further studies should be required to value the significance of the four compounds as the potential marker in human urine for bladder cancer.
Data Availability e figure data and related data used to support the findings of this study are included within the article.  Figure S1: the total ion chromatograms (TIC) of samples with different extraction solvents. Figure S2: the total ion chromatograms (TIC) of samples with different dosages of urease. Figure S3: the total ion chromatograms (TIC) of samples with different decomposition temperature. Figure  S4