Identification of Metabolite Markers Associated with Kidney Function

Background Chronic kidney disease (CKD) is a global public health problem. Identifying new biomarkers that can be used to calculate the glomerular filtration rate (GFR) would greatly improve the diagnosis and understanding of CKD at the molecular level. A metabolomics study of blood samples derived from patients with widely divergent glomerular filtration rates could potentially discover small molecule metabolites associated with varying kidney function. Methods Using ultrahigh-performance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS), serum was analyzed from 53 participants with a spectrum of measured GFR (by iohexol plasma clearance) ranging from normal to severe renal insufficiency. An untargeted metabolomics assay (N ¼ 214) was conducted at the Calibra-Metabolon Joint Laboratory. Results From a large number of metabolomics-derived metabolites, the top 30 metabolites correlated to increasing renal insufficiency according to mGFR were selected by the random forest method. Significant differences in metabolite profiles with increasing stages of CKD were observed. Combining candidate lists from six other unique statistical analyses, six novel, potential metabolites that were reproducibly strongly associated with mGFR were selected, including erythronate, gulonate, C-glycosyltryptophan, N-acetylserine, N6-carbamoylthreonyladenosine, and pseudouridine. In addition, hydroxyasparagine were strongly associated with mGFR and CKD, which were unique to this study. Conclusions Global metabolite profiling of serum yielded potentially valuable biomarkers of different stages of CKD. Additionally, these potential biomarkers might provide insight into the underlying pathophysiologic processes that contribute to the progression of CKD as well as improve GFR estimation.


Introduction
With an increasing elderly population and prevalence of obesity and diabetes, chronic kidney disease (CKD) has become a major public health concern, affecting approximately 10% of the population, posing a massive financial burden on health-care systems, and substantially increasing the risk of cardiovascular morbidity and mortality by at least 8-10 times compared to the general population [1][2][3]. Biomarkers offer the potential to distinguish etiologies of CKD, uncover the diagnosis at an earlier stage, and discern patients who respond to treatments from nonresponders.
Creatinine is a well-established biomarker to assess kidney function [4]. However, it has limited sensitivity in the early detection of CKD [5,6], and its use to estimate the glomerular filtration rate (GFR) [7] can be influenced by sex, age, and muscle mass. Because GFR is fundamental in assessing kidney function, and blood metabolite concentrations are known to be dependent on kidney function, a metabolomic approach to identify a metabolite signature could potentially provide remarkable insight into CKD pathogenesis and management. From a procedural perspective, a well-accepted current reference standard of measured glomerular filtration rate (mGFR) would be critical for comparison and validation. Ideally suited for this is measuring the clearance rate of the exogenous filtration marker iohexol; it is safe, straightforward, reliable, and inexpensive [8,9].
Using a panel of filtration markers can improve precision, reduce errors caused by variation in each marker's non-GFR determinants, and decrease the need to use race and clinical characteristics as surrogates for the non-GFR determinants [10,11]. Our study was aimed at identifying new metabolite biomarkers to optimize the measurement of GFR that perform equal to or better than creatinine.
Metabolomics is an omics technology that is a process to identify and quantitatively evaluate all small molecule metabolites among different types of biological samples such as serum, tissue, and urine. This technology is perfectly designed as a tool to discover novel glomerular filtrationrelated blood metabolite biomarkers that can be used in calculating the glomerular filtration rate (GFR). In the present study, GFR was measured with the plasma clearance rate of iohexol [12][13][14], and concurrently, estimated GFR (eGFR) was assessed based on serum creatinine and cystatin C levels [7] (a biomarker that accurately estimates GFR and reportedly can predict future risk of end-stage renal disease and death) [15]. Therefore, our metabolomic analysis of a wide range of metabolites could be correlated with mGFR to focus on potential novel filtration biomarkers with the aim of improving the estimation of GFR.
Blood metabolite levels are altered in CKD progression, prompting investigation utilizing metabolomics technologies that have led to the identification of new biomarkers [16][17][18][19][20]. The goal of the present study was to identify and replicate novel and known metabolites that have been reproducibly associated with mGFR and to characterize the metabolome associated with kidney function.

Study Participants.
The study was comprised of 53 participants (19 females, 35.8%) with varying degrees of renal dysfunction. The CKD diagnosis was based on the NKF-K/ DOQI guideline. The inclusion criteria were as follows: (1) age > 18 years and (2) voluntary CKD patient participation. Exclusion criteria were as follows: (1) acute kidney injury; (2) dehydration, congestive heart failure, obvious peripheral edema, and other severe fluid balance disorders; (3) physical disability and skeletal muscle atrophy; (4) urinary tract obstruction; (5) those who had recently taken the following drugs and could not suspend their use: aspirin, nonsteroidal anti-inflammatory drugs, cimetidine, or ranitidine; (6) allergy to iodine contrast agents; (7) thyroid disease; (8) pregnancy or breastfeeding; (9) cancer; and (10) dialysis. The participants were tested in a nonfasting state and received a single 5 mL infusion of iohexol (300 mg/mL, GE Healthcare, Shanghai, China), and its plasma clearance was calculated to measure GFR (mGFR) [14]. Blood samples were drawn from the contralateral upper extremity at specific time points to perform untargeted metabolomics assays (N ¼ 214) using ultrahigh-performance liquid chromatography-mass spectrometry, conducted at the Calibra-Metabolon Joint Laboratory (Hangzhou, China) using Metabolon's HD4 Discovery untargeted metabolomics platform in March 2021. The local ethics committee approved the protocol (KWH 2018-001).

Metabolomic
Analysis. The untargeted metabolomics analysis was carried out at the Dian Calibra-Metabolon Joint Metabolomics Laboratory (Hangzhou, China). Each sample was assayed using four different UPLC-MS/MS methods. Liquid transfer was processed using a Hamilton automated MicroLab STAR® system (Hamilton, Switzerland) whenever possible. After adding a methanol-based metabolite extraction solution to each sample, the mixture was shaken vigorously for two minutes on a GenoGrinder 2010 (Spex SamplePrep, USA) shaker. Denatured proteins and other debris were removed by centrifugation. The resulting supernatant containing the extracted metabolites was aliquoted to four fractions corresponding to the four UPLC-MS/MS assays: two fractions were used for reversed-phase (RP) UPLC-MS/MS analyses in positive ion electrospray ionization (ESI) mode; one fraction was used for RP/UPLC-MS/ MS analysis in negative ion ESI mode; one fraction was used for hydrophilic interaction chromatography (HILIC)/ UPLC-MS/MS in negative ion ESI mode. Each fraction was dried under nitrogen gas flow and then reconstituted in a solution suited for each UPLC-MS/MS method. The raw mass spectrometry data were processed using in-house developed software. Metabolite identification was realized by matching experimental ion features to in-house library entries obtained from reference standard compounds. The three matching criteria include retention time index (RI), molecular ion mass to charge ratio (m/z), and MS/MS spectral data. For identification with high confidence, strict matching windows were applied to RI and m/z, and both the MS/MS forward and reverse matching scores between experimental data and standard compound entries were considered.

Statistical Methods.
Values are expressed as the mean ± standard deviation (SD). Mean values and proportions were compared using one-way ANOVA and chi-square tests, respectively. A significance level of p < 0:05 was utilized in all tests, and SPSS-22 was used for these analyses. Principal component analysis (PCA) was conducted using R software. PCA is a dimension reduction technique that allows differences between many variables to be represented by a smaller number of variables.
A random forest was used to select metabolites that contributed the most to the group distinction. In addition, we used six well-known feature selection statistical methods, namely, least absolute shrinkage and selection operator (LASSO), optimal least absolute shrinkage and selection operator (Opt-LASSO), smoothly clipped absolute deviations (SCAD), iterative sure independent screening (ISIS), robust rank correlation-based screening (RRCS), and partial least squares (PLS), to further select the 30 most important metabolites in explaining the mGFR, which were implemented with RStudio [20].

Demographic and Clinical Characteristics of the Study
Population. Fifty-three samples were divided into four groups. The control group (normal renal function) contained 5 samples with mGFR > 90 mL/min/1.73 m 2 ; the mild kidney dysfunction group contained 15 samples with 60 ≤ mGFR < 90 mL/min/1.73 m 2 ; the moderate nephropathy group contained 11 samples with 30 ≤ mGFR < 60 mL/min/ 1.73 m 2 ; and the severe nephropathy group contained 22 samples with mGFR < 30 mL/min/1.73 m 2 .

Global Metabolite Determination and Significantly
Altered Biochemicals. The study dataset comprised 1094 compounds with known biochemical properties. A subset of these metabolites was identified, and correlation with renal function demonstrated significant differences by CKD stage progression (p < 0:05). A large number of metabolites changed significantly when mGFR decreased. For example, when comparing the severe nephropathy group with the normal control group, 51.7% of the detected metabolites (566 out of 1094) changed significantly (p < 0:05) ( Table 2). The Venn diagrams also help to visualize the differentially expressed metabolites identified by different phenotype between the groups according to degree of renal function ( Figure 1).

High Level of Metabolite Overview.
With the detected metabolites as the variables, PCA permitted visualization of how individuals within a group cluster with respect to their data-compressed principal components. Figure 2 shows the PCA of serum samples color-coded according to renal function grouping. There was a clear separation between the groups, best displayed along the PC1 axis, with the normal kidney function group on the left, severe nephropathy on the right, and the mild and moderate nephropathy groups in the middle; these results indicated a significantly different phenotype between the groups (Figure 2).

Identification of TOP Ranking Metabolite Changes.
In addition to producing a metric of predictive accuracy, random forest analysis also produced an associated list of biochemical rankings in order of their importance to the classification scheme. Therefore, random forest analysis was used to identify metabolites that differentiated samples from the four groups, and a predictive accuracy of 80.8% was obtained in the serum dataset ( Figure 3) [21], compared to 25% by random chance alone. These results suggest that significant metabolic differences could be used to discriminate the four groups, with metabolites in the amino acid and nucleotide super pathways being of most importance for the three models.

Six Methods for Model Selection.
In addition to the random forest analysis, we used six screening methods to rank the importance of the 1094 metabolites. For each of the six statistical screening methods, we selected the top twenty markers which were the most strongly correlated with the mGFR. The top markers identified by a different method were quite different, but the first ten markers were concordantly identified by at least two methods. To further rank these top markers, we suggested two possibilities. First, we could use an integrated variable selection method. That is, the variables selected by at least k (further determined by cross validation) methods were considered as the key variables. Second, we rank the 60 variables by their Pearson correlation with the mGFR. This would seem reasonable as only a limited number of variables (60 of 1094) remained. Both are reasonable ways to provide a rank of the important markers.

Discussion
Fundamental to the evaluation of renal function is an accurate, reliable, straightforward, relatively inexpensive method of assessing GFR. The most common laboratory tests are serum creatinine and blood urea nitrogen from which GFR is estimated. However, more accurate estimation of GFR is needed and optimally would help differentiate pathogenesis and rate of progression of CKD. In the present study, metabolomic analysis of patients with CKD revealed many metabolites linked to changes in carbohydrate, amino acid, nucleotide, and lipid metabolism. Because the course of CKD is linked to changes in metabolism, these metabolites were investigated as possible biomarkers. The identification of these potential biomarkers could aid in analyzing the various pathophysiological changes that occur in CKD, as they could indicate early abnormalities in specific pathways. Metabolomics analyses can yield hundreds to thousands of metabolites from a single sample, necessitating rapid-throughput, high sensitivity, and resolution. Recently, advances in mass methodology have allowed comprehensive studies of metabolomics and its relationship with kidney function [22][23][24][25][26]. Thus, the present study employed such methodologies as the heatmap and plots, principal component analysis, and random forest analysis on the entirety of the dataset. Significant differences in metabolite profiles were demonstrated in the subgroups of patients representing increasing severity of CKD. The number of biochemicals increased with CKD progression, whereas only a small number were reduced, which might indicate stage-specific biomarkers of CKD. Additionally, we found many metabolites associated with mGFR, and we analyzed the metabolites that

Journal of Immunology Research
were most strongly related to mGFR by the random forest method (Figure 3).
Sekula et al. [27] reported 56 metabolites that were associated with eGFRcr, including six that consistently showed strong correlation with eGFRcr (pseudouridine, cmannosyltryptophan, N-acetylalanine, erythronate, myoinositol, and N-acetylcarnosine). Moreover, Coresh et al. [16] reported a list of metabolites that could serve as a panel of filtration markers, including pseudouridine, acetylthreonine, myo-inositol, phenylacetylglutamine, and tryptophan, and a high correlation with mGFR (including all of the above metabolites except N-acetylcarnosine).
Our study identified pseudouridine and erythronate as highly correlated with mGFR, consistent with previously reported results [16,27]. Both metabolites could be indicators of protein turnover as N-acetylation of amino acids. Pseudouridine is a derivative of uridine and is a modified nucleoside found in RNA. Importantly, pseudouridine might function as an ideal biomarker, being cited in the top 5 metabolites of the above studies; it is a stable indicator and not dependent on race.
Hydroxyasparagine were unique to the present study as biomarkers. Hydroxyasparagine, known as β-hydroxyasparagine (beta-hydroxyasparagine), is associated with mGFR and CKD and is a modified asparagine amino acid. However, little is known about this metabolite. It appears in posttranslational modifications of EGF-like domains that can occur in humans and other eukaryotes. The modified amino acid residue is found in fibrillin-1 [28].
In addition to searching for potential markers associated with GFR, we also investigated metabolites whose changes could be correlated with different levels of kidney function that would lend credibility to the results of our metabolomics study. Creatine kinase catalyzes the transfer of highenergy phosphate from ATP to creatine and the regeneration of ATP from creatine phosphate and ADP. In solution, creatine slowly and spontaneously cyclizes to creatinine, which is eliminated in the urine and can be used as a marker of kidney function. Creatinine has been commonly accepted as a marker of kidney filtration function. From our data, creatinine levels increased from the control group to the mild nephropathy group, the moderate nephropathy group, and the severe nephropathy group. A critical function of the kidney is to regulate electrolyte balance and fluid volume. Thus, with nephropathy, derangements in molecules necessary for osmotic regulation would be expected. As seen in Figure 3, increases in small molecules involved in osmotic regulation, such as erythronate, were observed with increasing nephropathy.
To discover and validate the novel metabolite markers related to glomerular filtration that could be used for improving eGFR, we combined the data analysis results from six different statistical screening methods, including SIS, LASSO, Optimal LASSO, SCAD, RRCS, and PLS, to determine any link between the metabolites and mGFR (Figure 4). The most frequently identified metabolites in all six methods also included several identified in the random forest ( Figure 3). From our analyses, corroborated by the The 30 top ranking biochemicals in the importance plot suggest key differences in Amino acids (13) Nucleotides (7) Carbohydrates (3) Lipids (3) Cofactors and vitamins (2) Increasing importance to group separation Mean-decrease-accuracy  Journal of Immunology Research results reported by Coresh et al. [16] and Sekula et al. [27], the metabolites with the highest potential to measure eGFR would be erythronate, gulonate, C-glycosyltryptophan, Nacetylserine, N6-carbamoylthreonyladenosine, and pseudouridine, and hydroxyasparagine were unique to the present study as biomarkers. Worth emphasis, however, we found that creatinine was indeed included in the top 10 most important metabolites, ranking number 1 overall by the six methods we utilized and ranking number 9 by random forest. In our study, the metabolomics results could be influenced by the type of kidney disease (e.g., inflammatory vs. noninflammatory), which makes it difficult to determine the precise cause of the differential regulation of biochemicals. However, the metabolomics study could quantitatively compare all small molecule metabolite concentrations (including well-known creatinine) based on the mGFR values and discover novel glomerular filtration-related blood biomarkers.

Conclusion
Initially, random forest analysis and six statistical models were used to identify potential glomerular filtration-related  biomarkers that demonstrated a strong correlation with mGFR. Six novel, potential metabolites that were reproducibly strongly associated with mGFR were selected, including erythronate, gulonate, C-glycosyltryptophan, N-acetylserine, N6-carbamoylthreonyladenosine, and pseudouridine. In addition, hydroxyasparagine were strongly associated with mGFR and CKD, which were unique to this study. We confirmed that creatinine remained an irreplaceable biomarker of kidney function. Future studies will need to increase the number of participants to validate the biomarkers identified in this study and investigate whether our 3-5 novel biomarkers could be used individually or in combination to more accurately measure GFR.

Data Availability
The datasets used to support the findings of this study are available from the corresponding author upon request.

Ethical Approval
This study was approved by the Institutional Review Board of Ethical Commission of Kiang Wu Hospital (KWH 2018-001) and conducted following the principles of Declaration of Helsinki.

Disclosure
The part of the manuscript (Figure 3) was posted as a preprint in the following link: https://www.researchsquare .com/article/rs-407149/v1.