The Effect of Multiple Single Nucleotide Polymorphisms in the Folic Acid Pathway Genes on Homocysteine Metabolism

Objective. To investigate the joint effects of the single nucleotide polymorphisms (SNPs) of genes in the folic acid pathway on homocysteine (Hcy) metabolism. Methods. Four hundred women with normal pregnancies were enrolled in this study. SNPs were identified by MassARRAY. Serum folic acid and Hcy concentration were measured. Analysis of variance (ANOVA) and support vector machine (SVM) regressions were used to analyze the joint effects of SNPs on the Hcy level. Results. SNPs of MTHFR (rs1801133 and rs3733965) were significantly associated with maternal serum Hcy level. In the different genotypes of MTHFR (rs1801133), SNPs of RFC1 (rs1051266), TCN2 (rs9606756), BHMT (rs3733890), and CBS (rs234713 and rs2851391) were linked with the Hcy level adjusted for folic acid concentration. The integrated SNPs scores were significantly associated with the residual Hcy concentration (RHC) (r = 0.247). The Hcy level was significantly higher in the group with high SNP scores than that in other groups with SNP scores of less than 0.2 (P = 0.000). Moreover, this difference was even more significant in moderate and high levels of folic acid. Conclusion. SNPs of genes in the folic acid pathway possibly affect the Hcy metabolism in the presence of moderate and high levels of folic acid.


Introduction
The folic acid pathway is essential for hundreds of intracellular transmethylation reactions including DNA methylation and DNA synthesis, processes that are closely related to homocysteine (Hcy) metabolism [1,2]. Folic acid deficiency and abnormal metabolism of folic acid and Hcy not only play an important role in neural tube defects (NTDs) [3], but also are key factors for congenital heart disease, cleft lip and palate, late pregnancy complications, premature labor different kinds of neurodegenerative and psychiatric diseases, and cancer [3][4][5][6][7][8][9][10]. It was reported that folic acid preconceptional supplementation is effective for NTD prevention. However, it remains unclear whether 30-50% of cases are still unpreventable with folic acid supplementation [2].
Variations in genes that play key roles in the folic acid cycle have been widely investigated, where single nucleotide polymorphisms (SNPs) have been found to be associated with folic acid and Hcy metabolism. The variants of MTHFR and RFC1 were found to interact with Hcy levels [11,12], while the combined effects of the 2 MTHFR polymorphisms (rs1801133 and rs1801131) were found to be associated with Hcy concentration [13]. However, studies have been limited to one or a few SNPs with joint effects [11,[14][15][16].
With the rapid development of ongoing high-throughput human gene sequencing and bioinformatics, abundant SNPs can be identified. Support vector machines (SVMs) are a classic supervised machine learning algorithm typically used for classification and regression analysis. SVM has been widely used in solving biological problems, including gene function prediction, gene expression data analysis, and even cancer diagnosis [12,17,18]. It has also been applied to the analysis of SNPs.
The aim of the study is to identify the genotypes of 18 SNPs using MassARRAY and to investigate an association between the cumulative effects of 18 SNPs in the 9 genes of the folic acid pathway and Hcy metabolism.

Study Population.
Four hundred pregnant women at 11-25 gestational weeks were enrolled in the study conducted at the Obstetrics and Gynecology Hospital of Fudan University, China, from April to May 2011. Women included in the study were not smokers, did not drink alcohol, had no chronic diseases, and were not taking prescription medications. Blood samples were taken from fasting subjects. The serum was separated for the measurement of the concentrations of folic acid and Hcy and the remaining blood clots were used for DNA extraction and genotyping. All individuals enrolled in this study signed the informed consent and the study was approved by the Ethics Committee of the Obstetrics and Gynecology hospital of Fudan University, China.

Biochemical Measurement of Serum Folic Acid and Hcy
Concentrations. Serum folic acid concentration was measured by a chemiluminescent microparticle immunoassay (Architect Folic acid Reagent; Abbott, Lisnamuck, Longford) using the ARCHITECT I Systems following the manufacturer's recommended protocols.
Serum Hcy measurements were carried out by Liquid Chromatography Coupled to Tandem Mass Spectrometry (LC/MS/MS) using an API 3000 LC/MS/MS system (Applied Biosystem) equipped with an electrospray ionization interface that was used in the positive ion mode ([M+H] + ) according to the manufacturer's instructions.

Identification of SNPs Using
MassARRAY. 18 SNPs in 9 folic acid pathway ( Figure 1) related genes that have been documented in the literature to be associated with Hcy related disease were selected in our population [19][20][21][22] based on the CHB data using the following criteria: MAF > 0.1 by the Haploview program (version 4.0) ( Table 1).
Genomic DNA was prepared from peripheral leukocytes using Relax Gene blood DNA System (Relax Gene; TIAN-GEN, Beijing, China). The genotypes were determined by the Sequenom MassARRAY MALDI-TOF system. Primer sequences of the 18 SNPs were shown in Supplementary

SVM Regression
Model. SVM regression model was used to analyze the relationship between the SNPs of genes in the folic acid pathway and the changes in serum Hcy concentration. To minimize the effects of folic acid concentration on Hcy concentration, the residual homocysteine concentration (RHC, RHC = actual Hcy concentration − predicted value with a liner regression function of folic acid) was used as the dependent variable. The predicted RHC was defined as the SNP scores. For the independent variables (features) of the SVM regression model, we coded each SNP as two independent variables. For example, if a SNP has three types, AA, Aa, and aa, its coding features would be only two independent variables, which are feature AA and feature Aa, following Table 2. Thus, the input space initially has 30 independent variables for 15 SNPs. Basically, the SVM regression model was implemented using "SMOReg" algorithm of Weka software package with default parameters, where C equals 1.0; the kernel is polynomial kernel with exponent value equals 1.0. Using all of the 15 SNPs to construct a data model would result in suboptimal accuracy because the different variables may contain overlapping information that disturbs the modelconstructing process and thus variable selection was needed. Variable selection out of the total 15 SNP variables was conducted using recursive addition and stepwise addition of the input variables. The basic idea of the method was that, beginning with two features, we tried to add features to the SVM regression model which would improve the correlation coefficient most and then add another feature until none of the added features could improve the model or the improvement was less than the 0.01 correlation coefficient (see van Looy et al. 's paper [23] for more details).
After features selection, a tenfold cross-validation model was used to assess the SVM regression model. In this method,  we divided the dataset into 10 subsets of approximately equal size and built the model ten times, each time leaving out one of the subsets as testing set and the others as training sets.

Statistical Analysis.
Statistical analyses were performed using SPSS (SPSS Inc., Chicago, IL, USA), version 16.0 for Windows. A value of <0.05 was considered statistically significant. The Hardy-Weinberg equilibrium constant was assessed using the chi-squared ( 2 ) test. Pairwise linkage disequilibrium of SNPs was estimated using Haploview. The square of the correlation coefficient ( 2 ) between markers was used to define linkage using the data from the study population. Linear regression was used for detecting the association between folic acid and Hcy concentration, while analysis of variance (ANOVA) was used for the association analysis between SNP and Hcy concentration with serum Hcy concentration as dependent variable, serum SNP genotypes as the fix factor, and folic acid concentration as covariance.  Table 1. The genotyping call rate for each SNP ranged from 96% to 100%. There was no evidence for linkage disequilibrium in our database ( > 0.05, Supplementary Figure 2). There was no significant difference in age, gestational weeks, parities, and pregnancies among each genotype of the 15 SNPs ( > 0.05, Supplementary Tables 2-5).

The Effects of SNPs on Serum Hcy Concentration Adjusted
for Folic Acid Concentration. MTHFR SNPs (rs1801133 and rs3737965) were associated with serum Hcy concentrations which were adjusted for folic acid concentration (Table 3). There were only two cases with homozygous MTHFR SNP (rs3737965). Due to the low frequency of variants for MTHFR SNP (rs3737965) polymorphism, it was not included in the data analysis. Figure 2 shows the effects of SNPs on Hcy concentration in the different genotypes of MTHFR (rs1801133) after the Hcy concentration was adjusted for folic acid concentration. The SNPs MTHFR (rs1801133) CC, RFC1 (rs1051266), and  TCN2 (rs9606756) were significantly associated with Hcy concentration (Figures 2(c) and 2(k)). A similar association was observed with the SNPs CBS (rs2851391) in MTHFR (rs1801133) CT and MTHFR (rs3733890) and CBS (rs234713) in MTHFR (rs1801133) TT (Figures 2(m), 2(i), and 2(n)).

SVM Model of Multiple SNPs and the Residual Hcy
Concentration. In the SVM regression, five SNPs were selected: MTHFR (rs1801133, rs1801131, and rs3737965), CBS (rs234713), and BHMT (rs3733890). The weights of the five SNP variables are shown in Table 4 and the relationship between SNP scores and residual Hcy concentration is shown in Figure 3. The correlation coefficient between RHC and SNP scores was 0.275 in training sets and only 0.247 in the cross-validation combined test sets (Supplementary Table 6). All subjects were divided into four groups according to the 25%, 50% and 75% of the SNP scores (−0.26, 0, and 0.2, resp.). The Hcy concentration was significantly higher in the group with SNP scores of more than 0.2 than that in groups with SNP scores less than 0.2 ( < 0.01, Figure 4(a)).
For those with folic acid levels more than 25% (13.1 ng/mL), a possible interaction between Hcy concentration and SNP scores was detected ( < 0.05) (Figures 4(c) and 4(d)). However, for those subjects with folic acid levels less than 25%, SNP scores appeared not to be associated with Hcy concentration (Figure 4(b)).

Discussion
We first used SVM regression to predict Hcy concentration from the SNPs of genes in the folic acid pathway after analysis of the joint effect between MTHFR (rs1801133) and other genes related to Hcy metabolism. The results revealed that the integrate SNPs scores of SVM were significantly associated with Hcy concentration, especially at moderate and high levels of folic acid. This finding suggests that the variations in the genes of the folic acid pathway may be an important contributor to Hcy related diseases in women with moderate and high folic acid levels from folic acid supplementation.
High-throughput techniques have spawned a mass of complex biological data. However, analysis of these data creates a bottleneck seen in current studies [30].
In our study, the joint effects of SNPs generated nonlinear, noisy, and complex data sets that also contained a great deal of irrelevant information. Despite the effects of serum folic acid concentration, we have successfully presented a SVM regression model that can evaluate the RHC by five SNPs of genes in the folic acid pathway as inputs. This model translated the complex SNP patterns into a simple output of SNP scores which was significantly related to the changes in Hcy concentration adjusted by serum folic acid concentration, and it was found that 5 out of 15 SNPs were useful as inputs. The RHC was constructed to eliminate the effects of folic acid, for folic acid itself can affect Hcy level.
It was suggested that the SVM model could be a potential algorithm for predicting Hcy related diseases.
Furthermore, we found that the Hcy concentration was significantly higher in the group with SNP scores of more than 0.2 than that in groups with SNP scores of less than 0.2, especially for those with folic acid level more than 25%. However, there were no changes in Hcy concentration detected for those subjects with folic acid levels less than 25%. Abnormal metabolism of Hcy is related to many diseases, such as congenital heart disease [5], cleft lip with or without cleft palate [4], and NTDs [6]. The causes of these diseases have not been identified under normal concentrations of folic acid although folic acid periconceptional supplementation can effectively prevent many diseases related to folic acid deficiency. Our study provides the evidence that the joint effects of SNPs in the folic acid pathway may play an important role in Hcy related diseases, especially under sufficient support of folic acid.
In conclusion, the joint effects of SNPs in genes that belong to the folic acid pathway can affect Hcy metabolism especially under normal and high levels of folic acid. Further research that includes a bigger sample size is needed to test this SVM model.