Establishment of Relational Model of Congenital Heart Disease Markers and GO Functional Analysis of the Association between Its Serum Markers and Susceptibility Genes

Purpose. The purpose of present study was to construct the best screening model of congenital heart disease serum markers and to provide reference for further prevention and treatment of the disease. Methods. Documents from 2006 to 2014 were collected and meta-analysis was used for screening susceptibility genes and serum markers closely related to the diagnosis of congenital heart disease. Data of serum markers were extracted from 80 congenital heart disease patients and 80 healthy controls, respectively, and then logistic regression analysis and support vector machine were utilized to establish prediction models of serum markers and Gene Ontology (GO) functional annotation. Results. Results showed that NKX2.5, GATA4, and FOG2 were susceptibility genes of congenital heart disease. CRP, BNP, and cTnI were risk factors of congenital heart disease (p < 0.05); cTnI, hs-CRP, BNP, and Lp(a) were significantly close to congenital heart disease (p < 0.01). ROC curve indicated that the accuracy rate of Lp(a) and cTnI, Lp(a) and BNP, and BNP and cTnI joint prediction was 93.4%, 87.1%, and 97.2%, respectively. But the detection accuracy rate of the markers' relational model established by support vector machine was only 85%. GO analysis suggested that NKX2.5, GATA4, and FOG2 were functionally related to Lp(a) and BNP. Conclusions. The combined markers model of BNP and cTnI had the highest accuracy rate, providing a theoretical basis for the diagnosis of congenital heart disease.


Introduction
Congenital heart disease (CHD) indicates the presence of abnormality in heart and vascular structure and function at birth, the pathogenesis of which is complex. It is the interaction results of multiple factors like heredity and environment. The known risk factors include mental stimulation during pregnancy [1], harmful substances exposure [2], smoking and drinking [3], viral infections at early stage of pregnancy [4], diabetes mellitus [5], history of unhealthy pregnancy [6], and too high maternal age [7]. Its clinical consequences are extremely serious. It is the important cause of miscarriage, stillbirth, neonatal death, and children, adolescents, and adults with disabilities. The incidence of fetal CHD reaches as much as 6% to 10% [8] and continues to show a significant upward trend in China [9].
Currently, CHD is still cured by surgery. Many scholars believe that a number of indicators such as the level of serum C-reactive protein (CRP), brain natriuretic peptide (BNP), cardiac troponin I (cTnI), and Lipoprotein(a) (Lp(a)) can better reflect the functional status of the heart in patients with CHD and have good potential in clinical analysis. These proteins may serve as indicators in prognosis evaluation.
Since the United States has announced precision medicine plan, countries around the world have increased the support for precision medicine. With the enrichment and improvement of clinical big data and biological networks, it has become a general trend to complete interdisciplinary collaboration in disease prediction, diagnosis, and etiology analysis. In daily life, clinicians commonly use Logistic regression analysis to analyze the prognostic factors of the disease and estimate the probability of occurrence of variables [10]. Support vector machine (SVM) is a new machine learning method based on statistical theory. SVM is good at coping with linearly nonseparable sample data, which is achieved mainly through the slack variables (which are also called punishment variables) and kernel technology. It provides a unified framework in solving learning problems of finite samples [11].
Increasing studies show that the pathogenesis of congenital heart disease is related to certain transcription factors, while the relationship between the susceptibility genes and serological markers of congenital heart disease is not yet reported. With the rapid application of bioinformatics, Gene Ontology (GO) has become important tool and method in the field of bioinformatics. In terms of gene function annotation, GO plays a huge role. It can analyze the location of gene or protein in the cell, molecular functions, and biological processes involved; thus it simplifies the annotation of genes and their products as standardized vocabularies.
In this study, data of the susceptibility genes and clinical serology risk factors literatures of CHD were performed Meta-analysis to systematically evaluate them. By detecting levels of serum markers in patients with CHD, Logistic regression analysis, receiver operating characteristic (ROC) curve, and SVM approaches were used to evaluate the value of each serum marker in clinical diagnosis of CHD. The detection model of serum markers of this disease was then established. The functional relationship between susceptibility genes and serum markers was established by GO analysis. As a result, this study provides a theoretical basis for clinical practice and personalized treatment of cardiovascular disease.

Document
Retrieval. Google Scholar was a major source of Chinese documents; PubMed, EMBASE, MED-LINE, and MD consult were main sources of English documents and the Chinese or English key words were "congenital heart disease", "gene", and "mutation" as well as "congenital heart disease", "serum markers", and "diagnosis". The years of publication were from January 1, 2000, to October 31, 2014.

Statistical Analysis.
RevMan5.1 was used for metaanalysis of the included literature.
≥ 0.05 showed that the merge statistics of multiple studies had no statistical significance; < 0.05 indicated that the combined statistics were statistically significant.

Establishing Relational Model of CHD Markers Group
In this study, 80 CHD patients (33 with atrial septal defect, 36 with ventricular septal defect, 3 with patent ductus arteriosus, and 8 with tetralogy of Fallot) received treatment in the Department of Cardiac Surgery at our hospital from December 2009 to September 2014 (54 males and 26 females, aged from 7 days to 59 years) and 80 healthy outpatients as determined by a physical examination given at the hospital (38 males and 42 females, aged 3.6 months to 51 years) were selected as the subjects. Patients in case group were confirmed by echocardiography and (or) surgery, and the following cases were excluded: (1) renal insufficiency, chronic liver disease, and acute and chronic infectious diseases; (2) systemic lupus erythematosus, rheumatoid, and other immune system diseases; and (3) infectious endocarditis, rheumatic heart disease, cardiac tumors, myocarditis, and other types of heart disease. Healthy control group denied a family history of CHD. They were confirmed to have no cardiac dysfunction and organic diseases by physical examination and echocardiography. Infection, trauma, autoimmune diseases, cancer, and so on were also excluded.
10 mL of venous blood was collected from all study subjects in the morning after 12 h overnight fasting and put into the EDTA anticoagulant tube. Samples were centrifuged within 2 h at 3 000 r/min for 10 min, and then the supernatants were collected.

Sample
Testing. Serum BNP level was detected using enzyme-linked immunosorbent assay (ELISA). Serum hs-CRP was examined using immune rate nephelometry. Immunofluorescence method was used to determine serum cTnI level. ELISA double-antibody sandwich assay was adopted to test serum Lp(a) level. Detection methods were carried out in strict accordance with the kit instructions. Each sample received parallel testing twice and the average value was regarded as final test results.

Establishing Relational Model of CHD Markers Group
Based on Logistic Regression Analysis. Serum markers BNP, hs-CRP, cTnI, and Lp(a) levels of CHD patients and healthy control group undergone Logistic regression analysis with the new variables of Logistic regression model as test variables and the pathological diagnosis results as state variables; the ROC curve was drawn. According to the value of the area under the curve (AUC) of ROC and diagnostic accuracy, its application value in early diagnosis of CHD was evaluated.

Establishing Relational Model of CHD Markers Group
Based on SVM. Data of the 80 CHD patients were treated with normalization processing. The establishment, training, and validation of SVM model were achieved through MAT-LAB programming.
Computational and Mathematical Methods in Medicine 3 2.3.5. Statistical Analysis. The data obtained undergone significance of difference analysis using statistical software SPSS19.0 and the data were expressed by the following: mean ± standard deviation. < 0.05 indicated that the difference was statistically significant.

Bioinformatics Functional Analysis of Serum Markers
Lp (

Meta-Analysis of Susceptibility Genes and Serum Markers.
There were 176 documents about susceptibility and 216 documents about serum markers for initial survey after screening, there were 19 documents about susceptibility [12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31], and 20 documents about serum markers [32][33][34][35][36][37][38][39][40][41][42][43][44][45][46][47][48][49][50][51] were eventually included for meta-analysis. Meta-analysis results of susceptibility genes and serum markers are shown in Tables 2 and 3. The heterogeneity test result of susceptibility genes NKX2.5 and FOG2 was > 0.05, indicating the consistency of the literatures was well, so fixed effect model was used to pool the data. The heterogeneity test result of GATA4 was < 0.05, suggesting that heterogeneity existed between the literatures, so the random effect model was adopted. The upper and lower limit of pooled SMD and 95% CI were greater than 1, indicating that the correlation between the mutation of three genes and congenital heart disease was statistical significance. The heterogeneity test result of three serum markers was < 0.05, indicating that heterogeneity existed between literatures, so the random effect model was adopted. The upper and lower limit of pooled WMD and 95% CI were all greater than 0. Additionally, 95% CI transverse lines of three serum markers fell to the left side of the invalid vertical lines, suggesting that the incidence rate of the experimental group was bigger than that of the control group. Specific meta-analysis results are shown in Additional Files 1-6 (see Supplementary Material available online at http://dx.doi.org/10.1155/2016/9506829).  Figure 1. As can be seen from the figure, the levels of cTnI, hs-CRP, BNP, and Lp(a) in the case group were significantly higher than those in the controls ( < 0.05).

Logistic Regression Analysis Results.
With cTnI, hs-CRP, BNP, and Lp(a) as independent variables and sick or not as the dependent variable, SPSS19.0 was used for dichotomy Logistic regression analysis. Univariate regression analysis results are presented in Table 4, which suggested that the relationship between Lp(a), BNP, and cTnI with CHD was statistically significant ( < 0.05). These three factors were then used for multivariate Logistic regression analysis. The results showed that the combination of these three factors was unfavorable for accurate diagnosis of CHD ( > 0.05, Table 5). Pairwise combinations of three factors were conducted for multivariate Logistic regression analysis and the results are presented in Table 6. It was indicated that the relationship between Lp(a), BNP, and cTnI with CHD had statistical significance ( < 0.05). The accuracy rates of combined predication of Lp(a) and cTnI, Lp(a) and BNP, and BNP and cTnI were 93.4%, 87.1%, and 97.2%, respectively.

Application Value Evaluation of Serum Markers on the
Detection of CHD. SPSS19.0 software was adopted to evaluate the application value of Lp(a), BNP, and cTnI combined detection of CHD. ROC curves are shown in Figure 2. The AUC of Lp(a) and cTnI, Lp(a) and BNP, and BNP and cTnI joint detection were 0.994, 0.981, and 0.999, respectively, showing a high application value.

Establishing Relational Model of CHD Serum Markers
Group Based on SVM. Serum markers cTnI, hs-CRP, BNP, and Lp (a) levels of 80 CHD patients and 80 healthy controls undergone attributive analysis. It was indicated that attributive analysis had significant classification and the data were consistent with the basic calculation requirements of SVM ( Figure 3).
The relational model of CHD serum markers group based on SVM was established. Then, the test data of 20 CHD patients and 20 healthy controls were input into it. The test results are shown in Figure 4. The hollow circles represent the     target output; " * " is the actual simulation output of SVM. As can be seen from the figure, the diagnostic accuracy of the model was 34/40 = 85%.

GO Functional Annotation Results Comparison between Susceptibility Genes and Serum Markers of CHD.
After comparing the GO functional annotation results of susceptibility genes NKX2.5, GATA4, and FOG2 and serological indicators hs-CRP, Lp(a), BNP, and cTnI, it was found that NKX2.5, GATA4, and FOG2 had same GO functional annotation with Lp(a) and BNP. The functional relations between three susceptibility genes and BNP were mainly in gene expression and metabolic process. The internal connections between Lp(a) and NKX2.5, GATA4, and FOG2 were mainly in function, especially in the aspects of Lipoprotein transmembrane transport and blood circulation. The same GO functional annotations of them are shown in Tables 7-9.

Relative Expression Contents of Susceptibility Genes in mRNA Level.
Real-time fluorogenic quantitative PCR was used to detect the expression levels of susceptibility genes NKX2.5, GATA4, and FOG2 in mRNA. 2 −ΔΔCt was used to calculate the relative expression levels, and the results were 0.59 ± 0.18, 0.47 ± 0.14, and 0.33 ± 0.09, respectively. If the content of the control group was 1, the relative expression levels of NKX2.5, GATA4, and FOG2 in the case group were 0.59 ± 0.18, 0.47 ± 0.14, and 0.33 ± 0.99, respectively ( Figure 5). The expression levels of susceptibility genes NKX2.5, GATA4, and FOG2 in the case group were obviously lower than those in the controls. The results of serum indexes detection showed that Lp(a) and BNP levels in the case group were significantly higher than those in the controls (Figure 1). Thus it can be inferred that the unusual increase of serum Lp(a) and BNP levels may be related to the abnormal expression of NKX2.5, GATA4, and FOG2 genes.

Discussion
CHD is the most common congenital malformation at present and also the leading cause of infant death. Many factors interact with each other temporally and spatially in the development of heart. The combined actions of hereditary and environmental factors in embryonic phase will lead to the dysplasia of heart. Due to the complex genetic mechanism of CHD, the reason resulting in the malformation of heart is still unclear. The type of CHD is diverse, which has become a big problem in the treatment and prevention of CHD. In this study, meta-analysis found that the mutation of NKX2.5, GATA4, and FOG2 genes played an important role in the development of CHD. The mutation of NKX2.5 McElhinney et al. [52] reported that the mutation of exon 1 of NKX2.5 gene existed in various CHD. The pathological and physiological effects of GATA4 gene related to heart development have been extensively researched. Garg et al. [53] have verified that GATA4 gene mutation is one of the causes of CHD for the first time by the molecular genetics research on two independent and simple CHD families. FOG2 gene is a transcription factor with early expression in the process of heart development. Its interaction with GATA4 runs through the entire process of heart development. FOG2 plays an essential role in the development process of heart [3]. Both Tan and De Luca found a mutation in FOG2 gene exon from patients with double-outlet right ventricle combined ventricular septal defect [30,31]. This paper found that serum markers cTnI, hs-CRP, and BNP were related to CHD and they can predict the occurrence of the disease. Guo [32] believed that the changes in serum levels of cTnI were of great value in understanding the state and prognosis of CHD. However, researches on the relationship between Lp(a) and CHD were much rare, and Lp(a) did not meet the condition of meta-analysis, so we could not perform analysis of this factor. By examining the levels of cTnI, hs-CRP, BNP, and Lp(a) of 80 CHD patients and 80 healthy control subjects, this study showed that the levels of cTnI, hs-CRP, BNP, and Lp(a) in the case group were significantly higher than those in the controls, and the difference was statistically significant. Geiger et al. [54] found that, compared to the non-CHD subjects, BNP level of CHD children was obviously increased.. Similarly, Akhabue et al. [55] also believe that the difference of BNP concentration between CHD children patients and non-CHD children was significant. A number of studies show that the relationship between LP(a) and atherosclerotic disease was close, and the increased LP(a) is an independent risk factor of cardiovascular events [56][57][58][59]. Guo [32] has shown that serum cTnI level in patients with CHD was significantly higher than that in normal people. Logistic regression analysis showed that there existed significant correlations between cTnI, BNP, Lp(a), and CHD. When performing combined diagnosis, cTnI, BNP, and Lp(a) pairwise binding were associated with CHD. According to the joint detection ROC curve, it was found that the pairwise combination AUC of cTnI, BNP, and Lp(a) were greater than 0.9, and the accuracy rates were higher than 87%. The bigger the data is, the better the effect is when using Logistic regression model. SVM in contrast has a higher accuracy rate as to small sample size.
Recent studies showed that GATA4 and GATA6 can collaborate and regulate the expression of brain natriuretic peptide (BNP). The deletion of any factor of GATA will lead to the downregulation of BNP level [60]. Other studies indicated that NKX2.5 and FOG2 could cooperate with GATA4, all of which play an important role in the normal process of heart development [61,62]. As an independent protein molecule having a specific antigenicity, the metabolic pathways of Lp(a) is completely different from other apolipoproteins. It can interfere with lipid metabolism and the fibrinolytic system and then play an important role in cardiovascular diseases     like thrombosis and atherosclerosis [63,64]. Studies have shown that Lp(a) is an independent risk factor for myocardial infarction, coronary heart disease, and other cardiovascular diseases [65][66][67][68], but few researches are conducted on the relationship between Lp(a) and CHD. At present, it is not reported which transcription factor Lp(a) is regulated by. By bioinformatics analysis, this study showed that there were the same GO functional annotations between susceptibility gene NKX2.5, GATA4, and FOG2 and Lp(a) and BNP. The links between susceptibility genes and BNP existed mainly in gene expression and metabolism. Lp(a), especially in Lipoprotein membrane transport and blood circulation, was intrinsically linked to NKX2.5, GATA4, and FOG2. This paper conducted a study on the mRNA relative expression levels of susceptibility genes, Lp(a) and BNP. It was indicated that the levels of NKX2.5, GATA4, and FOG2 of the case group were significantly lower than those of the controls. The contents of Lp(a) and BNP of the case group were significantly higher than those of the controls, suggesting that the abnormal expression of susceptibility genes may lead to the increase of BNP level. However, the mechanism which causes the abnormal expression of Lp(a) is still not clear, so further study is required. This also gives us a direction on the in-depth study of CHD.

Conclusions
In conclusion, as risk factors associated with CHD, cTnI, CRP, BNP and Lp(a) also have functional relation with susceptibility genes; therefore, they may provide a basis for the clinical detection of CHD, but its specific application still requires a lot of clinical cases data to train and optimize, thus making it more accurate. Clinical auxiliary testing model is only as an auxiliary tool at the early stage and cannot completely replace an experienced clinician's diagnosis. The clinical diagnosis of CHD still needs to integrate all aspects of judgments.

Ethical Approval
We certify that this study has followed the Declaration of Helsinki (1964).

Consent
All subjects have given their written informed consent.