Machine Learning-Based Prediction Model of Preterm Birth Using Electronic Health Record

Objective Preterm birth (PTB) was one of the leading causes of neonatal death. Predicting PTB in the first trimester and second trimester will help improve pregnancy outcomes. The aim of this study is to propose a prediction model based on machine learning algorithms for PTB. Method Data for this study were reviewed from 2008 to 2018, and all the participants included were selected from a hospital in China. Six algorisms, including Naive Bayesian (NBM), support vector machine (SVM), random forest tree (RF), artificial neural networks (ANN), K-means, and logistic regression, were used to predict PTB. The receiver operating characteristic curve (ROC), accuracy, sensitivity, and specificity were used to assess the performance of the model. Results A total of 9550 pregnant women were included in the study, of which 4775 women had PTB. A total of 4775 people were randomly selected as controls. Based on 27 weeks of gestation, the area under the curve (AUC) and the accuracy of the RF model were the highest compared with other algorithms (accuracy: 0.816; AUC = 0.885, 95% confidence interval (CI): 0.873–0.897). Meanwhile, there was positive association between the accuracy and AUC of the RF model and gestational age. Age, magnesium, fundal height, serum inorganic phosphorus, mean platelet volume, waist size, total cholesterol, triglycerides, globulins, and total bilirubin were the main influence factors of PTB. Conclusion The results indicated that the prediction model based on the RF algorithm had a potential value to predict preterm birth in the early stage of pregnancy. The important analysis of the RF model suggested that intervention for main factors of PTB in the early stages of pregnancy would reduce the risk of PTB.


Introduction
Preterm birth (PTB) is defined as births before 37 completed weeks of gestation [1]. e PTB studied in this study was for 28-37 weeks of gestational age. Based on gestational age at delivery, PTB can be subdivided into very early preterm (<28 weeks), early preterm (28-31 weeks), moderate preterm (31-33 weeks), and late preterm (33-37 weeks) [2]. e global estimated prevalence of PTB was 11.1% (95% confidence interval [CI]: 9.1%-13.4%) [3]. e majority of PTB occurred in low-and middle-income countries [2], and the incidence of PTB in China was 6.9% in 2014 [4]. Although the incidence of premature birth was relatively low in China, PTB had a considerable impact on the health of pregnant women and children. Evidence shows that PTB was the most common cause of neonatal death and the second most frequent cause of death in children aged <5 years [5]. Further studies found that gestational age at delivery was inversely associated with the risk of neonatal morbidity and mortality [6], and about 35.00% of deaths among newborns were caused by complications of PTB [7]. Preterm neonates who survived were vulnerable to diseases, including pulmonary hypertension [8], retinopathy [9], visual and hearing impairments [10], and mental health problem [11]. Moreover, PTB not only caused death and diseases in the newborn, but also caused anxiety and depression in postpartum women [12]. Previous study showed that early screening of preterm birth pregnant women could reduce the incidence of preterm birth [13]. erefore, a prediction model was needed to predict PTB.
Currently, numerous studies have attempted to predict preterm birth in pregnant women. Several studies supported that sonographic measurement of cervical length (CL) could be used for the prediction of PTB in the first trimester of pregnancy [14,15], but other studies did not demonstrate the capability of CL in the screening of PTB [16,17]. Fetal fibronectin had extensively used to predict PTB, but the sensitivity and positive predictive value of fetal fibronectin were low [18,19]. In recent years, machine learning algorithms have been widely used in medicine with a better performance [20]. Compared with the logistic regression algorithm, the advantages of the machine learning were the ability to process higher-dimensional data and self-learn capacity [21]. Studies have shown that the use of machine learning algorithms improved the predictive accuracy of the prediction model for PTB [22,23].
ere are also some prediction models based on machine learning algorithms that have poor prediction accuracy. Weber et al. established a machine learning prediction model for preterm birth using demographic, maternal, and residency characteristics, but the predictive performance of the model was poor [24], which may be caused by inaccurate geographic information.
Inconsistent predictive power of machine learning in preterm birth. In this study, we try to use a new method to preprocess predictors. At the same time, we compared the predictive power of 6 machine learning algorithms in PTB.

Participants.
Data for this study were reviewed from 2008 to 2018. All the participants included in this study were collected from Haidian Maternal & Child Health Hospital. e inclusion criteria of the PTB group were as follows: (1) signed informed consent; (2) gestational age between 28 and 37 weeks; and (3) maternal age older than 18 years. e exclusion criteria of the PTB group are as follows: (1) missing maternal age; (2) missing gestational age; and (3) chronic diseases such as diabetes, hypertension, and heart disease. Controls were selected from hospitals in the same period in a 1 : 1 ratio. e inclusion criteria of controls were as follows: (1) signed informed consent; (2) gestational age ≥37 weeks; and (3) maternal age ≥18 years. Exclusion criteria are as follows: (1) missing maternal age; (2) missing gestational age; (3) and chronic diseases such as diabetes, hypertension, and heart disease. e flowchart of the study is shown in Figure 1.

Feature Processing.
Demographic factors (i.e., age), physical examination, blood test (red blood cells (RBC), white blood cell count (WBC), and plateletcrit (PCT)), urine test strip (urine pH, urine WBC, and glycosuria), and gynecological examination (bacterial vaginosis (BV), cleaning degree of vagina (CDV), and vaginal yeast infection (VYI)) were collected in our study. All participants had at least five antenatal check-ups before 27 weeks of gestation. For avoiding the overfitting of the model, variables that were measured multiple times were represented using the mean and mode, depending on the type of variable. With the increase in the gestational age, variables were more influence on the outcome. erefore, we gave more weight to the later data. e equation is defined as var 20 mean � average var week 1 , var week 2 , . . . , var week 20 , var i mean � average var i−2 mean , var week i−1 , var week i , i � 22, 24, 26, 27 weeks of gestation. (2) As shown in Figure 2, the variable processing process at each time point is determined by the values of the previous time point and the current time point. e dataset was divided into five datasets (20 weeks, 22 weeks, 24 weeks, 26 weeks, and 27 weeks of gestation dataset), according to the time of prenatal examination.

Outcome Measure.
In this study, 4 metrics were used to measure the predictive performance of the model: accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. e accuracy is the proportion of correct predictions among the total number of cases examined (1). Sensitivity refers to the test's ability to correctly detect true positive (2). Specificity relates to the test's ability to correctly detect true negative (4). AUC is a comprehensive measure of the sensitivity and specificity of the model: TP � true positive; FP � false positive; TN � true negative; FN � false negative.

Statistical Analysis.
e Kolmogorov-Smirnov test was used to test the normality of continuous variable. If the variable satisfies normal distribution, the mean ± standard deviation was used to describe the continuous variable. Categorical variables were shown as numbers and percentages. Because our data were collected from electronic medical records, there were missing values in the dataset. erefore, we excluded cases and variables that were missing more than 10%. For categorical variables, mode was used to fill, and for continuous variables, mean was used to fill. Comparison between the outcome groups was made by the chi-square test or Fisher's exact test for categorical variables and by the t-test or Wilcoxon test for continuous variables. e dataset was randomly divided into a training set (70%) and a test set (30%). e training set was used to train the model, and the test set was used to evaluate the model. Four indicators, the area under the curve (AUC), accuracy, sensitivity, and specificity, were used to measure the performance of the model. e importance of a variable was assessed by the decreased accuracy of the model after removing the variable. e higher the decreased accuracy of the model, the more important the variable. All statistical analyses were performed in R software (version 3.5.1) using the "e1071" (Naive Bayesian algorithm and support vector machine), "randomForest" (random forest tree), and "kknn" (K-means) packages. For all analyses, if the twotailed P value <0.05, the result was considered statistically significant.

Characteristics of Pregnant Women and Newborns.
A total 9550 of pregnant women (PTB: 4775, control: 4775) were included in our study. e mean ages of the PTB group were lower than those of the control group (PTB: 29.94 ± 5.39), control: 30.72 ± 4.00, P < 0.001). e gestation of pregnant women was 251.19 ± 11.51 days in the case group and 274.66 ± 7.15 days in the control group (P < 0.001). e gravidity and parity of pregnant women in the PTB group were lower than those in the control group (all P < 0.001). e weight and height of newborns in the control group were higher than those in the PTB group (all P < 0.001). e Apgar scores (1, 5, and 10 minutes) of newborns in the control group were higher than those in the case group (all P < 0.001). e characteristics of pregnant women and newborns were summarized in Table 1.

Prenatal Testing of Pregnant Women before 27 Weeks of Gestation.
In the biochemical analysis, albumin, aspartate transaminase (AST), total serum iron (TSI), magnesium (Mg), and triglycerides (TG) levels were higher in the PTB group than those in the control group (all P < 0.05). Meanwhile, the plasma glucose (fasting) is lower in the PTB group than that in the control group (all P < 0.05). Total biliary acid (TBA) and urea levels were higher in the PTB group than those in the control group (all P < 0.05). Platelet, intermediate cell, lymphocyte (LY), monocytes (MO), neutrophil granulocytes (NE), red blood cell distribution width-SD (RDW-SD), and WBC levels were higher in the PTB group than those in the control group.
Mean cell hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), and platelet distribution width (PDW) were lower in the PTB group than those in the control group. Waist size, fundal height, SBP, and DBP were higher in the PTB group than those in the control group. Fetal heart rate (FHR) in the PTB group was slower than that in the control group. Urine PH was higher in the PTB group than those in the control group. Pregnant women with blood type B were found to be more common in the case group than in the control group (Table 2). e results of prenatal testing at several other time points (20,22,24, and 26 weeks of gestation) were described in Supplementary Tables S1-S4.

Performance of Prediction Models.
Six algorithms (NBM, SVM, RF, ANN, K-means, and logistic regression) were used to build the model based on five datasets (20,22,24,26, and 27 weeks of gestation). Table 3 depicts the performance of the six types of models. e results showed that the AUC and the accuracy of the RF model based on 27 weeks of gestation were the highest compared with other algorithms (accuracy: 0.816; AUC � 0.885, 95% (confidence interval) CI: 0.873-0.897). e sensitivity and specificity of the RF model based on 27 weeks of gestation were 0.751 and 0.882. Meanwhile, there was positive association between the accuracy and AUC of the RF model and gestational age (Figure 4). e sensitivity of the NBM model based on 24 weeks of gestation was 0.837, but the specificity was only 0.515. e specificity of the NBM model based on 26 weeks of gestation was 0.946, but the sensitivity was only 0.328. e receiver operating characteristic (ROC) curve of the models is shown in Figure 5. e importance analysis of the RF model found that the top 10 most important variables were age, magnesium, fundal height, serum inorganic phosphorus, mean platelet volume, waist size, total cholesterol (TC), TG, globulins, and Journal of Healthcare Engineering        Journal of Healthcare Engineering total bilirubin (TB) ( Table 4). According to the importance of variables, we gradually increase the number of predictors, and the results show that the AUC of the model also increases gradually. e AUC of the model is stable when the number of predictors increases to 15 ( Figure 6).

Discussion
In this study, six algorithms were used to establish the prediction model of premature birth in the early stage of gestation. e overall prediction effect of the RF model was  better than that of other models. We also found that the predictive power of the RF model increased with the increase of gestational age. Age, magnesium, fundal height, serum inorganic phosphorus, mean platelet volume, waist size, TC, TG, globulins, and TB were found to be the main influencing factors of preterm birth.
In our study, we used the data from the production inspection to build the model based on the machine learning algorithm. e prediction performance of the model was relatively good, and the cost of the model was low. Ramkumar et al. using multivariate adaptive regression splines established a prediction model based on biomarkers (including IL-1RA, TNF-α, angiopoietin 2, TNFRI, IL-5, MIP1α, IL-1β, and TGF-α), resulting in a high AUC (train set: 0.82-0.98, test set: 0.66-0.86) [25]. Teresa et al. used cervical length at admission, gestational age, amniotic fluid glucose, and interleukin-6 to establish a prediction model, resulting in a high AUC (0.86, 95% CI: 0.77-0.95) [26]. uy et al. found that nine cell-free RNA could be used to predict gestational age and preterm delivery, and the AUCs of preterm delivery were 0.86 in the discovery cohort and 0.81 in the validation cohort [27]. In these studies, the prediction performance of the preterm birth model was better, but another clinical test was needed and expensive. Kamala et al. used a combination of neighborhood socioeconomic status and individual status to predict preterm birth, but the AUC (0.75) of the model was relatively low [28]. Liu et al. found that cervical elastography could be used as a predictive indicator, and the AUC of the model was 0.73 [29]. e above studies used a traditional biological algorithm, such as logistic regression, to build the model, but the predictive power of the model is relatively low.
In this study, the results of the numerical experiments show that the AUC of SVM, RF, and ANN models were higher than logistic, NBM, and k-means. e possible reason for the low AUC of the NBM model is that the NBM model assumes that features are independent of each other, which is often not true in practice. For logistic regression and k-means algorithms, they were susceptible to outliers and noise that reduce prediction accuracy. For the other 3 machine algorithms, the AUC value of the RF model was the highest. e RF model is an ensemble learning method, which constructs a multitude of decision trees at training time and then sets up the trees to give the classification [30]. is ensemble strategy makes several weak classifiers form a strong classifier to improve the predictive ability of the model. In a recent study, the RF algorithm had also achieved a good predictive effect in fatty liver disease [31], suggesting that the RF algorithm had advantages in the processing of clinical electronic medical records. Moreover, we found that the prediction performance of RF was the best at 27 weeks of gestation. is may be due to alternation of biochemical indexes in pregnant women as delivery approached. e AUC of the model based on random forest in 20 weeks of gestation was 0.855 (95% CI: 0.841-0.869), suggesting that interventions could be performed before these biochemical indicators change.
In the importance analysis of the RF model, we found that age was the greatest effect on preterm birth. A case-control study showed that premature delivery was associated with greater maternal age [32]. We also found that serum magnesium had a great influence on the results of the model. A  double-blind study suggested that magnesium supplementation during pregnancy is associated with a reduction in preterm delivery [33]. Maternal fundal height was found to be a valuable predictor for PTB in our study. Previous study used maternal fundal height to predict fetal weight [34], suggesting that fundal height was a good predictor for PTB. e measurement of fundal height is susceptible to measurement personnel, which may limit its clinical use. Della Rosa et al. used 9 most informative predictors to build a preterm birth prediction model, and the AUC of the model reached 0.812 [35]. Our results show that using only 15 predictions can achieve better model predictions. Considering the cost effect, this result has important implications for guiding clinical practice.
ere were some limitations in our study. First, our dataset, collection from electronic medical records, and lack of some data such as smoking, drinking, family income, method of conception, medication, and fetal fibronectin. e absence of these factors may underperform our model. Second, previous studies found that the conception method has an important effect on preterm birth [36,37], but it was not included in our model, which may affect the prediction accuracy of our model.
ird, controls of the study were matched 1 : 1 from contemporaneous hospitals, which may overestimate the performance of the model and may limit the use of the model to a normal proportion of the population.

Conclusions
Our results indicated that the prediction model based on the RF algorithm had a potential value to predict preterm birth early stage of pregnancy. e RF model also found the main influence factors of PTB, suggesting that intervention in the early stages of pregnancy could decrease the risk of preterm birth.
Data Availability e datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest
e authors declare that they have no competing interests.