Intelligent Ensemble Deep Learning System for Blood Glucose Prediction Using Genetic Algorithms

Forecasting blood glucose (BG) values for patients can help prevent hypoglycemia and hyperglycemia events in advance. To this end, this study proposes an intelligent ensemble deep learning system to predict BG values in 15, 30, and 60 min prediction horizons (PHs) based on historical BG values collected via continuous glucose monitoring devices as an endogenous factor and carbohydrate intake and insulin administration information (times) as exogenous factors. Although there are numerous deep learning algorithms available, this study applied ﬁve algorithms, namely, recurrent neural network (RNN), which is optimized for sequence data (e.g., time-series), and RNN-based algorithms (e.g., long short-term memory (LSTM), stacked LSTM, bidirectional LSTM, and gated recurrent unit). Then, a genetic algorithm (GA) was applied to the ﬁve prediction models to optimize their weights through ensemble techniques and to yield (output) the ﬁnal predicted BG values. The performance of the proposed model was compared to that of the autoregressive integrated moving average (ARIMA) model as a baseline. The results show that the proposed model signiﬁcantly outperforms the baseline in terms of the root mean square error (RMSE) and continuous glucose error grid analysis. For the valid 29 diabetic patients for the multivariate models, the RMSE was 11.08 ( ± 3.19), 19.25 ( ± 5.28), and 31.30 ( ± 8.81) mg/DL for 15, 30, and 60min PH, respectively. When the same data were applied to univariate models, the RMSE was 11.28 ( ± 3.34), 19.99 ( ± 5.59), and 33.13 ( ± 9.27) mg/DL for 15, 30, and 60 min PH, respectively. Both the univariate and multivariate models showed a statistically signiﬁcant diﬀerence compared with the baseline at a 5% statistical signiﬁcance level. Instead of using a model with a single algorithm, applying a GA based on each output of a model with multiple algorithms was found to play a signiﬁcant role in improving model performance.


Introduction
Diabetes mellitus (DM) is a metabolic disease that affects several people and causes huge socioeconomic losses [1].
ere are two types of diabetes: Type-1 Diabetes mellitus (T1DM) and Type-2 Diabetes mellitus (T2DM). T1DM is also known as insulin-dependent diabetes [2] and is an autoimmune disease in which the β cells of the pancreas are destroyed, and insulin is not produced dependably. T2DM is also known as noninsulin-dependent diabetes and can produce insulin in β cells of the pancreas; however, the body's cells are insulin-resistant, making it unable to function normally. Patients with diabetes are at risk of acute complications if their hyperglycemia continues, such as insufficient nutrition to the brain, leading to headaches, lethargy, and consequently coma.
us, external insulin administration is a prerequisite for diabetic patients to control their blood glucose (BG) concentration over time. Patients with T2DM who are resistant to insulin also require self-monitoring BG (SMBG). In the past, it was necessary to manage BG concentration by measuring its values through a finger-stick test: a self-monitored method [3]. However, with the advancement in technology, continuous glucose monitoring (CGM) has been introduced to measure BG values every few minutes using devices attached to the skin. CGM not only facilitates BG concentration control in diabetic patients but also promotes research into forecasting future BG values.
e BG values are not arbitrary, and there is an observable trend. us, given that BG history has a feasible architecture, it is possible to predict future BG values based on past BG values collected through CGM [4]. Most previous studies have been conducted on BG prediction models using traditional statistical methods. However, because such methods do not account for the nonlinear relationship of BG values, research applying machine learning models has been predominantly conducted [5][6][7][8].
Many machine learning-based BG prediction studies have been conducted, and most used only CGM readings as input. It has been demonstrated that models that add exogenous factors, such as insulin and carbohydrate (CHO) information, to BG prediction models as input perform better than models that use only CGM reading as input [9]. Furthermore, most studies were conducted on T1DM, even though T2DM diabetes accounts for more than 90% of all diabetes patients. T2DM and T1DM have different causes; therefore, different methodologies should be applied to forecast BG values.
In particular, factors affecting BG vary, including stress, emotion, physical activity, insulin, and CHO. erefore, it cannot be known which of the recurrent neural network (RNN)-based algorithms has the best model performance, owing to fluctuating variability [10]. In certain cases, the long short-term memory (LSTM) model shows the highest accuracy, whereas, in others, its accuracy may be the least. To reasonably solve this problem, an ensemble weight optimization technique using a genetic algorithm (GA) is applied to derive robust models based on each factoring condition. To prevent hypoglycemic and hyperglycemic events, five RNN algorithms are leveraged to optimize T2DM ensemble variable weights using the GA to predict future BG values at 15, 30, and 60 min prediction horizons (PHs) from time-series data and exogenous factors. e reasons for selecting those PH settings are as follows: 1. e rates of insulin and CHO absorption in the body match clinical thresholds; 2. It makes it easier to compare the results of this commonly used model [10][11][12][13][14][15]. e remainder of this paper is organized as follows. Section 2 introduces previous studies and provides an overview of the background knowledge related to the proposed model. Section 3 describes the framework of the proposed model and its process. Section 4 discusses the experiments conducted to evaluate the performance of the proposed model. Finally, Section 5 presents the conclusions of the study and recommendations for future research.

Related Work
In the past, SMBG finger-stick blood tests were required every few hours to monitor BG concentrations in diabetes patients. However, with the introduction of CGM devices that measure and record estimated glucose values (EGV) every few minutes, researchers can now collect highly utilizable data, which has led to the development of BG prediction models and hypoglycemic early warning systems. Glucose methods are estimated because CGM devices measure BG indirectly from interstitial fluids. erefore, SMBG is required every few hours, and EGV optimization through calibration is required. Hence, the accuracy of these devices depends on the calibration algorithm. Concerns about whether EGVs can clinically replace absolute BG values were addressed by Rebrin and Steil [16].

Background Work.
Many studies have been conducted to predict future BG values using data collected through CGM devices to monitor and regulate BG concentrations to prevent hypoglycemia in diabetic patients using alarms. However, most studies were conducted in silico or with patients with T1DM [12,13,17]. For a brief overview of the literature, see Oviedo et al. [5].
Initial attempts to predict future BG values based on past BG data collected from CGMs were performed by Bremer and Gough [4], who proved that past BG values can be used to predict future BG values. Since then, various studies have been conducted using in silico and clinical data to apply traditional statistical and machine learning techniques. Sparacino et al. [18] compared a first-order polynomial model and a first-order autoregressive (AR) model in 28 diabetic patients with T1DM. ey predicted BG values after 30 or 45 min PH, and considering the delay, mean-square prediction error, and energy of the second-order differences, they used the AR model as the evaluation metric. Sun et al. [19] applied LSTM and a bidirectional LSTM-based neural network to the data of 26 T1DM patients and measured the root mean square error (RMSE) as 21.07 mg/dL at 30 min PH and 33.27 mg/dL at 60 min PH. Pérez-Gandía et al. [20] used an artificial neural network for six T1DM patients. Patients carried a CGM device intermittently 72 h/week over 4 weeks, and the RMSE was 10.38 mg/dL, 19.51 mg/dL, and 29.07 mg/dL for 15, 30, and 45 min PH. ey showed that a neural network prediction model is a reliable solution to the problem of predicting BG values from CGM systems. Rabby et al. [21] proposed a stacked LSTM with a Kalman smoothing model for six T1DM patients using a dataset that included 8 weeks of data for each patient; they achieved an RMSE of 6.45 and 17.24 mg/dL for 30 and 60 min PH. Li et al. [22] proposed a multilayer convolutional RNN (CRNN) model, and the performance was evaluated using 10 in silico and clinical data points. e proposed CRNN model's RMSE in the in silico case was 9.38 and 18.87 mg/dL for 30 and 60 min PH; in the clinical data case, the RMSE was 21.07, and 33.27 mg/dL for 30 and 60 min PH.

Preliminaries.
As mentioned, five RNN-based algorithms are applied in this study in the first stage to build BG value prediction models. An RNN is an ANN with hidden nodes connected by directed edges to form a cyclic structure [23]. e strength of an RNN is its network-type structure that can accept input and output regardless of sequence length; hence, it can be constructed flexibly. However, there is a long-term dependency problem in RNN, where the larger the gap between data, the greater the decline of learning.
To deal with these problems, the LSTM algorithm addresses the long-term dependency problem by adding cell states to the same RNN structure [24]. A stacked LSTM is an extension of the LSTM's single hidden layer to create multiple hidden layers. It extends the depth of the neural network so that it has the potential to achieve more accurate results. Additionally, the typical RNN structure uses only historical data as a forward state, whereas the bidirectional LSTM can use future information as a backward state, solving the long-term dependency problem by adding a cell state to the RNN structure. e GRU algorithm has two gates: an update gate and a reset gate. It has similar performance and faster computation than the ordinary LSTM [25].
In this study, GAs are used for the ensemble weight optimization process. A GA is a search method that finds optimal solutions by imitating the evolution of living things as they adapt to their environment; it is an effective method because it theoretically finds global optima and handles problems that are not mathematically clearly defined [26]. e components of the GA are as follows: Chromosome: biologically, it denotes a set of genetic materials, and in the GA, it represents a solution.
Gene: this represents a single piece of genetic information as a building block of chromosomes. If a chromosome is [X Y Z], there are three genes inside, each with values of X, Y, and Z, respectively.
Offspring: this represents chromosomes derived from those that existed at a certain time t. e offspring have genetic information like that of the previous generation.
Fitness: this is the eigenvalue of any chromosome, indicating the suitability of the solution expressed by the chromosome for the problem.

Method
is study proposes individual deep learning BG prediction models classified into two types according to the input variables. One is a univariate model that uses only past BG values as input, and the other is a multivariate model that uses past BG values in addition to CHO intake and insulin administration time information. Both models have the same architecture, apart from the input. e overall research framework of this study is shown in Figure 1.
e proposed procedure consists of several segments: data collection, preprocessing, time-series prediction using five RNN-based algorithms, and weight ensemble optimization with GAs.
3.1. Stage 1: Data Setting. First, web pages and databases were built to facilitate remote data entry. e demographic information of diabetic patients, CGM readings, and CHO and insulin information were entered through the web page and stored in the database. e collected data were then converted into suitable forms to fit the neural networks via preprocessing.

Stage 2: Prediction Algorithms.
e input data were split according to the lookback period for the number of previous values to be used and the time step (i.e., sequence size). All samples were set up with sliding windows with a step size of one. A global description of how data are framed to fit into neural networks is shown in Figure 2.
e description of the terms in Figure 2 is as follows: Lookback describes the duration in minutes before the BG values are used as inputs; the prediction point reflects the BG values after a few minutes of PH as a target; the sampling rate measures the patient's BG value every 5 min.
For stage-3 optimization, five RNN-based algorithms were used to yield the predicted BG values. Each algorithm is a slight variant, as described in Table 1, based on the hyperparameters. All hyperparameters are determined experimentally through trial-and-error for good performance.

Stage 3: Optimization-GAs.
Stage 3 describes the optimization process of applying the GA based on the predicted BG values obtained from the model that applied each RNN in Stage 2. e main purpose of using the GA is to optimize each RNN model by assembling the weights. e output of each RNN computes the fitness and yields the optimal solution of the weights through an objective function that minimizes RMSE. Objective function is defined as follows: where n is the number of data and m is the number of models. e structure of the equation is similar to RMSE. But, the predicted value is calculated as equation (2). In equation (2), P (x) i is the model predicted value at point x , and w i is the weight applied to the model. e weighted set of w is optimized toward minimizing Obj(x) , and the w is derived through a genetic algorithm. Additionally, two constraints were established for the GA in this work. First, the sum of all weights is greater than 0.99 and less than one.
is is because the predicted BG values for each model must be multiplied by the weights and summed to be compared with the actual BG values. Second, each weight is greater than 0.05 and less than 0.5. e purpose of setting these constraints is to prevent overbiased weights.

Empirical Studies
In this section, the experiments conducted to evaluate the performance of the proposed model are discussed. First, the dataset was obtained. en, the preprocessed data were filtered and cleaned to fit the neural network. In the next step, the effectiveness of the proposed models was assessed in Complexity 3 two setups according to the input data. en, the best and worst cases based on the RMSE were analyzed. Finally, residual analysis and continuous glucose error grid analysis (CG-EGA) were performed to interpret the causes of the predictive performance difference between the best and worst cases. from 51 hospitalized T2DM patients. e Dexcom G5 consists of a sensor, transmitter, and a mobile app. Patients were encouraged to enter their own CHO, insulin amounts, and times using the mobile app. e device measures the patient's glucose reading every 5 min for 4-7 days. e glucose is measured through the sensor attached to the Dexcom G5, and the measured glucose is transmitted to the receiver (mobile app) through the transmitter at regular intervals. Patient data were collected from Soonchunhyang University Cheonan Hospital between July 2019 and March 2021. e collected data is stored in a database built based on APM (Apache-PHP-MySQL), and it includes demographic information of type 2 diabetes patients, CGM data, insulin and CHO information. Data stored in the database is exported in csv format and used for experiments. Information on patients registered in our study is shown in Table 2. e study was approved by SoonChunHyang University Hospital Cheonan Institutional Review Board (SCHCA IRB Protocol Number: SCHCA 2019-11-048). Of the 51 patients with diabetes, 29 were available for multivariate models based on valid CHO and insulin time information. erefore, the experiment proceeded with the following scenario: (1) results of the univariate model using 51 diabetic patients' data; (2) a comparison of the performance of a multivariate model and a univariate model using the same 29 diabetic patients' data; (3) RMSE was used as a model performance evaluation metric, and the best and worst cases were compared and analyzed to interpret the causes of the differences between samples.

Data Preprocessing.
e primary purpose of preprocessing is to clean the data and transform them into a suitable form for fitting the neural networks. e data used in this study were not in silico data generated through simulation, but rather clinical data collected from real-life conditions. erefore, outliers may have existed, owing to errors in wireless communication devices. e exogenous factors recorded directly by patients are likely to be inconsistent because they are directly inputted by patients, requiring appropriate preprocessing tasks. First, the CGM devices attached to patients during the data collection stage of this study were recorded as "low" if the BG values were  (1) procedure PREPROCESSING() (2) while countPoint < numberOfPointInFile (3) if glucose � � "low" (4) glucose � 60 (5) if glucose � � "high" (6) glucose � 400 (7) if CHO >0 (8) CHO � 1 (9) else (10) CHO � 0 (11) if insulin >0 (12) insulin � 1 (13) else (14) insulin � 0 (15) countPoint ++ ALGORITHM 1: Preprocessing Method.
Insulin and CHO information are input by patients directly into the application, resulting in a difference in information input by each patient. For example, there have been cases in which some patients accurately record their intake of CHO in grams, while others simply indicated that they had taken (1) parameter(s): w-modelWeight (2) require: 0.05 < w < 0.5, 0.99<sum(wList) <1.00 (3) procedure GA() (4) generate wLists//initialChromosomes (5) calculateFittness objectiveFunction (6) while termination condition not met (7) selectIndividuals wLists//selection (8) recombine individuals//crossover (9) mutate individuals//mutation (10) calulateFittness objectiveFunction (11) wLists � individuals (12) return wList//optimalWeights ALGORITHM 3: Genetic Algorithm Method.  erefore, the data were converted to the lowest dimension to unify the insulin and CHO information that occurs for each patient. e time-series data (i.e., the time information of insulin and CHO) were added to the past BG values via preprocessing.

Results.
is section details the experiments conducted in two categories according to input: univariate models using only CGM readings and multivariate models adding CHO and insulin information (time) to CGM readings. e predictive performance of the models was evaluated by considering 15, 30, and 60 min PHs. e performance of the proposed models was compared to that of the ARIMA model baseline.
In this study, two criteria were used to evaluate the performance of the proposed model: RMSE and CG-EGA. RMSE is one of the most frequently used evaluation indicators in regression problems and has been adopted to facilitate performance comparisons in many studies. CG-EGA assesses the clinical accuracy of BG prediction systems [27].
is error analysis method quantifies the clinical accuracy of measured and predicted BG values. e grid divides a scatterplot of the reference glucose values and predicted glucose values into five regions: A to E. e closer the points are to Zone A, the more significant the results are clinically.

Results for the Dataset with CGM Readings. First, as
there was a total of 51 data points for all diabetic patients, the results of applying a univariate model to them are described. Table 3 presents the model performance (RMSE) of the univariate model trained with CGM reading data as input without exogenous factors. Table 3 shows that the proposed model exhibits the best performance and the lowest standard deviation. In addition, at 15 and 30 min PH, there is a statistically significant difference compared with the baseline at a 5% statistical significance level.

Results for the Dataset with CGM Readings and Exogenous Factors.
For the next scenario, a comparison of the performance of univariate and multivariate models for 29 diabetic patients is shown in Table 4.  Complexity e results in Table 4 show that, apart from the RNN model of multivariate models, the proposed model has the best performance of all PHs and the lowest standard deviation. For the valid 29 diabetic patients of the original 51 for the multivariate models, the RMSE was 11.08 (3.19)  8 Complexity t-tests at a statistical significance level of 5%. Additionally, the performance of multivariate models was slightly superior to that of univariate and multivariate models. e cause of the slight difference in accuracy is explained in Section 5.

Best/Worst Case Analysis.
Based on the performance evaluation metric RMSE, the best and worst cases were analyzed to determine the differences that result in performance gaps between the best-and worst-performing models. Figures 3 and 4 show that the best and worst cases differ in the variability of BG values in the test set. e CG-EGA results are shown in Figures 5 and 6, indicating that the best case has all points in Zone A, and the worst case is distributed across several zones, including A, B, and D.

Conclusion
is work investigated highly accurate individualized BG prediction models with weight ensemble optimization using a GA in the output of RNN-based algorithms for hospitalized T2DM patients. Instead of using a model with a single algorithm, applying a GA based on each output of a model with multiple algorithms was found to play a significant role in improving model performance and making the model more robust toward variations.
However, the limitation of this study is that some inaccurate data regarding exogenous factors, such as CHO and insulin information, were used; this is because the data were measured and recorded under real-life conditions. erefore, in future studies, multivariate models may perform significantly better than univariate models if more strictly recorded data are used, even if the amount of data is limited.
Further study includes a more exhaustive test of the proposed model, and its comparison with other state of the art methods (e.g. Rough Autoencoder (RAE), Deep Belief Network (DBN), interval probability distribution learning (IPDL), and Deep Temporal Dictionary Learning (DTDL)), on a significantly large number of real datasets [28][29][30][31]. Moreover, complexity analysis may be required to apply the system to a real-time environment.

Data Availability
No data were used to support this study.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.

Authors' Contributions
Dae-Yeon Kim and Dong-Sik Choi contributed equally to this paper.