Risk Stratification with Extreme Learning Machine : A Retrospective Study on Emergency Department Patients

This paper presents a novel risk stratification method using extreme learning machine (ELM). ELM was integrated into a scoring system to identify the risk of cardiac arrest in emergency department (ED) patients.The experiments were conducted on a cohort of 1025 critically ill patients presented to the EDof a tertiary hospital. ELMand voting based ELM(V-ELM)were evaluated. To enhance the prediction performance, we proposed a selective V-ELM (SV-ELM) algorithm. The results showed that ELM based scoring methods outperformed support vector machine (SVM) based scoring method in the receiver operation characteristic analysis.


Introduction
In the emergency department (ED), the process of triage enables rapid screening of patients to determine severity and assign proper treatment.Most risk stratification systems are based on clinical judgment and traditional vital signs such as blood pressure and heart rate.However, vital signs alone may not be sufficient for accurate risk assessment.Machine learning has been used to design an Euclidean distance based scoring system (DIST) [1] and showed its advantage over statistical risk scores [2,3].Motivated by this encouraging discovery, we aim to derive the DIST score using the extreme learning machine (ELM) [4], one of the latest advancements in machine learning community.
Since ELM was proposed for single-layer feedforward neural networks (SLFNs) [4], it has received many interests on improvement and enhancement at the algorithm level [5][6][7][8][9].With the development of ELM based methodologies, ELM and its variants have been applied to various applications [10][11][12][13].ELM has been widely applied for classification problems in biomedical signal processing such as electroencephalography (EEG) [14,15] and electrocardiogram (ECG) [16,17].Bioinformatics is another popular area of its application [18,19].Both theoretical and experimental studies have shown the evidence that ELM methods are comparable to SVM but benefit from low computational complexity [20].
To our knowledge, ELM has yet been applied for risk stratification in biomedical applications.In this paper, ELM is employed to enhance the DIST scoring system for effective patient outcome prediction, where heart rate variability (HRV) parameters and vital signs are used as predictors.The rest of the paper is organized as follows.Section 2 elaborates the DIST risk stratification system for patient outcome prediction.Section 3 starts with the introduction of basic ELM algorithm and then presents the voting based ELM (V-ELM) and our proposed selective voting based ELM (SV-ELM).Section 4 describes the experimental setting and data collection, as well as the experimental results.Section 5 concludes the study.

Risk Stratification System
We previously developed an Euclidean distance based scoring system (DIST) [1] for risk stratifying critically ill patients presented to the ED.The purpose of the scoring method is to stratify patients into different risk levels so that a proper triage is able to be done.Triage is the process of determining the priority of patients' treatments based on the severity of The DIST risk stratification system is illustrated in Figure 1.It is designed to be applied onto an incoming patient presented to the ED.The DIST system is intended to be a clinical tool to assess incoming patients and to provide a risk score as an output.A low score indicates that the patient is not in critical condition, while a high score indicates the imminent possibility of cardiac arrest.
As an overview, the DIST system utilizes physiological and cardiac data measurements, compiled with medical status information, and processes such inputs within an intelligent machine learning scoring algorithm which compares the present input to correlated past patient diagnoses, in order to provide an insightful risk score as to the risk of cardiac arrest in the patient.A computer interface is provided to an ED nurse to register the incoming patient and to enter pertinent information relating to the medical status input of the patient.The medical status is thereafter transmitted and logged into the triage system under an identifier for the patient.The system is also capable of polling the relevant patient data, retrieving the information required, and propagating the information into the present triage assessment.
Two types of features used in the DIST triage system are heart rate variability (HRV) measures and vital signs.HRV is defined as the variation in the time interval between successive heart beats.Following the widely used HRV analysis standard [21], two categories of HRV (time domain and frequency domain) are calculated.Vital signs are physiological measures of the patient.Vital sign data may be defined as clinical measurements that indicate the state of patient's essential body functions.For example, they may refer to heart rate, respiratory rate, and blood pressure reading.

Risk Stratification with ELM
The DIST risk stratification system employs support vector machine (SVM) [22] as the core of its prediction module.Compared with SVM, ELM shows several advantages that have been discussed [20,23].In this study, we aim to apply ELM and the voting base ELM (V-ELM) [24] to evaluate the performance of ELM for risk stratification.Furthermore, we propose a selective V-ELM (SV-ELM) algorithm to enhance the prediction performance.

Basic ELM.
As a fast learning algorithm for single-layer feed-forward network (SLFN), ELM [4] randomly selects weights and biases for hidden nodes and analytically determines the output weights by finding least square solution.Suppose that there are  samples in the training set where x  is a  × 1 input vector and t  is a  × 1 target vector.Given that w  is -dimensional weight vector connecting th hidden node and input neurons and () is the activation function, an SLFN with Ñ hidden nodes is formulated as Then a compact format of ( 2) can be written as where H(w 1 , . . ., w Ñ,  1 , . . .,  Ñ, x 1 , . . ., x  ) is hidden layer output matrix of the network; ℎ  = (w  ⋅ x  +   ) is the output of th hidden neuron with respect to x  ,  = 1, 2, . . ., Ñ and  = 1, 2, . . ., ; β = [ 1 , . . .,  Ñ] T and T = [t 1 , . . ., t  ] T are output weight matrix and target matrix, respectively.To obtain small nonzero training error, Huang et al. [4] proposed randomly assigning values for parameters w  and   , and thus the system becomes linear so that the output weights can be estimated as  = H † T, where H † is the Moore-Penrose generalized inverse [25] of the hidden layer output matrix H.

V-ELM and SV-ELM.
The V-ELM [24] method was proposed to improve the classification performance of ELM.Its assumption is that the hidden nodes in basic ELM are randomized and remain unchanged during the training phase, which increases the possibility of misclassification for some samples near the decision boundary.V-ELM utilizes a majority voting mechanism to combine an ensemble of individual basic ELM based decisions.This strategy is reported to well address the misclassification problems on some borderline samples [19,24].Suppose that there are  independent networks trained in V-ELM.For each testing sample x test ,  prediction results can be obtained based on these independent ELMs.A corresponding vector S ,x test with dimension equal to the number of class labels is used to store all  results of x test , where if the class label predicted by the th ( ∈ [1, . . ., ]) ELM is , the value of the corresponding entry  in the vector S ,x test is increased by one.After all  results are assigned to S ,x test , the final class label of x test is then determined by conducting a majority voting: where  is the total number of classes in the database.
The V-ELM algorithm is simple yet effective.Its algorithm structure creates a lot of rooms for further development.One such possibility is to increase the ensemble size but only select a few individual ELMs for decision making.In detail, we create an ensemble of   ELMs and select  of them to combine the outputs.The selection is based on the mean value of norms of output weights ‖‖.Smaller ‖‖ could lead to better generalization performance [4] and this characteristic has been applied in several ELM based methods [8,26].The SV-ELM method is briefly described as follows.
(1) Randomly generate   sets of hidden node parameters, train each ELM, and obtain the corresponding output weight matrix.
(2) Select  individual ELMs with small ‖‖ values in the final decision ensemble.
(3) Apply the above selected  ELM models on testing sample x test to get the predicted label.
(4) Combine  predicted labels for the testing sample to reach the final decision.

Experiments
4.1.Data Collection and Processing.This was a retrospective observational study on emergency department (ED) patients.
Patients were recruited at the ED of Singapore General Hospital.Eight vital signs and raw electrocardiography (ECG) data were acquired.These vital signs include temperature, respiration rate, pulse rate, systolic blood pressure (SBP), diastolic blood pressure (DBP), oxygen saturation (SpO 2 ), Glasgow Coma Scale (GCS), and pain score.ECG signals were acquired using LIFEPAK 12 defibrillator/monitor and downloaded using the CODE-STAT Suite.To ensure qualified RR intervals for calculating HRV measures, only cases containing more than 70% sinus rhythm were included in the database.Each patient was represented as a 24-dimensional feature vector (16 HRV measures and 8 vital signs) and the corresponding outcome is coded as either 0 (no cardiac arrest within 72 hours) or 1 (cardiac arrest within 72 hours).Prior to implementing ELM based risk stratification, min-max normalization [27] was performed to transform the feature set into the interval [−1, 1].

Experiment Setting and Performance Evaluation.
Experiments were carried out in MATLAB R2009a (Mathworks, Natick, MA) under a desktop computer with Intel 3.2 GHz CPU and 4 G RAM.The LIBSVM library [28] was used to implement linear SVM algorithm for the DIST system.The Gaussian radial basis function (RBF) activation function (w, , x) = exp(−‖x − w‖ 2 /) was adopted for all ELM algorithms.In this study,  was chosen as 1 as a default value in MATLAB setting.We evaluated scoring systems with the leave-one-out cross validation (LOOCV) framework.Given a dataset of  samples, one sample was selected to validate a scoring model trained with the rest of  − 1 samples.To complete the LOOCV based validation, all  samples had to be tested individually through  iterations.Having derived the risk scores for all samples in the dataset, the receiver operation characteristic (ROC) analysis was conducted, with which the area under the curve (AUC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were derived for performance evaluation.

Baseline Characteristics.
During the recruitment period (November 2006 to December 2007), 1025 patients presented to the ED of SGH were conveniently sampled.Of these, 52 (5.1%) patients met the primary outcome (cardiac arrest within 72 hours) while 973 (94.9%) patients did not meet the primary outcome.Table 1 shows the characteristics of all recruited patients.The diagnosis group was based on physician clinical judgment.425 (41.5%) of the patients were diagnosed under cardiovascular group, followed by 159 (15.5%) who were at respiratory group.In the primary outcome group, patients had a median age of 71 (interquartile range or IQR: 61-81), and 37 (71.2%) of patients were male. 2 shows the comparisons of prediction results in terms of AUC, sensitivity, specificity, PPV, and NPV.The cutoff value for each scoring method was chosen as the optimal point closest to the upper left corner in the ROC curve.The 95% confidence intervals (CI) were also reported.As observed in Table 2, ELM based scoring methods outperformed SVM in risk prediction, and the proposed SV-ELM algorithm achieved the best performance.Noting that the database was fairly small and the feature dimension was not high, training time might not be a concern especially when the training process was done offline.Therefore, we briefly describe training time as follows instead of providing a detailed report.Both SVM and ELM based scoring methods completed their training in 0.3 s while V-ELM required more than 1 s and SV-ELM used more than 2 s.These data were based on the averaged value during the leave-one-out cross validation.SVM ran fast because linear kernel was implemented; no grid search [28] was needed to fine-tune parameters.As reported in [24], SVM using RBF kernel took much longer time than ELM methods for training.In the next section, we will investigate the effects of parameter setting in ELM based scoring methods.

Performance with Different ELM Parameters.
In practice, the number of hidden nodes and the ensemble size usually control the network complexity and the learning performance.The V-ELM algorithm was used to illustrate the impact of parameter selection.Figures 2(a) and 2(b) depict the performance of V-ELM with different ensemble size and different number of hidden nodes, respectively.Good prediction results were obtained when the number of hidden nodes was 20 and the ensemble size was 15.This pair of parameters was observed efficient in producing a trade-off between prediction performance and system complexity.It is worth noting that the performance was more sensitive to the ensemble size compared to the number of hidden nodes.
A further investigation on the parameter   in SV-ELM algorithm was conducted.Assuming that the base ensemble size  was 15, we gradually increased the ensemble size to 40 and only selected 15 of individual ELMs into the decision    ensemble.Table 3 presents the prediction performance.An initial ensemble size of 25 produced the best performance.With a cutoff score of 64, it achieved an AUC of 0.754, 78.8% sensitivity, 64.7% specificity, 10.7% PPV, and 98.3% NPV.We noted that a large   dramatically reduced the prediction performance; for example,   = 40 only received an AUC of 0.735.In this study, parameters   and  were empirically selected; they were far from optimal.Therefore, derivation of a general guideline for parameter selection is worthy of further investigation.

Conclusions
In this retrospective observational study of 1025 critically ill patients presented to the ED, we found that ELM based methods outperformed the original SVM based risk scoring method.Furthermore, our proposed SV-ELM method achieved the best performance in ROC analysis.Based on these discoveries, we foresee the potential use of ELM methods for risk modeling in biomedical applications.ELM methods provide an alternative solution to traditional classification tools like SVM by offering an increased predictive ability.

Figure 2 :
Figure 2: Prediction results in terms of AUC values using V-ELM with (a) different ensemble size where the number of hidden nodes was a constant value of 20 and (b) different number of hidden nodes where the ensemble size was a constant value of 15.The 95% CIs are shown in both figures.

Table 1 :
Baseline characteristics of study patients. value from either the chi-square test or the Mann-Whitney test as appropriate.b Based on admitting emergency physician clinical diagnosis.c Medical history at presentation to the emergency department.
a d Prior outpatient medical therapy at presentation to the emergency department.

Table 2 :
Prediction results using various scoring methods.The number of hidden nodes for ELM algorithms was 20.The size of ensemble  for both V-ELM and SV-ELM algorithms was 15.The initial ensemble size   for SV-ELM was 25.

Table 3 :
Prediction results using SV-ELM with different initial ensemble size.