Hemodynamic Instability Prediction Through Continuous Multiparameter Monitoring in ICU

Current algorithms identifying hemodynamically unstable intensive care unit patients typically are limited to detecting existing dangerous conditions and suffer from high false alert rates. Our objective was to predict hemodynamic instability at least two hours before patient deterioration while maintaining a low false alert rate, using minute-by-minute heart rate (HR) and blood pressure (BP) data. We identified 66 stable and 104 unstable patients meeting our stabilityinstability criteria from the MIMIC II database, and developed multi-parameter measures using HR and BP. An instability index combining measures of BP, shock index, rate pressure product, and HR variation was developed from a multivariate regression model to predict hemodynamic instability (ROC of 0.82±0.03, sensitivity of 0.57±0.07 when the specificity was targeted at 0.90; the alert rate ratio of unstable to stable patients was 7.62). We conclude that these algorithms could form the basis for reliable predictive clinical alerts which identify patients likely to become hemodynamically unstable within the next few hours so that the clinicians can proactively manage these patients and provide necessary care.


INTRODUCTION
Hemodynamic instability is considered one of the most critical events that require effective and prompt intervention in the intensive care unit (ICU).It is one of the major reasons for ICU recidivism which causes longer length of ICU stay [1].It is most commonly associated with an abnormal or unstable blood pressure (BP), especially hypotension, or more broadly associated with inadequate global or regional perfusion.Inadequate perfusion may result in damage of vital organs.The heart and brain are particularly sensitive to perfusion compromise due to their high metabolic requirements and their limited ability to compensate for decreased oxygen and metabolic substrate delivery.Compromised perfusion of these vital organs may quickly lead to lifethreatening organ failure and death.Under normal circumstances, the autonomic nervous system provides fine control of cardiac output that is adequate to maintain body function and end organ perfusion in the face of normal stresses.However, in the ICU, clinicians often need to proactively intervene to augment the body's own feedback systems which may be inadequate to maintain perfusion of vital organs in the face of extreme stressors such as sepsis, hemorrhage or severe cardiac injury.In these instances, perfusion is often maintained through the judicious and timely use of intravenous (IV) fluids and blood transfusion for vascular filling and/or vasoactive and inotropic agents.The onset of significant hemodynamic instability is often an event that occurs at the end of a long series of subtle physiologic derangements in patients who are becoming unstable.These subtle derangements may not be readily noticeable within the torrent of physiologic data presented regularly to clinicians caring for patients in a modern ICU.
Current automated alert mechanisms typically generate an alert if one or more physiological parameters (e.g., mean BP) cross predetermined "normal" thresholds.Unfortunately, often by the time these alerts are triggered, the patient is already in an unstable state, making intervention much more difficult.Furthermore, these alerts are often subject to high false alert rates, adding to the information overload [2][3][4].Early interventions in disease processes which often result in hemodynamic instability and end organ perfusion compromise such as sepsis have been widely advocated as a means to limit the morbidity and mortality [5,6].Thus, reliable methods to predict which patients are most likely to become unable to maintain organ perfusion through their own autonomic control should help ICU caregivers focus on earlier interventions that would limit the impact of physiologic deterioration in the sickest patients.
We have previously reported a method to reduce the false alert rate by identifying artifacts using multiple channels of monitoring data [7].We have also developed a rulebased algorithm to predict hemodynamic instability using hourly electronic charting data [8] and logistic regression models using minute-by-minute vital signs from electronic patient monitors [9].
In this work, we report on hemodynamic instability prediction with minute-byminute segments of continuous monitoring data in, for example, 2 hours.The purpose of the algorithms we developed based on trend (minute-by-minute) data of vital signs was not to replace the existing alerting systems in the current ICUs, which catch hemodynamic instability events when they occur.Rather, the purpose of these algorithms was to predict these events ahead of time so that the clinicians can proactively manage these patients and reduce the number of hemodynamic instability events, leading to improved patient care and outcome.It is important to develop algorithms that use features that are robust to missing data, a common reality of the ICU data.For example, since spectral analysis using Fourier Transformation (FT) is not robust to missing data, it cannot be used to extract features in this case, although it has been shown to be very important in heart rate variability analysis [11].Using a window (2-hour segment) makes the algorithm less sensitive to natural variations of the measured parameters and noisy artifacts.In addition, it becomes possible to extract useful pattern identifying instability from multiple data points.Therefore, a better 510 Hemodynamic Instability Prediction Through Continuous Multiparameter Monitoring in ICU classification performance might be achieved using monitoring data of higher resolution.
In this work, hemodynamic instability was based on clinical interventions of vasopressors (more details in METHODS Section 2.1).In other words, prediction of hemodynamic instability was equivalent to prediction of clinical interventions presumably due to hemodynamic instability.Our hypothesis was that by combining information from several continuously monitored vital signs (i.e., HR from body surface ECG signals and arterial BP), more timely and accurate alerts can be generated than traditional alerts that are triggered when a single data stream crosses a threshold.In order to ensure that the alerts are accurate (low false alert rate) and timely, they were developed using a high targeted specificity and using data two hours before the patient's deteriorating condition was diagnosed and an intervention was executed (i.e., the intervention occurs two hours after the end of the 2-hour window of data, illustrated in the top panel of Figure 4).
The study was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA) and the Massachusetts Institute of Technology (Cambridge, MA) during collaboration with Philips Healthcare (Andover, MA).Requirement for individual patient consent was waived because the study did not impact clinical care and all protected health information (PHI) was de-identified.

Data collection and reference data sets
The reference data sets were based on retrospective data from MIMIC (Multiparameter Intelligent Monitoring in Intensive Care) II, which is a database of ICU patients created by a partnership between Philips Healthcare (Andover, MA), Massachusetts Institute of Technology (Cambridge, MA) and Beth Israel Deaconess Medical Center (Boston, MA) [10].The version of the MIMIC II database that we used contained data of 12,695 medical, surgical, post-cardiac surgery, and cardiac ICU adult patients recorded between 2001 and 2005.A clinical information system, CareVue 9000 (Philips Healthcare, Andover, MA), was utilized to acquire laboratory results, IV medications, nurses' notes, and nurse validated monitoring data such as HR and BP on an hourly basis.Besides the CareVue data, the MIMIC II database also includes waveform data and minute-by-minute trend data saved directly from the patient monitors (IntelliVue MP-70 Patient Monitor, Philips Healthcare, Andover MA) for 1875 patients, from which our database was derived.The trend data include physiological data such as HR, various invasive BP measurements, oxygenation saturation (SpO2), and respiration rate (RR).Patient records with trend data for HR from ECG and arterial BP (systolic, diastolic and mean) were chosen for further analysis.
We developed our reference data sets based on whether a patient had therapeutic interventions indicative of hemodynamic instability.In particular, those patients who received vasopressors (dopamine, levophed, neosynephrine, epinephrine) were defined as unstable.In the patient population that did not qualify to be unstable, patients were defined as nonstable if they received at least one of the following: (1) medications such as lidocaine, nitroglycerine, labetolol, esmolol, nitroprusside, amiodarone, lasix, milrinone, (2) at least 1500cc IV-fluids in one hour, (3) at least 750cc of packed red blood cells in 24 hours.The patients who were neither unstable nor nonstable were defined as stable.However, patients who had data gaps of more than 2 hours immediately after receiving vasopressors were excluded because they might be out of the ICU for surgery, and the vasopressor might have been administered for reasons other than hemodynamic instability.We utilized the occurrence of a "qualifying" clinical intervention to define hemodynamic instability as described above, rather than the simple occurrence of a change in vital signs, so that the physicians' clinical judgment on the ICU patients at the bedside could be incorporated in the definition.Although the treatment end-points might differ for each disease-state group, we aimed at identifying those with physiologic evidence of deterioration without regard to the treatment goals normally prescribed for a particular subset of patients in our population.Since hypotension requiring the use of vasopressor medication is a common final signal of hemodynamic instability among all of the potential subsets of patients included in our very heterogeneous population, we have utilized this as the standard for evidence of hemodynamic instability.
There were 66 stable patients and 104 unstable patients that met our criteria, and 83 non-stable patients that did not meet our instability criteria.There were 128 points in each two-hour segment, and we required no more than 5 points missing in each segment.For each stable patient, up to 10 two-hour segments of minute-by-minute trend data since admission were selected for a total of 505 segments.For each of the 104 unstable patients, a two-hour segment 2 hours before the first onset of a vasopressor medication was selected.Twelve additional unstable segments were identified for which the vasopressor therapy was reinitiated after a period of at least 24 hours without vasopressors (i.e., the segment began at least 24 hours after therapy stopped and ended 2 hours before another therapy was initiated).These 116 segments constituted the reference data sets for unstable patients.No "stable" segments were extracted from unstable patients in order to establish a clear-cut case scenario, since the underlying physiological state might have been changed well in advance when the clinicians initiated critical intervention for hemodynamic instability, and it would be unclear when an unstable patient was in a true stable physiological state.

Feature creation
Based on the original set of four physiological parameters -ECG heart rate (HR), systolic arterial BP (SAP), diastolic arterial BP (DAP), and mean arterial BP (MAP)we developed several derived parameters.The intra-arterial BP data were collected directly from arterial line.The derived parameters include: • abs_dHR: the absolute successive difference of the minute-by-minute HR data, a measure of physiological HR variation which is an indicator of many cardiovascular diseases [11].• HR_div_BP: HR to BP ratio, including HR_div_SAP and HR_div_MAP, also known as the shock index, which may be an early indicator of cardiogenic, hypovolemic, and septic shock [12] [14,15].This modified model was found to demonstrate the best performance [14].In this paper, we set k co as 1, and therefore, ECO is a signal that is proportional to the actual cardiac output which quantifies the mechanical work of the heart [14,15] To eliminate noise/outliers of each initial physiological parameter, we used a modified statistical outlier detector based on the deviation to the median and inter-quartile range values in a local moving window [16].We then collected simple statistical measures based on filtered parameters.The measures included extremes (i.e., minimum and maximum), moments (mean, standard deviation, and skewness), percentiles (5 th , 10 th , 50 th , 90 th , and 95 th percentiles), and inter-percentile ranges (IPR, defined as the difference between two percentile values such as 95 th and 5 th percentiles).These measures are simple and easy to implement, and provide a snapshot of how each parameter is different for stable v.s.unstable patients.We refer to these measures as "features".Overall, we developed 220 features.

Feature selection and classification
We began feature selection by developing a univariable logistic regression model for each of the 220 features serving as predictors.Logistic regression models are one class of generalized linear models that assume prob{Y=1|X}=1/[1+exp(-B 0 -XB)] where Y is a binary outcome variable (0 or 1), X is a vector of the predictor variables, B 0 is the intercept, B is a vector of regression coefficients obtained from maximum likelihood estimation, and the left side of the equation gives the probability of Y =1 for a given X.
In this paper, Y was interpreted as a diagnosis of instability.The probability calculated from the logistic regression model was called the instability index.Given a set of predictors (features), the instability index could be calculated.By setting all possible thresholds of instability index, a series of sensitivity and specificity, and therefore receiver operating characteristic (ROC) curve area could then be calculated.If the instability index was greater than the threshold, the 2-hour segment was classified as "unstable".
The study population is 505 two-hour segments from 66 stable patients and 116 twohour segments from 104 unstable patients.The outcome variable, the probability of becoming unstable two hours later, is interpreted as an instability index.Wald's χ 2 test was used to test the significance of each coefficient in the model.In order to correct the impact of repeated measure on the artificial inflation of the sample size and therefore artificial increase in power in logistic regression models, we used the Huber-White method to adjust the variance-covariance matrix of a fit from maximum likelihood estimation.The p-values reported in the paper were from this robust variancecovariance matrix estimation.Only those features that were statistically significant (defined as p < 0.05) were considered.Only one feature from each physiological parameter (original or derived) was selected, unless there were two features that were both highly significant and weakly correlated (correlation coefficient < 0.2).There were 13 features that met our first group of criteria (shown in Table 2).Matlab 7.8 (The Mathworks, Inc., Natick, MA) was used for statistical analysis.

514
Hemodynamic Instability Prediction Through Continuous Multiparameter Monitoring in ICU
In order to assess the correlation structure of these selected features, a hierarchical tree was constructed (Figure 1) using the R function varclus from package DESIGN to draw the dendrogram depicting the clusters, and chose the matrix of squared Spearman rank correlation coefficients as the similarity matrix (R, http://www.r-project.org/).The Spearman correlation matrix was chosen due to some obviously non-Gaussian variables such as p50_abs_dHR.One feature was selected from each uncorrelated sub-tree as candidate feature based on their ROC area values.The best combination of four features based on ROC area after testing on all possible combinations of the features was p90HR_div_MAP, p50_abs_dHR, p95_5_HR_prod_SAP, and p5SAP, where p90HR_div_MAP was from sub-tree p50HR, p90HR_div_SAP and p90HR_div_MAP; p50_abs_dHR was from sub-tree p50_abs_dHR and p10HR_slope_4min; p95_5_HR_prod_SAP was from sub-tree p95_5_HR_prod_SAP, p90_10_ECO, and p10ECO_slope_16min; p5SAP was from sub-tree stdHR_div_SAP, stdHR_div_MAP, p5MAP, p5SAP, and p10SAP_slope_proj_8min.
We also compared performance on various subsets of the selected features by multivariable logistic regression models.By setting different instability index thresholds, one can adjust the tradeoff between sensitivity and specificity and maximize their summation.We define ss1 = sensitivity + specificity-1, also known as Youden's index [17], as an indicator of classification accuracy.In addition, in order to maintain a low false alert rate, we set a threshold for feature values so that the targeted specificity was 0.90, and computed the corresponding sensitivity.
In order to show the advantages of our predictive algorithms, we compared the classification results using conventional alerts for physiological parameters outside of the normal ranges.The current clinical practice uses single systolic or mean blood pressure to alarm for potential hemodynamic instability.However, there are no universal thresholds for blood pressure levels.According to the guideline for septic patients [6], for example, the rule is systolic pressure < 90 mmHg or mean pressure < 65 mmHg.Therefore, we tested the classification performance using these two rules.

516
Hemodynamic Instability Prediction Through Continuous Multiparameter Monitoring in ICU Spearman correlation ρ 2 of the selected features.The features can be partitioned into four uncorrelated groups.One feature is selected from each group as a candidate feature.

Impact of prediction period on sensitivity
We defined "prediction period" as the length of time from the end of an unstable segment to the time of the start of the critical intervention.In the reference data, we used a prediction period of an arbitrarily 2 hours; i.e., the end of the 2-hour segments were 2 hours before intervention (and the beginning of the 2-hour segments were 4 hours before intervention).The prediction period of more than one hour was chosen because the intervention time recorded by the clinicians was often quite inaccurate, e.g., off by 30 minutes.On the other hand, if a prediction period was too long, the underlying physiological state might change.Therefore, a prediction period of two hours was chosen as falling in between these two extremes.Nevertheless, in order to assess how well our algorithms predicted deterioration as the prediction period was lengthened, we did an analysis for various length of prediction periods ranging from 0 to 12 hours (see RESULTS Section 3.2).

Impact of completeness of record on sensitivity and specificity
In real monitoring data, missing data is a common issue.It is important to develop algorithms to be robust enough to tolerate missing data when using a segment of data rather than single data points.The duration of missing data could range from one or a few consecutive points (a few minutes) to any extended length (e.g., hours, days).We assessed the impact of completeness of record on classification performance for both stable and unstable patients.We compared the sensitivity and specificity when the required record completeness was varied over a range of 25%, 30%, 33.3%, 40%, 50%, 60%, 75%, 80%, 90%, and 95%.As a special case, the data was downsampled to 5minute by 5-minute from minute-by-minute using median value of each 5-minute data (typically the median value of every 5 points if there was no missing data), and the impact of performance was also assessed.

Validation
We validated our regression models using bootstrapping.Bootstrapping is a name generically applied to statistical resampling schemes that allow uncertainty in the data to be assessed from the data themselves.The basic idea is, given n samples (y1, y2,…, yn) of a random variable Y, which has an unknown cumulative distribution F(y) = Prob {Y ≤ y}, to compute the statistic of interest and to assess how the statistic behaves over B repetitions of sampling with replacement.As an estimate of F(y), the empirical cumulative distribution Fn(y) can be estimated from repetitive sampling with replacement from the n observed data when the number of repetition, B, is large enough.Therefore, the uncertainty or accuracy of the statistic of interest can be estimated empirically using confidence intervals, standard errors (SE), etc.We used B = 100 repetitions of the bootstrap validation procedure for each model.The classification performance was assessed by both ROC and contingency analysis.The validation results were reported as mean ± standard error (SE).
In addition, we validated our algorithm by relaxing the data selection criteria to include segments: (1) with up to 50% of missing data instead of 4%; (2) for the whole length of stay, instead of the first 10 segments only for stable patients, in the reference dataset; (3) with 120 points in each two-hour segment in the validation dataset, instead of 128 points in the reference dataset, in order to reflect the 2-hour in real monitoring practice.However, since only a few unstable segments were gained after relaxing the completeness from 96% to 50%, the data expansion was not sufficient to serve as a validation dataset for unstable patients.Therefore, we only validated our algorithms on the expanded dataset for stable patients.

Alert rate
High false alarm rates are a major concern in the current ICU clinical settings.In order to assess the additional workload our alert algorithms might add to clinicians in the clinical settings, as well as to assess the effectiveness of the prediction of deterioration in real practice, we calculated and compared the alert rates for both stable and unstable patients.
In many cases for classification algorithms, a positive predictive value (PPV) is reported to assess the percentage or probability of true positives among all positives.However, in real-time monitoring cases, rather than those diagnostic screening tests, PPV is dependent upon the relative length of time the stable patients are exposed to the alerting algorithms compared to the unstable patients.It should also be stressed that the relative exposure times are not only dependent upon the size of the two populations, but also upon the average length of time the two groups are exposed to the algorithm.For

518
Hemodynamic Instability Prediction Through Continuous Multiparameter Monitoring in ICU instance, an exposure ratio of 5 could be due to a situation where there are an equal number of stable and unstable patients, but the algorithm runs 5 times longer on the stable patients (their complete stay) than on the unstable patients (episodes prior to interventions).Therefore, in this section we present the relative alert rates that are not as much affected by the ratio of the exposure time of stable and unstable patients.
Rule firings were defined as the time when the instability index of a segment exceeded the preset threshold.Because the data were largely correlated, the firing tended to occur consecutively and unnecessarily frequently.In order to address this problem, we defined an alert based on a refractory period as the following: after the first rule firing, new rule firings were suppressed until there were no more rule firings for a certain period of time (e.g., 2 hours).In other words, consecutive firings were counted as one alert, and a new alert was counted only when there was a gap of at least a period of time (e.g., 2 hours) between two firings.This specified period of silent time was defined as the refractory period.A two-hour refractory period was selected to ensure no consecutive firings from overlapping moving windows.
We calculated alert rates in three different ways.One way was to calculate an aggregate alert rate by dividing the total number of alerts of all patients by the total monitoring time (Alert rate 1).Suppose Fi is the number of alerts for patient i, and Ti is the monitoring time for the same patient.Alert rate 1 (AR1) is defined as: The second way (Alert rate 2, AR2) was to average the individual alert rates for each patient: The third way is to use a Poisson regression method described below.Since the events occurred at a particular rate within a particular amount of time, an appropriate way to obtain a mean alert rate is to use Poisson regression models for predicting the expected value of the count given a time frame (e.g., one day); i.e., log(E[Y]) = a+bX, where Y is the event count, E[Y] is the expected value of Y, and X is a vector of predictor variables.Therefore, we developed a third way of calculating alert rate using a Poisson regression model (using length of monitoring time as a predictor variable and number of alerts as dependent variable).For stable patients, the distribution of monitoring time was skewed to the right, resulting from long ICU stays of a few patients, so we took the logarithm of monitoring time as a predictor variable.For unstable patients, however, since the monitoring time was restricted to one to six hours only, the distribution is less skewed, and taking the logarithm of monitoring time as a predictor variable was not necessary.By setting the monitoring time as one day, the expected count of alerts per patient day can be determined.Alert rate 3 (AR3) is thus defined as: where X is monitoring time, a is the intercept, and b is the coefficient of X.

Journal of Healthcare Engineering
When calculating alert rates, instead of using non-overlapping windows, we used moving 2-hour windows shifted in 15-minute increments to mimic a possible real time clinical implementation of our algorithms.Before calculating the alert rates, the monitoring time for each patient needs to be computed.In general, the time when there was no data should be excluded from the calculation of monitoring time in order to avoid an artificial inflation of monitoring time and therefore an artificial low alert rate.This is not straightforward because of the presence of missing data and the nature of overlapping moving windows.To accurately compute monitoring time, we set the center of the 2-hour operating window as the working point, which starts at the beginning of each patient's HR and arterial blood pressure (ABP) data.As the working point moves ahead in 15-minute increments, if the 2-hour segment meets a minimum requirement of completeness (e.g., 50%), 15 minutes are added to the monitoring time.If the 2-hour segment has less than 50% of data, the 15-minute time is not added.For example, if we have 60-minute continuous data, ideally, the monitoring time is 1 hour.If we need to move 4 times for the working point to be out of the 1-hour region, and each time 15-minute time is counted, that makes exactly 60 minutes (4 × 15 minutes).

Classification
Clinicians consider increased HR and decreased BP to be early signs of patient deterioration.Thus we developed statistical HR and BP features including the slope and hypothesized that they would be useful in predicting hemodynamic instability.Figure 2 shows the boxplots of the four sample features calculated for stable and unstable patients.Not surprisingly, unstable patients had lower BP and higher shock index.Also, unstable patients had lower physiological HR variation, which was consistent with other studies demonstrating that low beat-to-beat HR variability was usually a sign of physiological derangement [11].Interestingly, we also found that unstable patients had slightly, but significantly higher rate pressure product, which might represent attempted physiological compensation -as BP decreases, HR often increases to compensate [18].
We also investigated the performance for multiple features.The results for the strongest performers for 2, 3 and 4 features are shown in Table 3.Using more than 4 features did not improve classification significantly (data not shown).Figure 3 shows the ROC curve when using the 4 features in Table 3 as predictors.Figure 4 shows an example of how the instability index changes before and after initiation of the intervention for an unstable patient.

Table 3. Classification performance results using multivariable logistic regression modeling (p < 0.05)
Note: Keys are the same as Table 2. ROC curve based on the same four features as in Figure 2 for classification when the reference dataset was used.The upper grey circle marks the optimal trade-off of sensitivity (0.74) and specificity (0.80) with ss1=0.54.The lower dark circle marks the sensitivity (0.55) when the specificity was targeted at 0.90.

Journal of
We compared the performance of our algorithm with conventional alerts for physiological parameters outside the normal ranges.Since there are no universal thresholds for blood pressure levels for defining hemodynamic instability, as an example, according to the guideline for septic patients [6], the rule is systolic pressure < 90 mmHg or mean pressure < 65 mmHg.The classification results using these two thresholds on our reference dataset are shown in Table 4. Compared to the classification performance from our predictive model, our results are better.

Table 4. Comparison of conventional alerts and the present predictive algorithms
95% confidence interval data are parenthesized "( )".

Impact of prediction period on sensitivity
Figure 5 shows that as the moving window approaches the time of deterioration (time 0), the instability index (top panel) increases, so does the proportion of unstable segments (bottom panel) that are correctly classified as unstable (i.e., sensitivity) by our algorithm (when specificity was targeted at 90%).It drops quickly when the prediction period is longer than three hours.For example, sensitivity drops from 0.58 for one hour before intervention to 0.46 for five hours before intervention.Such decrease in prediction power was consistent with the finding shown in hemodynamic instability advisory using hourly clinical electronic charting data [8].

Impact of completeness of record on sensitivity and specificity
The impact of completeness of record on the sensitivity and specificity was assessed using the following two different instability index thresholds obtained from logistic regression model and the reference dataset: (1) 0.20, when the maximum ss1 was reached, and (2) 0.29, when the specificity was targeted at 90% (i.e. when a threshold of instability index of more than 0.29 was chosen, at least 90% of stable segments were classified as "stable" by the model).The data for the whole length of stay for stable patients and 2 to 8 hours before intervention for unstable patients were used for analysis of impact of missing data.As shown in Table 5, when the required record completeness varied from 95% to 25%, the changes in sensitivity and specificity were small (≤ 2%), and there are a gain of 3449 (36.1%) 2-hour overlapping segments (shifted in 15-minute increments) for stable patients, and a gain of 548 (24.2%) for unstable patients.Note that the baseline specificity (95% completeness) was lower than shown earlier (Table 3) as a much earlier prediction period (2 to 8 hours, vs. 2 hours, before intervention) was used.
Hemodynamic Instability Prediction Through Continuous Monitoring ICU

Figure 5.
Instability index (top panel, mean±std) and probability density function of unstable segments correctly classified (viz.sensitivity) (bottom panel) as a function of time to intervention (i.e., prediction period).The specificity was targeted at 90%; i.e., 90% of the stable segments are less than the threshold of instability index 0.29, when the sensitivity was calculated.It shows that both the instability index and the proportion of unstable segments increase, as the prediction period approaches the time to intervention.The mean instability index is above the threshold of 0.29 when the prediction period is less than 5 hours, but not all unstable segments are classified as unstable because of the large standard deviation.Note that the "-n" labels on x-axis, representing the time to intervention, are actually 1 hour bins of n to n+1 hours before intervention.The two data points that fall between hour -1 and 0 on the x-axis are 30 minute and 15 minutes before intervention, respectively.

Table 5. Impact of completeness of record on classification indexes
Table 6 shows the impact of downsample from minute-by-minute data to 5-minute-by-5-minute data on the classification performance based on logistic regression modeling using the same 4 features described in Table 2.The first two rows are results from logistic regression models developed using 1-minute and 5-minute data, respectively, with the 4 selected features.The third row represents the performance on 5-minute data when the model was developed from 1-minute data and the instability index thresholds were 0.20 (when the maximum ss1 was reached) and 0.29 (when the specificity was targeted at 90%), respectively.The performance on less complete dataset and 5-minute data was similar to that of 1-minute data, indicating that our algorithm may not be sensitive to data completeness or sampling rate.

Table 6. Impact of downsampling on performance
Note: Keys are the same as Table 2.

Validation
We validated the logistic regression model via 100 repetitions of the bootstrap validation procedure using p90HR_div_MAP, p50_abs_dHR, p95_5_HR_prod_SAP, and p5SAP as predictors.The classification performance was assessed by both ROC and contingency analysis.The validation results were reported as mean ± standard error in Table 7.The tightness of the variance limit for these performance indexes indicates that they are reliable.The results of 10-fold cross validation were similar (not shown).
526 Hemodynamic Instability Prediction Through Continuous Multiparameter Monitoring in ICU Table 9. Alert rates for stable and unstable patients Note that the mean individual alert rate ratio was considerably lower (3.66)than that of the aggregate (5.08).This was largely due to the difference in alert rates for stable patients (0.95 vs. 0.63).We found that the distribution of the alerts for stable patients was nonsymmetrical, with a high density at 0 alerts (no alerts in 62.0% of all 15-minute moving windows), and a long tail toward a large number of alerts (with a maximum number of alerts of 10).The spectrum of the number of alerts for stable patients is shown in Table 10.In addition, 1 out of 79 stable patients had one alert, but if the length of stay for this patient was only 1.25 hours, the resulting alert rate for this patient was 19.2 per patient day.By just excluding this patient, the mean individual alert rate dropped to 0.72 from 0.95 per patient day.Mean individual alert rate appeared to be biased by both a non-Gaussian distribution and outliers (viz.patients with short lengths of stay) for the stable patients.Furthermore, for the mean individual alert rate calculation, a patient with no alerts for a long period of time (e.g., 5 days) counts equally with another patient with no alerts for a short period (e.g., 5 hours).Both of them have an alert rate of 0, and are therefore weighted the same, unlike in the aggregate alert rate calculation.The alert rates from the Poisson regression models, however, were not biased by non-Gaussian distribution of the number of the alerts.In addition, in order to suppress the bias caused by the outliers of individual length of stay, we took the logarithm of monitoring time for stable patients.Therefore, the alert rate from the Poisson regression is the best estimator of alert rate among the three estimators of alert rates.Given relative alert rates and exposure times, one can estimate the PPV as TP/(TP+FP), where TP is true positive, FP is false positive .For example, using the aggregate alert rates presented in this paper (Table 9) with an unstable-to-stable-alerts ratio of about 5:1 and assuming that the stable patients are exposed to the alerts 5 times longer than the unstable patients (in the aggregate), then the number of TP will be equal to that of FP, and therefore the PPV would be 0.5.

DISCUSSION
ICU clinicians often base clinical care decisions on multiple physiological parameters and their trends.Traditional alerting technologies use single parameter and single threshold to alarm clinicians for attention.This work suggests that there may be physiologic features that can be predictive of impending hemodynamic deteriorationand subsequent need for aggressive therapy -at least two hours prior to the onset of therapy.We found that hemodynamically unstable patients had lower BP, higher shock index, lower HR variability, and higher rate pressure product than stable patients.The latter three features are associated with physiological compensation mechanisms of low BP.For an optimal tradeoff of specificity and sensitivity, logistic regression achieved a specificity and sensitivity of 0.80±0.07and 0.75±0.06,respectively.When the specificity was targeted at 0.90, the sensitivity was 0.57±0.07.By targeting at a high specificity, the false alert rate decreased.The present algorithm appeared robust to missing data.The aggregate alert rate was 0.63 and 3.20 per patient day for stable and unstable patients, respectively, and the alert rate ratio of unstable vs. stable patients was 7.62.

Feature extraction
When the same single features were used, the ROC results of the present algorithms were similar to the results of one of the winners of the Computers in Cardiology Challenge 2009 [19], even though there were a number of differences between the two (for example, to predict well-defined and hand-picked cases of acute hypotensive events rather than clinical intervention of pressor, and different prediction windows of one hour vs. two hours).One of the strengths of our algorithms is that it enables using multiple features to predict patient deterioration, and the classification performance for multi-feature was better than single feature, e.g., sensitivity of 0.55 (4 features extracted from HR and BP, Table 3) compared to 0.43 (using best single feature of p90HR_div_MAP, Table 2) when specificity was targeted at 0.90.
In addition to the features described above, we explored a number of simple and complex features (e.g., from wavelet transform of different scales), but there was no significant improvement in performance.Saeed et al. introduced a predictive algorithm using symbolic representations of wavelet representations of hemodynamic time series and the MIMIC II database [14,20].The performance results of the present study were similar to theirs.However, given the limitations of our reference data set, it is premature to conclude that complex features such as ECG signal and beat-to-beat HRV have nothing to contribute.In particular, the features utilized in the present study were based on features chosen a priori that are likely to be physiologically meaningful.Robust "unsupervised" data mining techniques with MIMIC II may reveal hitherto Journal of Healthcare Engineering • Vol. 1 • No. 4 • 2010 undiscovered features that may be predictive of one or more classes of hemodynamic instability.

Clinical relevance of hemodynamic control and reasons for false negatives and positives
Reduced BP and its downward trend are the main indicators of hemodynamic instability in clinical practice, consistent with our findings.Shock index, defined as the ratio of HR to BP, was the primary predictor of hemodynamic instability in our findings, but not widely used in clinical practice.Shock index was not correlated with SAP (correlation coefficient < 0.1).Low HR variability has been associated with various disease states [11], and we found that unstable patients tended to have lower physiological HR variation than stable patients, a result consistent with the findings of the studies that used HR and BP spectra to assess the autonomic cardiovascular regulation [21].Rate pressure product was reported to be associated with mortality in trauma ICU patients [22].This combination of features yielded a better classification performance than a single feature such as BP (ROC 0.82 vs. 0.75).
In order to improve our prediction algorithms, we attempted to determine the reasons for false negatives and false positives by examining the nurses' notes for additional information about patients, such as diagnosis, surgery, and medication.The following summarizes the reasons.
We identified two categories of reasons for false negatives where our algorithms failed to classify unstable segments as unstable.First, some patients simply deteriorated too fast (within 2-hour window) to be caught by our algorithms.Second, there was background information available to the clinician but not to the algorithms, and the clinicians intervened due to patients' conditions other than hemodynamic instability.These conditions were confirmed by reviewing the individual patient's medical records, and included the following: (a) Intra Aortic Balloon Pump (IABP) patients who were hemodynamically unstable by definition, but their physiological parameters of HR and BP may appear stable, and our data selection did not exclude them.Thus a number of false negatives were due to our inability to exclude IABP patients in our data selection.(b) Patients receiving a vasopressor after cardiac surgery to manage mild to severe hypotension, or to overcome the excessive effect of nitroprusside during high blood pressure control, rather than hemodynamic deterioration.(c) Head injury and spinal cord injury patients who receive neosynephrine and labetalol simultaneously to control BP and to ensure sufficient cerebral perfusion pressure (CPP).(d) Head injury and spinal cord injury patients with vasopressors as a therapeutic measure in order to maintain CPP above 70 mmHg in the phase of elevated intracranial pressure (ICP) by keeping the BP at elevated levels (e.g., mean BP > 80 mmHg or systolic BP > 95 mmHg) [23,24].Thus, a number of false negatives (above cases (b) to (d)) were noted because patients received vasopressors for therapeutic purposes not related to hemodynamic instability.These patients were not hemodynamically unstable from a clinical perspective.For instance, the head injury patients in our study often received vasopressor as a therapeutic measure to maintain CPP in the setting of increased ICP.In these cases, the administration of vasopressor therapy did not coincide with true hemodynamic instability.However, since we used administration of vasopressors as the 530 Hemodynamic Instability Prediction Through Continuous Multiparameter Monitoring in ICU criterion of instability (rationale of using this criterion is given in METHODS Section 2.1), these patients were labeled as unstable in our dataset, which confounded the present results.
The main reason for false positives, where our algorithm failed to classify stable patients as stable, was when a low BP (e.g., SAP < 90 mmHg) was accompanied by a consistently high HR (e.g., HR > 100 bpm), resulting in a high instability index, since one of the major predictors was the ratio of HR to BP (viz.p90_HR_div_BP).This happened in 7 out of 10 patients whose instability index well exceeded the threshold when targeting a specificity of 90% (a threshold of 0.50 vs. 0.29, see RESULTS Section 3.3) using our predictive algorithm.However, physicians sometimes missed these events either because they failed to notice this pattern indicating a need for intervention and the patients recovered later on, or they did not consider it to be serious enough to initiate an intervention that met our definition (viz.major vasopressor administration).Instead, the clinicians may have chosen a less aggressive treatment for the episode of hemodynamic instability that did not meet our absolute definitions for an intervention (e.g., an infusion of 100cc/hour of normal saline for several consecutive hours).Other reasons of false positives included positional or dampened waveform data arising from the arterial line collecting BP data [25].In this case, the monitor continued to report artificially low BP values, but the clinicians at the bedside recognized the condition from the resulting waveform and chose not to initiate therapy knowing the arterial line data was falsely reporting low BP.Ideally, the BP data from the overdampened waveform should be from the database using waveform data.However, it has also been demonstrated that the dampened data could also be removed using arterial BP time series and the data usefulness could be potentially improved [26].This will be one area for our future algorithm Our next steps include improving instability by combining clinical lab data with data, as well validating the algorithms on new datasets.

CONCLUSIONS
Timely identification of patients who are likely to become hemodynamically unstable would enable earlier intervention which will limit organ associated with low perfusion events in the ICU.The algorithms presented in this work based on trend (minute-by-minute) data of vital signs could form the basis for reliable predictive clinical alerts which identify patients likely to become hemodynamically unstable so the clinicians can proactively manage these patients and reduce the number of hemodynamic instability events, leading to improved patient care and outcome.

Table 1 . Original and derived physiological parameters
t: time index.