Sentiment Analysis Based on the Nursing Notes on In-Hospital 28-Day Mortality of Sepsis Patients Utilizing the MIMIC-III Database

In medical visualization, nursing notes contain rich information about a patient's pathological condition. However, they are not widely used in the prediction of clinical outcomes. With advances in the processing of natural language, information begins to be extracted from large-scale unstructured data like nursing notes. This study extracted sentiment information in nursing notes and explored its association with in-hospital 28-day mortality in sepsis patients. The data of patients and nursing notes were extracted from the MIMIC-III database. A COX proportional hazard model was used to analyze the relationship between sentiment scores in nursing notes and in-hospital 28-day mortality. Based on the COX model, the individual prognostic index (PI) was calculated, and then, survival was analyzed. Among eligible 1851 sepsis patients, 580 cases suffered from in-hospital 28-day mortality (dead group), while 1271 survived (survived group). Significant differences were shown between two groups in sentiment polarity, Simplified Acute Physiology Score II (SAPS-II) score, age, and intensive care unit (ICU) type (all P < 0.001). Multivariate COX analysis exhibited that sentiment polarity (HR: 0.499, 95% CI: 0.409-0.610, P < 0.001) and sentiment subjectivity (HR: 0.710, 95% CI: 0.559-0.902, P = 0.005) were inversely associated with in-hospital 28-day mortality, while the SAPS-II score (HR: 1.034, 95% CI: 1.029-1.040, P < 0.001) was positively correlated with in-hospital 28-day mortality. The median death time of patients with PI ≥ 0.561 was significantly earlier than that of patients with PI < 0.561 (13.5 vs. 49.8 days, P < 0.001). In conclusion, sentiments in nursing notes are associated with the in-hospital 28-day mortality and survival of sepsis patients.


Introduction
Sepsis, a syndrome of life-threatening physiologic, pathologic, and biochemical dysfunction due to uncontrolled responses to infection, is one of the leading causes of deaths in intensive care units (ICUs) [1]. Despite advances in care, sepsis remains among the costliest diseases, approximately accounting for over 20 billion (5.2%) of total United States (US) hospital costs [2]. In the US, admission for sepsis has overtaken that for stroke and myocardial infarction [3]. According to statistics, the prevalence of sepsis is up to 535 cases per 100 100,000 person-years and on the rise [4]. Population-level epidemiological data show that there are 31.5 million cases of sepsis and 19.4 million cases of severe sepsis worldwide, with 5.3 million potential deaths each year [5], and the in-hospital mortality reaches up to 25%-30% [6].
Currently, severity of illness scores (SOI) is usually used to predict mortality in ICUs. The SOI system is established according to the coded data of patients' demographics, vital signs, and laboratory results usually accessed from the electronic health records, but there also exist unstructured data in the electronic health records, such as clinical notes written by clinicians which are not frequently used for predicting mortality [7]. Studies have demonstrated that clinicians can properly predict mortality in ICUs [8,9]. Thus, their notes may provide some important information for patients' health status assessment. A previous study showed that the sentiment of clinicians towards patients could be evaluated by sentiment analysis, a method to classify the subjective properties of written text [10]. Sentiments measured in clinical notes are different according to demographic features and clinical outcomes [10]. There are studies suggesting that sentiments measured in clinical notes are associated with hospital readmission and mortality [11,12].
In this study, we investigated the association of sentiments in nursing notes with the in-hospital 28-day mortality of sepsis patients based on the Medical Information Mart for Intensive Care (MIMIC-III) database, a freely accessible critical care database, aimed at providing some evidence for the improvement of patients' outcomes in ICUs.

Study Population.
The data of patients and nursing notes were accessed from the MIMIC-III database developed by the MIT Lab for computational physiology. As an openly available dataset, MIMIC-III contains deidentified health data related to approximately 60,000 ICU admissions, including demographics, laboratory tests, medications, vital signs, transcribed nursing notes, diagnostic and procedure codes, fluid balance, length of stay, survival data, and others [13]. The inclusion criteria of this study were as follows: (1) patients diagnosed with sepsis, severe sepsis, and septic shock (International Classification of Diseases 9 (ICD-9) codes: 99591, 99592, and 78552) in the MIMIC-III database and (2) 15 years old or above at hospital admission. The exclusion criteria were as follows: (1) notes identified by physicians as errors, (2) notes written less than 12 hours before the time of death, and (3) patients without any data of nursing notes.
The data used in this study were obtained from the MIMIC-III database (https://mimic.physionet.org/), an openly available dataset. The data collection in the MIMIC-III was approved by the Ethics Review Board of the Beth Israel Deaconess Medical Center, and all private information has been desensitized.

Sentiment
Analysis. Two techniques (syntactic and sematic) are mainly used to classify and compute the sentiment polarity in text [14]. A semantic approach means that the sentiment is extracted based on text meaning and is commonly obtained using a classifier [14]. To make inferences based on text structural features, this study employed a syntactic technique to extract sentiments.
Both the Python programming language and TextBlob natural language processing library were adopted to compute sentiment scores for the nursing notes [15]. The sentiment of text strings was computed using the pattern module in TextBlob.
The pattern comprised a lexicon for various English language adverbs and adjectives able to be mapped to three dimensions of sentiment scores: polarity, subjectivity, and intensity [16]. The sentiment polarity was returned using TextBlob with a score from -1 to 1, and the sentiment subjectivity was returned with a score from 0 to 1. Higher scores showed more positive, subjective sentiments. In this study, both the polarity score and subjectivity score were assigned for each nursing note, and the scores were computed through establishment of a Text-Blob object initialized with nursing note strings and extraction of sentiment attributes from the object [7]. The mean scores of sentiment polarity and subjectivity in nursing notes written during hospitalization were calculated for the first hospital admission of each patient and then used as predictors in the model of this study. For an example of sentiment polarity scores using TextBlob, see Table 1. 2.3. Mortality and Survival Assessment. As a common predictor of ICU mortality, Simplified Acute Physiology Score II (SAPS-II) is a composite score, including 17 variables (age, 12 physiology variables, type of admission, and 3 underlying disease variables) [17]. In this study, the SAPS-II score was calculated by the data from the MIMIC-III database and SQL scripts in the MIT Lab for computational physiology git repository. Additionally, gender and ICU type were also enrolled as variables because they were freely accessed from the MIMIC-III database, but not involved in SAPS-II. Survival was defined as the number of days from hospital admission to death or right-censoring time. and Python text analysis (version 3.7). Normally distributed data were compared by the t-test and manifested as mean ± standard deviation ( x ± s); abnormally distributed data were compared with the Mann-Whitney U rank-sum test and presented as median and quartile (M (Q1, Q3)). Enumeration data were compared by the χ 2 test, with n (%) as the manifestation. The COX proportional hazard model was used to analyze the relationship between sentiment scores in nursing notes and the in-hospital 28-day mortality of sepsis patients. The size power of our study was 0.858.
The common type of the COX model was hðtÞ = h0ðtÞ exp ðX ′ βÞ, in which h0ðtÞ and hðtÞ represented the datum risk function and the risk function at t time point, respectively, X was the covariate vector quantity, and β was the unknown vector quantity of the regression coefficient. The formula of the individual prognostic index (PI) was PI = X 1β1 + X2β2 + ⋯+Xkβk. Based on the COX model, the individual PI was calculated. The greater the individual PI, the worse the prognosis. The survival curves were compared using a log-rank test. Box plot, histogram, and forest plot in our study were plotted with Python software. The power analysis was carried out to assess the statistical power (1 − β) using PASS 15.0 software (NCSS, LLC). The results showed that the power values of the sentiment polarity score and sentiment subjectivity score were all 1.000. It was indicated that our findings performed well reliability. A significant difference was shown at P < 0:05.   3 Computational and Mathematical Methods in Medicine patients were eligible for the study, among whom 580 patients suffered from in-hospital 28-day mortality from the date of ICU admission (dead group), while 1271 patients survived (survived group). The baseline characteristics of the two groups were compared as shown in Table 2, and the flowchart is presented in Figure 1.  Computational and Mathematical Methods in Medicine The sentiment polarity score of patients in the survived group was significantly higher than that in the dead group (P < 0:001), while the SAPS-II score was notably lower than that in the dead group (P < 0:001) (Table 2, Figure 2). The differences were significant between the two groups in age (P < 0:001) and ICU type (P < 0:001), but not in the sentiment subjectivity score (P = 0:340) and gender (P = 0:757) (Table 2, Figure 3).

Survival Analysis.
According to the individual PI, patients were assigned into the high-risk group (PI ≥ 0:561) and the low-and middle-risk group (PI < 0:561), and the survival curves are illustrated in Figure 5. It could be observed that the median death time of the high-risk group was significantly earlier than that of the low-and middlerisk group (13.5 vs. 49.8 days, P < 0:001).

Discussion
In the present study, a total of 1851 sepsis patients were eligible according to inclusion and exclusion criteria, among whom 580 cases suffered from in-hospital 28-day mortality, while 1271 cases survived. Multivariate COX analysis showed that sentiment polarity and sentiment subjectivity were inversely associated with in-hospital 28-day mortality. Based on the quartiles of the individual PI, patients were assigned into the high-risk group and the low-and middlerisk group. Survival analysis indicated that the high-risk group had earlier median death time compared with the low-and middle-risk group. These all suggested that the quantitative measurement of sentiments in nursing notes was associated with the in-hospital 28-day mortality and survival of sepsis patients; nursing notes containing rich information may serve as a potential predictor of clinical outcomes in the ICU.
To the best of our knowledge, brief fragments of the text are conducive to reflecting the author's feelings about a given topic. Recently, language processing tools have been developed and allow the characterization of feelings, such as the sentiment in text documents [18]. Sentiment is usually described as the relative positivity or polarity of a text string and is measured by a number from -1 (very negative) to 1 (very positive) [14]. It can also be interpreted as the estimated probability of "positive" or "negative" through a classifier. Sentiment analysis permits us to gain insights into the clinicians' emotions and attitudes towards patients through the subjective expressions made by clinicians in the text of clinical notes, thus contributing to the prediction of patients' outcomes [19][20][21][22]. In health-related fields, sentiment analysis has been widely applied to Cancer Survivors Network (CSN) breast and colorectal cancer discussion posts [23], health reforms on Twitter [24], encounter notes of patients with critical illness [25], etc. This study was aimed at identifying the association between sentiments in nursing notes and the in-hospital 28-day mortality of sepsis patients. The results exhibited that both sentiment polarity and sentiment subjectivity were inversely associated with in-hospital 28-   [11]. Based on the COX model, the patients with PI ≥ 0:561 were found to have a higher risk of death than those with PI < 0:561, highlighting the potential value of sentiments in survival analysis. A previous study has shown a strong association between sentiments and the risk of death even after adjustment for severity of illness and baseline information [25].
The superiority of the present study was that it was the first study to investigate the association between sentiments in nursing notes and the in-hospital 28-day mortality of sepsis patients. The nursing notes written less than 12 hours before the time of death were excluded, which made the results more reliable. However, the present study also had several limitations that should be cautiously interpreted. First, nursing notes from the MIMIC-III database with single-center samples may manifest different characteristics because of variations in clinicians, experience, training, or working environment, easily causing the results to be nongeneralizable. Second, the approach used to measure the sentiment in the present study was not the only approach available. Other techniques could produce different results, such as those based on the machine learning model to make semantic inferences. Third, the mean sentiment scores could only characterize the variations at the level of patients, but not at the levels of sentences, paragraphs, or documents. Forth, the nursing notes were recorded by caregivers who are research nurses, medical doctors, or so on (available at https://mimic.mit.edu/docs/iii/tab les/caregivers/). It cannot be determined whether the sentiments based on nursing notes are based on past or personal experiences. Moreover, the subtle difference in sentiments was not obtained over time. In the future, the temporal mode of nursing notes will be examined to gain more insights.

Conclusions
Sentiments in nursing notes are associated with the inhospital 28-day mortality and survival of sepsis patients, suggesting the importance of sentiments in nursing notes for the prediction of clinical outcomes in the ICU. Although predicting clinical outcomes is still a complex problem, the information extracted from unstructured data like nursing notes may contribute to further improving prediction performance.

PI:
Prognostic index SAPS-II: Simplified Acute Physiology Score II ICU: Intensive care unit SOI: Severity of illness scores MIMIC-III: Medical Information Mart for Intensive Care II ICD-9: International Classification of Diseases 9 HR: Hazard ratio TSICU: Trauma/surgical intensive care unit CSN: Cancer Survivors Network.

Data Availability
The data utilized to support the findings are available from the corresponding authors upon request. The data applied in the present study were from the MIMIC-III database (https://mimic.physionet.org/), a freely accessible database.

Ethical Approval
The data collected in the MIMIC-III was approved by the Ethics Review Board of the Beth Israel Deaconess Medical Center, and all private information has been desensitized.