Risk of Acute Respiratory Distress Syndrome in Community-Acquired Pneumonia Patients: Use of an Artificial Neural Network Model

This study aimed to explore the independent risk factors for community-acquired pneumonia (CAP) complicated with acute respiratory distress syndrome (ARDS) and to predict and evaluate the risk of ARDS in CAP patients based on artificial neural network models (ANNs). We retrospectively analyzed eligible 989 CAP patients (632 men and 357 women) who met the criteria from the comprehensive intensive care unit (ICU) and the respiratory and critical care medicine department of Changzhou Second People's Hospital, Jiangsu Provincial People's Hospital, Nanjing Military Region General Hospital, and Wuxi Fifth People's Hospital between February 2018 and February 2021. The best predictors to model the ANNs were selected from 51 variables measured within 24 h after admission. By using this model, patients were divided into a training group (n = 701) and a testing group (n = 288 patients). Results showed that in 989 CAP patients, 22 important variables were identified as risk factors. The sensitivity, specificity, and accuracy of the ANNs model training group were 88.9%, 90.1%, and 89.7%, respectively. When ANNs were used in the test group, their sensitivity, specificity, and accuracy were 85.0%, 87.3%, and 86.5%, respectively; when ANNs were used to predict ARDS, the area under the receiver operating characteristic (ROC) curve was 0.943 (95% confidence interval (0.918–0.968)). The nine most important independent variables affecting the ANNs models were lactate dehydrogenase (100%), activated partial thromboplastin time (84.6%), procalcitonin (83.8%), age (77.9%), maximum respiratory rate (76.0%), neutrophil (75.9%), source of admission (68.9%), concentration of total serum kalium (61.3%), and concentration of total serum bilirubin (50.4%) (all important >50%). The ANNs model and the logistic regression models were significantly different in predicting and evaluating ARDS in CAP patients. Thus, the ANNs model has a good predictive value in predicting and evaluating ARDS in CAP patients, and its performance is better than that of the logistic regression model in predicting the incidence of ARDS patients.


Introduction
Community-acquired pneumonia (CAP) is one of the most common infectious diseases in the world, and the nosocomial mortality of CAP patients is about 13% [1]. Studies have indicated that about 21% of patients will develop severe CAP and need treatment in the intensive care unit (ICU), 26% of patients in the ICU need mechanical ventilation, and 29% of CAP patients will develop acute respiratory distress syndrome (ARDS) [2]. Te mortality in severe CAP patients with concurrent ARDS is up to 30%, which may be related to the poorly recognized pathophysiology of ARDS [3]. Furthermore, the mortality of patients with ARDS associated with CAP is independently associated with delayed admission to the ICU, an increase in medical costs, and a decrease in long-term quality of life [4].
Prediction of ARDS in CAP patients is mainly based on their clinical symptoms, degree of hypoxia (arterial blood gas analysis), lung imaging fndings, and reliable biomarkers [5]. With the development of techniques used for the detection of biomarkers that can refect the pathophysiological mechanism of diseases and the introduction of the US-European Consensus Standard [6], some feasible biomarkers have been identifed to be used to predict the concurrent ARDS in CAP patients, such as plasma endocrine proteins [7], T lymphocytes [8], interleukin-8 [4], neutrophil traps [9], and angiogenesis-2 [10]. However, the use of biomarkers remains controversial. An artifcial neural network (ANNs) model is a nonlinear mathematical model, and its unique working principle in the analysis of characteristics of data has almost no restrictions, which helps to ft complex multifactorial diseases with good sensitivity and specifcity. Tus, ANNs have been used in the diagnosis and prognostic analysis of clinical diseases [11].
No risk prediction models have been proposed to predict the clinical ARDS in CAP patients. Terefore, it is particularly important to assess the risk of concurrent ARDS in CAP patients. In this retrospective study, a predictive model was constructed to predict the concurrent ARDS in CAP patients, which may provide information for the prevention of ARDS in CAP patients.

Ethics Statement.
Tis retrospective case-control observational study was conducted according to the guidelines of the Declaration of Helsinki and was approved by the ethics committee of our hospital. Tis was a retrospective study that was approved by the institutional review board, but patient-specifc informed consent was not obtained. Tis study was approved by the Ethics Committee of Changzhou Second People's Hospital, which is afliated with Nanjing Medical University (IRB: 2020YLJSE086). Furthermore, all the data were provided only to investigators with privacy protection. All the raw data were collected according to the procedures outlined in the epidemiological guidelines. Diagnostic criteria for CAP in China [1] are as follows: (a) it was acquired in the community. (b) Tere were pneumonia-related clinical manifestations such as (1) recent aggravation of cough, sputum, or existing respiratory disease, with or without concentrated sputum/chest pain/ dyspnea/hemoptysis; (2) fever; (3) pulmonary consolidation signs and (or) wet rales; (4) peripheral white blood cells >10 × 10 9 /L or <4 × 10 9 /L, with or without left nucleus migration. (c) Chest imaging revealed a newly patchy infltrating shadow, leaf/segment solid contrast, ground-glass opacity, or interstitial changes with or without pleural effusion. A clinical diagnosis was established once it met one of the a, c, and b characteristics, when pulmonary tuberculosis, pulmonary tumors, noninfectious pulmonary interstitial disease, pulmonary edema, atelectasis, pulmonary embolism, pulmonary eosinophil infltration, or pulmonary vasculitis were excluded. Berlin 2012 diagnostic criteria for adult ARDS [12]: (1) time: within 1 week of known clinical onset or aggravation; (2) thoracic imaging fndings: double lung density, pleural efusion, lobe/lung collapse, or nodules not fully explained on X-ray or CT; (3) causes of pulmonary edema: respiratory failure not fully explained by heart failure or fuid overload; (4)  According to the 51 clinical risk factors of CAP patients with statistically signifcant were recorded as follows: age, gender, source of admission (emergency, outpatient), maximum temperature (MT), maximum heart rate (MHR), maximum systolic blood pressure (MSBP), maximum respiratory rate (MRR), urine volume within 24 h, complement C4 (C4), hypertension, diabetes, c-reactive protein (CRP), procalcitonin (PCT), erythrocyte sedimentation rate (ESR), white blood cell count (WBC), neutrophil count (NEUT), lymphocyte count (LYM), eosinophil count (EO), fbrinogen equivalent unit (FEU), fbrinogen (FBG), activated partial thromboplastin time (APTT), alkaline phosphatase (ALP), albumin (ALB), total protein (TP), total bilirubin (TBIL), prealbumin (PA), alanine aminotransferase (ALT), aspartate aminotransferase (AST), lactate dehydrogenase(LDH), creatine kinase isoenzyme (CK-MB), troponin I (TNI), B-type natriuretic peptide (BNP), creatinine (CREA), blood urea nitrogen (BUN), uric acid (UA), red blood cell count (RBC), hemoglobin (HGB), platelet (PLT), glucose (GLU), total serum kalium (K + ) level, total serum natrium (Na + ) level, total serum magnesium (Mg 2+ ) level, fraction of inspiration O 2 (FiO 2 ), potential of hydrogen (pH), oxygen partial pressure (PaO 2 ), partial pressure of carbon dioxide (PaCO 2 ), lactic acid (LAC), glasgow coma scale (GCS) score, nutritional risk score, lung injury score, and acute physiology and chronic health evaluation (APACHE). In addition, the gender, age, and source of admission (emergency department and outpatient department), the patient had the worst examination result within 24 h of admission.

Inclusion and Exclusion
Criteria. CAP patients: Inclusion criteria were as follows: the patients with the initial diagnosis of CAP within 24 h served as CAP patients, and CAP was diagnosed based on the criteria from the Respiratory Society of the Chinese Medical Association and the Guidelines for the Diagnosis and Treatment of Adult Community-Acquired Pneumonia in China (2020 edition). Exclusion criteria were as follows: (1) Tere was confrmed severe respiratory dysfunction before the onset of CAP, such as acute respiratory distress syndrome, acute respiratory failure, severe pulmonary edema, and acute exacerbation phase of chronic obstructive pulmonary disease; (2) patients were admitted to hospital for pneumonia more than 2 times or patients required long-term oxygen therapy after tracheostomy; (3) the patient was transferred from other departments to the general ICU or the Department of Respiratory and Critical Care Medicine during the hospitalization; (4) presence of hospital-acquired pneumonia during hospitalization; (5) cardiac pulmonary edema during hospitalization; (6) presence of other risk factors on admission, cancer, heart failure or kidney failure, blood disease, and tuberculosis; (7) the disease condition was stable or normal within 48 h after admission; (8) patients with >30% deletions in the clinical risk variables; (9) patients with missing data in the identifed clinical variables; (10) hospital stay <24 h; (11) incomplete clinical information.

Artifcial Neural Networks Model.
Te 3-layer network model, including an input layer, output layer, and hidden layer, is mainly used to analyze the data. Te independent variable Xi(i � 1, 2, 3 . . . n) is used as the input neurons; the dependent variable Y j (j � 0, 1) is the output neurons, and the output layer is ARDS (no ARDS � 0; ARDS � 1); its transfer parameters are expressed by the activation function identity. ANNs were conducted based on the building block of the single implicit layer with the classes separated through the following equation: where x represents the input, w i represents the weights, b represents the bias, and y represents the output.
With K as the number of hidden layers, all data are normalized by x � (X − X min)/(X max -X min). By gradually increasing and decreasing the number of neurons in the hidden layer, the number of hidden layer neurons that give the network sufcient generalization and output accuracy is selected. Finally, K is determined as 1 hidden layer including fve neuronal units. As synaptic weights, its transfer function is dominated by the hyperbolic tangent function and reported by the activation function tangent curve. All the data were divided into a training dataset and a validation dataset at 7 : 3 ratio. Te training dataset is used for network learning to build the prediction model, and the validation dataset is used to evaluate the performance of the model.

Logistic Regression Model.
Te logistic regression (LR) model is a generalized linear regression model, similar to the ANN model. In this model, the dependent variables serve as the output one, which is a binary variable ("no ARDS � 0," "ARDS � 1"). Te independent variables are the clinical risk factors as initial input ones, such as age, sex, heart rate, and hypertension. Independent variables can be continuous or categorical variables. In the logistic regression analysis, the weight of each independent variable can be obtained, and the risk factors for developing ARDS are determined. Meanwhile, the weight can be used to predict the likelihood of developing ARDS in a specifc person based on the risk factors. Te combination of each predictor was employed to predict the ARDS by a link function, logistic. Te dataset was randomly divided into training and validation groups at a 1 : 1 ratio and the dataset in the training group was used to construct the LR model.

Statistical Analysis.
All data were analyzed using SPSS version 26.0 statistical software. Data with normal distribution are expressed as mean ± standard deviation (X ± SD), and compared with an independent sample t-test between two groups. Data without normal distribution are expressed as medians (P25-P75), and compared with a nonparametric Kruskal-Wallis rank-sum test between the groups. Categorial data are expressed by frequency and rate, and compared between groups by the chi-square test. A value of P < 0.05 was considered statistically signifcant. ANN analysis was performed using SPSS Clementine11.1. Te LR and ANN models were established to predict the risk of developing ARDS in CAP patients. Predictive performance was evaluated by sensitivity (SEN) and specifcity (SPE). Dichotomous variables were created from continuous variables according to clinically important cut-of values. (MathWorks Institute, USA) was used to delineate the receiver operating characteristic (ROC) curves, and the area under the ROC curve (AUC) was calculated.

Patients' Characteristics.
A total of 2228 patients who were admitted due to the initial diagnosis of CAP were included in this study. Tere were 989 patients (632 men and 357 women) with a mean age of 68.48 ± 29.49 years were diagnosed with CAP alone, and 323 (32.7%) CAP patients developed ARDS (Figure 1). According to the exclusion criteria of clinical risk variables (1) the missing observed value of risk variables is >15%; (2) retain the most representative risk variables representing the same functional index; (3) the risk variable data is seriously skewed in distribution; (4) exclude the risk variables of blood gas analysis, and afect the accurate value of arterial blood gas analysis when using the ventilator. Finally, 25 clinical risk factors were collected for each patient such as gender, age, MHR, MT, MSBP, MRR, source of admission (emergency, outpatient), hypertension, diabetes, CRP, PCT, ESR, NEUT, EO, FEU, APTT, TBIL, ALB, LDH, CREA, HGB, PLT, GLU, K + , and Na + . Results of univariate analysis are shown in Table 1, and a P < 0.05 indicates the signifcant diferences between the ARDS groups compared with the non-ARDS groups.   numbers X 1 -X 19 , including the dependent variables X 1 -X 4 , gender X 1 , admission source X 2 , hypertension X 3 , and diabetes X 4 . Covariates X 1 -X 19 were entered sequentially, including age X 1 , heart rate X 2 , MRR X 3 , CRP X 4 , PCT X 5 , ESR X 6 , NEUT X 7 , EO X 8 , FEU X 9 , APTT X 10 , TBIL X 11 , ALB X 12 , LDH X 13 , CREA X 14 , HGB X 15 , GLU X 16 , PLT X 17 , K + X 18 , and Na + X 19 . Te output layer was ARDS (no ARDS � 0, ARDS � 1). Te number of hidden layers as interneurons was set as 5. Te topological stratifcation structure of the neural network model is plotted (Figure 2). Te BP neural network model was built using the training dataset. Te sensitivity, specifcity, and accuracy of the ANN model were 88.9%, 90.1%, and 89.7%, respectively, in the training group. Tis indicates that the ANN model has good recognition ability. In the test group, the sensitivity, specifcity, and accuracy of the ANN model were 85.0%, 87.3%, and 86.5%, respectively (Table 2). When the ANN model was used to predict ARDS, the AUC was 0.943, and the 95% confdence interval (CI) was 0.926-0.928 (Figure 3). In the training group and the ANNs model in the test group, the error rates of prediction were 10.3% and 13.5%, respectively, indicating that the two datasets have good accuracy in the prediction model. In our study, the ft of ARDS and non-ARDS predictions were compared with the Hosmer and Lemeshow test, and results showed that ARDS fts were better than those of non-ARDS, indicating that the ANN model is more suitable for predicting the occurrence of ARDS.

Comparison of ANN Model with LR Model.
Te evaluation metrics of BP-ANNs and LR models were compared. Te results showed no signifcant diferences in the SEN, SPE, accuracy, and AUC between them (P > 0.05). Te AUC was calculated in the LR and ANN models established using the validation dataset and used to identify the ARDS. Te AUC of the ANN model was 0.943 (95% CI: 0.918-0.968), and the AUC of the LR model was 0.942 (95% CI: 0.923-0.961) ( Table 4).

Discussion
CAP developed severe CAP and needs intensive care patients, the most common complication is ARDS. ARDS is a heterogeneous syndrome, including direct and indirect causes of lung injury, and pulmonary ARDS is a CAP sepsislike infammatory reaction and alveolar endothelial injury [13]. Some studies have found that pulmonary edema in ARDS is unspecifc, and may further increase the mortality of ARDS patients [14,15]. Tus, the development of a model  for the prediction of ARDS in CAP patients may be helpful for the early monitoring and management of severe diseases and the reduction of risk for ARDS in CAP patients.
Tis study for the frst time investigates the predictive model of ARDS in CAP patients with conventional variables. In this model, all objective and commonly used clinical variables collected within 24 h after admission were included. Te predictive model of ARDS in CAP patients constructed with the ANN model has good predictive and calibration power. In our study, LDH, APTT, PCT, age, MRR, NEUT, admission source, K + , and TBIL played important roles in predicting the occurrence of ARDS in CAP patients, and some previous studies have investigated the specifc risk factors of ARDS in CAP patients. As a key enzyme in the glycolytic pathway, LDH is a cytoplasmic enzyme in most organs, which is associated with an infammatory response and cellular damage. Zhou et al. found that bacterial or viral mRNA clearance is highly correlated with LDH level, and CAP patients infected by bacteria or viruses may have infammasome activation, induction of apoptosis, and invasive symptoms, which can partly explain the association of LDH with CAP and ARDS [16]. Zhou et al. also found that an elevated LDH level in admitted patients was strongly associated with the risk of developing ARDS [16]. Pathophysiologically, ARDS is mainly characterized by infammatory cell migration, fber proliferation, and apoptosis, the imbalance between hypercoagulability and infammation may lead to excessive infammation and accelerate the fbrin deposition in the alveoli [17]. CAP patients develop severe pneumonia, in which neutrophils gradually form external neutrophil traps, further increasing lung endothelial and epithelial cell damage, which leads to the occurrence of ARDS and acute respiratory failure. Immune thrombosis is a key manifestation of ARDS. Grasselli et al. found that the fbrin-rich exudates due to coagulation activation and inhibition were the core event in the pathophysiology of ARDS [18], and the coagulation function (fbrinolysis) was related to the development of ARDS. Studies have confrmed that ARDS patients have severe coagulation dysfunction, and the liver peak test has demonstrated a strong association of TBIL with ARDS in patients receiving mechanical ventilation in the ICU [19]. It has been shown that pneumonia is unlikely to be responsible for the elevation of procalcitonin. However, the elevated PCT may be related to the longer duration of mechanical ventilation in patients with severe pneumonia in the ICU [20]. Tang et al. found that procalcitonin was related to the acute exacerbation of infammation and could be used to assess the severity of CAP as a risk factor for ARDS [21]. Te immunity may gradually compromise with age, easily leading to bacterial and viral invasion [22], which is also confrmed by the signifcantly older age of CAP patients with ARDS in this study as compared to those without ARDS. Pensier et al. employed protective ventilation to improve lung tension, which is conducive to the further improvement of ARDS [23]. Te source of admission is also a risk factor for ARDS in CAP patients, and the incidence of ARDS is signifcantly higher in patients admitted to the emergency department than in those of other sources [24]. In patients with CAP secondary to sepsis or ARDS, impaired hypoxic pulmonary vasoconstriction (HPV) may lead to fuid perfusion mismatching and hypoxia. Te voltage-gated potassium channels have been shown to be one of the key regulators of HPV. ATP-sensitive potassium channels increase in case of endotoxemia and are also involved in the pathogenesis of alveolar epithelial barrier failure, explaining the importance  of potassium in the ARDS [25,26]. Fu et al. reported that high body temperature, high systolic blood pressure, and diabetes were not associated with the development of ARDS [27]. Our results were consistent with those reported in available studies. In addition, some studies have mentioned that low levels of albumin, hemoglobin, and fbrinogen are risk predictors of ARDS [28]. Tese metrics were also used as potential predictors, and new predictors were added to the aforementioned factors. Furthermore, the study by Dzierba et al. concluded that platelet count decreased and ARDS occurred in patients with septic shock, and they were unrelated to the occurrence of ARDS in the nonseptic shock subgroup [29], which was inconsistent with our fndings. Te role of platelets in the pathogenesis of ARDS may be probably mediated by platelet-related infammatory responses and disseminated intravascular coagulation.   Figure 4: Normalized importance distribution of the infuence of input layer independent variables on the output layer in a concurrent ARDS prediction model for CAP patients based on an artifcial neural network model. Note. CAP, community-acquired pneumonia; ARDS, acute respiratory distress syndrome.

Emergency Medicine International
Trombocytopenia is a key feature of the systemic infammatory response [30,31], which was confrmed in all ARDS patients as compared to non-ARDS patients in this study. CAP patients have a high risk for ARDS. In the present study, a predictive model was established to predict ARDS in CAP patients. Te previous pre-established ARDS model based on diferent risk factors on admission has a good predictive ability and focuses on the prediction of ARDS in CAP patients. Te clinical outcomes and biomarker characteristics of CAP patients with ARDS difer from patients with ARDS unrelated to non-CAP risk factors, which refects the unique potential clinical factors and the special pathogenic mechanism of CAP [22]. In the present study, the BP-ANNs model was compared with the LR model, and results showed the two models were compared in the SEN, SPE, accuracy, and AUC (P > 0.05). ANN was used in this study. Compared with the traditional LR, ANN is a nonlinear mathematical model, and its unique working principle has almost no restrictions on the characteristics of data used for analysis, which helps ft complex multifactorial diseases and has good sensitivity and specifcity [11]. Terefore, ANN was employed to construct a model for the prediction of ARDS in CAP patients, and results showed its AUC was 0.943, showing a good predictive performance. Based on the ANN method, the predictive power reached 89.7% in the trained dataset and 86.5% in the verifed dataset. Te decrease in predictive power in the trained dataset may be related to the small sample size. In this study, the number of predictors for the proportion of ARDS was limited, aiming to avoid this bias. Although this only slightly afected the predictive power of the model, it had a large impact on the model calibration. Overall, the predictive model of this study systematically overestimated the risk of developing ARDS due to the relatively small sample size in the cohort.
Tere were still limitations in this study: First, the sample size was small in patients with ARDS, and thus more clinical studies with a large sample size are needed to improve the accuracy of the prediction model and confrm our study. Second, some predictors have been repeatedly mentioned in the studies, such as smoking, body mass index, acute physiology, and chronic health evaluation II, complement C3, but they were not included as potential predictors in this study. Tird, there was an overlap between some predictors in this study (FEU, APTT). However, the aim of this study was not to establish an independent association between risk factors and ARDS. Instead, our study aimed to determine the combination of variables that can achieve the best predictive performance for CAP-related ARDS. Fourth, the case size in the present study was not very well established, due to the heterogeneity in the ARDS heterogenicity, compromising the identifcation ability of these predictors. Fifth, early lack of CAP patients with a conclusion urea respiratory rate and age 65 (CURB-65) for grouping, To further predict the high-risk group (CURB-65 score 3 points) model. Sixth, causal inference is an important aspect of machine learning [32], but the current predictors in this study are not necessarily causal factors for ARDS. Terefore, more prospective multicenter randomized controlled studies with a large sample size are warranted to confrm our fndings in the future.

Conclusions
In conclusion, the predictive model constructed in this study based on the ANN model using the indicators collected early after admission can be used to calculate and stratify ARDS in CAP patients. Specifcally, the model can be used to calculate risk and intervene in the early targeting of meaningful markers in this study. Te model may provide a reference for the early allocation of medical resources and help to guide the clinical management of CAP patients.

Data Availability
Te data used to support the fndings of this study are available from the corresponding author upon request.

Conflicts of Interest
Te authors declare that they have no conficts of interest regarding the publication of this paper.