Development and Validation of a Prediction Model of the Risk of Pneumonia in Patients with SARS-CoV-2 Infection

Objective To establish a prediction model of pneumonia risk in SARS-CoV-2-infected patients to reduce unnecessary chest CT scans. Materials and Methods The model was constructed based on a retrospective cohort study. We selected SARS-CoV-2 test-positive patients and collected their clinical data and chest CT images from the outpatient and emergency departments of Hunan Provincial People's Hospital, China. Univariate and multivariate logistic regression and least absolute shrinkage and selection operator (LASSO) regression were utilized to identify predictors of pneumonia risk for patients infected with SARS-CoV-2. These predictors were then incorporated into a nomogram to establish the model. To ensure its performance, the model was evaluated from the aspects of discrimination, calibration, and clinical validity. In addition, a smoothed curve was fitted using a generalized additive model (GAM) to explore the association between the pneumonia grade and the model's predicted probability of pneumonia. Results We selected 299 SARS-CoV-2 test-positive patients, of whom 205 cases were in the training cohort and 94 cases were in the validation cohort. Age, CRP natural log-transformed value (InCRP), and monocyte percentage (%Mon) were found to be valid predictors of pneumonia risk. This predictive model achieved good discrimination of AUC in the training and validation cohorts which was 0.7820 (95% CI: 0.7254–0.8439) and 0.8432 (95% CI: 0.7588–0.9151), respectively. At the cut-off value of 0.5, it had a sensitivity and specificity of 70.75% and 66.33% in the training cohort and 76.09% and 73.91% in the validation cohort, respectively. With suitable calibration accuracy shown in calibration curves, decision curve analysis indicated high clinical value in predicting pneumonia probability in SARS-CoV-2-infected patients. The probability of pneumonia predicted by the model was positively correlated with the actual pneumonia classification. Conclusion This study has developed a pneumonia risk prediction model that can be utilized for diagnostic purposes in predicting the probability of pneumonia in patients infected with SARS-CoV-2.


Introduction
SARS-CoV-2 infection has prevailed globally since 2020, accounting for recurring quarantines in many countries. It has signifcantly impacted public health and the global economy [1,2]. As of 10 February 2023, there have been 755,385,709 confrmed cases of COVID-19 reported to WHO globally, including 6,833,388 deaths. Omicron, the mutant strain, entered the community in November 2021 and is far more contagious and escape-resistant than the previous variants of concern (VOC), like Delta [3][4][5][6][7][8]. At the beginning of 2022, the Omicron version quickly surpasses the Delta variant as the prevalent strain worldwide [9].
However, recent studies have demonstrated that the most recent VOC, Omicron is much less likely to cause pulmonary infections [3-5, 15, 16], suggesting potential implications for adapting management strategies for these infections.
In clinical practice, we found that due to the apprehension of contracting severe pneumonia from the SARS-CoV-2, many people with mild symptoms are choosing to receive CT scans, causing excessive CT scans and putting a strain on the availability of healthcare resources, which is particularly true when SARS-CoV-2 localized epidemic outbreaks occur. Terefore, a strategy to evaluate the risk of pneumonia among recently infected people is essential to ensure the efcient use of medical resources and decrease unnecessary exposure to electromagnetic radiation.
Tis study is to improve the classifcation of pneumonia risk in individuals with the most recent VOC of SARS-CoV-2 infections. In this way, it can not only reduce the overuse of CT scans and nonessential ionizing radiation in individuals but also reduce the associated fnancial burden on patients and optimize the allocation of medical resources. As a result, we have developed and externally validated a pneumonia risk prediction model based on general patient data and blood routine tests, which meets the needs of the new phase of COVID-19 epidemic control.

Materials.
A retrospective analysis was performed on the clinical data of SARS-CoV-2 test-positive patients who visited outpatient and emergency departments and underwent chest CT scans at the Mawangdui Branch of Hunan Provincial People's Hospital from 20 December 2022 to 23 December 2022 and at the Tianxinge Branch of Hunan Provincial People's Hospital from 1 January 2023 to 4 January 2023. Te inclusion criteria were as follows: (a) attendance as an outpatient or emergency (not including inpatients); (b) patients had completed chest CT scans, and CT imaging data were available; (c) SARS-CoV-2 infection positive was diagnosed by antigen test or nucleic acid test within 3 days before the current chest CT; (d) complete blood routine examination results were obtained. Te exclusion criteria were as follows: (a) infammation of a body part other than the lungs had been diagnosed at the time of the current blood routine tests; (2) the patient was already on antiviral medication at the time of the visit. Te patient recruitment pathway is detailed in Figure 1.
Te study was conducted in accordance with the Declaration of Helsinki. It was approved by the Medical Ethics Committee of Hunan Provincial People's Hospital (Te First Afliated Hospital of Hunan Normal University), and patient informed consent was waived for this retrospective analysis.

Device Parameters and Image
Analysis. At the Mawangdui Branch (training cohort) of Hunan Provincial People's Hospital, CT scans were performed with a United Imaging uCT 760GE 128-slice CT using the following parameters: feld of view (FOV), 230 mm × 230 mm; layer thickness, 5 mm; and layer spacing, 5 mm. At the Tianxinge Branch (validation cohort) of Hunan Provincial People's Hospital, CT scans were performed with a United Imaging uCT 860 160-slice CT or a United Imaging uCT 960 + 640slice CT using the following parameters: feld of view (FOV), 230 mm × 230 mm; layer thickness, 5 mm; and layer spacing, 5 mm. Two attending radiologists conducted image analysis separately, and the fnal decision in case of a dispute was determined by consultation between the two physicians. CT diagnosis of COVID-19 was referred to the report published by the RSNA [17]. Typical fndings were as follows: peripheral distribution, ground-glass opacity, fne reticular opacity, vascular thickening, and reverse halo sign. Patients with pneumonia were also classifed into grades 0, 1, 2, 3, and 4 according to the extent and distribution of lung involvement (no lung involvement was categorized as grade 0).

Statistical Analysis and Construction and Evaluation of
Predictive Models. Statistical analysis was performed using Empower Stats, version 5.0 (https://www.empowerstats. com, X&Y Solutions, Inc., Boston, MA, USA), R statistical software, version 4.2.0 (https://www.R-project.org, Te R Foundation), and the SPSS statistical software, version 27.0 (SPSS Inc., Chicago, IL, USA) with continuity variables expressed as medians (min, max) and categorical variables expressed as frequencies (percentages). Kruskal-Wallis rank sum test or Fisher's exact probability test was used to compare diferences between groups of continuity variables. Te Chi-square test was used for comparisons of categorical variables. After the natural log transformation of some continuity variables, to reduce irrelevant and redundant information, the predictor variables of the training cohort were fltered using both "univariate and then multivariate logistic regression" and "least absolute shrinkage and selection operator (LASSO)" methods. Te variables selected by both screening methods were used as the fnal predictor variables. Te prediction model was constructed based on multivariate logistic regression and was presented in a nomogram. Te ROC curves were used, and 500 in eternal resamples were performed by Bootstrap to evaluate the discrimination of the pneumonia risk model between the training and validation cohorts. DeLong test and integrated discrimination improvement index (IDI) were used to compare the AUC of the pneumonia risk model with the AUCs for predictors incorporated in the model alone. Calibration curves were plotted to assess the calibration of the model. Te clinical validity of the model was evaluated by the net beneft of DCA at diferent threshold probabilities. In addition, a smoothed curve was ftted using a generalized additive model (GAM) to explore the relationship between the pneumonia grade and the model's predicted probability of pneumonia. A diference of P < 0.05 was considered statistically signifcant.

General Information.
A total of 205 patients were enrolled in the training cohort, of which 105 cases (51.22%) were female and 100 cases (48.78%) were male, 99 cases (48.29%) without pneumonia and 106 cases (51.71%) with pneumonia. Te median age of the training cohort was 47 years old, the youngest being 14 and the oldest being 97; a total of 94 cases were enrolled in the validation cohort, of which 60 (63.83%) were female and 34 (36.17%) were male, 47 (50.00%) were without pneumonia, and 47 (50.00%) were with pneumonia. Te median age of the validation cohort was 56 years old, the youngest being 2 and the oldest 89; the distribution of the remaining baseline indicators is shown in Table 1.
To lessen irrelevant and redundant information, the variables age, InCRP, and %Mon selected by both screening methods were taken as the fnal predictor variables.  Table 3. A comparison of the AUC and DCA for the pneumonia risk model, with predictors incorporated in the model alone in the whole study cohort, is illustrated in Figure 4, which shows that the pneumonia risk model combining multiple predictors has better diagnostic performance than a single predictor.

Correlation between the Predicted Probability of Pneumonia Risk and Pneumonia Grade.
We further explored the correlation between the predictive values of the pneumonia risk prediction model constructed in this study and the actual pneumonia severity rating. As mentioned in the method, patients with pneumonia were also classifed into grades 0, 1, 2, 3, and 4 according to the extent and distribution of lung involvement (no lung involvement was categorized as grade 0). Te actual pneumonia rating results are shown in Table 4. A positive linear correlation was found between the predicted pneumonia probability of the

5.
Patients excluded due to: No results of nucleic acid and antigen to SARS-CoV-2 detection within 3 days before the current chest CT. (n=69) The results of nucleic acid or antigen to SARS-CoV-2 detection were negative within 3 days before the current chest CT. (n=29) No blood routine test results or incomplete blood routine test results. (n=16) Inflammation of a body part other than the lungs has been diagnosed at the time of the current chest CT and routine blood tests. (n=3) Patient had already taken antiviral medication within 3 days at the time of the visit. (n=2) 1.

4.
5. pneumonia risk model and actual pneumonia grade using GAM ( Figure 5); see Figure 6 for examples.

Discussion
In this study, we constructed a pneumonia risk prediction model based on common, easily obtainable, and inexpensive clinical indicators such as "age," "InCRP," and "%Mon" to classify the pneumonia risk of patients infected with SARS-CoV-2. It provides an appropriate reference for clinicians in selecting chest CT examinations to reduce unnecessary medical ionizing radiation and alleviate patients' economic burden. Te model performs well in discrimination, calibration, and clinical efectiveness and can be widely applied for clinical use.

Analysis of the Rationality of Including "Age" in the Pneumonia Risk Prediction Model in Tis Study.
Te severity and fatality rates of COVID-19 signifcantly vary with age group, and they rise sharply in older people [18][19][20]. According to recent studies, the activation of the nucleotide-binding domain and leucine rich repeat containing family, pyrin domain containing 3 (NLRP3) infammasome, plays a role in lung infammation and fbrosis induced by SARS-CoV-2 infections [21]; the NLRP3 infammasome is excessively activated in older individuals due to impaired mitochondrial function, elevated levels of mitochondrial reactive oxygen species (mtROS), and/or mitochondrial DNA. Tis results in an exaggerated response from classically activated macrophages and subsequent increases in IL-1β [22]. Tis explains, to some extent, why elderly patients are more likely to have pneumonia after being infected with SARS-CoV-2 and also provides evidence for the rationality of including age as a predictive factor in our prediction model.

Analysis of the Rationality of Including "InCRP" in the Pneumonia Risk Prediction Model in Tis Study.
As a general indicator of infammation, CRP is associated with the clinical severity of COVID-19 [20,23,24]. CRP is an infammatory biomarker synthesized by the liver. Our results show that CRP levels are signifcantly elevated in

Analysis of the Rationality of Including "%Mon" in the Pneumonia Risk Prediction Model in Tis Study.
In our study, %Mon was partially associated with the risk of pneumonia, which is in accord with recent studies [27]. Monocytes are innate immune system cells that participate in several immune function events, including phagocytosis, antigen presentation, and infammatory responses [28]; circulating monocytes extravasate into peripheral tissues during sterile and nonsterile infammation and undergo diferentiation into macrophages or dendritic cells. A previous review article discussed the buildup of monocyte/macrophage cells in the lungs. Tese cells are likely sources of the proinfammatory cytokines and chemokines linked to deadly diseases brought on by human coronavirus infections, such as COVID-19 [29]. It suggests that the migration of monocytes into lung tissue may be the cause of the monocyte reduction in peripheral blood.
In previous relevant studies, additional factors, such as cardiovascular disease, hypertension, chronic respiratory disease, diabetes, obesity, and high serum ferritin levels, were found to be associated with the progression of COVID-19 [30][31][32]. Angiotensin-converting enzyme 2 (ACE2) has been found to be a pathway by which SARS-CoV-2 enters cells, and angiotensin-converting enzyme inhibitor (ACE1) and angiotensin II receptor antagonist (ARB) are mainly used to treat cardiovascular disease and hypertension, which may lead to increased ACE2 expression and promote SARS-CoV-2 infection in hypertensive patients [33]. Moreover, smokers and COPD patients have higher levels of ACE2 expression in their lungs [34,35]. Tis may go some way towards explaining why patients with chronic respiratory disease are more likely to progress after SARS-CoV-2 infection. Diabetes patients are more likely to develop COVID-19 at MCV � mean corpuscular volume; MCHC � mean corpuscular hemoglobin concentration; MCH � mean corpuscular hemoglobin; RDW-SD � red cell distribution width-standard deviation; RDW-CV � red cell distribution width-coefcient of variation; PDW � platelet distribution width; MPV � mean platelet volume; PCT �plateletcrit; P-LCR � platelet large cell ratio; InCRP � natural log-transformed value of CRP; %Eos � eosinophils (percentage); #Eos � eosinophils (number); %Bas � basophils (percentage); #Bas � basophils (number). Signifcance in bold in Table 2 indicates indicators with p values less than 0.05 in multivariate analysis. a severe stage. Tis might be brought on by hyperglycemic circumstances that afect neutrophil activity, antioxidant system function, and humoral immunity, all contributing to immunological dysfunction [36]. Obesity afects lung function by infuencing lung volume and compliance, as well as narrowing peripheral airways [37]. Additionally, due to the high expression of angiotensin-converting enzyme type 2 in adipose tissue compared to the lungs, there is a hypothesis that SARS-CoV-2 may be capable of entering adipocytes and causing infection. Tis could contribute to the spread of the virus to other organs or serve as a natural reservoir for prolonged viral clearance [38]. Clinically applicable infammatory marker panels now contain ferritin. Infammation can cause the release of ferritin from macrophages or cells owing to tissue damage. Tis release explains the abnormal levels of ferritin in infammation. Since our study is based on a retrospective analysis, it is limited because of missing information, so some of the valuable indicators reported by relevant studies are not included in this study. In addition, some of the indicators were not included in our study because they were derived from patients' complaints rather than standard medical diagnoses and thus had low credibility.
From the standpoint of model promotion, the more streamlined a prediction model is, the less expensive, easier to use, and more suited to wide application it is. However, it will also result in a decline in prediction performance.
Tis is a matter of balance: whether the model should be applied mainly for primary screening of high-risk cases or whether it should prefer higher predictive accuracy. It depends on the application scenario of the constructed model.
In this study, the pneumonia risk prediction model we constructed was mainly applied to the primary screening of people at high risk of pneumonia in SARS-CoV-2-infected individuals, so we chose a more streamlined modeling strategy.       One unexpected fnding was that the model performed better in the validation cohort than in the training cohort. Tis result may be explained by the relatively small sample size of the validation cohort and a certain degree of homology with the training cohort.

Conclusion
In this study, a pneumonia risk prediction model was developed and externally validated based on simple clinical and blood test indicators. Te model was used to diagnostically predict the likelihood of pneumonia in patients infected with SARS-CoV-2 and performed well on dimensions of discrimination, calibration, and clinical validity. It can be used as a reference for the management of pneumonia risk classifcation in SARS-CoV-19-infected patients.

Limitations of This Study
Our study has several limitations. First, despite applying the inclusion criteria strictly, we could not completely rule out cases with potential lesions in body parts other than the lungs from infuencing the predictors at study entry. Tis caused some confusion in constructing the model and difculties in evaluating its predictive performance.
Second, even though external validation was carried out, it was a single-center retrospective study, and the sample size was somewhat tiny. His routine blood test showed a CPR of 82.45 (InCRP � 4.41) with a %Mon of 8.30. Considering his age of 17, the patient had total points of 152 according to our pneumonia risk prediction model, with a pneumonia risk prediction probability of 0.68. Te patient underwent a chest CT, which showed multiple lamellar ground-glass opacities in the lower lobe of the left lung, with a peripheral distribution and thickened blood vessels within the lesion. (e and f ): A 63-year-old male with a 1-week history of malaise was confrmed to be nucleic acid test positive for SARS-CoV-2 on presentation. His routine blood test showed a CRP of 259.68 (InCRP � 5.56) with a %Mon of 5.00. Considering his age of 63, this patient had total points of 192 according to our pneumonia risk prediction model, with a pneumonia risk prediction probability of >0.9. Te patient underwent a chest CT, which showed multiple lamellar hyperintensities in multiple lobes of both lungs with solid lesion density, bronchial air sign within, and halo sign at the edges of some lesions.
In later research, larger-sample and multicenter studies would be required to calibrate and validate the model.

Data Availability
Te data used to support the fndings of this study are restricted by the Medical Ethics Committee of Hunan Provincial People's Hospital (Te First Afliated Hospital of Hunan Normal University) to protect the patient. Data are available from Xi Yi, beimingyi0322@163.com, for researchers who meet the criteria for access to confdential data.

Disclosure
Tis is a preprint paper [39].

Conflicts of Interest
Te authors declare that there are no conficts of interest.

Authors' Contributions
Xi Yi and Jirong Li, who had conceived and designed the study, had full access to all the data in the study and took responsibility for the integrity and analysis accuracy of the data. Daiyan Fu and Lile Wang contributed to the modifcation and revision of the manuscript; Guiliang Wang evaluated the quality of the literature. Xi Yi wrote the manuscript. All listed authors reviewed and approved the fnal manuscript.