An Adaptive Combined Learning of Grading System for Early Stage Emerging Diseases

,


Introduction
Emerging and reemerging infectious diseases continuously pose a serious threat to human health [1,2].Particularly, the outbreak and spread of the coronavirus disease 2019 (COVID- 19) not only brought about profound loss of lives but also triggered a severe socioeconomic crisis [3].If an efcient and precise healthcare identifcation, diagnostic, and treatment system can be established early in the development of a disease, it has the potential to minimize the scope of the disease outbreak, reduce individual health damage, and optimize resource utilization [4,5].
Using COVID-19 in the early stage as a case study, data from the Chinese Center for Disease Control and Prevention's epidemiological investigation revealed that out of 44,415 confrmed cases, 81% were categorized as mild, 14% as severe, and 5% as critical.COVID-19 patients of diferent severity levels exhibited signifcant diferences in prognosis upon hospital admission.Most mild cases or suspected cases were allocated to Fangcang shelter hospitals or other public facilities for centralized isolation, where they received primary medical care and showed better prognosis [6,7].Severe or critical patients, especially the elderly or those with preexisting comorbidities, were prone to develop severe pneumonia, acute respiratory distress syndrome (ARDS), and multiple organ failure, thereby facing a higher risk of mortality [8].
Artifcial intelligence (AI) has made signifcant advancements in guiding disease diagnosis and prognosis management, particularly in combating the COVID-19 pandemic [9].Te identifcation, diagnosis, and prognosis prediction of COVID-19 hospitalized patients using AI algorithms have been proposed.Existing approaches involve machine learning and deep learning algorithms to analyze multisource data, including clinical examinations and imaging scans [10].However, existing AI methods have not yet been able to accurately analyze disease features and predictions in the early stages of emerging diseases.Below, we will frst review the typical applications of current AI algorithms in common diseases, with a specifc focus on AI in the diagnosis of emerging diseases, especially COVID-19.

AI for Diagnosis and Prognosis of Common Disease.
Signifcant progress has been made in AI-based disease diagnosis and prediction of disease risks, progression, and treatment response.In particular, image recognition and natural language processing, which are based on big data, have been widely applied in recent years.To classify skin lesions, a convolutional neural network model was trained using a dataset of 129,450 clinical images [11,12], and a specialized neural network, tailored for image classifcation, was trained on a retrospective development dataset consisting of 128,175 retinal images.Te focus of the training was on detecting diabetic retinopathy, specifcally in primary care ofces [13,14].In addition, for the development of an artifcial intelligence algorithm for the diagnosis and Gleason grading of prostate cancer, a retrospective collection of 12,625 whole-slide images (WSIs) from six diferent sites was undertaken.Tese WSIs comprised prostate biopsies and were utilized to train and refne the algorithm [15], as well as other applications such as colorectal cancer stratifcation and atrial fbrillation identifcation [16,17].Another application utilized automated natural language processing systems and deep learning techniques to analyze electronic health records from 1,362,559 pediatric patients and guide the classifcation diagnosis of common childhood diseases [18].Machine learning algorithms were applied to explore the key features infuencing the treatment of infertility and to grade the outcomes in 78,826 treatment cycles [19,20].Recently, the Human Lung Cell Atlas (HLCA) has been developed, which integrates large-scale, cross-dataset organ maps within the Human Cell Atlas [21].Furthermore, new research from a preventive perspective, such as utilizing machine learning models to train on data involving 22 common cancers and predicting the origins of cancer and treatment responses in 36,445 cases, is noteworthy.Tis research assists doctors in formulating personalized treatment strategies [22].

AI for Prediction of Diagnosis and Prognosis of COVID-19.
Te huge applications of AI techniques have encompassed epidemiology, therapeutics, clinical research, and social studies to combat the COVID-19 pandemic [23,24].First, in terms of rapid and accurate COVID-19 diagnosis, deep learning methods provide great help for rapid and accurate detection of COVID-19 through chest X-ray and Computed Tomography (CT) images [25][26][27][28][29].In addition, some studies have conducted a series of deep learning algorithms trained on cohorts consisting of thousands of patients to localize the pleural/parenchymal walls and classify COVID-19 pneumonia [30,31].Second, some studies have focused on the prediction of prognosis in COVID-19.A research employed machine learning tools to identify three biomarkers from blood samples of COVID-19 patients, achieving a prediction accuracy of over 90% in forecasting patient mortality ten days in advance [32].A high-resolution COVID-19 mortality prediction model has been developed to identify future mortality risk two weeks prior to clinical outcomes [33].A method utilized Shift3D and random weighted loss for multitask learning in COVID-19 diagnosis and severity assessment [34].Some interesting studies have taken into consideration the issue of multiple sources.An open-source deep learning approach has been proposed for diagnosing COVID-19 using chest CT images [35], and an approach combining regularized cost-sensitive capsule network was proposed for early detection of COVID-19 using imbalanced or limited data [36].In addition, the integration of deep learning CT scan models with biological and clinical variables was proposed to predict the severity of COVID-19 in patients [37], and an integrated CT image and resource library for COVID-19 with deep learning algorithms was developed [38].Recently, some studies have focused on selecting key features that infuence the outcome of COVID-19 for prediction [39].For example, a study considered utilizing feature selection methods to reduce the clinical features to 13 key features and predicted COVID-19 severity based on personalized diagnostic models [40].Te signifcance of known risk factors for the in-hospital mortality rate of COVID-19 was evaluated, and the predictive utility and grading diagnosis of radiological texture features were investigated using various machine learning methods [41].

Motivations and Contributions.
Trough a review of research, it can be observed that current AI methods heavily rely on vast amounts of data for the diagnosis of various diseases, including COVID-19.However, there is almost no AI application research for early stage emerging diseases.For newly emerging diseases like COVID-19, it is crucial in the early stages for timely and accurate identifcation, diagnosis, and treatment.Tis often becomes a race against time and a matter of life and death, and waiting until a large number of cases accumulate for analysis can prove to be too late [42].

2
International Journal of Intelligent Systems In addition, an examination of 37,421 COVID-19-related studies in the British Medical Journal revealed that nearly 87% of the studies exhibited bias, primarily due to inadequate sample sizes [43].Moreover, the widespread use of a single model introduces uncertainty in model selection, potentially leading to biased model estimates and increased unreliability of results [44].Tis study proposes an adaptive combined learning framework for the diagnosis and outcome prediction of newly emerging major diseases in small sample data.Taking advantage of combined computation, we can alleviate the underftting issues arising from insufcient training data and reduce biases associated with the selection of a single AI method.Te weight of the combination is placed on the method that can better ft the real data, which refects the adaptability and scalability for diferent data.Te proposed framework is applied in two early COVID-19 cohorts, demonstrating the adaptability and reliability of this approach.Te main contributions of this study include the following: (1) To provide targeted guidance to doctors for a rapid and accurate understanding of the clinical characteristics and examination of newly emerging diseases, we propose the Adaptive Combination Importance (ACIM) measure with binary responses.Tis method combines the importance of various AI algorithms regarding the impact of clinical features on disease outcomes.Tis provides a basis for the swift formulation of public health emergency policies in response to emerging diseases with limited sample data.
(2) To provide a precise prediction of newly emerging disease outcomes based on a key clinical understanding, we design an Adaptive Combination Prediction Algorithm (ACPA) with binary responses.Tis method combines the serious disease outcome predictions from diferent AI algorithms.Tis serves as a reliable algorithmic foundation to assist doctors in faster and more accurate assessments of disease occurrence, progression, and outcomes within limited time and medical data information.And it supports medical decision making and resource allocation with a fexible AI framework.
(3) To provide grading treatment with a focus on predicting outcomes and assigning corresponding therapies upon patients' admission, we propose a disease severity grading system based on adaptive prediction in terms of probability of death for patients.Tis ofers a meticulously designed treatment approach that aligns with key features and early diagnosis, potentially improving actual treatment outcomes.Tis will support the optimization of medical interventions in the event of severe disease outbreaks and minimize the wastage of medical resources.
Te outline of the rest of this paper is as follows.Section 3 introduces the adaptive combined feature screening and combined prediction algorithms for emerging diseases.Section 4 presents two cohorts of COVID-19 in Wuhan, along with relevant data analysis.Section 5 elaborates on the results for both cohorts.Section 6 encompasses the discussion, and Section 7 will address future work.

The Combined Feature Screening and Prediction for an Emerging Disease
In this section, we propose a comprehensive framework for screening features and predicting outcomes in the context of an emerging disease.Initially, leveraging existing clinical data, we designed an algorithm that integrates multiple feature screening methods to mitigate instability across diferent approaches.Trough weighted calculations, our aim is to align the combined feature assessment with the inherent patterns in the data.Subsequently, the combined prediction based on variables with feature screening is used to forecast disease outcomes.We anticipate that this integrated approach will deliver more stable and accurate predictions.
We use the following notation to represent the dataset and model parameters.Let the dataset contain n samples, each sample consisting of p features, represented by , where X i represents the feature vector of the i-th sample, and the probability of Y i occurring given the feature vector X � x i , where the sample size could be smaller than the number of features.

Adaptive Combined Importance Measure for Binary
Response.Feature importance is the study of the contribution of each feature to the outcome and the selection of features considered signifcant.Random Forest (RF) and XGBoost algorithms, as representatives of model-free methods, are widely used in importance learning [45,46].In recent years, there have been other combined methods proposed for feature importance learning based on parametric models, such as Sparsity Oriented Importance Learning (SOIL), which presents feature importance as a weighted linear model [47].A combined feature learning method based on these three feature algorithms has been proposed to comprehensively and objectively evaluate the importance of features infuencing the continuity of health [48].However, there has been no research based on binary response.
We introduce the general form of combined feature screening in binary disease data.Te calculation has three steps based on K screening methods.First, calculate the feature importance sequence for each screening method under binary scenario and normalize their values then denoted as IM 1 , IM 2 , . .., IM K , where IM k � (I k,1 , . . ., I k,p ) represents the importance value of the k method for p features.Secondly, weights are calculated based on the features recommended by each algorithm, denoted as w 1 , w 2 , . .., w K .Finally, an adaptive combined importance (ACIM) for binary response is where the computation of weights relies on the datasplitting, and we employ analogous calculations to derive the weight procedure, as to those utilized in Algorithm 1 [49].
A larger weight for a method indicates a better model ft.Due to the random nature of data-splitting, there is a signifcant possibility of distinguishing the performance of diferent methods on diferent datasets.
Combined feature screening brings several advantages.Firstly, it serves as a consolidation of information from various feature screening methods, achieving an efect akin to the majority's choice.Secondly, with adaptive weight calculations, it can refect which method fts the real data better based on the magnitude of weights.Tis grants more infuence to the method with superior ftting, enhancing its say in the majority of selections.Consequently, this makes the features identifed by the combined screening more closely approximate the true ranking of feature importance.

Adaptive Combined Prediction Algorithm for Binary
Response.We provide a computational framework for adaptive combined prediction algorithm (ACPA) for binary response.Assuming there are M methods available to provide probability predictions for the binary disease outcome being 1, denoted as Te detailed weighting calculation in (2) is similar to that in Algorithm 1.
Regarding combined prediction, there are several advantages.Firstly, a single prediction method may exhibit bias in predicting disease outcomes, while a combination of multiple prediction methods brings more stable results, especially in scenarios where highly accurate disease prediction is needed.Secondly, weight calculations can assess the performance of diferent prediction methods, allowing the method with the best performance in combined prediction to give greater weight.When one method signifcantly outperforms others, its weight in ACPA is likely to approach 1. Tis makes the ACPA results closely align with the performance of the best method.Te relevant theories are ensured to be present in the literature [50].Conversely, if all methods perform similarly and are not particularly effective, ACPA's results may surpass those of individual methods.Te inclusion criteria for this cohort comprised 711 confrmed COVID-19 patients with the number of cured being 654 and deceased being 57.Among the 311 clinical examination features, a considerable proportion exhibited signifcant missing data.We opted to include 62 features with a missing proportion below 40% for further investigation.Tese selected features encompassed all the aforementioned diagnostic procedures, and detailed feature information is available in Table S1 of supplementary material.Basic information, such as mortality outcomes, SARS-CoV-2 RNA testing, age, gender, body temperature ( °C), and the presence of underlying diseases, was derived from patients' medical records, and none of these variables had missing values.

Cohort 2.
Te study involves a substantial analysis of COVID-19 and serves as a focal point for the COVID-19 pandemic [32,[51][52][53][54].We collected and compiled data from early consecutive COVID-19 patients admitted to Tongji Hospital in Wuhan, Hubei province, China, from January 2020 to April 2020.A total of 3286 medical records were extracted from electronic health records.Of the initial 3286 medical records, 63 records had missing data or did not meet the composite endpoint, and 3223 patients were included in this study, as detailed in Table S2 of supplementary material.
Te inclusion criteria for this cohort include 3223 confrmed COVID-19 patients with the number of cured being 2920 and deceased being 303.Medical records were reviewed and extracted from electronic health records using a standardized data collection form by experienced clinicians and independently reviewed by two researchers.32 clinical examination features are provided in this cohort.Tis study was approved by the Medical Ethical Committee of Tongji Hospital, Tongji Medical College of Huazhong University of Science and Technology.Written informed consent was waived in light of the use of deidentifed retrospective data.
All methods were performed in accordance with the relevant guidelines and regulations.Te study followed the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) reporting guideline.
Te processing and analysis of the two datasets are conducted in parallel, involving data preparation, feature selection, and outcome prediction.Te detailed analysis process is shown in Figure 1.[55,56].For cohort 1, except for 4 basic kinds of information: gender, underlying diseases, age, and body temperature at admission, other 126 clinical examination features all have missing data.We selected 58 features with missing proportions less than 40% from cohort 1 and used Multiple Imputation by Chained Equations (MICEs) to impute missing data, specifcally calculated using the R package 'mice'.To ensure data balance, the Synthetic Minority Oversampling Technique for Nominal and Continuous features (SMOTENC) was implemented in both cohorts, with the specifc calculation performed using Python 3.10 with Imbalanced-learn version 0.10.0.

ACIM Calculation for
Feature Screening.We selected RF, XGBoost, and SOIL for ACIM, and the calculations corresponding to these three methods use R packages: "ran-domForest", "xgboost" XGBoost, and "SOIL".Te parameter settings for these three methods were primarily based on their default settings, with the exception of setting ntree � 200 to control the model complexity in RF.For XGBoost, we utilized the "trainControl" function from the "caret" package in R to optimize parameters, setting max_depth � 3 to control model complexities for both cohorts.
Weight calculations were derived from Algorithm 1, where we chose features with value of ACIM greater than 0.2 in each method for model evaluation.Te specifc results of ACIM calculations for the two cohorts are illustrated in Figure 2.

ACPA Calculation for Disease
Outcomes.We utilized the combined prediction of RF, XGBoost, and Logistic regression based on important features recommended by ACIM, specifcally those with ACIM greater than 0.2.Te parameter settings for RF and XGBoost are similar to ACIM.Weight calculations are also similar to ACIM.
We randomly split data for both cohorts into training and testing sets using an 8/2 split ratio.Tis ensures homogeneity in both the training and testing data, reducing the impact of human selection.Additionally, to obtain stable prediction results and evaluate model performance, we Input: D � (X, Y), N (Repeat times); Output: weight of each screening method.
and the sample sizes are n 1 � n 2 � n/2, respectively.(ii) For each method, ft an estimator  f k (X i ) using the training set D 1 , where k � 1, . . ., K represents the screening method that needs to be computed.(iii) For each method, compute the prediction  f k (X i ) on the test set D 2 using the training model under D 1 .(iv) For the observations Y i , the weight for each method under s time is Repeat the above steps N times to get w s k , s � 1, . . ., N and then obtain the weight w k �  N s�1 w s k /N.6 International Journal of Intelligent Systems repeated the data-splitting process 100 times.If the estimated probability of death calculated by ACPA exceeds 0.5, it is considered as death; otherwise, it is considered as cured, and the results are presented in Table 1.

Model Evaluation.
We utilize common machine learning classifcation metrics to elucidate the efectiveness of the method.Tese metrics include Accuracy, Precision, Recall, and F1 score.Furthermore, the receiver operating characteristics curve (ROC) and the area under its curve (AUC) value will also demonstrate the model's predictive prowess.In addition, we performed calibration evaluation for our model [57], and the R package 'PresenceAbsence' provided specifc calculations.

Result
5.1.Rankings of Feature Importance.We measure the importance of clinical features on treatment outcomes (Cured and Deceased) in two COVID-19 cohorts.Two importance feature rankings are shown in Figure 2. Tree popular feature importance algorithms are adaptively combined to measure the impact of features on the death of a patient, with the top-ranked having a more signifcant impact and requiring more attention.Tere are diferences in the number of clinical features between the two cohorts, but the two rankings share similarities in important features.We take the threshold according to the attenuation degree of the feature importance curve and select the features with importance greater than 0.2 as important features.Figure 2(a) shows the ranking of the importance of 62 clinical features for whether a patient died in cohort 1.We fnd as test in each calculation).Te detailed results are provided in Table 1.
In the results for cohort 1, XGBoost exhibits superior training performance, but its testing performance is inferior to RF, and Logistic shows the poorest results.However, overall, ACPA demonstrates the best comprehensive performance.It closely rivals XGBoost in training and RF in testing, surpassing RF in F1 score and Accuracy.In the results for cohort 2, RF maintains the best performance in testing, and XGBoost continues to excel in training.ACPA leverages the strengths of both, demonstrating stable advantages.
For both cohorts, the results highlight the advantages of ACPA combination.In practical terms, as the true best method or the one most suitable for uncovering the inherent nature of the data is often unknown, if ACPA achieves results comparable to or even slightly surpassing the best method after computation, it indicates the method's versatility, ensuring the quality of computed results in most scenarios.
Furthermore, we conducted an evaluation of the predictive performance of the models, including model calibration and discrimination, as illustrated in Figure 3. Te results of model calibration at a confdence level of 0.05 are shown in Figures 3(a) and 3(b), with fve probability bins displayed.It can be seen that both the observed and predicted probability bins are close to the diagonal line, indicating that the models for both cohorts are well calibrated.In addition, we calculated the discriminant performances of the two models for two cohorts.In the case of diferent threshold selection, ROC curves show excellent performance, and the AUC values are 0.983 and 0.988, respectively.

Severity Grading of COVID-19.
According to the COVID-19 grading system [58], we divided the degree of severity into four groups by the probability of death (POD), namely, Mild (T4): POD < 0.25, Moderate (T3): 0.25 ≤ POD < 0.5, Severe (T2): 0.5 ≤ POD < 0.75, and Critical (T1): POD ≥ 0.75.From Figures 4(a) and 4(b), it can be found that the proportion of green and yellow parts is the largest, that is, when the probability of death is lower than 0.5 or 0.25, most patients are mild and regular, which does not need to take up too much treatment cost.On the contrary, the proportion of the red part is small, and its probability of death is greater than 0.75.Tere is a high probability of death without timely assistance, which needs to be focused on.In orange part, the probability of death is between 0.75 and 0.5, which requires doctors' attention and more resources.In the absence of timely diagnosis and treatment, these patients will be at great risk, whereas with timely diagnosis and treatment, these patients are likely to recover.
Table 2 shows the patient classifcation under the original grading and the new grading with the classifcation threshold equal to 0.5.Under the original classifcation, there were signifcantly fewer mild (T4) and moderate (T3) cases than severe (T2) and critical (T1) cases, which is obviously lacking in rationality.Under the original International Journal of Intelligent Systems   International Journal of Intelligent Systems classifcation, there were signifcantly fewer mild (T4) and moderate (T3) cases than severe (T2) and critical (T1) cases, which is obviously lacking in rationality.Compared with the original grading structure, the new grading system for COVID-19 patients established in the paper is more reasonable and scientifc, which can efectively and accurately distinguish mild patients from severe patients.Accurate classifcation is conducive to the efective utilization of medical resources and helps patients establish the most correct treatment path in the early stage of admission.

Discussion
Tough there are studies that assess the risk for progression, prognosis, and mortality of patients with COVID-19, few studies focus on developing a disease grading system based on various characteristics through combined AI methods [59][60][61].In this study, we conduct and validate a disease grading system for patients with COVID-19 based on the prediction of the probability of death under a combination algorithm, which can be used to identify and predict the prognosis among hospitalized patients on admission.Te reliable and feasible early identifcation of patients is essential for timely triaging in clinical practice, especially under the heavy burden of medical resource.Te application of combined AI method to the diagnosis of COVID-19 can improve diagnostic efciency and optimize the allocation of medical resources, which is of great signifcance to curb the pandemic.
Te combined framework we ofer includes calculations for three feature screening methods and three prediction methods.Of course, within this framework, we allow the integration of additional methods to enhance the overall efectiveness, including some deep learning algorithms.Furthermore, calculations for feature screening and disease prediction can be conducted independently, based on the specifc requirements of the task.However, if there are a substantial number of features to be predicted, it is recommended to perform combined feature selection before prediction.Importantly, our framework does not mandate extensive parameter training for each combinable method to seek optimality, as it is apparent that such an approach may be more benefcial.We recommend initially attempting the combination of potential methods to see if the desired efects are achieved; otherwise, one can incorporate better methods or optimize existing methods based on the task requirements.
In terms of the risk factors of COVID-19, a total of 23 indicators were chosen as prediction markers, including the demographic characteristics (age and gender), blood routine (Lymphocyte, Neutrophil, Eosinophil, Basophil, and Monocyte), coagulation function (PT, Trombin, and D-Dimer), LDH, cytokine profles (IL2, IL6, IL8, and IL10), and CRP.Tese features can be used as elements of clinical tests or early warning systems to optimize the treatment process of COVID-19.In particular, these characteristics have been verifed by previous studies.Regarding the severity grading of COVID-19, the current ofcial disease grading is based on some symptom observations, which are based on historical and subjective judgment and have a certain lag.However, our grading system is based on the fnal result of prediction, which has an early warning efect and can signifcantly reduce the irreparable outcome caused by historical judgment bias.Doctors can decide the treatment sequence of patients by predicting the outcome; at the same time, all patients are managed hierarchically.International Journal of Intelligent Systems

Future Work
Te current study has several limitations.First, our fndings might be limited by the quality of the data.First, the samples for disease grading system are entirely from Wuhan, China, which may require more data from other areas of the world to increase the generalizability and applicability.Second, the hospitals contributing to our current research cohort tend to admit severe and critical COVID-19 patients.Terefore, this subset of patients may have disproportionately representation in the study, potentially leading to some bias in the grading system.Te clinical experiments are pending to validate the practicality of the algorithmic procedures.
Future research can focus on addressing data issues in more depth.For example, when dealing with a large number of features that require selection, designing penalties for weights to combine can provide feedback on the impact of model complexity.Exploring how to combine methods on imbalanced data and mitigating potential efects of the SMOTE algorithm could be another area of investigation.Additionally, retrospective studies can contribute to the establishment of a comprehensive compendium for COVID-19, providing more guidance for uncertain future pandemics.

Figure 3 :
Figure 3: Te evaluation of the prediction performance.(a) and (b) represent the performances of model calibration of two cohorts.(c) shows the ROC curves and AUC values of the two models.Cohort 1 is in red, and cohort 2 is in green.

Figure 4 :
Figure 4: Distribution of predicted probability of death in two cohorts.
Moderate (T3), Severe (T2), and Critical (T1)) based on the Diagnosis and Treatment Protocol for COVID-19 issued by the National Health Commission of China (Trial Version 5), and the composite endpoint was discharge from the hospital or death (cured or deceased).Tese data related to early stage COVID-19 are presented on a public website (https://ngdc.cncb.ac.cn/ ictcf/HUST-19.php).It enrolled 1,126 patients from Union Hospital (HUST-UH) and 395 patients from Liyuan Hospital (HUST-LH) in Wuhan, Hubei Province, China, during the period from January 2020 to February 2020.Tese data encompass rich clinical features of early COVID-19 confrmed cases.Among these patients, 130 clinical tests spanning nine categories were conducted, including basic information, routine blood tests, infammation tests, blood coagulation tests, biochemical tests, immune cell typing, cytokine profle tests, autoimmune tests, and routine urine tests.
4.1.Data Source.In this retrospective study, we collected clinical records from two groups of COVID-19 patients admitted early and with prolonged hospital stays.Both datasets comprise extensive clinical examinations conducted upon patient admission, categorizing patients into four diferent disease severity levels for treatment (Mild (T4),

Table 1 :
Prediction performances of two cohorts under four evaluations.Bold font is for train and italic font is for test.Te standard deviation of the results is in parentheses.

Table 2 :
Comparison between the original grading and the new grading of COVID-19 for two cohorts.