Predicting Liver Disease Risk Using a Combination of Common Clinical Markers: A Screening Model from Routine Health Check-Up

Background Early detection is crucial for the prognosis of patients with autoimmune liver disease (AILD). Due to the relatively low incidence, developing screening tools for AILD remain a challenge. Aims To analyze clinical characteristics of AILD patients at initial presentation and identify clinical markers, which could be useful for disease screening and early detection. Methods We performed observational retrospective study and analyzed 581 AILD patients who were hospitalized in the gastroenterology department and 1000 healthy controls who were collected from health management center. Baseline characteristics at initial presentation were used to build regression models. The model was validated on an independent cohort of 56 patients with AILD and 100 patients with other liver disorders. Results Asymptomatic AILD individuals identified by the health check-up are increased yearly (from 31.6% to 68.0%, p < 0.001). The cirrhotic rates at an initial presentation are decreased in the past 18 years (from 52.6% to 20.0%, p < 0.001). Eight indicators, which are common in the health check-up, are independent risk factors of AILD. Among them, abdominal lymph node enlargement (LN) positive is the most significant different (OR 8.85, 95% CI 2.73-28.69, p < 0.001). The combination of these indicators shows high predictive power (AUC = 0.98, sensitivity 89.0% and specificity 96.4%) for disease screening. Except two liver or cholangetic injury makers, the combination of AGE, GENDER, GLB, LN, concomitant extrahepatic autoimmune diseases, and familial history also shows a high predictive power for AILD in other liver disorders (AUC = 0.91). Conclusion Screening for AILD with described parameters can detect AILD in routine health check-up early, effectively and economically. Eight variables in routine health check-up are associated with AILD and the combination of them shows good ability of identifying high-risk individuals.


Introduction
Autoimmune liver disease (AILD) is the second commonest cause of chronic liver disease in teenagers. There are several forms including autoimmune hepatitis (AIH), primary biliary cholangitis (PBC), primary sclerosing cholangitis (PSC), PBC-AIH, and PSC-AIH overlap syndromes (OS) which have common immunological characteristics and diagnosed based on immunological markers and histology [1][2][3][4]. AILD differs significantly in presentation and course depending on the patient's age at manifestation. Previous studies demonstrated that more than one-third of AILD patients had liver cirrhosis at the initial presentation, with the rate even being higher in PBC-AIH OS [5][6][7][8]. Therefore, it is necessary to develop a simple and reliable prognostic method for early identification of patients with high risk for AILD and help guide clinicians to identify potential AILD patients with maximized cost-effectiveness in primary and secondary healthcare systems.
It has been reported that AILD patients with cirrhosis at initial presentation have a substantially lower 10-year survival rate than patients without cirrhosis (61.9% vs. 94.0%) [9]. The prognosis and survival time of AILD patients largely depend on the development of liver cirrhosis and complications [10,11]. Establishing practical methods for identifying high-risk individuals of AILD prior to the development of cirrhosis is crucial for improving the prognosis of patients with AILD.
In our previous study, we observed that abnormalities of several markers from routine health check-up, including serum biochemistry tests, family history of autoimmune diseases, and abdominal lymph node enlargement (LN) [12], might be helpful for predicting individuals at high risk. Other studies demonstrated that serum γ-globulins and abnormal LN ultrasound results were associated with AILD [13,14]. Moreover, it is about 20-50% AILD patients have a history of other autoimmune diseases [15,16], and 10-40% firstdegree relatives of patients have autoimmune disorders [17,18]. Nevertheless, there is a lack of evaluation of common clinical variables as the primary screening tool in clinical practice. For the aim of detecting AILD risk from routine health check-up, we analyze the clinical characteristics at initial presentation and select available indicators in health check-up. With these common indicators utilizing in the routine health check-up, we build up computational models for the prediction of AILD risk at the early clinical stage.

Study Design and Participants.
This study was a retrospective long-term cohort study of 602 patients admitted  to a single center from January 2001 to December 2017,  including 173 patients with AIH, 330 with PBC, 78 with  PBC-AIH OS, 19 with PSC, and 2 with Ig4 related liver disease. Informed written consent was obtained from all the study participants. All the patients were admitted fulfilling the diagnostic criteria of AILD, as proposed by diagnostic criteria of AIH (v.1999), PBC (v.2009), and "Paris Criteria" (v.1998) (Supplementary Materials-Participants (available here)). Additionally, 1000 individuals from the health management center were included as a healthy control group.
We recruited a cohort of individuals with abnormal liver function tests (LFTs) as validation cohort, which is including 56 AILD patients and 100 non-AILD liver disease cases with LFTs, including viral hepatitis, alcoholic liver disease, drug introduced liver injury, nonalcoholic fatty liver disease, and obscure liver injury (Supplementary Table 1 Table 2). The  data derived from the medical records in patients with AILD  and healthy controls included age, gender, serum biochemical parameters (TP, ALB, GLB, ALT, AST, ALP, GGT,  TBIL, and DBIL), LN, concomitant extrahepatic autoimmune diseases (CEAID), and familial history of autoimmune disease (FA). FA and CEAID were recorded via telephone follow-up interview, while cirrhosis was defined by CT image or liver biopsy, and LN were diagnosed by abdominal ultrasound [19]. FA was identified as at least one first-degree relative with at least one autoimmune disease, included AILD, autoimmune thyroid disease, Sjögren's syndrome, and rheumatoid arthritis. CEAID was defined as the patients were diagnosed with both AILD and extrahepatic autoimmune disease, the details were shown in Supplementary  Table 3. To identify LN, the following criteria according to Soresi et al. were applied: one or more masses with an ovoid shape and less echogenic than the liver parenchyma, separated from adjacent organs and vessels by a clear-cut cleavage on repeated transverse, sagittal, and oblique scans [19]. Investigation sites included the area of the trunk of the portal vein, hepatic artery, celiac axis, superior mesenteric vein, and pancreas head. The ultrasound was performed by the same digestive specialist operator who was unaware of the clinical, biochemical, and histologic data. The study protocol adhered to the declaration of Helsinki and was approved by the Institutional Ethics Committee of Tianjin Medical University General Hospital.

Predictor Variables Selection.
In order to select the AILDassociated variables for further analysis, we performed correlation analysis between the 14 indicators in the AILD cohort and retained noncorrelated variables such as age, gender, GLB, ALT, GGT, LN, CEAID, and FA for further analysis (Supplementary Table 4). We tested these variables for potential batch effects caused by the year of initial diagnosis. Univariate logistic regression analysis was used to affirm the association between each variable, and 8 variables were found to be significant and selected for the construction of AILD-risk models (Supplementary Materials-Choosing Variables). A comparison of variables between AILD patients and healthy controls is shown in Table 1.

Construction and Model Validation.
After incomplete data filtering, we included 438 patients with AILD and 782 controls for model construction. All patients and controls were randomly split into training group (75% of data) and test group (remaining 25% of data). Models were trained using logistic regression and classification and regression trees (CART), with optimization performed by 3 repeats of 10-fold cross-validation on the training set. Model convergence and training were assessed using learning curves (Supplementary Figure 2). After establishing the first logistic regression model (Model 1) with 8 covariates, the two markers of liver and cholangetic injury (ALT and GGT) were subsequently excluded to better separate AILD patients and other abnormal LFTs cases. We trained logistic regression model (Model 2) and CART model with the remaining six variables (AGE, GEN, GLB, LN, CEAID, and 2 Disease Markers FA). Details in the parameters of the CART model are provided in Supplementary Materials-Classification and Regression Tree [20]. The predictive power of models was calculated in the test group and the external validation group (56 cases with AILD and 100 controls with abnormal LFTs). The predictive power of the model was evaluated by receiver operating characteristic (ROC), area under the curve (AUC), accuracy, sensitivity, and specificity.

Statistical Analysis.
We reported frequency (percentages) for categorical variables and median (range) for continuous variables. We used Chi-squared test and Mann-Whitney U test for comparisons of categorical and continuous variables, respectively. More details in statistical methods are described in Supplementary Materials-Descriptive analyses.
Correlation analyses and univariate logistic analyses were performed with SPSS (version 23.0, IBM, USA). Establishment and validation of the multivariate logistic regression model and CART model were performed in the R software (version 3.4.3.), using the caret package [21,22]. Statistical tests were considered significant at p < 0:05.

Changing of Detection Ways and Cirrhosis Rate at
Diagnosis in AILD. AILD patients were classified into two groups due to admission reasons: the health check-up group referred to patients with abnormal LFTs or incidental findings detected in health check-up; and the symptomatic group included patients who had clinical symptoms, such as jaundice, gastrointestinal bleeding, and abdominal pain.
We found that the proportion of patients in the health check-up group increased from 31.6% before the year 2006 to 68.0% in the year 2017. This increase is statistically significant over the last 18 years (χ 2 = 44:32, p < 0:001, Figure 1) and demonstrates that regular health check-up has become the key method to identify the individuals at high risk for AILD.

Risk Factors of AILD in the Health Check-Up.
Compared with healthy controls, 14 parameters measured during the routine health check-up were significantly associated with   Table 1). After pairwise correlation analysis, we excluded 6 parameters (TP, ALB, AST, ALP, TBIL, and DBIL) that were strongly correlated with others. Consequently, age, gender, GLB, ALT, GGT, LN, CEAID, and FA were assumed as independent variables and used to construct prediction models. The above eight variables were also found to be associated with AILD in univariate analysis ( Table 2). The factor of positive abdominal lymph node enlargement showed the most significant association within them (OR 19.46, 95% CI 10.91-34.69, p < 0:001). We constructed a logistic regression model without the variables of ALT and GGT, designed to separate AILD cases from patients with other hepatic or cholangetic diseases (Model 2, Table 3). This model showed high performance in cross-validation set (Kappa = 0:75, accuracy = 0:89, Supplementary Figure 2B) and the test set (AUC of 0.94; 95% CI 0.92-0.96, sensitivity of 79.8%, specificity of 93.3%, and accuracy of 88.5%, Figures 3(b) and 3(c)). Abdominal lymph node enlargement positive result (OR 17.24, 95% CI 7.18-41.41) was also found to be the most influential variable compared to others (Table 3).
Next, we tested these two models in a newly collected cohort of 56 AILD patients and 100 individuals with other liver diseases. Here, model without liver biomarkers (Model 2) showed higher performance (AUC 0.97, 95% CI 0.96 to 0.98) when compared to Model 1 (AUC 0.94, 95% CI  Disease Markers 0.92 to 0.96). The exclusion of the two liver biomarkers, which are not specific for AILD, increased both the sensitivity and specificity of AILD prediction (87.5% and 95.0%; Figure 4).

Decision Tree Model Simplifies Prediction of AILD with
Health Check-Up Predictors. In order to find the best combination of predictors and their exact cutoff values, as well as establish a visualization prediction model, we fitted a CART model with six variables used for the training of Model 2. The fitted decision tree is shown in Figure 5(a), and the results of the evaluation on the external validation set are shown in Figure 5(b). The model demonstrated good predictive power for the identification of AILD cases (AUC, 0.91, 95% CI 0.89-0.93, sensitivity of 85.7%, specificity of 92.0%). Consistent with the logistic regression model, elevated GLB (≥34 g/L) was the most important discriminating factor between high and low-risk AILD, while increased age (>45 years), familial history of autoimmune disease and positive ultrasound finding of abdominal lymph node enlargement were also found to be important risk factors for AILD ( Figure 5(a)).

Discussion
AILD is often asymptomatic at the early stage. Approximately 30% of patients have already developed cirrhosis when the disease has been diagnosed, and such patients have poor prognosis (e.g., lower survival rates). However, if patients with AILD can be identified and diagnosed prior to the onset of cirrhosis, treatments with immunosuppressive agents could significantly improve the survival rates (from 62% to 94%) [10,23]. While the management of AILD is crucial, the early identification of the disease remains a challenge; currently no screening methods are available for identifying individuals at risk of AILD [2]. To the best of our knowledge, this is the first study that identified predictors measured in routine health check-up for the early detection of AILD.
In this study, we found that the proportion of cirrhosis in AILD patients gradually decreased over the past 20-year period ( Figure 2). This is potentially because the increase in regular health check-up attendance allowed the identification of AILD patients with no clinical symptoms, but presented abnormal LFTs in the health check-up and were referred to a hepatologist for further diagnostic tests. This is in line with the study that investigated diagnostic rates of autoimmune hepatitis in Singapore, which concluded that the lack of awareness of the primary health care professionals and the public led to the delayed diagnosis and therapy of AIH [24]. Our study further suggests that regular health checkup may help improve early detection of individuals at high risk for AILD.
We found that 14 parameters measured in routine health check-up might contribute to the prediction models for AILD (Table 1). Of these, we chose 8 uncorrelated fac-tors (AGE, GENDER, GLB, ALT, GGT, LN, CEAID, and FA) to build predictive models for identifying high-risk AILD patients. Among the transaminase and bile enzymes, AST and ALT, ALP and GGT are highly correlated. Previous researches showed that ALT and GGT are more "early" and "sensitive" indicators, which are more suitable for early screening than AST and ALP [25,26]. Therefore, we finally chose ALT and GGT as the representative to enter the model (Supplementary Table 4). Using these variables, we developed two prediction models for the identification of high AILD risk: Model 1 is intended to be used in general health check-up for the identification of AILD risk with clinical variables, and we excluded LFTs in Model 2 to enable estimation of AILD risk in individuals with abnormal LFTs, to aim at identifying AILD from other liver diseases. While detection of abnormal LFTs in health check-up has a potential to identify AILD, it is not a specific marker because LFTs are elevated in different liver diseases [27,28]. Thus, we used other parameters measured in the health check-up to design model for the specific identification of AILD.
The Model 1 is built up for general healthy check-up to identify high-risk AILD. Combined with the above clinical variables, we found high predictive power in the internal cross-validation (sensitivity is 89.0%, specificity is 96.4%, Figure 3). Model 2 showed higher specificity and higher sensitivity when tested using validation cohort of patients with AILD and other liver diseases (Figure 4). This implies that Model 2 without LFTs is better suited to the identification of AILD from different liver disorders manifest with abnormal LFTs. It is known that a family history of AILD and a history of other autoimmune diseases are risk factors for this disease [29], that AILD was found mainly in middle-aged women, and that serum γ-globulins and abnormal LN were  6 Disease Markers associated with autoimmune hepatitis [2]. Among them, enlarged abdominal lymph nodes are a typical ultrasound feature, which is consistent with our results [30].
To demonstrate the possible implementation of our model in the clinical practice, we constructed a decisiontree based schematic for identification of AILD risk (CART . This allowed us to quantify the cutoffs for selected variables and to assess the risk for subgroups ( Figure 5). For example, the model predicts that individuals with GLB ≥ 34 g/L, older than 45 years, and with a family history of AILD are at a very high risk of AILD (risk > 90%) and should undergo further clinical tests for AILD diagnosis. While AILD is a female-dominant disease, gender was not identified to be a critical variable in our decision tree model ( Figure 5(a)), possibly because it is mildly correlated with GLB in our data (Spearman correlation 0.33). For clinical practice, when an individual is judged to be "high risk" with abnormal LFTs, it is necessary to conduct the immunology or liver biopsy to further confirm the diagnosis of AILD, and it is also necessary to have virology, blood lipid, B-ultrasound, and other tests to estimate specific liver damages [31].
Since AILD is a rare disease (prevalence of 1-2 per 100,000 worldwide [32]), our models were, by necessity, designed using relatively small samples and an unbalanced ratio of cases and controls. Furthermore, while our model did show high performance in the external validation cohort, it might require further validation in cohorts from other medical centers. Finally, the predictive model was designed to supplement, rather than replace, the physician's clinical judgment and existing diagnostic criteria.
In summary, we demonstrate that models trained using limited sociodemographic and clinical parameters measured   9 Disease Markers during a routine health check-up enable reliable identification of individuals at high risk for AILD. This approach could be implemented in both primary and secondary health-care settings to facilitate identification of noncirrhotic AILD patients at the early stage, and thus help improve the prognosis of patients with AILD.