Consistency Test between Scoring Systems for Predicting Outcomes of Chronic Myeloid Leukemia in a Saudi Population Treated with Imatinib

Inconsistency in prognostic scores occurs where two different risk categories are applied to the same chronic myeloid leukemia (CML) patient. This study evaluated common scoring systems for identifying risk groups based on patients' molecular responses to select the best prognostic score when conflict prognoses are obtained from patient profiles. We analyzed 104 patients diagnosed with CML and treated at King Abdulaziz Medical City, Saudi Arabia, who were monitored for major molecular response (achieving a BCR-ABL1 transcript level equal to or less than 0.1%) by Real-Time Quantitative Polymerase Chain Reaction (RQ-PCR), and their risk profiles were identified using Sokal, Hasford, EUTOS, and ELTS scores based on the patients' clinical and hematological parameters at diagnosis. Our results found that the Hasford score outperformed other scores in identifying risk categories for conflict groups, with an accuracy of 63%.


Introduction
The Australian Institute of Health and Welfare (AIHW) classified myeloid cancers as the ninth most commonly diagnosed cancer in 2016, with more than 3,600 cases in Australia [1]. Chronic myeloid leukemia (CML) is also known as chronic myelogenous leukemia or chronic granulocytic leukemia. The bone marrow produces an unusual number of white blood cells. The bone marrow could produce an excessive number of immature white blood cells and lead to progressive disease. Consequently, the bone marrow cannot make enough red cells, normal white cells, and platelets [2].
Prognostic scores in patients with CML are used to stratify CML patients according to risk profile to ensure appropriate treatment. Historically, the science of prognostication has evolved rapidly, and various scoring systems have been developed to optimize the use of clinical experience in CML treatment. These scores were developed using logistic regression with the selection of the patients' clinical and hematological parameters at diagnosis. The common prognostic scores have shown variable correlation with complete cytogenetic response (CCyR) [3][4][5][6][7][8] and major molecular response (MMR) [9][10][11][12]. Although the investigation compared the prognostic value of the validated scoring systems in overall survival (OS), event free survival (EFS) or optimal response in CML patients who receive frontline imatinib, applying the established prognostic scores in a comparative fashion and questioning the value of scoring systems, especially with regard to inconsistency in risk category, has not been considered in previous studies.
The European LeukemiaNet (ELN) current recommendations for the management of CML are basically addressed to the goal of achieving an at least MMR [13]. As newly diagnosed CML patients should be stratified based on the available prognostic scoring systems, we considered the risk 2 International Scholarly Research Notices groups might be studied based on the MMR outcomes. This is needed to evaluate the clinical impact of the existing prognostic scores by comparison of prognostic risk groups with primary concern on consistency in prognostic scores outcomes. Inconsistency occurs when two different risk categories are applied to the same CML patient; that is, one prognostic score classifies the patient in one group and the other score contradicts the first classification. Consistency in prognostic scores used to estimate the risk group of CML patients before therapy commencement can increase clinician trust in the treatment decision and play important role in modern medicine for CML changing treatment modalities [14,15]. However, conflict between prognostic scores is observed in some CML patients. Thus, it is important to study consistency between prognostic score categories used to allocate CML patients to risk groups in order to support clinician decisionmaking. Our analysis evaluated the different scores outcomes with the long-term molecular response in patients treated with imatinib to determine which was the best prognostic score to apply where a conflict prognosis was generated by prognostic scores.

Study Population.
Participants in this study were members of the Saudi population diagnosed with CML and treated at King Abdulaziz Medical City, Jeddah [16]. A total of 104 CML patients received 400 mg imatinib as the initial therapy. Patient characteristics are described in Table 1. All of the patients monitored their MMR in time points defined by ELN [13] where MMR is defined as achieving a BCR-ABL1 transcript level equal to or less than 0.1% at 12 months by RQ-PCR.

Scoring Systems in CML.
Four common prognostic scoring systems are available for CML patients prior to commencing therapy: (1) the Sokal score [17], (2) the Hasford score [14], (3) the European Treatment and Outcome Study (EUTOS) score [15], and (4) the EUTOS long-term survival (ELTS) score [18]. These four scores ascertain the level of risk for CML patients by running multivariable regression analysis. Prognostic scores were calculated using formulas in Table 2, based on the patients' clinical and hematological parameters at diagnosis.
The analysis is conducted in two steps: (1) studying the prognostic index using combined groups and (2) consistency analysis between the risk categories obtained from the scoring systems. First, from Table 2, the EUTOS score is the only score that classifies CML patients into low risk and high risk. The number of categories in comparative prognostic scores in Sokal, Hasford, EUTOS, and ELTS was three, three, two, and three, respectively. Accuracy was measured on prognostic score data by assuming two different combined groups: (1) low and intermediate risk in Sokal, Hasford, and ELTS scores as low risk and (2) intermediate and high risk in Sokal, Hasford, and ELTS scores as high risk.
Secondly, in consistency analysis, the combined category is selected based on the higher-accuracy results from combined groups to study the inconsistency between scoring systems. We are dealing with two models advising on the same patient. Each score may provide an index that conflicts with the other. The patients were classified into a consistency group or an inconsistency group. The consistency group included patients who observed consistent risk categorization from scoring systems, while the inconsistency group included patients who observed inconsistent risk categorization from scoring systems. The possible combination of risk categories for scoring systems is (number of the risk categories) raised to power. The number of patients belongs to each molecular response groups is included to calculate the accuracy and determine which is the most accurate scoring system that can be used in a conflict group.

Results and Discussion
This study presents the analysis of each scoring system for distinguishing patients. We evaluated scoring systems in CML for identifying risk categories based on patients' molecular responses to determine which was the best prognostic score to apply where a conflict prognosis was generated by prognostic scores.
Of the 104 CML patients included in this study, the data of 9 patients were removed due to incomplete MMR data, to improve overall data quality. Of the 95 patients with complete data, 33 (34%) did not achieve MMR, while 62 (65%) did achieve MMR. The number of CML patients per prognostic score included in the two different combined methods is shown in Table 3.
It is clearly observed that the combined method of low and intermediate risk in Sokal   EUOS to predict optimal response [12]. Therefore, we used the first combined method in the consistency analysis. In Table 4, there will be sixteen rows in our analysis (2 4 = 16). The consensus group involved 65 (68.42%) patients, and there were 30 (31.58%) patients in the conflict group. To identify the most appropriate prognostic score to use when there is conflict between prognostic scores, we compared the number of patients belonging to each group. Table 4 shows that, in the consensus group, both prognostic scores incorrectly predict CML risk group in 21% (19 patients did not achieve MMR, while all scores classified them in the low-risk group, and 1 achieved MMR, while all scores classified this patient in the high-risk group) of cases. In the conflict group, the Sokal and ELTS scores predicted MMR accurately in 46.67% (14 of 30) of patients, while the EUTOS score predicted MMR accurately in 50% (15 of 30) of patients. The highest accuracy of 63.33% (19 of 30) of patients was obtained by the Hasford score for predicting the risk category. However, the accuracy achieved by the Hasford score in both groups (consensus and conflict groups) was the lowest (58.95%) among the other scores (Sokal's accuracy: 62.11%, EUTOS's accuracy: 63.16%, and ELTS's accuracy: 62.11%).
Although the results show that the Hasford performance in the consensus and conflict groups was not recommended, the Hasford score accuracy percentage (63%) shows that Hasford may be useful in identifying risk group in conflict CML patients. In the conflict group, the Hasford prognostic score identified more low-risk categories for CML patients and few high-risk patients, while the Sokal score identified more high-risk patients and few low-risk patients. Only one study [3] reported conflict in 22 CML patients. This study also supports our finding as they found that a majority of patients corroborated better with the Hasford score [14] than the Sokal and EUTOS scores. Previous studies compared and assessed the Sokal, Hasford, and EUTOS but not ELTS scores in investigating consistency between the scoring systems. Our study is the first to investigate the conflict and compare the four validated scoring systems. Comparison of prognostic scores shows the diversity in scoring, but in future work, we intend to implement advanced methods from computer science to resolve conflict. Thus, a new scoring system combining the power of currently available prognostic scores may further help increase accuracy of identifying risk groups.
International Scholarly Research Notices 5 Table 4: The consistency/inconsistency of prognostic scores for predicting major molecular response.