Application of Bayesian Decision Tree in Hematology Research: Differential Diagnosis of β-Thalassemia Trait from Iron Deficiency Anemia

Objective Several discriminating techniques have been proposed to discriminate between β-thalassemia trait (βTT) and iron deficiency anemia (IDA). These discrimination techniques are essential clinically, but they are challenging and typically difficult. This study is the first application of the Bayesian tree-based method for differential diagnosis of βTT from IDA. Method This cross-sectional study included 907 patients with ages over 18 years old and a mean (±SD) age of 25 ± 16.1 with either βTT or IDA. Hematological parameters were measured using a Sysmex KX-21 automated hematology analyzer. Bayesian Logit Treed (BLTREED) and Classification and Regression Trees (CART) were implemented to discriminate βTT from IDA based on the hematological parameters. Results This study proposes an automatic detection model of beta-thalassemia carriers based on a Bayesian tree-based method. The BLTREED model and CART showed that mean corpuscular volume (MCV) was the main predictor in diagnostic discrimination. According to the test dataset, CART indicated higher sensitivity and negative predictive value than BLTREED for differential diagnosis of βTT from IDA. However, the CART algorithm had a high false-positive rate. Overall, the BLTREED model showed better performance concerning the area under the curve (AUC). Conclusions The BLTREED model showed excellent diagnostic accuracy for differentiating βTT from IDA. In addition, understanding tree-based methods are easy and do not need statistical experience. Thus, it can help physicians in making the right clinical decision. So, the proposed model could support medical decisions in the differential diagnosis of βTT from IDA to avoid much more expensive, time-consuming laboratory tests, especially in countries with limited recourses or poor health services.


Introduction
Iron deficiency anemia (IDA) and β-thalassemia trait (βTT) are the two most common hypochromic microcytic anemia. βTT is more prevalent in the Mediterranean region, in specific geographical areas, including the Caspian Sea and Persian Gulf regions; the 10% prevalence was reported [1]. The differential between βTT from IDA is crucial for preventing iron overload and related complications caused by misdiagnosis and inaccurate treatment [2].
Differentiation of β-thalassemia trait from iron deficiency anemia is also essential for premarital counseling in developed countries; for patients with microcytic anemia, complete blood count (CBC), in conjunction with hemoglobin variant analysis by high-performance liquid chromatography (HPLC), is interpreted to differentiate iron deficiency from thalassemia traits. Then, iron studies and molecular testing are also performed. Hemoglobin electrophoresis, serum iron, and ferritin levels are considered to make a definitive differential diagnosis between βTT and IDA [3][4][5].
Jahangiri et al. [32] used classic decision-tree-based methods for constructing a differential diagnosis scheme and investigating the performance of several tree-based methods for the differential diagnosis of βTT from IDA. Decision trees have advantages over traditional statistical methods like discriminant analysis and generalized linear models (GLMs). The main advantage of tree-based methods is a tree structure that makes it easy to interpret the clinical data and be accepted by medical researchers and clinicians. CART is one of the bestknown classic tree algorithms. However, this algorithm suffers from some problems such as greediness, instability, and bias in split rule selection. Bayesian tree approaches were proposed to solve the greediness of the CART algorithm. The greedy search algorithm has disadvantages such as limit the exploration of tree space, the dependence of future splits to previous splits, generate optimistic error rates, and the inability of the search to find a global optimum [40]. Also, the Bayesian approaches can quantify uncertainty and explore the tree space more than classic tree approaches. Bayesian approaches combine prior information with observations, unlike classic tree methods (these methods use only observations for data analysis). The Bayesian approaches define prior distributions on the components of classic tree methods and then use stochastic search algorithms through Markov Chain Monte Carlo (MCMC) algorithms for exploring tree space [41][42][43][44][45][46][47]. So, in the last two decades, many studies have developed Bayesian Treed Generalized Linear Models. These models fit a parametric model such as GLMs instead of using constant models in each tree node. So, these treed algorithms create smaller trees than tree models and improve the tree's interpretation [43].
This paper aims to compare the Bayesian Treed Generalized Linear Models and CART for the differential diagnosis of βTT from IDA based on simple laboratory test results. The outcome variable of the present study is qualitative, so we must use the Bayesian Logit Treed (BLTREED) algorithm for discrimination between these two disorders. This Bayesian treed model fits the logistic regression model in each tree node for data prediction and uses the Metropolis-Hastings algorithm for exploring tree space.

Criteria for Selecting Patient Groups.
In this study, a total of 907 patients aged over 18 years old diagnosed with IDA (n = 370) or βTT (n = 537) were selected. The mean (±SD) age of the patients was 25 ± 16:1 years. Most of the patients (n = 592 (65%)) were women, and 315 (35%) were men.
CBC analysis of EDTA-K2 anticoagulated blood samples was performed using the Sysmex KX-21 automated hematology analyzer (Japan) to measure differential parameters. Hematological parameters like hemoglobin (Hb), mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), Red Blood Cell Distribution Width (RDW), Mean Corpuscular Hemoglobin Concentration (MCHC), and Red Blood Cell count (RBC) were measured for all patients.

Inclusion Criteria.
In the IDA group, patients had hemoglobin (Hb) levels less than 12 and 13 g/dl for women and men, respectively. Mean corpuscular hemoglobin (MCH) and mean corpuscular volume (MCV) were below 80 fl and 27 pg for both sexes, respectively, and for men, ferritin of <28 ng/ml was considered as IDA. In the βTT group, patients had an MCV value below 80 fl. Patients with HbA2 levels of >3.5% were considered as βTT carriers.

Exclusion Criteria.
In the IDA group, the patients who had mutations associated with αTT (3.7, 4.2, 20.5, MED, SEA, THAI, FIL, and Hph) were excluded. For the βTT group, patients with αTT confirmed by mutations in the molecular analysis were excluded. All patients with malignancies or inflammatory/infectious diseases were also excluded.

Ethical
Consideration. This study was approved and supported by the Ethical committee affiliated with the Ahvaz Jundishapur University of Medical Sciences (AJUMS), Ahvaz, Iran. Written informed consent was filled before the enrollment.

Machine Learning Analysis.
Tree-based machine-learning methods are valuable tools in data mining techniques. These methods empower predictive models and could provide a 2 Computational and Mathematical Methods in Medicine solution for constructing the diagnostic test with high accuracy [48,49]. Tree-based models do not need any assumptions about the functional form of the data. One of the advantages of these methods is the graphical presentation of results that make them easy to interpret and no need for statistical experience for the understanding result of models [50][51][52][53]. Tree-based models also were constructed based on Bayesian algorithms. Chipman  2.5.1. Bayesian Logit Treed (BLTREED) Model. The Bayesian approach (BCART) was implemented by using a prior distribution on the two components (Θ, T) of the CART model; T is a binary tree with K terminal nodes or tree with size K, and Θ = ðθ 1 , θ 2 , ⋯, θ K Þ is the parameter set in the terminal nodes (θ i = p ij , i = 1, ⋯, K, j = 1, ⋯, N: the number of distinct classes of the response variable and p ij shows the probability of the jth class of response variable in ith terminal node). The joint posterior distribution of parameters and tree structure was as the following equation: where pðTÞ and pðΘ | TÞ show the prior distributions for tree and parameters in terminal nodes, respectively.
Usually, the Bayesian approach defines prior distributions as unknown; so, tree structure and parameters in terminal nodes were considered unknown [42]. BCART was extended by fitting a parametric model such as a logistic regression model for data prediction and describing the conditional distribution of YjX in each terminal node [43,54]. In the BLTREED model, the conditional distribution of YjX, unlike the BCART model, depends on X (Y | X~f ðY | X, θ i ÞÞ and also by fitting sophisticated model at terminal nodes (by fitting logistic regression model for data prediction in each terminal node), smaller trees and more interpretable were generated. In the BLTREED model, one subset of X can be used to generate the tree and other subsets were used to fit models in terminal nodes (these subsets can be joint and/or disjoint). In the Bayesian approach, θ i = B i shows the regression coefficients for the logistic model fitted in an ith terminal node.
The recursive stochastic process using a tree-generating stochastic process for tree growing (pðTÞ) is as follows [42,43]: (1) Start from T that has only a root node (terminal node η) (2) Calculate the probability for splitting node η as follows: where d η is the depth of the node η, α is the base probability of tree growth of splitting a node, and β is the rate that determines the propensity to split decreases with increased tree size.
Actually, α and β are parameters that control the shape and size of trees, and these parameters provide a penalty to avoid an overfitting model (3) If the node η splits into left and right nodes according to the distribution of p RULE ðρ | η, TÞ, then let T as the newly created tree from step 3 and reapply steps 2 and 3 to the new children nodes The BLTREED model was fitted based on standardized data. So, the same prior distribution can be used independently for parameters in the terminal nodes, and they were considered a multivariate normal distribution with zero mean and variance matrix proportional to the identity for these parameters [43,54].
Posterior distribution function p ðT | X, yÞ was computed by combining the marginal likelihood function p ðY | X, TÞ and tree prior p ðTÞ as follows: In this study, no informative priors were considered. The priors were uniform on variables at a particular node, and all possible splits for variables.
Where p ðY | X, TÞ is as follows: which pðy | X, Θ, ΤÞ, ðy ih , x ih Þ, and n i show the data likelihood function, observed values for hth observation in ith node, and the number of observations in ith node, respectively. The integral of equation four has no closed form, so the Laplace approximation was used to solve it [43,54].
Chipman et al. [42,43] utilize a Metropolis-Hastings algorithm to simulate equation (3) for finding trees with the high posterior distribution. The Metropolis-Hastings algorithm simulates a Markov chain sequence of trees, namely, T 0 , T 1 , T 2 , ⋯: The simulation algorithm was implemented with multiple restarts for reasons mentioned in Chipman et al. [42,43].  [55]. The CART algorithm generates a tree using a binary recursive partitioning, and the tree-generating process contains four steps: (1) tree growing: tree growth is based on a greedy search algorithm, and this algorithm generates a tree by sequentially choosing splitting rules. The CART algorithm uses traditional split-    Computational and Mathematical Methods in Medicine independent test dataset or cross-validation to estimate the prediction error of each tree and then selects the best tree with the lowest estimated prediction error.
2.6. Data Analysis. The BLTREED model and classic CART algorithm based on the two splitting functions like entropy and Gini index (after that, we named the CART methodbased Gini index as CART1 and CART method-based entropy as CART2) were fitted by using predictor variables such as hemoglobin (Hb), mean cell volume (MCV), mean cell hemoglobin (MCH), and red cell distribution width (RDW) for differential diagnosis of βTT from IDA. The BLTREED model fitted using eight restarts with 6000 iterations per restart and a prior standard deviation of 20 for the logit coefficients [54]. For determining the pair of (α, β), the BLTREED model was fitted with two choices, 0.5 and 0.95 for the α parameter, and four choices for β (a range 0.5-2 by step 0.5), then select the pair of (α, β) that generate the best tree with smallest FNR.
Based on the acceptable method of cross-validation in machine learning studies, for assessing the performance of the three models, the dataset was split randomly in the ratio 2 : 1 into a training and a test dataset, respectively, using a stratified random sample to ensure equal allocation of presences and absences (for a classification tree). The model was then fit to the training dataset, and the set of the best trees was determined. For each tree, the posterior predictive distribution was computed for both the training data and the test dataset; this was implemented for each iteration of the BLTREED algorithms, thus incorporating the uncertainty of the model parameters and the data in the evaluation of models. Finally, the predictive performances were calculated based on the confusion matrix of the posterior predictive distribution for both the training and the test dataset [43,47,54,56,57].
Differential performance of the Bayesian classification tree and CART was evaluated using criteria such as sensitivity (TPR), specificity (TNR), false-negative rate (FNR) and falsepositive rate (FPR), positive predictive value (PPV) and negative predictive value (NPV), positive likelihood ratio (PLR) and negative likelihood ratio (NLR), accuracy, Youden's index, and the area under the curve (AUCROC). AUCROC represents the degree of separate ability showing how much the machine learning model can distinguish between the classes (IDA and βTT); actually, it is a global measure of diagnostic accuracy. A perfect classification algorithm has an AUCROC = 1. The interpretation of the AUCROC is described as follows: AUCROC > 0:9: excellent differentiation, AUCROC > 0:8: very good differentiation, AUCROC > 0:7: good differentiation, AUCROC > 0:6: sufficient differentiation, AUCROC > 0:5: bad differentiation, and AUCROC < 0:5: classification method is not useful for discriminating between IDA and β TT [58,59]. Criteria such as Youden's index, accuracy, PLR, NLR (an excellent diagnostic test has NLR < 0:1 and PLR > 10), and AUC take both sensitivity and specificity into consideration, so that can present the performance of the model more accurately than other criteria. In addition, AUC values were compared using DeLong et al. method [60]. A P value < 0.05 was considered a statistically significant difference.

Results
A total of 537 patients were diagnosed as βTT with an average of age (±SD) 22 ± 16:4 including 299 (56%) women and 238 (44%) men, while 370 patients (mean of age (±SD): 29 ± 14:6) were diagnosed as IDA including 293 (79%) women and 77 (21%) men. Table 1 shows the median and interquartile range (IQR) of laboratory parameters as predictor variables across the type of hypochromic microcytic anemia (βTT and IDA).   The tree structure of CART1, CART2, and BLTREED models is shown in Figures 1-3, respectively. The first split of the three methods of classification trees was based on MCV, which showed that MCV has a higher importance value in differentiation between the βTT and the IDA. Another predictor that was used as the second splitting variable in tree structure was HB. According to the presented trees, the BLTREED model produced a smaller tree size and was more interpretable than the CART algorithm (Figures 1 and 2). This model showed values of MCV ≤ 72:6 screening the βTT patients. The BLTREED model extracted four homogenous subgroups for differentiating between the βTT and the IDA (Figure 3).
The predictive performance of models in differentiation between βTT and IDA was calculated based on the confusion matrix ( Table 2). The BLTREED model, CART1, and CART2 trees showed the high TPR, TNR, PPV, NPV, Youden's Index, and accuracy in differentiation between βTT and IDA (Table 3). However, the BLTREED model had a higher accuracy and Youden's index other than CART1 and CART2.
In addition, all the models have NLR < 0:1 that three classification tree algorithms have good diagnostic accuracy for discriminating the patients. Table 4 shows the AUCs of the three tree models from ROC analysis that were statistically significant (P < 0:001) and revealed that all three classification methods had an excellent diagnose accuracy (AUC > 0:9: excellent differentiation) in differentiation between the βTT and the IDA. In addition, Figure 4 displays the receiver operating characteristic curves of the BLTREED model, CART1, and CART2 algorithms for the test dataset, and the comparisons of AUC values between the models. According to the exhibited figure, there was no significant difference between the methods (P > 0:05).

Discussion
In this paper, we used the BLTREED model as the differential diagnostic tool for thalassemia diagnosis. In addition, we compare the predictive performance of the BLTREED model Table 3: Sensitivity (TPR), specificity (TNR), false-positive rate (FPR), false-negative rate (FNR), positive predictive value (PPV), negative predictive value (NPV), accuracy, Youden's index, positive likelihood ratio (PLR), negative likelihood ratio (NLR), and diagnostic odds ratio (DOR) of the BLTREED model in prediction of IDA and βTT groups and their 95% exact confidence interval for training and test dataset.  The Bayesian decision tree was used to solve uncertain problems of conventional tree-based methods [43,54,61]. This model was implemented by using Hb, MCV, MCH, and RDW as independent variables.
Based on our result, MCV and Hb were the main predictor parameters in differential diagnostic, and it showed that the patient with βTT has lower values of MCV.
In previous studies that used the different conventional decision trees for differential diagnosis βTT from IDA, the first split of all algorithms was based on MCV. They also concluded that MCV was a significant predictor variable in the discrimination of IDA and βTT [32,36]. The performance of the BLTREED model that was evaluated using sensitivity, specificity, false-negative and positive rate, and positive and negative predictive value exhibited the high performance of the differential diagnosis of βTT from IDA. In addition, positive likelihood ratio, negative likelihood ratio, accuracy, and Youden's index showed that BLTREED has good diagnostic accuracy for discriminating the patients. It was indeed classified as 96% of βTT patients. Furthermore, AUC as an overall performance index showed excellent and significant accuracy (99, 98) in training and test data, respectively, in differential diagnostic of βTT and IDA. BLTREED has also generated a tree with a smaller size, and it is more interpretable other than the CART algorithms and indicated better diagnostic performance.
Our study has a limitation, which should be considered. The investigated patients have included just IDA and βTT cases and excluded concomitant diseases and αTT cases. Therefore, considering αTT patients in the study would affect the performance of the presented models and changed the interpretation of the result. Particularly when only simple hematologic parameters are used like in the present study, it may be difficult to distinguish αTT from βTT.
Other studies that used different data mining techniques and decision trees based on the frequentist approach of fitting revealed the high performance and accuracy but lower than our result [32,[34][35][36]38]. In many studies which had imbalanced datasets, Oversampling Technique (SMOTE) was applied for handling this problem [34,64].
The BLTREED model improves the classification performance by solving the uncertainty of previous models [43,54]. The diagnostic performance of the BLTREED was better than other discrimination methods (classification trees or hematological discrimination indices) in past studies for differentiating βTT from IDA. These studies are as follows: Setsirichok et al. used a C4.5 decision tree, naϊve Bayes (NB) classifier, and multilayer perceptron (MLP) for classifying eighteen classes of thalassemia abnormality [38]. Bellinger et al. used classification algorithms like the J48 decision tree, support vector machines (SVM), k-nearest neighbors (k-NN), MLP, and NB for differentiating between βTT, IDA, and cooccurrence of these disorders. In this study, the imbalanced dataset was a cause for the weaker performance [34]. AlAgha et al. compared the diagnostic performance of different classification algorithms such as J48, k-NN, artificial neural networks (ANN), and NB for classifying β-thalassemia carriers. They showed that SMOTE helped decrease the problem of highly imbalanced class distribution and consequently improved the predictive performance [64]. Jahangiri et al. utilized classification tree algorithms such as CHAID, E-CHAID, CART, QUEST, GUIDE, and CRUISE for differential diagnosis of βTT from IDA. They indicated that the CRUISE algorithm has the best diagnostic performance similar to the present study, but this classic algorithm uses the greedy algorithm for tree generating and cannot explore the tree space more than the Bayesian tree approaches. Also, many studies compared the diagnostic performance of hematological discrimination indices, and BLTREED showed better performance in comparison to them [16-19, 23, 25-30, 65-80].

Conclusion
In the present study, the BLTREED model showed excellent diagnostic accuracy for differentiating βTT from IDA. According to the advantages of Bayesian tree-based methods like generating a small and more interpretable tree, and lack of uncertainty of different conventional decision trees, this method can be helpful along with other laboratory parameters for discriminating between these two anemia disorders. Also, understanding tree-based methods are easy and do not need statistical experience. So, it can help physicians in making the right clinical decision.