Clairvoyant: AdaBoost with Cost-Enabled Cost-Sensitive Classifier for Customer Churn Prediction

Customer churn prediction is one of the challenging problems and paramount concerns for telecommunication industries. With the increasing number of mobile operators, users can switch from one mobile operator to another if they are unsatisfied with the service. Marketing literature states that it costs 5–10 times more to acquire a new customer than retain an existing one. Hence, effective customer churn management has become a crucial demand for mobile communication operators. Researchers have proposed several classifiers and boosting methods to control customer churn rate, including deep learning (DL) algorithms. However, conventional classification algorithms follow an error-based framework that focuses on improving the classifier's accuracy over cost sensitization. Typical classification algorithms treat misclassification errors equally, which is not applicable in practice. On the contrary, DL algorithms are computationally expensive as well as time-consuming. In this paper, a novel class-dependent cost-sensitive boosting algorithm called AdaBoostWithCost is proposed to reduce the churn cost. This study demonstrates the empirical evaluation of the proposed AdaBoostWithCost algorithm, which consistently outperforms the discrete AdaBoost algorithm concerning telecom churn prediction. The key focus of the AdaBoostWithCost classifier is to reduce false-negative error and the misclassification cost more significantly than the AdaBoost.


Introduction
In developing countries, smartphones play a significant role in human life, and the number of mobile operators is rapidly increasing in every technologically advanced country. By the end of 2019, several billion people subscribed to mobile services, accounting for nearly two-thirds of the global population [1]. ese incessantly growing telecom operators are coming up with various value-added subscriptions to retain their loyal customers. Hence, customer retaining with the same service provider became questionable. In this fierce competitive nature of the wireless telecommunication industry, customers have unlimited freedom to migrate from one service provider to another. is phenomenon is known as churn. A few reasons for churn are dissatisfaction in services such as unattractive recharge plans, frequent call drops, insufficient bandwidth, frequent customer care calls, unreachable networks, and slow Internet speed. In general, several techniques are used to address the customer churn prediction such as statistical learning [2], machine learning [3], evolutionary optimization technique [4], and deep learning [5]. Boosting is an ensemble technique that attempts to create a robust classifier from several weak classifiers. AdaBoost (adaptive boosting) is the first successful algorithm developed for binary classification to improve accuracy. It has now become a somewhat feasible method for different kinds of boosting in machine learning paradigms. However, AdaBoost is inherently a cost-insensitive boosting algorithm; therefore, it has limited applications where costs need to be treated differently for different misclassification errors. is study is interested in attempting to mitigate the limitation.
In many real-world applications like anomaly detection scenarios such as bank loan defaulter, telecom churn prediction, fraudulent transactions in banks, domain feature retrieval [6], and rare diseases identification, the problem of cost-sensitive classification is predominant. e critical reasons for rising telecom churning are telecommunications' technological development, liberalization, and aggressive competition. In a highly competitive market, mobile operators mainly rely on incessant profits from existing loyal customers. In practice, the cost of acquiring a new customer is five to ten times higher than the cost of retaining an existing customer [7]. Increased churn rate is considered the plague in revenue generation because losing a royal customer client indicates losing revenue. erefore, the leitmotiv of marketing strategy is now royal customer retention for the telecom industry. In many real-world applications, classification with imbalanced datasets encounters the misclassification costs of rare or minority classes which are usually more expensive than those of the majority classes, especially in telecom churn, medical diagnosis, and prognosis [8]. For effective customer churn management, it is essential to build an accurate churn prediction model.
Recently, cost-sensitive learning [9][10][11][12][13][14] has gained considerable interest. With the rapid use of ensemble classifiers to improve accuracy, this paper proposes a design of a misclassification cost-sensitive boosting algorithm as an extension of favourably voted boosting method AdaBoost. e clairvoyant study empirically evaluates the Ada-BoostWithCost cost-sensitive boosting method to predict customer churn rate with higher accuracy than the fundamental AdaBoost classifier. In general, boosting is an ensemble technique that attempts to create a robust classifier from several weak classifiers. AdaBoost (adaptive boosting) is the first successful boosting algorithm developed for binary classification using this concept to achieve more accuracy. It has now become somewhat of a go-to method for different kinds of boosting in machine learning paradigms. However, AdaBoost fundamentally is not a cost-insensitive boosting algorithm; therefore, it has inherent limitations for applications where costs need to be treated differently for different misclassification errors. It is interested in attempting to mitigate this limitation. Most classification algorithms treat all kinds of misclassification errors, which may not be accurate in all applications in reality. In telecom churn rate prediction, the customer who will churn if mispredicted by the model has a severe impact on revenue perspective. erefore, model accuracy may not be the correct measure index for real-world cost-sensitive applications. However, instead of optimizing the accuracy, the classification algorithm should then minimize the total misclassification cost. erefore, the paper's key focus is on empirical evaluations and the proposed AdaBoostWithCost algorithm's theoretical issues to reduce the cumulative misclassification cost considerably better than the AdaBoost.

Cost-Sensitive Learning.
Cost-sensitive learning is a type of learning that considers the misclassification costs [15]. e primary objective of this type of learning is to minimize the cumulative misclassification cost.
e key difference between cost-sensitive learning and cost-insensitive learning is that cost-sensitive learning treats different misclassification errors differently. e cost of labelling a positive example as negative can be different from labelling a negative example as positive. Cost-insensitive learning does not consider misclassification costs. When researchers first confronted the variable cost issue, they entertained the costsensitive adjustments in binary classification settings [16]. Cost-sensitive learning is a distinct subfield of machine learning that takes the costs of prediction errors into account while training a machine learning model. One extra input, namely, the cost matrix, is supplied in the model-building phase of the classification process used to construct costsensitive models. When the cost matrix is used in association with boosting, it is said to be cost-sensitive boosting.
1.1.1. e Problem of Class Imbalance. Today classification algorithms assume a proportionate distribution of examples in each class label, which is not always valid in practice. e data are said to suffer from a class imbalance problem when the class distributions are highly imbalanced. ese datasets have a skewed class distribution, and they are also known as imbalanced classification problems. In this context, many classification learning algorithms have low predictive accuracy for the infrequent class [17]. In addition to assuming that the class distribution is balanced, most classifiers also assume that the costs of all types of misclassification error are equal. is assumption is not always valid in many realworld applications. In this situation, the predictive model developed using conventional machine learning algorithms could be biased and inaccurate. Researchers have put serious thought and significant attention to minimizing the misclassification cost instead of minimizing the errors. erefore, in recent years, cost-sensitive learning has been a common approach to solving this class imbalance problem.

Issue of Cost Sensitivity.
Over the past few years, it has been observed that most of the classification algorithms assume the costs of all types of misclassification errors generated by a model as equal [36], which is often not the case for imbalanced classification problems. In class imbalance problems, the wrong prediction of a positive or minority class case is worse than incorrectly classifying an example from the negative or majority class. In recent years, cost-sensitive learning has drawn significant interest because of the increasing number of applications that involve costs such as customer churn prediction [18], fraud detection, and bank loan defaulter. 2 Computational Intelligence and Neuroscience In Section 2, the problem of mobile operators along with the boosting algorithm AdaBoost is discussed. In Section 1.1, cost-sensitive learning is discussed along with problems and issues. e discussion is carried out on various classification algorithms and various popular cost-sensitive boosting algorithms in Section 2. en, in Section 3, Ada-BoostWithCost is proposed with a detailed algorithm, equations, and explanation. An empirical evaluation is performed in Section 4 by taking a dataset to investigate the algorithm on the synthesized data, and the result is generated. An evaluation of the AdaBoostWithCost algorithm and empirical results and visualizations are presented in Section 5.

Related Works
In recent years, there have been countless applications of machine learning [19] and reinforcement learning [20] in the diversified areas such as healthcare predictions [21], cloud resource management [22], and mobile robot navigation [23]. Moreover, a significant surge is also observed in cyber frauds, as well as the corresponding model to counter them, such as credit card fraud detection, telecom churn prediction [2][3][4][5], and detecting rare medical diseases. In the models mentioned above, classifiers are trained to handle most costly errors compared to others. Many ensemblebased classifications have been proposed to introduce the misclassification cost in cost-sensitive classifiers. In literature, various algorithms have been proposed over the past decades for cost-sensitive classification. Various authors have modified decision trees in different ways that consider different class-dependent costs. In [24], the cost-sensitive boosting framework has been proposed by the authors expected to optimize the loss function by applying costsensitive decision rules optimally. An adaptive cost bagging method was proposed in [25]. In the doctoral dissertation [21], a cost-sensitive tree stacking has been proposed where different decision trees are learned in this proposed method and then finally merged in such a way so that the cost function is minimized. In [26], a survey of cost-sensitive learning applications with base classifier as decision trees is demonstrated. e survey contains several types of costsensitive ensembles methods. e outline of the literature survey is described in Section 2.1.

Comparison and Discussion.
is paper surveys various cost-sensitive boosting classifiers mentioned below. ere are various popular cost-sensitive boosting algorithms [27] such as Boosting [28], Uboost, Cost-Uboost [29], AdaCost [30], and CostBoost [31] in addition to recently emerged algorithms such as CSE 1 , CSE 2 , CSE 3 , CSE 4 , and CSE 5 [32]. It is to note that CSE stands for Cost-Sensitive Extension. All specified ten algorithms are compared and summarized in Table 1. Boosting is extended by the CostBoost algorithm.
e Cost-Uboost classifier modified the Uboost. e discrete AdaBoost extended to CSE 1 , CSE 2 , and CSE 3 . In contrast, CSE 4 and CSE 5 are extensions of AdaCost. e goal of all these stipulated algorithms is to modify the weight in different ways in each iteration. As regards AdaCost [17] (AdaBoost with Cost-Sensitive Adaptation), Freund and Schapire's AdaBoost is the first attempt towards the study of the cost-sensitive boosting algorithm. AdaCost is a misclassification cost-sensitive boosting classifier, a variant of AdaBoost. AdaCost applies misclassifications cost in each round of boosting to update the training distribution. e central idea of AdaCost is to incorporate the cost and produce more advanced classifiers which can reduce the misclassification cost better than AdaBoost. CostBoost [31] is the extension of Boosting [28]. e modified version of Uboost is Cost-Uboost [29]. CSE 1 , CSE 2 , and CSE 3 are extensions of discrete AdaBoost. On the contrary, CSE 4 and CSE 5 are extensions of AdaCost. All of these update the weight in algorithmic step. e following are the weight update equations for the cost-sensitive boosting classifiers [33].
Weight update equation for discrete AdaBoost is as follows: (1) Weight update equation for CSE 1 is as follows: Weight update equation for CSE 2 is as follows: Weight update equation for CSE 3 is as follows: In AdaBoost, there is no misclassification cost included in the reweighting step. However, the misclassification cost is incorporated in the weight update equation of some costsensitive classifiers such as AdaCost, CSE 4 , and CSE 5 . e symbols defined in the weight update equations (1)-(3) and (4) are specified as follows. C δ � cost of classification and Weight update equation for AdaCost, CSE 4 , and CSE 5 is as follows: Here, β j is identical in CSE 1 , CSE 2 , CSE 3 , and AdaBoost, whereas for AdaCost and CSE 4 4 and τ+ � − 0.5C n + 0.5 and τ− � 0.5C n + 0.5 for AdaCost and CSE 5 . Furthermore, CSE 5 does not include ϱ c in the calculation of z j [33]. From the above weight update algorithmic equation, it has been noticed that the cost parameter is directly applied to all kinds of misclassification error (false-positive and false-negative) equally in each boosting round. ey all have given equal weight to reduce cumulative misclassification costs. Table 1 depicts the summary of the survey for ten cost-sensitive boosting algorithms.

Proposed Clairvoyant Method
Different methodologies have been studied, and the most appropriate one is selected for this paper. In practice, there have been two schools of thought while dealing with misclassification costs. e first addresses the cost sensitizing with preprocessing the data by implementing various sampling techniques to increase the influence of the desired samples. ese preprocessing techniques rely on examples in the training dataset to minimize cost. e second school of thought i to handle the problem more directly by building cost-sensitive adjustments into the algorithmic step. In this approach, the wealth of existing machine learning algorithms is modified to use the cost matrix. is mechanism gained significant popularity and became more demanding in practice. In the case of the second methodology, for example, AdaBoost and AdaCost, the metaclassifiers are extended to incorporate the cost of misclassification in the weight update method [34]. AdaBoost is a statistical classification meta-algorithm known for adaptive boosting, and it tweaks the learners in favour of instances misclassified by the previous classifiers. On the contrary, AdaCost is a misclassification cost-sensitive boosting method, a variant of AdaBoost. AdaBoostWithCost is an ensemble of AdaBoost and AdaCost to improve the performance. In this paper, the proposed algorithm belongs to the second methodology described above.
3.1. AdaBoostWithCost. Nonetheless, misclassification cost is not used in AdaBoost's weight update rule. In many other methods, the weight-updating rule increases the weights of wrong classifications more aggressively by applying the constant misclassification cost directly to the all misclassification errors (both false-positive and false-negative) equally in each boosting round. Such a traditional framework assumes that all misclassification errors carry the same cost. e proposed AdaBoostWithCost method applies the misclassification cost more specifically to the costly high-risk errors (false-negative in telecom churn study) instead of applying a constant cost to all misclassification errors directly in each iteration of boosting. e algorithm focuses on class-dependent cost sensitivity. e cumulative misclassification costs are reduced by assigning higher weights to costly high-risk errors over low-risk errors. e proposed new algorithm AdaBoostWithCost is illustrated in Algorithm 1.

Definitions of Symbols.
All mathematical symbols and parameters used in equations of the proposed Ada-BoostWithCost algorithm (described above) and flowchart shown in Figure 1 are described in Table 2. e description of the inventive steps is as follows. e central idea of the proposed AdaBoostWithCost algorithm is to increase the weight of the costly misclassified data points more aggressively than the correctly classified data points. Hence, the weight-updating rule increases the weights of the false negatives more than false positives since the falsenegative error is more significant in the telecom churn prediction. In the above AdaBoostWithCost algorithm, steps 7 and 12 constitute the invented steps of the proposed AdaBoostWithCost algorithm. e weight update equation in each boosting round of AdaBoostWithCost is as follows: In the above equation, P D(t+1) denotes the new probability assigned to the i th data point x i at (t + 1) th iteration and P Dt (x i ) represents the distribution of i th data point x i at iteration t. e exponential loss function in the weight update equation is denoted by consisting of two components or subexpressions as follows: (1) e first subexpression is − α t y i h t (x i ) (2) e second subexpression which involves cost and false-negative misclassification error is It is worth mentioning that the value of the expression y i h t (x i ) will be positive if y i h t (x i ) is negative because the negative sign at the beginning changes negative y i h t (x i ) to positive (since α t is always positive). To elaborate more, in case of any misclassification performed by the model, the expression y i h t (x i ) becomes positive, whereas in case of   Computational Intelligence and Neuroscience 5 becomes negative according to the logic prescribed above. erefore, the first subexpression − α t y i h t (x i ) is exactly similar to AdaBoost's weight update equation and it can be derived from the above logic that AdaBoost boosts up the weights of the data points which have been misclassified consistently by earlier models and brings down the weight of the data points which have been classified correctly so that in the algorithm can focus more on the misclassified samples in its subsequent iterations. Nonetheless, the second subexpression (− c t )C fn y i h t (x i ) incorporates cost C fn derived from the supplied cost matrix (described in Section 3.1) and the parameter c t which represents falsenegative error at t th iteration (on the contrary α t is the total misclassification error used in the first subexpression). In the subexpression (− c t )C fn y i h t (x i ), the cost computation component is (− c t )C fn . e other component y i h t (x i ) holds the same evaluation method as described in the explanation of first subexpression. Hence, the subexpression (− c t )C fn y i h t (x i ) will be positive if the y i h t (x i ) is negative because the negative sign at the beginning changes negative y i h t (x i ) to positive and it is multiplied by cost C fn for the false-negative error (denoted by c t ). Here, it is worth mentioning that since both c t and C fn are always positive, the sign of entire expression (− c t )C fn y i h t (x i ) depends on the sign of y i h t (x i ) as described above. erefore, in the second subexpression (− c t )C fn y i h t (x i ), the multiplication of cost C fn to y i h t (x i ) specifically for false-negative error (denoted by c t is the nucleus of the inventive step. e central idea of Ada-BoostWithCost is to incorporate the extra cost specifically for false-negative error to enhance the boosting of the weight, in addition to the normal weight update performed by Ada-Boost. is second subexpression underlines the fact that, to reduce the misclassification costs, costly and high-risk errors have been given more higher weights with respect to low-risk error. In short, in the AdaBoostWithCost algorithm, the weight-updating rule increases the weights of costly misclassified samples more aggressively than the correctly classified samples. e flowchart for AdaBoostWithCost is depicted below. In the flowchart, the inventive step of AdaBoostWithCost is specifically highlighted to demonstrate how AdaBoostWithCost incorporated the cost into the reweighting equation. Table 3 demonstrates the key difference between their weight update equations.

Empirical Evaluation Parameters.
e choice of measurement indices is of paramount importance to evaluate the classifier's performance. Different performance metrics are used to evaluate different classification algorithms. In the context of the current study, the false-negative classification error plays a pivotal role in telecom churn prediction. us, the study seriously focuses on the false-negative error counts for the empirical evaluation.
e study also considers evaluating the other two parameters: misclassification cost and mean misclassification cost, which too holds great influence in the context of this study. e performance metrics are used to evaluate the performance of the proposed costsensitive boosting algorithm AdaBoostWithCost and Ada-Boost. e cost of each class error is shown in the confusion matrix in Table 4, which is supplied as an input to measure the total misclassification cost. e normalized weight distribution concerning cost is shown in Table 5. More details about the confusion matrix and weight normalization method are stipulated in Section 3.1.

Data Selection.
e telecom dataset used in the investigations has been taken from Kaggle [35]. e dataset contains over 3335 rows (Call Data Records) and 21 columns (attributes). Data consist of the various behaviours of customers, and the last column states if the customer is still with the existing telecom company or not. However, the study requires generating synthetic data (over 100,000 samples) to carry out the study's objective.

Generating Synthesized Data.
e objective of the study's experiment is to empirically evaluate the performance of the proposed classifier AdaBoostWithCost with a large volume of data. erefore, it enforces the study to generate synthesized data to fulfil the requirement for the investigation.
e idea is to generate enough synthesized data (near about 100,000 samples) points, that is, Call Data Records (CDR), to compare the robustness of the Ada-BoostWithCost method against discrete AdaBoost. e number of features in the Kaggle dataset is 21 features as well as only 3335 Call Data Records (CDR), which is not sufficient for satisfying the study's objective. Hence, it is essential to generate synthetic data from the source data collected e new probability of the i th data point x i at (t + 1) th iteration α t Hypothesis's weight for gross misclassification error at t th iteration c t Hypothesis's weight for high-risk (false-negative) error at t th iteration C fn Cost of misclassification for false-negative error specified in the input cost matrix

e Input Cost Matrix and Weight Normalizations.
Cost-sensitive machine learning methods explicitly use the confusion matrix as an input while building cost-sensitive classifiers. Fundamentally the cost matrix is a matrix that assigns a cost to each cell in the confusion matrix. e effectiveness of cost-sensitive learning relies strongly on the supplied cost matrix. Parameters provided in the confusion matrix have the utmost importance in both training and prediction steps [36] in the study of cost-sensitive learning. In most of the cost-sensitive boosting algorithms, the cost matrix is supplied in the model-building phase. e costsensitive boosting classifiers modify the weight update equation to incorporate the misclassification cost derived from the cost matrix. Defining the confusion matrix might sometimes be challenging as it is domain-specific. In the telecom churn prediction modeling study, a model is used to predict which customers are more likely to abandon a service provider. In this context of the study, failing to detect an actual churning customer (false-negative case) has a more serious impact on economic results than failing to identify accurately a nonchurning customer (false-positive case). Hence, in this study, the proposed cost-sensitive boosting algorithm specifically focuses on reducing cumulative highrisk misclassification error (false-negative), and, accordingly, the confusion matrix parameters are defined.
Ideally, an accurate cost matrix might be correctly defined by a domain expert or economist. In this study, since the incorrect prediction of the churning customer (falsenegative) has bigger influence, the proposed Ada-BoostWithCost algorithm focuses on reducing specifically high-risk costly errors. Regarding the allocation of the cost for each class in the cost, the matrix is shown in Table 6. It has been observed by most telecom experts from various literature surveys that false-negative classification error is 5 to 10 times more expensive than false-positive error. Considering a worst-case scenario in telecom industries, this study assigns the false-negative cost ten times (extreme case) more than the false-positive cost. Hence, the cost ratio of false-positive errors to false-negative errors used in this study is 1 : 10, which means that false-negative errors are ten times costlier than the false-positive classification errors. e study experiments with running three different sets of iterations for empirical evaluation of AdaBoostWithCost and AdaBoost. It is important to note that Table 4 depicts a hypothetical cost matrix supplied as an input to the Ada-BoostWithCost algorithm and used in the weight update equation to calculate the misclassification cost. In the below cost matrix, in Table 4, the notation C () indicates the cost. In C (x, y), the first parameter x is the predicted class, and the second parameter y represents the actual class. Table 4 represents the confusion matrix; the names of each cell of the confusion matrix are also listed as acronyms; for example, false positive is FP. Table 4 shows the cost-matrix structure where the cost of a false positive is denoted by C(1, 0) and the cost of a false negative is denoted by C(0, 1). Table 6 depicts the cost matrix which is supplied as input to the AdaBoostWithCost algorithm and used in the weight update equation. e assignment of a cost to each cell in the confusion matrix is defined below and referred to as the confusion matrix. It is noteworthy that cell C(0, 1) of the confusion matrix represents the cost of false-negative error, whereas false-positive error is designated by cell C (1, 0). Consequently, cell C (0, 1) is assigned to cost 10, and cell C (1, 0) is assigned to 1 according to the aforementioned discussion (the study considers that the false-negative error is 10 times more costly than the false-positive error). Table 6 shows each cell value of the confusion matrix.
Although the confusion matrix consists with four cells, nevertheless, the true positive and true negative do not play an important role in the context of telecom churn prediction. Moreover, false-positive classification has also an insignificant impact on the context of the study. e only significant parameter is false-negative classification which is the new probability assigned to the i th data point x i at (t + 1) th iteration; all other parameters constituting the right side of the equation are described as follows: α t � hypopaper's weight for gross misclassification error at t th iteration, c t � false-negative error at t th iteration, C fn � misclassification cost for false-negative error specified in the input cost matrix,where, − 1, if actual ≠ predicted 1, if actual � predicted .   Computational Intelligence and Neuroscience has a serious impact in telecom churn modeling, hence the high value of 10 assigned to cell C (0, 1). e calibration of weight distribution with respect to cost is essential to carry out the weight update step in AdaBoostWithCost. e normalization (rescaling) method to transform false-negative value to weight distribution is mentioned in Table 5. To use the cost matrix in the proposed classifier, the confusion matrix cell values must be rescaled within the range of 0 to 1.
is normalization or calibration [37] is an essential step to perform the weight update operation in the reweighting equation of the AdaBoostWithCost algorithm. e normalization technique ensures that the weight or probability distribution of each training data point stands between 0 and 1.
e investigation of this study centered around falsenegative cost 10 and corresponding weight distribution 0.2, highlighted in Table 5.

Experimental Method.
e investigations of the study estimate the three measure indices for telecom churn prediction which have utmost importance, the false-negative errors, misclassification cost, and mean misclassification cost, to assess the performance of the proposed Ada-BoostWithCost classifier. e empirical evaluation of this study demonstrates two significant aspects of benchmarking the performance of the AdaBoostWithCost algorithm against AdaBoost. First, the study focuses on measuring performance metrics: the false-negative errors, misclassification cost, and mean misclassification cost (average misclassification costs across all sets of iterations). Second, it graphically plots the misclassification error rate (both training and test error rates) concerning multiple boosting rounds. To carry out the second measurement criteria mentioned above, this study computes the training and test misclassification error rates for each boosting round of the proposed AdaBoostWithCost classifier and plots them graphically to demonstrate the performance curve of Ada-BoostWithCost boosting classifier and basic AdaBoost classifier. e input cost matrix for each category of errors is defined in Table 4. Here, it is important to mention that false-negative error observation is the foremost interest in this study, since it significantly impacts revenue generation in telecom churn prediction. e false-positive errors are not accounted for seriously in the experiment, since they are insignificant compared to false-negative errors in this context.
Literature states that false-negative classification error is generally 5-10 times more costly than the false-positive classification error in telecom churn modeling. is study considered the worst-case scenario of the telecom industry, that is, presumed the most severe impact on the revenue generation for service providers due to the incorrect falsenegative classification. Given this worst-case scenario, the experiment assigns the false-negative cost ten times (highest possible impact on business) more than the falsepositive cost. It is to be noted that cell C (0, 1) of the confusion matrix represents the cost of false-negative errors, whereas false-positive error is designated by cell C (1, 0). Consequently, cell C (0, 1) is assigned to cost 10, and cell C (1, 0) is assigned to 1. While estimating the three critical performance metrics, the cost matrix must be rescaled or normalized to a range of 0 to 1.
is normalization of probability calibration [37] is mandatory to execute the weight update operation in the reweighting equation of the AdaBoostWithCost algorithm as the weight (probability) distribution of each data point varies between 0 and 1. e normalization method for transforming the confusion matrix's false-negative value to weight distribution is mentioned in Table 5.
e first aspect of the empirical evaluation illustrated above is to determine by using three sets of iterations 10, 20, and 40 to measure the performance metrics; the falsenegative errors, misclassification cost, and mean misclassification cost are explained as follows: the misclassification cost for each set of iterations (10, 20, and 40 used in the experiment) of the AdaBoostWithCost algorithm is computed from the following formula: the misclassification cost � CM[C(0, 1) × false negatives where CM is the confusion matrix and C (row_index, col_index) is the cost of the cell. e study uses iteration-wise computation of cumulative misclassification cost: (a) Cumulative misclassification cost at the end of the 10 th iteration (b) Cumulative misclassification cost at the end of the 20 th iteration (c) Cumulative misclassification cost at the end of the 40 th iteration e misclassification cost is determined by the following formula: mean misclassification cost � cumulative misclassification cost of all iterations over the number of a set of iterations.
where a, b, and c are the above steps to calculate the misclassification cost resulting from each set of iterations, and there are three sets of iterations (10, 20, and 40) that have been used for the experiment to compute the mean misclassification cost. e second aspect of the empirical evaluation specified above is to visually represent the misclassification error rate for both training and test errors by plotting graphs. One of the salient features of the investigation is to manifest the change in training and test error rate over each set of boosting rounds.

e Evaluation of AdaBoostWithCost and
AdaBoost. e error summary of the experimental results focuses on the three important performance metrics: the total misclassification error, false-negative error count, and training and testing error rates. Upon careful inspection of the below synopsis, it is obvious that the values of three performance metrics consistently decrease over each set of boosting rounds 10, 20, and 40, respectively. Specifically, the falsenegative error, which is a parameter of utmost importance in this study, gets reduced significantly over each interval of boosting rounds.

Interpretation of Empirical Results and Visualizations.
e empirical evaluation of the proposed Ada-BoostWithCost algorithm and AdaBoost classifier has been carried out in three crucial performance metrics considered in the study context. e summarized error summary is shown in Table 7. Table 7 manifests the significant difference in experimental results between AdaBoostWithCost and  Computational Intelligence and Neuroscience 9 AdaBoost.
e study observed that AdaBoostWithCost significantly reduced the false-negative error counts compared to the traditional boosting classifier AdaBoost. Hence, the summarized results unfold the fact that Ada-BoostWithCost prevails over AdaBoost in terms of falsenegative error reduction, which is the foremost influential parameter in the context of the study. Figure 2 demonstrates how misclassification error rates of both classifiers monotonically decrease with the increasing number of iterations. Nevertheless, the span of the sharp falling edge shown as the dark blue line (indicating AdaBoostWithCost) unveils the fact that the pace of error rate reduction by AdaBoostWithCost is more expeditious than that by traditional AdaBoost. Figure 2 also reveals eventually that AdaBoostWithCost beats AdaBoost in the race of error rate reduction. e below side-by-side graph shows the decreasing pattern of training and test rates with each set of iterations for both AdaBoost and Ada-BoostWithCost classifiers. e above plots show how both training and test error rates gradually get scaled down over each iteration round. Moreover, the line graphs portray how the training and test error rates monotonically decrease when the number of iterations is increased. By careful inspection, the study discovers that the intermediate gap between the two lines (training and test error rates) demonstrates that training and test error rates reduction is much expedited by AdaBoostWithCost compared to the traditional AdaBoost classifier. e study also concludes from Figure 3 that the AdaBoostWithCost model does not tend to overfit. However, there is a chance of slight overfitting in the case of AdaBoost classier.

Conclusion
Cost-sensitive learning is not new in today's machine learning community. In recent years, it has gained tremendous popularity because of the rising demand for critical real-world cost-sensitive applications. Today, state-of-theart machine learning algorithms are not well designed with financial goals, in the sense that the models miss including the real financial costs during the training and evaluation phases. In the context of telecom churn prediction, a model evaluation based on a traditional measure such as accuracy does not yield the best results when measured by the actual financial cost. Failing to detect true churners severely impacts telecom operators' revenue rather than incorrectly predicting a nonchurning customer as a churner. is paper intended to deal with the challenges of class-dependent costsensitive classification and mitigate the business-specific cost sensitivity.
is paper surveyed various cost-sensitive boosting algorithms in today's machine learning community and summarized their comparison in Table 1. e study also discussed the weight update equation of those cost-sensitive classifiers while dealing with variable cost errors. Nevertheless, the study significantly contributed to class-dependent cost-sensitive boosting classification in two distinct aspects: First, the study devised a novel class-dependent cost-sensitive boosting algorithm, AdaBoostWithCost, which incorporates the cost function into the weight update equation in a novel way. e inventive step of Ada-BoostWithCost is in the weight update equation, which incorporates the unique cost function.
e Ada-BoostWithCost classifier applied the misclassification cost in the reweighting equation more specifically to the high-risk errors (false-negative error in the telecom churn case) instead of applying to all misclassification errors directly in each iteration of boosting. Second, the study carried out an in-depth inspection of experimental results summarized in Table 7 and the interpretation of graph visualizations (Figures 2 and 3). Finally, the study has drawn a significant conclusion that the AdaBoostWithCost algorithm consistently outperforms AdaBoost in all aspects of the study's objective.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.