IMBoost: A New Weighting Factor for Boosting to Improve the Classification Performance of Imbalanced Data

Imbalanced datasets pose signifcant challenges in the feld of machine learning, as they consist of samples where one class (majority) dominates over the other class (minority). Although AdaBoost is a popular ensemble method known for its good performance in addressing various problems, it fails when dealing with imbalanced data sets due to its bias towards the majority class samples. In this study, we propose a novel weighting factor to enhance the performance of AdaBoost (called IMBoost). Our approach involves computing weights for both minority and majority class samples based on the performance of classifer on each class individually. Subsequently, we resample the data sets according to these new weights. To evaluate the efectiveness of our method, we compare it with six well-known ensemble methods on 30 imbalanced data sets and 4 synthetic data sets using ROC, precision-eecall AUC, and G-mean metrics. Te results demonstrate the superiority of IMBoost. To further analyze the performance, we employ statistical tests, which confrm the excellence of our method.


Introduction
Imbalanced problems have occurred in various felds such as fault detection [1], anomaly detection [2], credit risk prediction [3], and cancer diagnoses [4].Imbalanced datasets have been one of the challenging issues in the feld of machine learning, where samples belonging to one class, majority or negative, outnumber the other class, minority or positive.Terefore, most classifers have difculties in dealing with such problems since they tend to bias towards majority samples, resulting in weak performance in the classifcation of minority class samples.In recent years, various methods have been developed to cope with imbalanced problems that can be categorized into four taxonomies, namely, the data level, algorithm level, and hybrid approaches.
Data-level methods aim to balance classes and include under-, over-, and hybrid-sampling methods.Undersampling involves removing majority class samples until the minority and majority classes are balanced.Tis simplifes training and improves run time and storage [5].However, the main drawback of this method is the removal of potentially useful samples.Oversampling, on the other hand, duplicates or adds minority class samples [6].While this increases the training set size and time, it might also add meaningless samples to the data set.Hybrid-sampling methods combine multiple sampling techniques to overcome the limitations of individual sampling methods.[7].Algorithm-level methods modify the algorithm to remove its bias towards the majority classes [8].Cost-sensitive learning is a commonly used approach in handling imbalanced data sets.It involves assigning a higher cost to misclassifed samples from the minority class, forcing the classifers to classify the minority samples correctly [9].Defning costs of misclassifcation is challenging in these methods and is disadvantageous [10].Hybrid methods are a combination of the aforementioned techniques, with ensemble methods being one of the most common approaches in this category [8].Ensemble learning algorithms demonstrate strong performance in addressing balanced problems; however, they encounter challenges when faced with skewed data sets due to their primary focus on maximizing accuracy.To address this limitation, the literature proposes the integration of ensemble learning algorithms with other approaches, including algorithm-level and data-level techniques, to efectively handle imbalanced data sets.By combining these methods, ensemble learning algorithms can better adapt to imbalanced issues and improve their classifcation performance.[11].
Overall, ensemble methods train base learners and combine them and the results show that a combination of base classifers outperforms individual classifers [12].Bagging [13] and boosting [14] are the most common ensemble methods.In the bagging technique, classifers are trained in parallel using training sets known as "bags."Tese bags are created by sampling the original data set with replacement, a method called bootstrap sampling [13].On the other hand, boosting algorithms work iteratively by placing a stronger emphasis on difcult samples.Tis is accomplished by increasing the weights assigned to samples that are incorrectly classifed by the base classifer at each iteration [14].
Boosting is a widely recognized approach in ensemble learning, known for its ability to create a strong and robust classifer by combining multiple weak classifers [15].Te strength of boosting lies in its serial learning aspect, which leads to excellent generalization.In essence, weak classifers are learned sequentially with the objective of reducing the errors made by previously trained classifers.Over time, various boosting approaches have been proposed.AdaBoost is a popular boosting technique used in ensemble schemes.It follows an iterative process where a set of weak classifers is trained on weighted data.Te outputs of these weak classifers are combined through a weighted summation to produce the fnal boosted classifer.Despite the good performance of AdaBoost, its performance against unbalanced data sets should be improved due to giving the same weight to minor and major samples [16].Moreover, the weak performance of base classifers in AdaBoost can afect the performance of AdaBoost against imbalanced problems.In this work, we propose a new approach to improve the performance of AdaBoost in dealing with imbalanced problems.To do this, unlike traditional AdaBoost, we compute the performance of classifers on minor and major samples separately.Furthermore, as part of the boosting process, the training data are resampled based on the weights assigned to each sample.Terefore, a new training data set is created by sampling (with replacement).In the new data set, samples with higher weights are repeated multiple times.Tis repetition allows the classifer to bias towards these particular samples, leading to a better learning process.By giving more importance to these weighted samples, the classifer can focus on capturing their patterns and characteristics, potentially resulting in improved performance.In addition, the initial weights of positive and negative samples are calculated based on their distribution.To validate the efectiveness of our proposed approach, we perform comprehensive experiments and compare our approach with state-of-the-art methods.
Te remaining sections of the paper are structured as follows.Section 2 provides a review of existing methods for imbalanced data sets.Section 3 explains the proposed method.Sections 4 and 5 present the experimental setup and results, and conclusions are fnally presented in Section 6.

Related Works
Te advantages of ensemble learning motivate researchers to focus on the feasibility of using boosting algorithms.Te base concept of ensemble learning refers to using multiple learning models instead of a single model to get better performances.Recently, a lot of ensemble-based methods have been proposed to cope with imbalanced problems.In this section, we shortly discuss these methods.
Chawla et al. [17] have proposed a method to oversample the data of minority class called the SMOTE boost method for learning from imbalanced data sets.For the frst time, the issue of using the majority class to make the data set balanced was raised in [18].RUSBoost is an ensemble method that combines data sampling and boosting to enhance classifcation performance in the presence of imbalanced training data.It addresses the limitations of SMOTEBoost by addressing the complexity and time-consuming nature of data sampling.
Undersampling is one of the popular methods to address imbalanced problems.Liu et al. [19] proposed a method in which many majority class samples were ignored.Te main idea of this work was to sample several subsets from the majority class and then train a learner using each of them and fnally combine the outputs of those learners.
To improve the performance of the boosting algorithm for imbalanced data, Iosifdis et al. [20] proposed a novel cost-sensitive boosting approach (called AdaCC) that dynamically adjusts the misclassifcation costs over the boosting rounds in response to the model's performance.Tis approach is more efective than using a fxed misclassifcation cost matrix.
Yuan and Ma [21] proposed a novel approach to addressing imbalanced data sets by combining oversampling with AdaBoost and retraining the weights of the classifers using optimization techniques such as genetic algorithms.Tis approach allows for the direct optimization of targeted performance measures, such as G-mean and F-measure, and can improve the overall performance of the AdaBoost algorithm on imbalanced data sets.
Mostafaei and Tanha [22] proposed a new technique for addressing imbalanced data sets.Te technique involves undersampling the majority class using peak clustering.In addition, the paper proposed a boosting-based algorithm called OUBoost, which combines the peak undersampling technique with SMOTE.It selectively chooses useful examples from the majority class and generates synthetic examples from the minority class, thereby indirectly modifying the update weights.
Ke et al. [23] proposed a method called majority resampling via subclass clustering (MRSC).Tis approach tried to address the issue of imbalanced data sets.It utilized a clustering algorithm to cluster the majority class instances into numerous groups or subclasses.Subsequently, a new training set was created by combining these subclasses with 2 Complexity the minority class data.Te imbalanced ratio (IR) of the new multiclass data set had lower IR compared to the original data set.Finally, the classifer was trained using the created data set.Roshan and Asadi [10] proposed a multiobjective approach to address imbalanced problems.Tey utilized NSGA-II to undersample the majority samples.Finally, the solutions obtained from the Pareto front were used to train classifers.
Zhai et al. [24] introduced a novel approach to address the shortcomings of SMOTE, particularly its limited diversity and the overlapping nature of generated minority samples.Teir method incorporated the use of generative adversarial networks (GANs) and combined an oversampling technique with a two-class imbalanced data classifcation approach.Te oversampling method consisted of an enhanced GAN model, while the classifcation method involved fusing classifers using fuzzy integral.
Puri and Kumar [25] introduced an enhanced hybrid bag-boost model that incorporates a novel resampling technique.Tis technique comprises two main components, namely, K-means SMOTE for oversampling and edited nearest neighbor (ENN) for undersampling to eliminate noise from the data set.Tis technique operated in three steps as follows: frst, it clustered the data set using K-means clustering, then it applied SMOTE within each cluster to address class imbalance by generating synthetic instances of the minority class, and fnally, it employed ENN to eliminate instances that contribute to noise.
Wang et al. [26] introduced a hybrid strategy called Majority-to-Minority Resampling (MMR) and a boosting algorithm called Majority-to-Minority Boosting (MMBoost) for classifcation tasks.MMR was developed to tackle class imbalance by taking samples from the majority class to augment the minority class.Tis adaptive resampling approach aimed to mitigate information loss.In addition, the MMBoost algorithm adjusts weights of the sampled instances to further enhance classifcation performance.
Arafa et al. [27] proposed a preprocessing method called Reduced Noise-SMOTE (RN-SMOTE) to address unbalanced problems.Te RN-SMOTE method consisted of several steps.First, it used SMOTE to oversample the training data.However, since SMOTE added noise to minority classes, DBSCAN was applied to identify and remove noise.Once the noise was eliminated, the clean synthetic instances were combined with the original data.Ten, RN-SMOTE was employed to rebalance the data set again using SMOTE before feeding it into the underlying classifer.
Li et al. [28] presented a novel ensemble method that combined ensemble learning techniques with a new undersampling method called binary PSO instance selection.Te proposed method aimed to address the challenges of imbalanced classifcation problems.By leveraging the strengths of ensemble learning and the undersampling technique, the method efectively selected a suitable combination of majority class samples to create a new data set that incorporates minority class samples.Te approach utilized a multiobjective strategy to optimize the performance of imbalanced classifcation while preserving the integrity of the original data set.Dong and Qian [29] introduced DBRF, a density-based random forest algorithm, to enhance prediction performance for imbalanced problems.DBRF focuses on identifying challenging boundary samples and employs a densitybased approach to augment them.Two distinct random forest classifers are then built to model the augmented boundary samples and the original data set separately.Te fnal output is determined using a bagging technique, which combines the predictions from the two random forest classifers.
Gu et al. [30] proposed the DCI-ISSA equilibrium ensemble method to address class imbalance issues.It introduced two techniques as follows: data center interpolation (DCI) for creating balanced data sets and the improved sparrow search algorithm (ISSA) for parameter optimization in random forest (RF).Tese techniques improved classifcation performance and adapted to diferent imbalanced data sets.
Zhao et al. [31] introduced a weighted hybrid ensemble method, called WHMBoost, designed to classify imbalanced data in binary classifcation tasks.WHMBoost combined two data sampling methods and two base classifers within a boosting algorithm framework.Each sampling method and base classifer were assigned specifc weights to leverage their complementary advantages and enhance performance.
Morais and Vasconcelos [32] introduced the kinfuential neighborhood for oversampling (k-INOS) algorithm, which aimed to enhance the robustness of oversampling algorithms when dealing with noisy examples in the minority class.Te k-INOS algorithm followed a threestep process.First, it detected and removed instances in the minority class that were likely to be noise.Ten, an oversampling algorithm was applied to the undersampled data set.Finally, the previously removed minority examples were reintroduced into the data set after oversampling, ensuring that they were not used to augment the minority class directly.
In [33], an ensemble algorithm called BPSO-Ada Boost-KNN for addressing multiclass imbalanced data classifcation was proposed.Te algorithm combines feature selection and boosting techniques within the ensemble framework.In this model, BPSO (binary particle swarm optimization) is employed as the feature selection algorithm using AUCarea as the ftness measure.Finally, a boosting classifer was built in which KNN was chosen as the base classifer.
Wang and Sun [16] presented a method to enhance the AdaBoost algorithm by introducing weighted vote parameters for the weak classifers.Te proposed approach determined the weighted vote parameters based not only on the global error rate but also on the classifcation accuracy rate of the positive class, which is the primary focus of interest.
Piao et al. [34] introduced a cost-sensitive ensemble learning method for classifying imbalanced data, utilizing a support vector machine (SVM) as the base learner within the AdaBoost.Te method was developed to rebalance the weights of AdaBoost, infuencing the training of the base learner.Tis weighting strategy focused on increasing the Complexity sample weight of misclassifed minority instances while decreasing the sample weight of misclassifed majority instances in each round, efectively equalizing their distributions.By employing this method, the classifcation performance on imbalanced data can be improved.Moreover, other methods had been developed that adjusted weights to instances according to their labels, such as AdaC1-2-3 [35], AdaCost [36], and CSB1-2 [37].
Hao and Huang [38] proposed an algorithm that begins by obtaining a group of k-base classifers through K-fold cross validation with a set of training samples.A subclassifer was then obtained using a voting method.Concurrently, weight coefcients for each subclassifer were derived based on the training error and adjustments were made to the data distribution of the training samples.
Furthermore, multiple base classifer groups contributed to obtaining more diverse subclassifers.Finally, these subclassifers were combined with weights to create an integrated enhanced classifer.

IMBoost
3.1.Original AdaBoost.Before delving into our method for improving AdaBoost to deal with imbalanced problems, we frst review AdaBoost.Let S � {(x 1 , y 2 ), . .., (x m , y m )} be a training set, where each sample x i belongs to sample space X and each label y i belongs to label space Y.
Te concept behind AdaBoost is to utilize a weak learner to make precise predictions by iteratively training the weak learner on diferent distributions of the training samples.Te algorithm for AdaBoost is presented in Algorithm 1. AdaBoost assigns weights to the training samples based on their classifcation errors and increases or decreases the weights of the samples that are classifed incorrectly or correctly, respectively.

IMBoost.
In our work, we focus on binary imbalanced problems, where the minority class is labeled as y min � +1 and the majority class is labeled as y maj � −1.We consider a weak learner that accepts the training set as input along with a distribution D over {1, . .., m}, which is our training set, and m is number of samples.Given this input, the weak learner computes a weak hypothesis h that maps x to R. Te sign of h(x) is used to determine the predicted label to be assigned to sample x.If h(x) is positive, the predicted label is assigned as the positive class, and if h(x) is negative, the predicted label is assigned as the negative class.
In general, AdaBoost is not well suited for imbalanced classifcation problems.Tis is because the minority class samples are underrepresented and AdaBoost tends to bias towards the majority class.Consequently, the overall performance of the classifer can negatively be impacted.Terefore, it is important that the weights of minority and majority class samples are adjusted according to classifer performance on them.In this paper, we propose an improved version of AdaBoost that revised the weighting mechanism of AdaBoost for binary imbalanced data sets.
In our method, at frst, we set the initial weights to 1/N min and 1/N maj for minority and majority class samples, respectively, where N min is the number of samples belonging to the minority class, N maj is the number of samples belonging to the majority class, i min and i maj are ith example that belongs to minority class samples and ith example that belongs to majority class samples, respectively.
Similar to original AdaBoost, we train a weak classifer h t (x) using the distribution over the training set.
While in AdaBoost, the weighted classifcation error ε is computed over all the training set, we calculate ε for both minority and majority class samples (equation ( 1)).
Error rate of minortiy class samples: Error rate of majority class samples: where I is an indicator that yields a value of 1 if the condition inside the parentheses is true and 0 if it is false.Ten, the weights of h t on minority and majority samples, denoted by α t min and α t min , are computed based on ε min and ε maj .Intuitively, α t min and α t min measure the signifcance of h t on minority and majority sample classes; α t min and α t min get larger if ε min and ε maj get smaller.α t min and α t min are calculated as follows: ( Once the error rate and hypothesis weight are computed, the weights of the training samples are then modifed accordingly, where exp(−α t × y i × h t (x i )) and exp(−α t × y i × h t (x i )) are the weight update factor for minority and majority class samples, respectively.Te misclassifed samples by h t get higher weights, and correctly classifed samples get lower weights.Subsequently, the training data undergo a resampling process based on the updated weights of the samples.At this stage, the weights of both minority and majority class samples are reinitialized.In the updated data set, samples with higher weights have a higher likelihood of being repeated multiple times.As a result, the classifer becomes biased towards these samples, leading to improved learning performance.Finally, the fnal classifer H(x) is obtained by combining all weak classifers using their weights.Te pseudocode of our method is presented in Algorithm 2.

Forward Stagewise Additive Modeling (FSAM).
Friedman et al. [39] presented a statistical interpretation of the binary AdaBoost algorithm and demonstrated that the two-class AdaBoost algorithm can be considered as a forward stagewise additive modeling using the exponential loss function as follows: 4 Complexity In this part, we show that our method can be interpreted as forward stagewise additive modeling employing the exponential loss.
Given the training set, we try to fnd We assume that f(x) has the following form: where β (m) is the coefcient, and g (m) (x) is the basis function.We need g(x) to satisfy symmetric constraint as follows: where g: x ⟶ y y min or y maj  .
FSAM aims to build a predictive model by incrementally adding simple base models to the existing model without adjusting the parameters and coefcients of models that have already been added.Te pseudocode of FSAM is presented in Algorithm 3. Now, using exponential loss, we are trying to fnd β and g (m) (x) to solve the following equation: Input: training set S � {(x 1 , y 2 ), . .., (x m , y m )}, where x i is the feature vector and y i is the label; number of iterations: T. Output: fnal classifer H(x).(1) Initialize weightsD 1 (i) � 1/m, for i � 1, . . ., m (2) For t � 1 to T: (a) Train a weak classifier h t (x) using the distribution D t (b) Compute the training errorrate ε t with respect to D t given by: where Z t is a normalization factor (e) R − sample the training data according to weights and initialze the weights of samples as step1.
(3) Output the final classifier: Assuming y i to be y i min and y i maj , we expand equation ( 7) as follows: and where w i min and w i maj are the weights of minority and majority class samples, respectively.It worth noting that when the data set is balanced, (8) reduces to (7).Moreover, note that in (8), w i min and w i maj are not a function of G and β and they depend on f m and change at each iteration.We can expand equation ( 8) as follows: Now, we minimize (10).To do this, frst, we assume β min and β maj are fxed, and we just minimize over G. Terefore, for fxed β min and β maj , G(x) is as follows: ALGORITHM 3: Te pseudocode of FSAM.6 Complexity Next, we want to minimize β min and β maj .We keep β maj as the constant and take derivative with respect to β min and set to zero as follows: For β maj , we do the same calculation as (12).Tus, β maj is calculated as follows: Finally, the model and weights are updated according to equations ( 14) and (15), respectively.

Experimental Setups
Our goal is to demonstrate that the proposed method can improve the performance of AdaBoost over imbalanced data sets.To do this, we conducted comprehensive experiments and compare our method with the state-of-the-art method on diferent data sets with diferent IR.In the following section, frst, information about the data sets and the tuned parameters are presented and then the used metrics are presented.

Data Sets.
In this study, four synthetic data sets and 30 imbalance data sets available at KEEL [40] are used to evaluate the proposed method.Te distribution of these synthetic data sets is presented in Figure 1.Tables 1 and 2 show the features of data sets including the number of attributes, number of samples, and proportion of number of majority class samples to minority ones (IR).In experiments, the area under ROC and the precision-recall curve and geometric mean (G-Mean) estimations are obtained using 5-fold cross validation; each dataset is partitioned into fve folds, with each fold containing an equal number of samples.Subsequently, the learning algorithms are trained on four of the folds and tested on one fold for each iteration.

Tuning of Parameters.
Te performance of an algorithm can be greatly infuenced by the assigned parameter values.
Modifying parameter values in algorithms can yield diferent solutions.Tis subsection focuses on parameter tuning, where values are selected in a manner that facilitates comparison with other methods under similar conditions.To compare IMBoost with other methods, six methods, i.e., OUBoost, AdaCC SMOTEBoost, AdaCost, AdaC1, and AdaBoost are used.In Table 3, the related parameters for all methods are presented.Other parameters are adjusted based on the original paper.

Metrics.
In this section, we introduce the utilized metrics, including the area under ROC and the precisionrecall curve and geometric mean (G-Mean)

ROC and Precision-Recall AUC.
Both ROC and precision-recall AUC (PR AUC) are used for imbalanced problems; however, PR AUC is more robust under imbalanced data sets [41].ROC AUC measures the area under ROC curve, while PR AUC measures the area under precision and recall curve (equation ( 16)).
Te ROC AUC represents the area under this curve and is a measure of how well your model can distinguish between minor and major instances.On the other hand, PR AUC focuses on the performance of the model within the positive class and is particularly relevant when dealing with imbalanced data sets. .

Results and Discussion
In this section, the results of the method are provided and compared.First of all, we investigate the performance of method on synthetic data sets.Ten, we compare the outcome of the methods on 30 data sets.

Results on Synthetic Data Sets.
To facilitate a more comprehensive comparison, we created four synthetic data sets, each with varying imbalanced ratios (IRs) and incorporating noise and outliers.In Tables 4-6, we present the performance of various methods based on ROC AUC, PR AUC, and G-mean.Te results reveal that our method achieves the lowest ROC AUC score on synthetic 1 but attains the highest ROC AUC scores on synthetic data sets 2 and 3.In Figure 2, we depict the performance of diferent methods on synthetic data sets.It is evident that our method outperforms the others, especially on data sets with the high IR.However, when considering the PR AUC, our method falls short on synthetics 2 and 4. Furthermore, when using the G-mean metric, our method exhibits slightly worse performance on synthetic 4 compared to other methods.

Results on KEEL Data Sets.
Te efciency of IMBoost is reported in terms of ROC AUC, PR AUC, and G-mean in Tables 7-9.In Tables 7-9, each row corresponds to the results of OUBoost, AdaCC, SMOTEBoost, AdaCost, AdaC1, and AdaBoost over each data set, where the average of mentioned measures for 5-fold cross validation is reported for each method.Te last row of the tables explains the average and the frequency of each method achieving the frst rank.
In Table 7, with respect to the ROC AUC metric, our method achieves the frst rank 13 times, while OUBoost and AdaCC hold the second-best rank.Figure 3 compares the average of obtained results on diferent data sets, and it depicts that the proposed method performs better than OUBoost, AdaCC, SMOTEBoost, AdaCost, AdaC1, and AdaBoost.Although OUBoost and AdaCC achieve more frst rank in comparison with SMOTE-Boost, SMOTEBoost outperformed them on average.ROC AUC serves as a standard evaluation measure, encompassing both positive (minority) and negative (majority) samples.
Consequently, precision-recall AUC (PR AUC) emerges as a valuable metric to assess the performances of methods, specifcally concerning the minority class.In Table 8, we evaluate the performance of various methods based on PR AUC.Table 8 depicts that our method attains the top rank 12 times, while AdaCC secures the frst rank in 8 data sets, highlighting the excellence of our method.However, when we consider the average results, there is not a signifcant disparity in the performance among these methods.In Figure 4, the performances of models are compared in terms of their averages.
Moreover, Table 9 depicts that our method outperforms other methods in terms of the G-mean metric.According to the results, our method obtains the frst rank 13 times, indicating its superior performance.Figure 5 further illustrates that our method performs well on average.
In addition, Figure 6 depicts the performance of the top four methods, namely, our method, OUBoost, AdaCC, and SMOTEBoost, based on the IR.Te regression line associated with each classifer illustrates how the performance of the classifer decreases with an increase in the IR.However, our method shows a signifcant reduction in performance with a steep slope.After ecoli3 (with an IR of 8.6), it demonstrates higher performance.In other words, our method performs well on data sets with a higher IR compared to the other methods.
Based on the results obtained from both the synthetic and KEEL data sets, we can conclude that our method exhibits superior performance on data sets with the high IR.

Te Efect of Classifers.
In this section, we will discuss the impact of the number of classifers on our method.Table 10 presents the results of our method with diferent numbers of classifers, displaying the average results and the number of times it achieved the frst rank.According to the fndings, our method performs well with 10 classifers and the addition of extra classifers does not signifcantly improve its average performance.However, it is worth noting that our method with 20 and 30 classifers outperforms the aforementioned methods on average.Complexity Furthermore, it is important to note that our method with diferent numbers of classifers still performs poorly on specifc data sets, namely, glass1, Wisconsin, glass0, vehicle3, ecoli2, glass2, page-blocks-1-3_vs_4, and abalone9-18.However, when utilizing 20 classifers, our method demonstrates improved performance on haberman and glass-0-1-2-3_vs_4-5-6.In addition, with 50 classifers, it performs well on vowel0 and ecoli4, and with 100 classifers, it achieves good results on new-thyroid1.
Figure 7 illustrates the impact of the number of classifers, with the x-axis representing the number of classifers and the y-axis indicating the average results across data sets.
As depicted, the performance of our method decreases with an increase in the number of classifers.Tese results suggest that adding more classifers is not suitable for enhancing the performance of IMBoost.

Time Complexity.
For further analysis, we investigate the time complexity of our method and compare it with other methods.Table 11 provides a comparison of the time complexity of the methods, with the execution time presented in seconds.It is noteworthy that our method, which includes 10 classifers, exhibits the worst execution time among the methods.In   Complexity contrast, AdaC1 demonstrates the lowest average execution time, followed by AdaBoost as the second best.However, their performance is not satisfactory compared to other methods.Based on the results, it is evident that increasing the number of classifers in our method leads to higher time complexity.Terefore, since a larger number of classifers does not contribute to an improvement in the performance of our method, but increases the time complexity, it is not advisable to increase the number of classifers.

Nonparametric Statistical Tests.
To assess the presence of a signifcant diference between the methods, statistical tests are conducted on the obtained results.To conduct these tests, we employ a two-stage approach proposed by Desmar [42].
In the initial stage, a statistical test based on the ranking of algorithms according to their performance is conducted.Te null hypothesis assumes that the algorithms have equal performance.Rejecting the null hypothesis indicates statistically signifcant diferences in the performance of the algorithms.
In the preceding step, the algorithm with the highest rank is identifed as the "control algorithm."It is subsequently compared to other algorithms in pairwise comparisons using various nonparametric post hoc statistical tests, such as Holm [43], Hochberg [44], and Hommel [45].
Table 12 presents the ranking results based on the ROC AUC using the Friedman test.Te p value, which is smaller than the signifcance level, indicates the rejection of the null hypothesis, demonstrating signifcant diferences in performance among the algorithms used.Our method ranks frst among the other algorithms in terms of ROC AUC and is, therefore, designated as the "control algorithm."Subsequently, this algorithm is pairwise compared to the other employed algorithms using post hoc statistical tests, including Holm, Hochberg, and Hommel.Te results of these tests are reported in Table 13.

Complexity
Table 13, at a signifcance level of 0.05, shows that the control algorithm outperforms the other algorithms pairwise, except for AdaCC.To gain insight into how our method performs compared to AdaCC, we conducted a pairwise comparison using the Wilcoxon signed-rank test at a signifcance level of 0.05.As indicated in Table 14, the p value exceeds the 0.05 signifcance level.In other words, the observed diference between the sample change and the expected change is not substantial enough to be considered statistically signifcant.Consequently, concerning the ROC AUC metric, there is only a marginal diference between our method and AdaCC.However, when considering the average performance across all data sets, our method surpasses AdaCC, and on most of the data sets, IMBoost demonstrates superior performance.In the identical experiments using the PR AUC, as shown in Table 15, AdaCC secures the top position and is designated as the "control algorithm."However, the outcomes of the Holm, Hochberg, and Hommel tests, as presented in Table 16, reveal that AdaCC In addition, we conducted the Friedman test based on Gmean and the results are presented in Table 18.According to these results, our method claims the frst rank and is designated as the "control algorithm."Moreover, the outcomes of the Holm, Hochberg, and Hommel tests in Table 19 indicate that our method performs better than the other methods at a signifcance level of 0.05, with the exception of AdaCC.Consequently, the Wilcoxon test is used at a signifcance level of 0.05.As shown in Table 20, based on the Gmean metric, our method outperforms the other methods.
Overall, the evaluation conducted using the Friedman test reveals that the proposed method attains a higher rank compared to other techniques, with AdaCC securing the second rank.Subsequent post hoc tests confrm that the employed algorithms exhibit varying performance in terms of G-mean, and our method demonstrates the best performance.

Conclusion
Most studies addressing imbalanced problems commonly employ over-and undersampling techniques.However, these methods may introduce noisy data or discard important information, respectively.In this paper, we propose a novel method to enhance the performance of AdaBoost for imbalanced data sets.Our approach involves initializing the weights of minority and majority samples based on their distribution.Subsequently, the weights are updated according to the error of classifer on minority and majority samples separately.Te data set is then resampled based on these updated weights, and this process is iterated.
To evaluate the efectiveness of our method, we compare it with six ensemble methods on 34 data sets.Te performance of these methods is measured using the ROCAUC, PR AUC, and G-mean metrics.Te results based on these metrics demonstrate that IMBoost outperforms the others, consistently achieving the highest rank on most data sets.In addition, our method exhibits strong performance on data sets with high imbalance ratios.According to statistical tests, which included both PR and ROC AUC measures, no signifcant diference was observed between AdaCC and IMBoost.However, when we applied the G-mean measure, it solidifed our method's excellence.However, it is important to note that our method has a drawback in terms of time complexity since the increase in samples leads to higher computational requirements.
For future studies, extending IMBoost to multiclass data sets or semisupervised problems could be explored.In addition, applying metaheuristics and comparing the results with our method could be considered as a promising further research.

Figure 1 :
Figure 1: Te distribution of synthetic data sets.

Figure 6 :
Figure 6: Regression line to ft the AUC and the IR.

Figure 7 :
Figure 7: Te efect of number of classifers on our method performance.

Table 1 :
Te attributes of synthetic data sets.

Table 2 :
Attributes of data sets.

Table 4 :
Te average value of ROC AUC belongs to our method and six state-of-the-art methods on synthetic data sets.

Table 5 :
Te average value of PR AUC belongs to our method and six state-of-the-art methods on synthetic data sets.

Table 6 :
Te average value of G-mean belongs to our method and six state-of-the-art methods on synthetic data sets.

Table 7 :
Comparison of the ROC AUC of diferent methods (the better ones are bold).

Table 8 :
Comparison of the PR AUC of diferent methods (the better ones are bold).

Table 9 :
Comparison of the G-mean of diferent methods (the better ones are bold).

Table 10 :
Te average value of AUC belongs to our method with diferent number of classifers.
Based on the number of trees, bold values represent the best result in each synthetic dataset.

Table 11 :
Te time complexity of our method compared to 5 state-of-the-art methods.

Table 12 :
Average rankings of the algorithms based on the ROC AUC (P value computed by Friedman test � 0 and Friedman statistic � 49.175).

Table 13 :
Post hoc comparison table for α � 0.05 based on the ROC AUC.

Table 14 :
Te obtained results of Wilcoxon signed-rank test based on the ROC AUC.W + corresponds to our method and W − to AdaCC.

Table 15 :
Average rankings of the algorithms based on the PR AUC (P value computed by Friedman test � 0 and Friedman statistic � 25.557143).

Table 16 :
Post hoc comparison table for α � 0.05 based on the PR AUC.

Table 17 :
Te obtained results of Wilcoxon signed-rank test based on the PR AUC.W − corresponds to our method and W + to AdaCC.

Table 18 :
Average rankings of the algorithms based on G-mean (P value computed by Friedman test � 0 and Friedman statistic � 50.067857).ourmethod.In Table17, the Wilcoxon signed-rank test, conducted at a signifcance level of 0.05, indicates no signifcant diference between our method and AdaCC.

Table 19 :
Post hoc comparison table for α � 0.05 based on G-mean.

Table 20 :
Te obtained results of Wilcoxon signed-rank test based on G-mean.W + corresponds to our method and W − to AdaCC.