Recurrent Adaptive Classifier Ensemble for Handling Recurring Concept Drifts

,


Introduction
Advances in technology in recent years have witnessed an upsurge in the number of applications that generate large amounts of data streams at unprecedented volumes and speed.Examples of such real-world applications include network intrusion detection [1], sensor networks, spam filtering systems [2], and credit card fraud detection [3].
One of the biggest challenges faced by machine learning tasks in data stream learning is concept drift [4], where the data generating mechanism is constantly evolving and the statistical properties of the target concept change over time.Changes that happen in the underlying distribution of the data lead to a significant drop in predictive performance of the learning model.Wang et el.[3] described the term concept in machine learning as the quantity that a learning model is trying to predict.Concept drift often occurs in realworld applications, for example, in weather prediction where prediction models may change due to changes in seasons and consumer preferences may change over time due to seasons, fashion, and economy.Changes that occur in the underlying distribution of the data often lead to a drastic drop in classification performance of the learning model.
An efficient and effective online learning model must have the ability to recognize and respond to such changes accordingly and accurately.In streaming data, different types of concept drifts can be identified.Concept drifts can be categorized based on their speed into sudden and gradual drifts [4].Sudden concept drift is characterized by severe changes between the underlying class distribution and the incoming instances in a relatively short amount of time.Gradual concept drift takes a relatively large amount of time for significant changes to be revealed in differences of underlying class distributions between the old instances and the incoming instances.Regardless of the type of drift currently occurring, an online learning model must be able to track the drift, recognize its type and adapt to changes accordingly.In many real-world applications, it is common that patterns or concepts recur over time.Context recurrence is a common situation concerning concept drift.Domains associated with context recurrence include weather prediction where learning models change according to seasons.Other domains include financial prediction and dynamic control.Recurring contexts may occur due to cyclic phenomena such as seasons of the year or may be associated with irregular phenomena such as inflation rates or market condition.is phenomenon of recurring concepts is one of the key challenges that online learning algorithms [5] need to deal with.In the event that concept drifts recur, previously learned models may be applied to handle recurring concepts.Existing algorithms consider recurring concepts as new concepts, thereby increasing computational overheads as more classification models are generated.If patterns or concepts recur, previously learned classification models should be reapplied; thus, the predictive performance of the learning model can be optimized.e application of previously learned models may impact both negatively and positively on learning the current concept.Preserving all previously learned classification models induces overheads in both storage and computation, for example, when repeatedly assessing the performance of previously learned classification models on new training data.For this reason, the number of preserved models should be subject to some constraints, instead of increasing indefinitely.A selection scheme is required to decide which previously learned classification models should be preserved.As learning algorithms work at handling different kinds of drift, they tend to better represent the last observed concepts and discard previously learned concepts.Two research questions need to be answered when designing an ensemble classifier to handle recurring concepts; that is, which previously learned classification models should be preserved for future use?And how to exploit the preserved classification models to facilitate adaptation to recurring concepts?
To address the above research questions, this paper first reviews the latest progress on machine learning algorithms for handling recurring concepts and then proposes the Recurrent Adaptive Classifier Ensemble (RACE), specifically designed to handle recurring concept drifts in dynamic environments.RACE employs J48 Decision Tree, Multilayer Perceptrons (MLPs), and Support Vector Machines (SVMs) as base learners in order to maximize diversity and create dynamic decision boundaries separating the training instances, a change detection algorithm, and a diversity based strategy for preserving previously learned models to handle recurring concepts.When a new data chunk arrives, classification models of high diversity are adapted to the new training data.e rest of this paper is organized as follows.Section 2 presents a review of related work.Section 3 introduces the Recurrent Adaptive Classifier Ensemble (RACE).Section 4 presents the empirical analysis of the comparison between RACE and other state-of-the-art algorithms designed to handle recurring concepts using selected datasets considering the accuracies achieved and how the algorithms handle recurring concepts.

Related Work
Scenarios associated with recurring concepts are not uncommon, and a number of contemporary approaches have been proposed to address recurring concepts with minimum overheads.Many machine learning techniques have emerged in the literature as candidate solutions, and ensemble classifiers have demonstrated the ability to handle different types of drifting concepts in nonstationary environments.Hassan [6] proposed a concept drift adaptation technique in distributed environment for real-world data streams.
e algorithm uses drift detection method; if concept drift is detected, it retrains the model, and knowledge of previously learned concepts is lost.e approach does not automatically identify the type of drift.Sarnovsky [7] proposed the heterogeneous adaptive ensemble model for data stream classification which utilizes the dynamic class weighting scheme and a mechanism to maintain the diversity of the ensemble members.e algorithm implicitly handles recurring concepts, and classifiers with lower weights are discarded, making it difficult to handle recurring concepts.Liu [8] proposed an instance based ensemble learning algorithm called the diverse instance weighting ensemble (DiwE).e algorithm weights classifiers according to their performance, and poorly performing classifiers are discarded.Heusinger [9] proposed a combination of the modified versions of Robust Soft Learning Vector Quantization (RSLVQ) and Generalized Learning Vector Quantization (GLVQ) to learn streaming data and adapt to all types of concept drift.e integration of Adadelta and Adamax into RSLVQ and GLVQ optimized the prediction performance over their vanilla versions.e combined algorithm does not detect drifts and does not handle concept drift explicitly.Zheng [10] proposed a semisupervised classification algorithm on data streams with recurring concept drift and concept evolution in data streams with partially labeled data.e framework uses the Jensen-Shannon divergence based change detection technique on classifier confidence score instead of classification error rate to detect recurring concept drift.e algorithm uses too many parameters that are difficult to tune.Namitha [11] proposed a novel algorithm to identify recurring concepts in data stream clustering.If concept drift is detected, the algorithm retrieves the most matching model from the repository.e algorithm has no strategy to prevent the repository from growing or increasing indefinitely.Wing [12] proposed a bagging ensemble that adapts to concept drift by using a dynamic cost-sensitive weighting scheme for component classifiers according to their classification performances and stochastic sensitivities.e 2 Applied Computational Intelligence and Soft Computing algorithm discards classifiers whose weight is below a predefined threshold, making it unable to adapt to recurring concepts.Zang [13] presented the drift detection based incremental ensemble (DIE) that combines the operations of concept drift detection and component update mechanism to react to different types of concept drift.DIE assigns weights to classifiers and discards classifiers whose weight is below a predefined threshold, making it difficult to react to recurring concepts.Baidari [14] proposed the Accuracy Weighted Diversity based Online Boosting (AWDOB) which is based on an Adaptable Diversity based Online Boosting (ADOB).AWDOB uses an accuracy weighting scheme that exploits the accuracy of the current expert and the number of correctly classified and incorrectly classified instances of all experts to assign the current expert weight to the current instance in the data stream.Experts with lower weights are discarded from the ensemble.e process of calculating and assigning weights takes time and slows the learning process.Gu [15] presented the a novel self-organizing fuzzy inference ensemble framework (SOFEnsemble) which is capable of self-learning, processing streaming data on a chunk by chunk basis, and continuously self-updating the decision boundaries by identifying the more representative samples.SOFEnsemble has a high computational efficiency, and the use of fuzzy inference slows down the learning process.Zeng [16] proposed a chunk based incremental ensemble algorithm called Dynamic Updated Ensemble (DUC) for learning imbalanced data streams with concept drift.DUE periodically updates previous components to make the ensemble react to different kinds of concept drift, and the final decision of testing events is based on the weighting voting value of a certain number of best performing classifiers.DUE discards classifiers whose weight is below a predefined threshold making it unable to accurately react to recurring concepts.Liu et al. [17] proposed a comprehensive online active learning framework (CALMID) that includes an ensemble classifier, a drift detector, a label sliding window, sample sliding windows, and an initialization training sample sequence to learn concept drift.e algorithm has a sample weight formula that assigns weights to classifiers.CALMID was found to be effective and efficient when compared to other state-of-the-art algorithms.
Most of the proposed ensemble approaches in the literature handle recurring concepts by relearning them as if the concepts are new and not recurring.Existing ensemble classifiers for recurring concepts share a common weakness; that is, when a new data chunk arrives, all the ensembles utilize all previously learned concepts without adapting them to new training data.Neither of the proposed approaches explores the exploitation of highly diverse models previously learned to handle recurring concepts by firstly adapting them to the new training data.erefore, in this paper, a novel and evolving ensemble learning approach called Recurrent Adaptive Classifier Ensemble (RACE) is presented.RACE stores highly diverse models and does not directly combine the prediction outputs of the models.Instead, each diverse model in the archive is first adapted to the new training data, and the model which further increases the diversity of the ensemble is removed from the archive.
In the next section, we present our proposed approach, the Recurrent Adaptive Classifier Ensemble (RACE), that explicitly exploits diversity to handle recurring concepts.

Recurrent Adaptive Classifier Ensemble (RACE)
e Recurrent Adaptive Classifier Ensemble (RACE) employs Support Vector Machines (SVMs) as the base learner.
e algorithm first builds a support vector, denoted as f 1 , with first streaming data chunk and stores the first support vector in an archive.When a new data chunk arrives, the drift detection algorithm checks if the data chunk is from the same distribution from the first created support vector.If the data chunk is from a different underlying distribution, the preserved support vector is adapted to the new data chunk and a new support vector is built from scratch from the new data chunk.e adapted support vector and new support vector are combined to constitute an ensemble to perform classification at time t.RACE does not directly combine the prediction outputs of the stored models in the library.Each preserved previously learned model is first adapted to fit the current data, and then the adapted models and the newly constructed model from the most recent data chunk are combined.Previously learned models are preserved according to a diversity based criterion as opposed to an accuracy based criterion, as the base classifiers have to perform diversely for the ensemble of classifiers to improve its prediction performance.RACE uses Yule's Q Statistic [18] as a diversity measure to minimize the ensemble error.
e diversity measure is recommended due to its simplicity and ease of interpretation [19].RACE stores highly diverse previously learned models.e previously learned diverse models are then adapted to the current concept via knowledge transfer.A diversity measure is used to measure model diversity to keep only previously learned diverse models [20].e transfer learning is appropriate as it optimizes the learning process in terms of accuracy and learning efficiency.To learn new concepts, previously learned diverse models are employed as initial candidates of the ensemble for learning new concepts.RACE adapts each previously learned model in the archive to the new training data.e adapted models and the model learned from new training data are combined to predict incoming instances.
e newly built model is stored in the archive if it is not full.e model whose removal will lead to the largest diversity among the remaining models is removed from the archive.Algorithm 1 provides a description of the ensemble framework.
Algorithm 2 provides a description of the RACE algorithm.e detailed steps of the Recurrent Adaptive Classifier Ensemble (RACE) algorithm are presented in Algorithm 2 with the assumption that data arrives sequentially.
e Recurrent Adaptive Classifier Ensemble (RACE) uses the Early Drift Detection Method [21] to detect drift.If concept drift is detected, the preserved models are adapted to fit the current data.EDDM is an online learning system since it does Applied Computational Intelligence and Soft Computing not store the training instances for posterior use.e detailed steps of the Recurrent Adaptive Classifier Ensemble (RACE) algorithm are presented in Algorithm 2, with the assumption that t data chunks D 1 , . . ., D t arrive sequentially.

Model Preservation.
Preserving previously learned models induces overheads in terms of both storage and computation.For example, iteratively assessing the predictive performance of previously learned models on new data is computationally prohibitive.To prevent the ensemble from growing indefinitely, the size of the ensemble is dynamic.Previously learned models are preserved in an archive of size n.When a data chunk arrives at step t, the preserved models in the archive are adapted to fit the current data.e drift detection helps to detect if the new data chunk is drawn from a different data distribution.e newly generated model from the current data chunk, f t , will be directly stored in the archive if the size of the archive is less than n.To optimize diversity, the model whose removal will increase diversity among the remaining models in the archive will be discarded from the archive.RACE combines the prediction outputs of previously learned diverse models that are representative of the current concept with the prediction output of a new model built with the first data chunk to form final decisions on testing training instances of the current concept.

Archive Size and Transfer Operation.
e goal is to minimize computational overheads by creating a dynamic pool size of previously learned models from where the ensemble to learn recurring concepts and sudden and gradual concepts is generated.RACE performs a transfer of every previously learned model with the new streaming data chunks.To improve the time efficiency of RACE, we implement the transfer operation in a parallel processing manner.By parallelizing the transfer operations, the speedup ratio is optimized and the runtime level is satisfactory for nonstationary environments.In line with transfer operation of knowledge is the archive size that is dynamic to cater for other different types of concepts.Parallelization of transfer operation is best optimized with a reasonable dynamic archive size which does not grow indefinitely, since models that cause diversity among models to decrease are removed from the archive.e implementation of a drift detection mechanism facilitates detection of recurring concepts.To reduce overheads, a dynamic pool size from which models are drawn serves as a better starting point.e goal is to capitalize on the accuracy as the ensemble size fluctuates.To validate the behavior of the RACE algorithm, we conduct two experiments.e first experiment evaluates the validity of RACE using knowledge transfer.In the second experiment, the behavior of RACE is evaluated using Hidden Markov Models (HMM).

Experimental Configuration
e empirical experiments to assess the performance of RACE were conducted on the Massive Online Analysis (MOA) framework, a software environment for implementing machine learning algorithms and running experiments for online learning.MOA is an open source framework for data streaming mining in evolving environments.
e generalization performance of RACE is compared to other state-of-the-art algorithms designed to handle recurring concepts such as the comprehensive online active learning framework (CALMID) [17], Dynamic Updated Ensemble (DUE) [16], Self-Organizing Fuzzy Ensemble Inference System (SOFEnsemble) [15], and Accuracy Weighted Diversity based Online Boosting (AWDOB) [14].

Datasets Used in the Experiments.
We evaluate the performances of the algorithms with data created by five synthetic dataset generators.All data stream generators are available in MOA. e synthetic datasets contain three types of concept drift, namely, gradual, sudden, and recurring concept drift.
e Hyperplane dataset [22] is represented by the set of points x that satisfy  d i�1 w i x i � w 0 , where x i is the ith coordinate of x.Two classes are distinguished in the following way: instances for which  d I�1 w i x i > w 0 are labeled positive, and instances for which  d i�1 w i x i < w 0 are labeled negative.Drifts are simulated by changing each weight attribute w i � w i + dα, where ⍺ is the probability that the direction of change is reversed and d is the change applied to every instance.is generator was adopted to create a dataset that contains 1,000,000 instances.e LED dataset [23] is used to predict the digit displayed on a seven-segment LED display.
e particular configuration of the generator used for the experiment produces 24 binary attributes, 17 of which are irrelevant.Concept drift is simulated by interchanging relevant attributes.A stream of 1,000,000 instances was generated.
e Random Tree dataset [24] is generated by the Random Tree generator.
e dataset contains 1,000,000 instances and 10 attributes.e dataset has four recurring concepts which are evenly distributed among the instances.
e SEA dataset [25] consists of three attributes, where only two are recognized as relevant attributes.All three attributes have values between 0 and 10. e points of the dataset are divided into four blocks with different concepts.In each block, the classification is done using where f 1 and f 2 represent the first two attributes and θ is a threshold value.e dataset contains 1,000,000 instances.e last artificial dataset adopted for this study is the STAGGER Boolean Concepts.e dataset presents enough variety of drifts to perform principled studies.It allows a proper analysis considering several types of drift with different amounts of severity and speed.STAGGER Boolean Concepts dataset generates the data with categorical features using a set of rules to determine the class label.e dataset contains three nominal attributes, namely, size � {small, medium, large}, color � {red, green}, and shape � {circular, noncircular}.Concept drift is simulated by changing the items in the rules.Before the first drift, instances are labeled positive if (color � red) and (size � small).Before the occurrence of the second drift, instances are classified as Applied Computational Intelligence and Soft Computing positive if (color � green) and (shape � circular), and after the second drift, instances are classified as positive only if (size � medium) and (size � large).

Evaluation of RACE.
is section investigates the proposed algorithm and compares its predictive accuracy and drift handling capabilities with existing ensemble based approaches: CALMID, DUE, SOFEnsemble, and AWDOB.We also investigate in the second experiment the effect of Hidden Markov Model on the predictive performance and its recurrent drift handling capabilities.
e predictive performance and the recurrent drift handling capabilities of RACE were tested on both artificial and real-world datasets, and corresponding ranks of all algorithms are determined in such a way that higher averages represent lower ranks.Significance tests and post hoc comparisons on ranks are performed to determine significance levels and critical differences.e prediction accuracies and average ranks of RACE, CALMID, DUE, SOFEnsemble, and AWDOB are shown in Table 2.
It is evident from the table that shows accuracy measures that RACE performed significantly better than CALMID, DUE, SOFEnsemble, and AWDOB.e Nemenyi test [31] was applied for pairwise comparison.e critical difference is 1.432.From the figure that provides the average ranks of algorithms compared, it is evident that RACE performed significantly better than the other four algorithms.Figure 1 shows the critical difference plots from post hoc Nemenyi tests of average rankings for experiments on all datasets.
To further evaluate the drift handling capabilities of RACE against the other four representative and current algorithms designed to handle concept drift, we introduce the two Kappa evaluation measures, Kappa Temporal and Kappa M, on all the five algorithms designed to handle recurring concepts.e Kappa evaluation measure is widely used in data stream learning and can handle both multiclass and imbalanced class problems.e larger the Kappa value, the more generalized the classifier, and a negative Kappa value is an indication of low predictive accuracy.Kappa Temporal values are shown in Table 3.
Table 4 shows the Kappa M values of all the datasets used.Kappa values for both Temporal and M are positive as the attributes in the datasets are averagely balanced.
e statistical tests applied on Kappa Temporal on artificial and real-world data streams showed significance differences at any specified level of significance.Statistical tests for Kappa M on both artificial and real-world datasets also showed significance differences at a specified level of significance, and for this experiment, we chose 0.05.e Nemenyi test [31] was applied for Kappa Temporal and Kappa M for pairwise comparison.e critical difference (CD) is 1.421.RACE performed significantly better than CALMID, DUE, SOFEnsemble, and AWDOB.

Resources Comparison.
To analyze the benefits in terms of resources usage, we compare CPU time and memory consumption of RACE, CALMID, DUE, SOFEnsemble, and AWDOB using real-world data streams since they have large numbers of attributes.e ensemble sizes of all the algorithms are dynamic; that is, they vary in size given the task at hand.Lower values generated in the two scenarios are considered to be the best for each algorithm.Corresponding ranks are determined such that higher averages are representing lower ranks.
Table 5 shows the memory consumption (MB) of each algorithm on each dataset.
According to Table 5, in most cases, RACE achieved minimal memory consumption while AWDOB consumed the most memory.
e insertion and deletion of models make memory usage lower for RACE when compared to other algorithms.
Table 6 shows the CPU processing time(s) for each algorithm on each real-world dataset.
As shown in Table 4, through the comparative analysis, we found that RACE consumed the least processing time, followed by CALMID, and SOFEnsemble has the longest CPU processing time.

Accuracy over Time.
Graphical plots are generated for each dataset to describe the performance curves of all the tested algorithms at each time step.e x-axis represents the number of processed observations, and the average accuracy is presented on the y-axis.e graphical plots allow adaptation abilities of all comparative algorithms under different streaming conditions to be analyzed.As shown in the accuracy over time plots, RACE achieved the highest predictive accuracies on the Hyperplane 81.67%, Stagger 79.34%, Covertype 81.56%, and Sensor Data 80.34%.In total, the RACE average ranking in both artificial and real-world data streams is 1.4,CALMID is 3.2, DUE is 3.9, SOFEnsemble is 2.5, and AWDOB is 4.0.
Figure 2 shows the accuracy over time plots of the five algorithms on the Hyperplane dataset that exhibits gradual concept drift.e accuracy of all the algorithms shows the same trend.RACE performs the best, followed by DUE, and CALMID performs the worst.RACE is designed to adapt to all types of concept drift.
Figure 3 demonstrates the accuracy over time plots of the five algorithms on the Stagger dataset which exhibits sudden concept drift.As can be observed, RACE performs the best, followed by DUE, and CALMID is the third, while SOFEnsemble and AWDOB are the worst.
Figure 4 shows the accuracy over time plots of the five algorithms on the LED dataset which is devised to evaluate the ability to handle sudden concept drift.RACE performs the best, followed by AWDOB and then CALMID.SOFEnsemble and DUE perform poorly.
Figure 5 shows the prediction accuracy of the five algorithms on the SEA dataset which is devised to evaluate the ability to handle sudden and gradual drifts.e trend of all the five algorithms is basically the same.Among them, RACE performs the best, followed by DUE and AWDOB, and SOFEnsemble performs the worst.
Figure 6 shows the accuracy over time plots of the five algorithms on the Random Tree dataset which is devised to evaluate the ability to handle recurring concepts.AWDOB performs well in the first observed instances, but as the number of observed instances increases, RACE outperforms all the four algorithms.
Artificial data streams are typically designed for controlled environments.When handling real-world classification problems, several challenges emerge.e major issue is that of the identification and location of the concept drifts.Accordingly, RACE was evaluated on real-world data streams, namely, Airlines, Forest Covertype, KDD99 World Cup, Poker Hand, and Sensor Data.With the five real datasets and the five observations, significance tests were performed and the obtained results showed improvements.show the accuracy over time plots of the five algorithms on five real-world datasets.e overall average ranking of RACE is 1.4,CALMID 3.2, SOFEnsemble 3.9, DUE 2.5, and AWDOB 4.0.

Input: (D
Figure 7 shows the accuracy over time plots of the five algorithms on the Airlines dataset.DUE performs well in the first observed instances, but as more instances are observed, RACE performs the best.SOFEnsemble performs the worst.
Figure 8 shows the accuracy over time plot of the five algorithms on the KDD99 dataset.RACE performs the best, followed by DUE.SOFEnsemble performs the worst, and the trend is the same for CALMID and AWDOB.
Figure 9 demonstrates the accuracy over time plots of the five algorithms on the Covertype dataset.RACE performs the best, followed by DUE.AWDOB performs the worst.
Figure 10 demonstrates the accuracy of the five algorithms on the Poker Hand dataset.e prediction performance of all the algorithms fluctuates with time.As more instances are observed, RACE performs the best, followed by AWDOB.DUE and SOFEnsemble perform the worst.
Figure 11 shows the accuracy over time plots of the five algorithms on the Sensor Data to evaluate gradual concept drift.RACE performs the best, followed by DUE.SOFEnsemble is the third, and AWDOB and CALMID perform the worst.RACE manages recurrent change detection mechanism by reusing previously learned concepts and generalizes well in different situations especially in different concept drift environments.However, other existing ensemble methods do not store previously learned knowledge and lack detection mechanisms, and for that they adapt poorly to different types of drifts.Applied Computational Intelligence and Soft Computing For all the five real-world datasets, RACE subjects all classifiers to a diversity and accuracy evaluation after each iteration.If they are not representative of the current concept, they are discarded, and classifiers that are representative of the current concept and those with higher amounts of diversity are retained, which allows RACE to appropriately deal with recurring concepts.Poker Hand (84.31%) and DUE (81.36) on the KDD99 dataset are able to deal with concept drifts appropriately and this can only be attributed to the periodic inclusion of new base learners, while CALMID SOFEnsemble do not maintain dynamic pools due to their static ensemble size.

Hidden Markov Model-Based RACE
In our next experiment, we investigate the behavior of RACE when we replace the knowledge transfer process with Hidden Markov Model, a metalearner.Hidden Markov Models (HMM) are known to work extremely well in practice as prediction, recognition, and identification systems in a very efficient manner.Hidden Markov Models are based on the assumption that consecutive observations are independent and therefore the probability of a sequence of observations can be expressed as the probabilities of individual observations.e Hidden Markov Model is a metalearner that is able to predict when recurring concepts will occur.We can then anticipate that recurrent drifts choose also the most appropriate model for the incoming data chunk.e implementation of RACE using Hidden Markov Models allows the algorithm to better handle recurrent situations in classification problems in dynamic environments, thus enabling the evolving base learner to adapt to recurring concepts in a timely manner.
is is made possible by predicting when the drift will happen from training examples at a given time and also getting a similarity level between concepts from a fuzzy similarity function.e drift detection mechanism (DDM) is continuously monitoring the error rate generated by learning algorithm; a warning is generated by the DDM if the error rate exceeds a predefined threshold, and a new classifier is learned.A metamodel is trained from the information provided by the drift detection mechanism and the metamodel evolves as new concepts are detected.e fuzzy concept similarity approach determines whether the underlying concept is recurrent, and previously learned models are applied.
In this case, previously learned highly diverse models are no longer trained as they are stable models that adequately represent specific concepts.

Experimental Analysis.
To compare the performance of RACE that uses knowledge transfer and the RACE that uses Hidden Markov Models, we use the same synthetic datasets and real-world datasets used to compare the predictive performance of RACE with recent state-of-the-art algorithms designed to handle recurring concepts in dynamic environments.
Using the MOA framework, the performance of the analyzed algorithms is evaluated with respect to accuracy, time efficiency, and memory usage on both synthetic datasets and real-world datasets.Table 7 shows the prediction accuracy of RACE using Markov Models.
e performance of RACE is also evaluated with respect to CPU processing time in seconds.Table 8 shows the CPU processing time in seconds.
Concerning runtime, online ensembles like AWDOB require the most time for classification, followed by ARF and DP.RACE is the least time-consuming.is is partly because the combination of Hidden Markov Models with a drift detection mechanism offers quicker reactions to sudden and recurring concept drift compared to other methods.For this reason, RACE is in a better position to capture changes with Hidden Markov Models much more efficiently and adapt to different types of drifts accurately and timeously.
Memory consumption on the real-world datasets that have many attributes is shown in Table 9.
e memory consumption of SOFEnsemble, CALMID, and AWDOB is more than that of RACE and DUE. e three algorithms maintain a large pool of historical concepts which are checked for reuse.RACE and DUE require the least memory storage due to their pruning strategy.

Comparison of Accuracy Performance.
To compare the accuracy of the five algorithms over multiple datasets, we follow the methodology proposed by Demsar [32].We firstly use the nonparametric Friedman test to determine if there is a statistically significant difference between the rankings of the compared algorithms.We then perform the Nemenyi post hoc test with average rank diagrams.
e rankings are depicted on the axis such that the best ranking algorithms are at the rightmost part of the diagram.e algorithms that do not differ significantly are connected with a line.
e critical difference (CD) is indicated above the graph.
As can be observed, from the critical difference (CD) plots, RACE outperforms the other algorithms most of the time.
Figure 12 shows the critical difference plots from post hoc tests of rankings for experiments on the datasets used.
e nonparametric Friedman test was carried out to extend the analysis of comparing multiple classifiers over multiple datasets.e null hypothesis for the test was that there is no difference between the performances of all the tested algorithms.In the event of rejecting the null hypothesis, the Nemenyi test could have been employed to verify whether the performance of our algorithm, RACE, is statistically different from the rest of the algorithms used for comparative purposes.e critical difference (CD) from the average rank diagram shows that our algorithm is significantly better than the four recent representative algorithms on nonstationary time series data.weaknesses.e RACE algorithm can be computationally expensive as it requires large memory to store all the highly diverse classes and storage during concept transfer.Furthermore, as the ensemble increases in size, it slows down the convergence to recurring concepts as the concept transfer process will require more time, thus compromising its usability in nonstationary series data where a classification delay can prove costly.However, regardless of the weaknesses identified, this paper has uniquely opened new avenues of research in this area.e expectation is that many more approaches to handling recurring concepts in nonstationary time series data can be explored and developed, so that a comparison of prediction performance with the unique and novel RACE algorithm proposed in this research paper can be made.

Figure 1 :
Figure 1: Average rank diagram of the five algorithms.
1 , D 2 , ..., D t ) chunks of streaming data M: a set of diverse models previously learned Output : E t : the generalized ensemble model at time step t (1) For each data chunk D t do (2) Learn a new base model f t with D t (3) Select transferred models f t i by transferring the highly diverse stored models f i ∈ M (4) Build the generalized ensemble E t using the transferred models f t i and the newly learned model f t Input: (D 1, D 2, . . ..., D t ) the streaming data chunks E t archive of ensemble models at time step t Diversity measure : Q Statistic Drift Detection Method Detect Drift Output: F t : the generalized ensemble model at each time step t (1) For each incoming data chunk D t do (2) Train new model f t with data chunk D t (3) Test E t with f t

Table 1 :
Description of real-world datasets.

Table 3 :
Kappa temporal values for all the ten datasets.

Table 5 :
Memory consumption of each algorithm on each dataset.

Table 6 :
CPU processing time for each algorithm on real-world datasets.