An Incremental Interesting Maximal Frequent Itemset Mining Based on FP-Growth Algorithm

. Frequent itemset mining is the most important step of association rule mining. It plays a very important role in incremental data environments. The massive volume of data creates an imminent need to design incremental algorithms for the maximal frequent itemset mining in order to handle incremental data over time. In this study, we propose an incremental maximal frequent itemset mining algorithms that integrate subjective interestingness criterion during the process of mining. The proposed framework is designed to deal with incremental data, which usually come at diﬀerent times. It extends FP-Max algorithm, which is based on FP-Growth method by pushing interesting measures during maximal frequent itemset mining, and performs dynamic and early pruning to leave uninteresting frequent itemsets in order to avoid uninteresting rule generation. The framework was implemented and tested on public databases, and the results found are promising.


Introduction
Association rule mining (ARM) [1] has been widely used as a leading technique in data mining. It is usually utilized in analyzing marketing baskets. ARM commonly employs two main subtasks, namely mining of frequent itemsets to ensure a minimum support threshold and generation of association rules to satisfy a minimum confidence threshold. Most of the studies have addressed the efficiency criterion of frequent itemset mining as it normally entails more resource capacity and computing time [2].
Frequent itemsets (FIs) can be mined from transaction databases through one of the traditional algorithms that can be generally grouped into two methods [3]: Apriori-based method, which is used for generating and filtering candidate itemsets such as Apriori algorithm [4], and tree-based method that is normally used for building FP-tree and then mining FIs from the FP-tree such as FP-Growth [5], TRR [6], PrePost+ [7], FIN [8], dFIN [9], and negFIN [10] algorithms. Since Apriori-based methods depend on continuous scanning of the database to generate multiple candidate itemsets, they require high I/O. On the contrary, tree-based methods scan the database only twice, but they need higher memory for constructing multiple sub-trees [2,11].
In dynamic data, when adding an incremental database to the previous databases, some previous FIs become invalid and new FIs appear due to the changes in the support value of some FIs. Generally, the traditional FI mining algorithms are ineffective for incremental data as the entire database is re-mined afresh [2]. Since transaction databases are unceasingly and dramatically growing, incremental FI mining algorithms are needed as useful for making decisions and getting real-time information sought by users.
Many attempts have been made for processing incremental data without re-mining the entire updated databases such as FUP [27], FUFP-tree [28], FUFP-tree maintenance [29], FCFPIM [30], FPISC-tree [31], FIUFP-Growth [2], prelarge [32], pre-FUFP [33], and FPMSIM [11] algorithms. It is noteworthy that these algorithms were used for mining incremental FIs. IM_WMFI [17] and IMU2P-Miner [18] have been proposed to deal with incremental MFIs. e drawbacks of these algorithms are that they do not regard the factors of size and time of data entry and, therefore, require re-mining of the updated database. Since new data arrive over time, researchers have been motivated to propose techniques that update the entire model of previously discovered knowledge (PDK), instead of running the algorithms from a scratch, thus presenting a new incremental model such as [34][35][36], and our proposed framework [37].
For association rules, two measures have been used: objective and subjective measures. Objective measures are statistical values, such as support, confidence, all confidence, and left [38]. ese measures were used during mining, such as support in the first task and confidence in the second task of ARM. On the other hand, subjective measures, such as unexpectedness [39], actionability [40], and novelty [34,36,37], were used to capture the user's belief about the domain. However, these subjective measures were used at post-mining.
Our proposed approach is motivated by the increasing need for an efficient MFI algorithm that deals with larger data entry over time. As an extended form of FP-Growth and FP-Max algorithms for mining incremental interesting MFIs, our method uses novelty metrics (NM) as a subjective measure during the process of the mining stage. A major contribution of this proposed framework is handling the time-changing data and user-domain knowledge.
is is useful when many databases arrive at various times or from a distributed environment. Certainly, it is desirable to update the discovered frequent itemsets each time new data arrives. Moreover, the incremental nature of the proposed framework makes it valuable to mine interesting frequent items at the current time concerning the previously discovered frequent items more willingly than wholly mining all frequent items. So, dynamic pruning for these frequently discovered itemsets is performed in real time. e objective of dynamic pruning is to save time and reduce search complexity.
is study introduces an algorithm, based on a tree structure, for mining interesting MFIs. e major contributions of our work are as follows: (1) extending the FP-Max algorithm for incremental MFI mining; (2) integrating subjective interestingness criterion (novelty measure) during the process of mining for reducing the count of discovered interesting MFIs and subsequently reducing search complexity; and (3) introducing a structure that handles all the items (frequent or infrequent) with related information along with previous discovered IMFIs for use next time to speed up the construction and size of the tree.
is study is structured as follows. Section 2 reviews the related work. Section 3 introduces the design issues of our approach. Section 4 discusses the experimental settings and results. Section 5 concludes the study.

Related Work
e concept of association rule mining was introduced by Agrawal et al. in 1993 [1], and the Apriori algorithm was proposed a year later [4] for mining FIs and generating association rules. e algorithm was used to generate and filter candidate itemsets in a level-by-level manner. However, a disadvantage of this method is that it generates several candidate itemsets, which need multiple database scans, thus consuming much time and high I/O. FP-Growth algorithm [5] was presented, using compact data structure as FP-tree to compact all transactions of the database inside the tree. is algorithm scans a database twice only, firstly for finding support for each item and secondly for building the FP-tree. en, the algorithm recursively builds sub-trees to mine all FIs. A limitation of this algorithm, however, is its need to create multiple sub-trees to mine FIs, which, in turn, require considerable resources and processing time.
MFI mining concept was later introduced as in MAFIA [20], FP-Max [22], FP-Max * [23], PADS [25], SelPMiner [19], and CL-Max [26]. MAFIA is an MFI method, which uses a bitmap representation to check itemsets' support information without any database scan. A major disadvantage of this algorithm is that it is inefficient with respect to sparse databases [20]. FP-Max was then proposed to mine all MFIs, and it used FP-tree structure and MFI-tree structure [22]. FP-Max * is an FP-tree-based algorithm used for mining MFIs utilizing its own two-dimensional array structure, called FP array to improve mining performance by reducing the number of tree scans [23]. A drawback of these algorithms, however, is that in a dense database, the FP-trees become more compact and thus more memory usage and slower execution time. PADS is another FP-tree-based method that defines and uses a pattern-aware dynamic search order to pre-prune unnecessary operations [25]. It is praised for guaranteeing higher speed compared with the previous approaches. However, it is memory-consuming due to its storing conditional databases and time-consuming due to longer search space [25]. CL-Max is an algorithm, which uses k-means concept for MFI mining [26]. SelPMiner was introduced to utilize the optimizations of the search space pruning through itemset-count tree format [19].
Remarkably, all the abovementioned algorithms have to do with static data. Consequently, new methods have been introduced to work on incremental data without re-mining the entire updated databases such as FUP [27], FUFP-tree [28], FUFP-tree maintenance [29], pre-large [32], and pre-FUFP [33] algorithms.
Based on the concept of Apriori method, Cheung et al. introduced the FUP algorithm [27] to handle the new 2 Complexity database and update FIs efficiently. is algorithm reduces the scans of the database. Based on previously found frequent or infrequent itemsets of the previous databases, the algorithm partitions discovered itemsets from the incremental database into four cases: Case 1: frequent in previous and incremental databases, so frequent in the updated database; Case 2: frequent in the previous database but infrequent in the incremental database, so frequent or infrequent in the updated database; Case 3: infrequent in the previous database but frequent in the incremental database, so frequent or infrequent in the updated database. Only if infrequent, rescanning of previous databases is needed; and Case 4: infrequent in previous and incremental databases, so infrequent in the updated database. However, many researchers preferred to use FP-Growthbased algorithms for better management of frequent itemset search in a dynamic database such as FUFP-tree algorithm, which was introduced as a developed algorithm to effectively deal with new transactions and improve the efficiency of the updated FP-tree structure, by reducing the number of rescans of the previous databases after adding an incremental database [28]. is FUFP-tree was characterized by double action of node insertion and deletion from the tree.
e FUFP-tree algorithm [28] handles the itemsets based on the four principles of the FUP algorithm [27]: Case 1: frequent in previous and incremental databases, so support item is updated in header table and FUFP-tree; Case 2: frequent in the previous database but infrequent in incremental database, maybe frequent or infrequent. If frequent, support item is updated in header table and FUFP-tree, else the item from header table and all nodes from FUFP-tree are deleted; Case 3: infrequent in previous databases but frequent in incremental databases, maybe frequent or infrequent. If frequent, the item is placed at the end of the header table and its nodes to the leaf node of a path in the FUFP-tree are added, else nothing is done; and Case 4: infrequent in previous and incremental databases, so nothing is done. Like the FUP algorithm, the FUFP-tree maintenance algorithm improves the FUFP-tree structure after the addition of a new database [29]. An improvement was made to FUFP-tree structure so that when deleting transactions from the databases, the mining performance of incremental association rules was efficiently improved and the execution time was reduced. After updating tree, the algorithm continued to mine all FIs from the updated tree [29]. Pre-FUFP maintenance algorithm was proposed as a modified FUFP-tree algorithm.
is algorithm was based on the "pre-large" concept that identified upper and lower support thresholds [33]. Accordingly, the previous databases need not be rescanned if the incremental transactions count is less than the safety number f of new transactions. e f number can be obtained according to the following equation: where S u is the upper support threshold, S l is the lower support threshold, and d is the count of transactions in previous databases [33]. e problem with these algorithms is that they were based on a modified FP-tree structure, working on rescanning the whole database and updating and deleting items in the tree for mining incremental FIs from the updated tree [2]. Other algorithms used upper bound as an improved method for the traditional Phi correlation for mining item pairs from static databases based on the Apriori method such as the one proposed by Li et al. [41].
Other algorithms used list structure on a dynamic database to mine erasable patterns and high-utility patterns such as IWEL [42], LINE [43], IMSEM [44], VME [45], PRE-HAUIMI [46], and HUI-list-INS [47] algorithms. Some other algorithms were based on the Apriori or tree method [48]. e point of convergence between these studies and this study is that they are incremental, working on frequent pattern mining, using some additional measures of interestingness. However, these studies were concerned with erasable and high-utility patterns using list structure with weighted or pre-large concepts. e main purpose of this study was to develop an FP-Max tree-based algorithm by utilizing subjective and objective measures for mining MFIs from a dynamic database. List structure can be more effective for mining, especially high-utility and erasable patterns, with regard to quantities or profit value conditions and other criteria, which are beyond our research.
ere was a growing need for more comprehensive methods that can operate on both maximal and incremental frequent itemsets, and several incremental MFI algorithms were proposed [17,18,49]. IM_WMFI algorithm [17] used a tree structure to mine WMFIs from incremental databases using weighted criteria of representative patterns and item importance. It scans the entire incremental database only once and extracts fewer number of MFIs. IMU2P-Miner algorithm [18] was introduced for mining MFIs from univariate uncertain dada. is algorithm used a tree structure local array to keep the updates without the need for tree reconstruction as it allows only one path to be updated or added.
Interestingness is another feature that has received little attention in the field of incremental MFI mining. Briefly, two interestingness measures are considered: objective and subjective measures. e most considered objective measures that reflect the statistical strength of a pattern are support, confidence, all confidence, and left [38]. ey are significantly used to discover only strong rules and, hence, filter out the number of uninteresting patterns. Since these measures failed to reflect user's knowledge, subjective measures were called to ensure the reduction in the number of the discovered interesting rules only [36]. Novelty measure (NM), as a subjective measure, was presented in [34][35][36] according to the following equation: where S 1 and S 2 are two conjunct sets with cardinalities |S 1 | and |S 2 |, respectively. K is the pairs of compatible conjuncts between S 1 and S 2 . δ(C i 1 .C i 2 ) is the ith pair of compatible conjuncts. Interested readers may refer to [34][35][36][37].
To the best of our knowledge, SelPMiner and CL-Max are the recent state-of-the art algorithms used for mining Complexity MFIs from static databases. IM_WMFI and IMU2P-Miner algorithms are the only recent algorithms that use the tree method for mining MFIs from incremental database. ese algorithms used additional objective measures in mining process.
In our proposed approach, an integrated algorithm was used. e basic FP-Growth algorithm was adopted to mine FIs from the tree as FP-tree without adjusting the tree structure, and FP-Max algorithm was used for mining all MFIs based on MFI-tree structure. erefore, there is no need to rescan the previous databases and reconstruct the FP-tree. Added to the incremental nature of our method, NM is used to ensure the reduction in the number of MFIs and, consequently, the number of the discovered interesting rules. Table 1 shows the characteristics of the major reviewed algorithms. Interesting Rules

The Proposed Approach
e proposed approach is designed to discover incremental IMFIs, named as IIMFIs, from dynamic database, as shown in Figure 1. e proposed approach contains several components to which three functions are added: (1) keeping all items, frequent or infrequent, with their related information; (2) constructing a tree from the discovered IMFIs and current database; and (3) dynamic pruning of uninteresting MFIs. Symbols and notation used in this study are shown in Table 2. e approach acts incrementally rstly by adding any new item to the item list and updating the support and threshold support of each item and, secondly, by adding IMFIs to the list of IMFIs in PD_IMFIs. PD_IMFI structure will be explained in Section 3.1. e framework architecture contains ve phases: the rst phase scans all transactions in D i + 1 to add any new item as 1-Itemset with its cur_Sup or update the incr_Sup and incr_Minsup for each item in P i + 1 . e second phase builds up the tree from previous IMFIs and transactions in D i+1 , utilizing FP-Growth algorithm similar to FP-tree structure. As for IMFIs (with its corresponding support), only the associated item with incr_Sup ≥ incr_Minsup or items with cur_Sup ≥ cur_Minsup are added. For each transaction in D i+1 , only items in transaction are added to the tree where the item incr_Sup ≥ incr_Minsup, or cur_Sup ≥ cur_Minsup. In the third phase, FP-Max algorithm is used to extract MFIs from the tree using MFI-tree structure. In the fourth phase, dynamic pruning is performed using NM to compare each new MFI with all IMFIs of IMFI list in P i+1 . A new MFIs can be added to the list of IMFIs only if it is interesting; otherwise, it is discarded. In the fth phase, association rules are generated from IMFIs. e output of this framework is incremental IMFIs and interesting rules. e detailed description of the proposed algorithm will be given in Section 3.5.

PD-IMFI Structure.
e objective of PD_IMFIs is to speed up the construction of the tree by adding the previous IMFIs to tree nodes with a counter equal to the support value  Complexity of IMFIs instead of rescanning previous transactions, thus reducing the size of the constructed tree, through keeping all the items (frequent or infrequent) with related information such as updated support, incremental threshold support, and the list of IMFI, for use next time, and so on. As shown in Figure 2, PD_IMFI structure consists of two main parts: 1-Items list and IMFI list. 1-Items list part contains four fields: item_Name, cur_Sup, incr_Sup, and incr_Minsup. item_Name is a key and identifier for each item in PD_IMFIs. cur_Sup is the count of transactions, which contains the item in D i+1 . incr_Sup is the sum support of each item in D U . incr_Minsup is the sum minimum threshold support of each item in D U . It is worth noting that cur_Minsup is a temporal condition in D i+1 used to update incr_Minsup. cur_Minsup is calculated according to the following equation [50]: cur Min sup � min Sup * n. (3) As an example, let the count of transactions in D i+1 = 10, and Min_Sup = 0.5, so cur_Minsup = 0.5 * 10 = 5.
Each item in 1-Items list part has an IMFI list, which may be null or have one/more IMFIs. IMFI list part contains two fields: IMFI, referring to an array of associated itemsets as IMFIs, and support, indicating the frequency value of IMFIs in D U .

Incremental MFI Mining.
e incremental nature of the proposed approach self-adjusts the minimum support (incr_Minsup) due to the change in the support value (incr_Sup) of each item in 1-Items list as a result of the Di + 1 scan in the first phase, algorithms 1-2. e construction of the tree is based on previously discovered IMFIs in Pi (IMFI list) and transactions in Di + 1, tree starting is null, and fetching is only of IMIFs in Pi that associated item in Pi + 1, if cur_Sup ≥ cur_Minsup, or incr_Sup ≥ incr_Minsup. In the second scan of Di + 1, for each transaction, any item is removed, if item in 1-Items list of Pi + 1 has the value of cur_Sup < cur_Minsup and incr_Sup < incr_Minsup, and items are re-sorted in descending order based on the value of incr_Sup of an item in 1-Items list of Pi + 1. Support in the header table and the counter node can be updated only if it is less than incr_Sup of each item in the 1-Items list part of Pi + 1, as in the second phase, Algorithm 3. As a result, the counter values of nodes are updated. In the third phase, incremental MFIs with updated support are extracted from this tree. Consequently, the Conf value may change as it is related to support.

Incremental Dynamic Pruning of MFIs.
One of the main advantages of our approach is its ability to handle dynamic pruning based on NM. e goal is to reduce the count of the MFIs. e framework computes NM according to equation (2). In fact, we do not need to calculate δ(C i 1 , C i 2 ) in case of association rules, because it is always equal to zero, and it is used in the case of classification. So, equation (2) can be modified as follows: NM value of new MFIs determines whether it is novel or not. e NM value of these new MFIs is calculated against all the IMFIs in the IMFI list part of PD_IMFIs as follows: As a result of using equation (5) that represents the relationship between S 1 and S 2 , four cases are observed. Case 1: S 1 is equal to S 2 , S 1 � S 2 and K � |S 1 | � |S 2 |. Case 2: S 2 is a subset of S 1 , S 2 ∈ S 1 , K > 0, and K � |S 2 |. Case 3: S 1 is a subset of S 2 , S 1 ∈ S 2 , K > 0, and K � |S 1 |. Case 4: S 1 is not equal to S 2 , S 1 ≠ S 2 , K ≥ 0, K < |S 1 |, and K < |S 2 |. Cases 1 and 2 are not used in our approach since FP-Max algorithm cancels out any recurrence of FIs in the MFI-tree. Case 3 is not applicable to our approach since the tree-building process removes any IMFIs from IMFI list part when any cur_-Sup ≥ cur_Minsup, or incr_Sup ≥ incr_Minsup of items associated with IMFIs in Pi + 1. Only Case 4 is utilized in our approach to calculate the value of NM depending on the value of K, as in the fourth phase, Algorithm 4. After the computation of the NM of each IMFI in Pi + 1, the dynamic pruning is performed by eliminating those new MFIs with NM less than minNovlty as shown in Algorithm 1. For example, let minNovlty � 0.5, a new MFIs � {a,s}, and the three IMFIs in IMFI list, i.e., {{r,m,s}, {r,b,m}, {m,t}}. e NM between a new MFIs is S 2 � {a, s} and |S 2 | � 2 and each IMFI can be calculated and illustrated as in Table 3. Note here that minNM � 0.6. So, minNM > minNovlty, and then, the new MFIs {a, s} are interesting; subsequently, it is added to IMFI list. Suppose later comes a new MFIs � {u,s,m}. e NM between the new MFIs is S 2 {u,s,m} and |S 2 | � 3 and each IMFI in IMFI list can be calculated as shown in Table 4. Note here that minNM � 0.2. So, minNM < minNovlty, and then, the new MFIs {u,s,m} are uninteresting; subsequently, it is pruned.

Generation of Interesting
Rules. NM is used in this approach as a determining measure during the pruning process to discover MFIs that are only interesting. As such, NM becomes a crucial constraint within the algorithm to reduce the number of MFIs and subsequently the count of discovered rules. erefore, only those rules interesting to the user can be generated. Interestingly, the incremental nature of the proposed approach makes it significantly flexible to cope with time-changing data and user-changing beliefs. is enforces its computing functionality with two or more databases arriving at different courses of time with different volumes and from different locations. Hence, incremental updating of the discovered knowledge is a striking feature of the proposed algorithm. However, a keyword for generating the association rules is to identify the measure of Conf.
is can be calculated according to the following equation: 6 Complexity where support (A ⟶ B) is the number of transactions containing itemsets A and B together, and support (A) is the count of transactions composing itemset A [50].

Proposed Framework Algorithms.
As stated earlier, five phases have been proposed by our approach framework representing algorithms 1-4. Algorithm 1 (IIMFI algorithm) includes these phases, which call other algorithms.
Phase 1. First scan of Di + 1. Lines 1-4 in Algorithm 1 explain Phase 1. is phase consists of two tasks. First Task: a new PD_IMFIs are created as Pi + 1. is is the integration between the previous PD_IMFI Pi and Di + 1. Algorithm 2 (created PD_IMFI algorithm) describes these tasks.
Second Task: the integer value of cur_Minsup is calculated for Di + 1 as to equation (3), and incr_Minsup is updated by increasing its value with cur_Minsup for each item in Pi + 1.
Phase 2. Create a new tree as FP-tree structure. Lines 5-6 in Algorithm 1 explain Phase 2, which calls Algorithm 3 (created tree algorithm), describing Phase 2.
is phase consists of three tasks.
First Task: the tree is built by adding the previously discovered IMFIs (along with their corresponding support) of Pi to the tree.
Second Task: the construction of the tree is completed by reading the records from the Di + 1.
ird Task: the support value for each item in the header table and the counter for the nodes in the tree are updated so that it becomes equal to incr_Sup for that item in Pi + 1. Phase 4. Perform dynamic pruning of MFIs using NM constraint. In this phase, one task is to calculate the NM between MFIs and all IMFIs in Pi + 1 and select the minNM value from NM values. Algorithm 4 (interesting measure algorithm) explains the first task. If the min-NM ≥ minNovlty, then the MFIs are interesting (IMFIs), so Phase 5 is proceeded, else the MFIs are discarded; thus Phase 3 is processed until the first item in Pi + 1.

Phase 5.
is phase runs two tasks. First Task: the list of IMFIs in Pi + 1 is updated by adding new MFIs to the list, as represented in line 8.1.4, Algorithm 1.
Second Task: rules from MFIs are generated, which have Conf ≥ minConf, by calling association rule generation algorithm, Algorithm 1, line 8.1.5.

Time Complexity.
Regarding the time complexity of the developed algorithm (IIMFIs), the complexity varied visa-vis the algorithmic conditioning of the mining process at three divided times and the total time spent on the incremental database. e first algorithmic step was to call Algorithm 2 to create PD_IMFIs, and the time complexity is O(n × i), where n is the total number of transactions in the databases and i is the number of items in n.
e time complexity here is almost similar to all the other algorithms when performed on static databases. However, in the case of incremental databases, the time is less complex in Algorithm 2 of our proposed method compared with other algorithms due to the algorithmic condition that if any item has been already discovered, it needs just the update of P data and not the creation of a new item. As Algorithm 3 was called to create the updated tree, two recursive cycles were involved. e first cycle was to build the tree from the previously discovered MFIs associated only with the items having incremental support cur_Sup value ≥ cur_Minsup value, or incr_Sup ≥ incr_Minsup, and the time complexity here is o(n). e second recursive cycle was used to conduct the second scan of the databases to complete the construction of the tree, and the time complexity for this step is the same as for Algorithm 2, O(n × i). It is to be noted that the tree items) in these tables shows the representation of items in descending order according to incr_Sup value for each item in 1-Items list as used in the construction of the tree. Let Min_Sup 0.5, minNovlty 0.5, and minConf 0.7. At T1 Phase 1, P 1 is created, all items in D 1 to1-Items list with cur_Sup and incr_Sup support of item in D 1 are added, and incr_Minsup is set; here, cur_Minsup, and the value of cur_Minsup 0.5 * 10 5. Figure 3(a) shows P1 after Phase 1. Phase 2: the tree is created as shown in Figure 3(     rules from it are generated where each rule has Conf ≥0.7. en, Phase 3 is proceeded, and item{E} is taken; this phase is not the output MFIs for item{E}, so Phase 4 is not processed; also, item {B} has no MFIs. Figure 3(c) shows P1 after phases 3-4 at T1. Table 8 shows generated rules that have Conf ≥0.7 after phase 5 at T1. (Note) At T1, 9 rules are interesting (1-9) (green color).

Experiment 1 (Runtime Dynamic State).
In this experiment, the computation of runtime of IIMFIs is performed against the well-known incremental IM_WMFI and IMU2P-Miner algorithms as shown in Table 14. Note that running time here means the execution time (Sec.), which is the period between the input of dataset and the finish mining for each time: time 1 (T1), time 2 (T2), and time 3 (T3). Total runtime refers to the sum of runtime (Sec.) at T1, T2, and T3. is experiment was used in the first phase to discover MFIs without stepping to the rule generation phase. e experiment calculates the runtime at T1, T2, and T3 on three classified times of given datasets as shown in Table 13 and then compares the total executed runtime of IIMFIs, IM_WMFI, and IMU2P-Miner algorithms as shown in Table 14. (Note) In IIMFI algorithm, useNM � false; subsequently, Algorithm 4 used to calculate NM is not called.
Figures 6(a)-6(e) reflect the results of the total runtime (Sec.) of the three algorithms on the five datasets with different Min_Sup values. As illustrated in Figure 6(a) that represents the results of the total runtime on T10I4D100K, IIMFI algorithm records the least runtime, especially at Min_Sup � 0.2%, followed by IMU2P-Miner algorithm that takes a reasonable runtime and appears closer to IIMFI algorithm, particularly at Min_Sup � 1.0%, whereas IM_WMFI shows a slowing performance. However, the runtime reduction gap between IIMFIs and IMU2P-Miner and IM_WMFI is not consistent as it declines at higher Min_Sup values. Similarly, on T25I10D10K, as Figure 6(c) shows, IIMFIs outperform the other algorithms at all Min_Sup values. e other algorithms show an oscillated runtime where IM_WMFI almost outpaces IMU2P-Miner at all Min_Sup except at 1.2%, where they almost exploit the same runtime, and at 1.4% where IMU2P-Miner shows better runtime performance. Figure 6(b) reveals that on Mushroom dataset, almost all the three algorithms take the same total runtime. A slight difference occurs at Min_-Sup � 5% where IIMFIs appears a little bit faster. At Min_Sup � 25%, however, the total runtime is almost the same for the three algorithms. When operated on Accidents as in Figure 6(d), IIMFIs perform generally well at lower Min_Sup � 40% to 50%, while its performance reverts at the higher Min_Sup � 55% to 60%. On the contrary, IMU2P-Miner shows better performance at higher Min_Sup values, while it slows down at lower Min_Sup values; IM_WMFI stays behind the two, especially at lower Min_Sup values. Figure 6(e) shows the results of the total runtime on Kosarak where IIMFIs score a greater reduction runtime rate at Min_Sup � 0.2% and 0.3%, the same runtime of IMU2P-Miner at 0.4% and 0.5%, and slower than IMU2P-Miner at 0.6. IM_WMFI is generally the slowest at all Min_Sup. e performance of the three algorithms, IIMFIs, IMU2P-Miner, and IM_WMFI, varies from one dataset to another depending on the characteristics of datasets (Figures 6(a)-6(e)). Generally, IIMFI algorithm is faster than IM_WMFI and IMU2P-Miner on all datasets. is may be due to the nature of its incremental tree structure that reduces counting loop and time through maintaining the discovered MIFs of the first time and updating incr_Sup only without rescanning the whole data. Based on the experimental results, IIMFIs generally achieve a higher runtime efficiency rate against other algorithms, particularly on T10I4D100K, T25I10D10K, and Accidents. However, on Mushroom and Kosarak datasets the IIMFI runtime oscillates. We can notice the efficiency of IIMFIs in dealing with high-weight and dense dynamic data as with T10I4D100K and T25I10D10K, where it scores superiority, especially at lower Min_Sup. As shown in Figures 6(c) and 6(d), IIMFIs score higher time efficiency at smaller Min_Sup values due to the count of items and MFIs since T25I10D10K and Accidents are dense datasets. However, at higher Min_Sup, and subsequently, with the high MFI count discovered, it slightly slows. is explains that the count of MFIs is higher at T1, which consequently affects the sum of time needed at T2 and consequently affects the sum of runtime at T3. As shown in Figure 6(a), the higher the Min_Sup value is, the greater the increase in the number of MFIs and, consequently, the less the reduction runtime rate. Since Mushroom is a sparse dataset but with high count items, the difference in runtime consumption is almost the same for the three algorithms ( Figure 6(b)). erefore, at T1 and T2, item number is greater, MFI count is higher, and Min_Sup value is lower. In this case, IIMFI performance is faster because the tree can hold and update the support of MFIs without the need to re-mine MFIs from the tree. It is observed that at Min_Sup � 1.4% on Mushroom, IMU2P-Miner algorithm works well at higher Min_Sup values, which indicates that this algorithm best operates on univariate uncertain dada as it uses a tree structure local array to keep the updates. e evaluation of the execution time of the tested algorithms on dynamic data experimentally revealed that our algorithm was faster than all the other algorithms due to its ability to handle the MFI rule mining problem by updating the tree structure and generating fewer trees and, subsequently, shorter executed time. IM_WMFI took the longest time, especially at lower minimum supports or heavy datasets. is was most likely due to the larger number of the sub-generated trees and the schema of updating structure rules of the algorithm compared with faster mining of our algorithm due to its tree structure that does not allow subtrees and subsequently admits only new items or updates the cur_Sup, thus reducing the total runtime. It has been experimentally observed that when operating on large and high-weight datasets, IMU2P-Miner and our algorithm ran similarly well as they used similar FP-tree structure rules.
is indicated that when the dataset's weight was too high, our algorithm was faster as it did not rescan the whole database, and hence, the runtime required to generate and scan the updated trees in our algorithm was also less than the 14 Complexity   runtime needed to do the same in the other algorithms because the updated trees in our algorithm were constructed from conditioned MFIs. Experimentally, when comparing the total runtime of the three algorithms, we notice that their performance fluctuates depending on the dataset characteristics and the mining structure of the algorithms whether they are treebased or not. T25I10D10K and Accidents datasets are dense having an unsymmetrical distribution of the item count and transaction count (929 and 9976 for T25I10D10K; 468 and 340183 for Accidents, respectively) with relatively long patterns (average length is 24 for T25I10D10K and 33 for Accidents), while the datasets T10I4D100K and Kosarak are sparse with a variation of item count and transaction count (870 and 100000 for T10I4D100K; 41270 and 990002 for Kosarak, respectively) and characterized by relatively short patterns (average length is 10 for T10I4D100K and 8 for Kosarak). Mushroom is a sparse dataset characterized by symmetrical distribution of the maximal FPs with relatively long patterns (average length is 23) and small item count and transaction count (119 and 8124, respectively). Regarding the mining structure of the algorithms, IIMFIs and SelPMiner are tree-based, whereas CL-Max is not. Figure 7(a) shows the experimental results on T10I4D100K. As T10I4D100K contains smaller number of items, the performance of the algorithms fluctuates so that IIMFIs prove faster than CL-Max algorithm at all Min_Sup and outpace SelPMiner at lower Min_Sup � 0.2% and 0.4%. However, at Min_Sup � 0.6% to 1%, SelPMiner performs faster than IIMFIs and outperforms CL-Max algorithm at all Min_Sup. Also, on T25I10D10K, IIMFI algorithm shows superiority over CL-Max and SelPMiner algorithms at all Min_Sup, as in Figure 7(c). CL-Max algorithm outpaces SelPMiner at Min_Sup � 0.8% to 1.4%, except for 0.6%, where SelPMiner outperforms CL-Max algorithm. As for the performance of the three algorithms on Accidents dataset as illustrated in Figure 7(d), the IIMFI algorithm is generally faster than CL-Max and SelPMiner algorithms at all Min_Sup, while SelPMiner algorithm shows comfortably less runtime than CL-Max at all Min_Sup. When approaching the higher Min_Sup values, they have similar runtime consumption. As for Mushroom dataset (  Figure 7(b)), IIMFIs take less execution time at lower Min_Sup � 5% and 10%, but similar runtime of the other algorithms at higher Min_Sup � 15% to 25%. is may be attributed to the characteristic parameters of Mushroom dataset as sparse but large, having fairly long pattern average, which makes it appropriately workable for all algorithms. Figure 7(e) shows that SelPMiner algorithm has faster pruning than IIMFIs and CL-Max algorithms at all threshold values when implemented on the Kosarak dataset. IIMFIs take the longest time and are outpaced by both SelPMiner and CL-Max algorithms at all threshold values.

Experiment 2 (Runtime Static State
is may be attributed to the search pattern of SelPMiner algorithm of selective partitioning, based on itemset-count tree, which is apt for compact datasets. Its MFI-tree structure works very well, especially when the dataset is sparse but very large. On the other hand, IIMFIs are working efficiently when a dataset is dense, so that at lower thresholds IIMFIs show an efficient runtime reduction rate compared with SelPMiner and CL-Max algorithms. e overall evaluation of experimental tests shows satisfactory results that our algorithm is still effective and satisfactorily fast.

Experiment 3 (Effect of NM on MFIs and Rules).
In this experiment, we evaluated the effect of the NM on the reduction in MFI count and rules generated from MFIs. e experiment was applied to all the datasets with different multiple Min_Sup. In this experiment, useNM � true is used in the proposed algorithm, utilizing two minNovlty values 0 and 0.2, and minConf � 0.7. Table 15 shows the effect of NM on the reduction in MFIs when we applied minNovlty � 0.2. MFIs refer to the count of extracted MFIs without NM; i.e., minNovlty � 0; IMFIs refer to the count of the extracted MFIs with dynamic pruning when minNovlty � 0.2. MFI rules refer to the number of rules generated from MFIs where these rules have Conf ≥ minConf; IMFI rules refer to the number of rules generated from IMFIs where these rules have Conf ≥ minConf. Reduction (-) indicates the effect of NM at minNovlty � 0.2 on the pruning of MFIs and, subsequently, the reduction in the overall rules in each dataset at different Min_Sup values. e results in Table 15 show that since we are dealing with fixed values in static datasets, NM values are generally high.
As Table 15 reveals, there is a direct effect of NM on the count of MFIs and rules when minNovlty � 0.2 compared with the case when minNovlty � 0. NM reduces the count of MFIs and rules in all datasets at all dregs of Min_Sup. e highest effect is on T10I4D100K (92%-100%), while the least effect is on Kosarak (4%-9%). e difference in the percentage of the NM effect from one dataset to another is due to the intensity and degrees of Min_Sup in each dataset. It has been found that in some datasets, the effect is direct, i.e., the higher the degree of Min_Sup is, the greater the effect of the NM, as with T10I4D100K and T25I10D10K datasets. However, some of the datasets have the opposite effect; i.e., the higher the Min_Sup is, the lower the impact ratio of NM, as with Mushroom, Accidents, and Kosarak datasets. is is due to the count of MFIs at each Min_Sup and the average length of MFIs in each dataset.

Conclusion and Future Work
e study introduces a novel approach for mining incremental interesting MFIs by extending FP-Growth and FP-Max to reduce the scanning time and results of MFIs in datasets arriving at different times. e approach framework is structured in five phases representing the approach algorithm IIMFIs, which constitute the proposed design of this work. e proposed approach is incrementally self-adjusting in nature that integrates the previous and current data by adding and updating only new items or items with clearly defined support and incremental support values. Based on developing a tree structure mining method, the proposed approach has advantageously integrated objective and subjective measure (novelty measure) test in dynamic pruning so that only interesting MFIs are produced and only interesting rules are generated. For evaluation purpose, three experiments were conducted on five datasets to test the efficiency of the proposed IIMFI method. Two experiments were performed to test the runtime efficiency of IIMFIs against two MFI incremental methods and two state-of-the-art static algorithms and an experiment to test the effectiveness of IIMFIs in reducing the number of MFIs and discovered rules by incorporating NM during mining process. e experimental results are promising, revealing that the proposed IIMFI method NM generally has a direct effect on the reduction in MFI count and rules in all datasets at all dregs and Min_Sup. However, the varying effect of IIMFI performance and NM on the number of MFIs differs from one dataset to another, which can be attributed to the nature of the dataset, its average length (density), and number of items.
Future work will be on testing the effectiveness of the proposed algorithm on other datasets with different degrees using the concept of "pre-large" to avoid errors in calculating the degree of the incr_Minsup, or using list structure for looking at the price or quantities of the items. More experiments may be conducted to evaluate the impact of subjective measures on objective measures. Future work is also suggested for developing a proposed algorithm that deals with parallel processing of data.