Efficient Utility Tree-Based Algorithm to Mine High Utility Patterns Having Strong Correlation

. High Utility Itemset Mining (HUIM) is one of the most investigated tasks of data mining. It has broad applications in domains such as product recommendation, market basket analysis, e-learning, text mining, bioinformatics, and web click stream analysis. Insights from such pattern analysis provide numerous beneﬁts, including cost cutting, improved competitive advantage, and increased revenue. However, HUIM methods may discover misleading patterns as they do not evaluate the correlation of extracted patterns. As a consequence, a number of algorithms have been proposed to mine correlated HUIs. These algorithms still suﬀer from the issue of the computational cost in terms of both time and memory consumption. This paper presents an algorithm, named Eﬃcient Correlated High Utility Pattern Mining (ECoHUPM), to eﬃciently mine the high utility patterns having strong correlation items. A new data structure based on utility tree (UTtree) named CoUTlist is proposed to store suﬃcient information for mining the desired patterns. Three pruning properties are introduced to reduce the search space and improve the mining performance. Experiments on sparse, very sparse, dense, and very dense datasets indicate that the proposed ECoHUPM algorithm is eﬃcient as compared to the state-of-the-art CoHUIM and CoHUI-Miner algorithms in terms of both time and memory consumption.


Introduction
We live in a data age where a huge amount of data is generated from different devices every day. It is expected that 463 exabytes of data will be generated on a daily basis by 2025 [1]. Data mining has received a great deal of attention in order to transform data into useful information, due to the exponentially explosive growth of data [2]. Pattern mining is a type of unsupervised data mining approach, which aims to find useful, interesting, and meaningful patterns that can be used to support decision-making [3,4]. Different pattern mining techniques are used to mine different types of patterns, including frequent patterns [5], high utility patterns [6], sequential patterns, trends, outliers, and graph structures [2,6].
Frequent itemset mining (FIM) aims to extract patterns containing items that frequently appear in transactional database. [7]. is task has been tremendously studied and remains to this day a very active research area as it has several applications in domains such as market basket analysis, product recommendation, text mining, e-learning, bioinformatics, and web click stream analysis [3,8,9]. Even though the mining of frequent pattern is useful, it depends on the assumption that all items in the dataset are equally important (e.g., weight or profit). Nevertheless, this assumption is not true for several real-life applications [6,10]. For instance, the pattern bread, milk { } in a transaction database may be extremely frequent but it may not be interesting as it may produce a low profit. In different circumstances, numerous patterns like champagne, caviar may yield a higher profit even if they are not frequent [11]. To overcome this limitation of FIM, an emerging research area is High Utility Itemset Mining (HUIM) which aims to find high utility or important patterns [2,6].
HUIM takes into account the weight of items in the database and their quantities in each transaction. e goal of HUIM is to find all patterns having utility not less than minimum utility threshold. Recently, HUIM has become a very active research area as it generalizes the problem of FIM and has the same wide applications [12][13][14][15]. e algorithms of HUIM are divided into two main categories. e first category is called Two-Phase algorithms [11,16]. ese types of algorithms generate candidates in the first phase, and then, in the second phase, they calculate the utility of each candidate in order to derive HUIs. However, due to the huge number of candidates generated in the first phase, these algorithms may suffer from the problem of time and memory consumption. e second category is One-Phase algorithms [6,10,17]. e algorithms of this category try to overcome the above issue by utilizing different data structures to store sufficient information for mining the desired patterns without candidate's generation and utilize various pruning properties to reduce the search space.
One critical downside of High Utility Itemset Mining methods is that they generally extract patterns with a high utility, but the items that make up these patterns are weakly correlated. For marketing decisions, such patterns are either useless or misleading [18][19][20][21][22][23]. For instance, with market basket analysis application, the current algorithm of High Utility Pattern Mining may find that buying a pen and a 60inch plasma TV is a high utility itemset, since these items generally create a high profit when purchased together. However, these items are weakly correlated and rarely sold together. Hence, it would be a mistake to use this pattern to promote TV to customers who buy pen [11,21].
To address the above-stated issue, few numbers of algorithms have been developed to mine Correlated High Utility Itemsets, such as HUIPM [19], FDHUP [21], FCHMbond [22], FCHMall-confidence [22], CoHUIM [20], and CoHUI-Miner [24]. ese algorithms differ from each other in the measures used to evaluate the interestingness of the extracted patterns, the data structures, and pruning properties that they used to reduce the search space and improve the mining performance. In [20,24], a projected database has been utilized to reduce the database and improve the efficiency of correlated HUIs mining. e projected database is effective, but it suffers from the computational cost in terms of running time and memory consumption.
In order to address such issue in mining Correlated High Utility Itemsets, this study proposes a new algorithm named Efficient Correlated High Utility Pattern Mining (ECo-HUPM). In the proposed algorithm, new efficient data structures and pruning properties are introduced to mine the desired patterns in efficient manner. e main contributions of this paper are summarized as follows: (i) It proposes a novel algorithm, ECoHUPM, which adopts the divide-and-conquer approach and employs UTtree structure which is an extended form of FP-tree [25].
(ii) New data structure based on UTtree named CoUTlist is proposed to store sufficient information for mining the desired patterns in one phase without candidate's generation. (iii) e proposed algorithm introduces several pruning properties to reduce the search space and improve the mining performance. (iv) An experimental performance evaluation of the proposed algorithm is conducted on sparse, very sparse, dense, and very dense datasets. e performance of the proposed ECoHUPM algorithm is compared with CoHUIM and CoHUI-Miner algorithms for Correlated High Utility Itemset Mining. Experimental results show that the proposed ECoHUPM algorithm is better than the stateof-the-art CoHUIM and CoHUI-Miner algorithms in terms of both time and memory consumption. e rest of this paper is organized as follows: In Section 2, we review the literature associated with HUIM and CoHUIM. Next, we introduce the mathematical preliminaries and state the problem in Section 3. In Section 4, we explain the proposed algorithm in detail. Section 5 gives details of the experimental setup and analyzes the results. Section 6 concludes the work of this paper.

Related Works
is section reviews the literature on HUIM and the CoHUIM.

High Utility Itemset Mining (HUIM).
Yao and Hamilton defined the problem of HUIs mining in 2004 [26]. ey developed UMining algorithm for mining the itemset having high utility. UMining is an approximate algorithm and may fail to extract all HUIs. Hence, in order to extract the complete set of HUIs, Liu et al. [16] developed a Two-Phase algorithm. In the Two-Phase algorithm, a novel upper bound pruning property named TWU (Transaction Weighted Utilization) has been proposed to reduce the search space. e Two-Phase algorithm mines the HUIs in two phases. In the first phase, it generates the candidate HUIs with their TWU not less than the minimum utility threshold. en, in the second phase, it calculates the utility of each candidate by scanning the database again to drive the HUIs. However, the Two-Phase algorithm suffers from the issue of time and memory efficiency. e main reason is that a huge number of candidates may be generated in the first phase.
In [27], a new method based on tree structure called HUPtree is proposed to mine HUIs. It integrates the Two-Phase procedure and FP-tree concept to construct a compressed tree structure for utilizing the TWU property. is approach mines HUIs in three steps: (1) constructs the tree, (2) generates the candidate's patterns, and then (3) identifies the HUIs from the list of candidates. e mining performance of this algorithm is affected by the number of conditional trees constructed during the whole mining process and the traversal cost of each conditional tree. Hence, this algorithm suffers from the time and memory consumption due to the generation of a huge number of conditional trees and candidate patterns as well [28].

Complexity
In order to improve the efficiency of HUIs mining, several algorithms have been developed. To extract HUIs without candidates generation, Liu and Qu proposed HUI-Miner algorithm [29]. HUI-Miner utilizes utility-list structure to store sufficient information for mining the HUIs in one phase. en Fournier-Viger et al. developed an algorithm named FHM [30], which introduced EUCS (Estimated Utility Cooccurrence Structure) and EUCP (Estimated Utility Cooccurrence Pruning) to improve the HUIs mining performance. HUP-Miner [31] extended the HUI-Miner to speed up utility list by utilizing a look-ahead strategy and pruning the search space by database partitioning. Chen and An [32] proposed PHU-Miner which is a parallel version of HUI-Miner. A novel algorithm named ULB-Miner was developed [33], in which improved utility list has been proposed, called utility-list buffer, for speeding up the utility-list join operation and reducing the memory consumption. A new projection-based algorithm, named MAHI [34], has been proposed to speed up the discovery of HUIs by utilizing a MAprun (Matrix-based pruning strategy).
For mining HUIs without the need to set the minimum utility threshold, Tseng et al. [35] developed two types of efficient algorithms named TKU (mining Top-K Utility itemsets) and TKO (mining Top-K utility itemsets in One phase) to extract top-K high utility itemsets. However, they remain expensive in terms of both runtime and memory usage. Hence, Duong et al. [12] designed a novel algorithm named kHMC to extract the top-K HUIs more effectively. e kHMC utilizes three strategies called COV, RIU, and CUD to reduce the search space and thus improves the mining performance. Recently, Gunawan et al. [36] developed an algorithm based on binary particle swarm optimization for optimizing the search for HUIs without setting the minimum utility threshold beforehand. Instead, the minimum utility threshold is determined as a postprocessing step.
Although High Utility Pattern Mining has several applications, it has some limitations. As a consequence, many extensions of High Utility Pattern Mining appeared in the literature such as Incremental Utility Mining [37,38] which aims to extract HUPs from dynamic databases, On-Shelf High Utility Pattern Mining [39][40][41] in which the shelf time of items is considered, and Concise Representations of High Utility Patterns (e.g., Maximal Itemsets [42,43] and Closed High Utility Itemsets [44][45][46][47]) that aim to extract a small list of meaningful HUPs.

Correlated High Utility Itemset Mining (CoHUIM).
A number of correlation measures have been suggested in the data mining literature which are used for association analysis, such as bond, all-confidence, any-confidence [48,49], coherence [50] and Kulczynsky [51]. As the traditional algorithms of High Utility Itemset Mining do not consider the correlation of the extracted patterns, they may lead to noninteresting or misleading patterns. In such a case, they usually discover itemsets having high utility, but these itemsets may contain weakly correlated items.
In order to extract more interesting patterns and to avoid misleading patterns resulting from the traditional methods of HUIs mining, a number of algorithms have been proposed to mine Correlated High Utility Itemset by utilizing both utility and correlation measures. Ahmed et al. [19] first proposed an algorithm named High Utility Interesting Pattern Mining (HUIPM) with strong frequency affinity for mining interesting patterns in high utility itemset, in which the relation among items is meaningful. e HUIPM algorithm introduced a new data structure named Utility Tree based on Frequency Affinity (UTFA) as an efficient data structure to store sufficient information required for mining the desired patterns. While a new pruning property named Knowledge Weighted Utilization (KWU) has been proposed in this algorithm to reduce the search space, the HUIPM algorithm recursively creates a number of conditional trees to generate candidates and then derive interesting patterns.
is procedure is time-consuming. us, Lin et al. [21] developed a new algorithm named fast algorithm for mining discriminative high utility patterns (FDHUP) to improve HUIPM. In the FDHUP algorithm, two data structures called Element Information table (EI table) and Frequency  Utility table (FU table) have been proposed to store required information for mining the DHUP efficiently. New pruning property is based on summation of affinitive utility and the remaining affinitive utility has been introduced to reduce the search space.
Fournier-Viger et al. [22] developed Fast Correlated High Utility Itemset Miner (FCHM) algorithm for integrating the concept of correlation in High Utility Itemset Mining in order to extract profitable patterns that are highly correlated. Two versions of the algorithm have been proposed, FCHMbond and FCHMall-confidence, which are based on bond and all-confidence measures that are already used for measuring frequent correlated patterns [48,50,52]. e FCHM algorithm is based on HUI-Miner [29], in which the utility-list structure has been utilized, while TWU and strategy based on summation of initial and remaining utility have been used as pruning properties to reduce the search space. Moreover, FCHMbond and FCHMall-confidence utilize the antimonotonicity property of the bond and allconfidence measures, respectively, for further improving the mining performance.
Gan et al. [18,20] proposed two algorithms to extract correlated purchase behaviors by considering the correlation and utility measures. e first algorithm [20] is named Correlated High Utility Itemset Mining (CoHUIM), while the second one [18] is Correlated high Utility Pattern Miner (CoUPM). Both algorithms use the Kulczynsky (abbreviated as Kulc) measure [51] in conjunction with utility measure to evaluate the interestingness of the desired patterns. e CoUPM utilizes the utility-list structure which is introduced in [29] as a data structure to store information required to mine the desired patterns. Meanwhile, an efficient projection mechanism and a sorted downward closure property are developed in CoHUIM to reduce the database size.
Vo et al. [24] suggested an algorithm, called CoHUI-Miner, to efficiently extract Correlated High Utility Itemset. e CoHUI-Miner applies the database projection Complexity mechanism to reduce the database size. Furthermore, it proposes a new concept called the prefix utility of projected transactions to directly calculate the utility of itemset. Table 1 shows a summary of the Correlated High Utility Itemset Mining algorithms and their features.

Fundamental Concepts
is section presents preliminary concepts related to the problem of Correlated High Utility Itemset Mining. We adopted the definitions presented in previous work [53].
Definition 1 (quantitative database). Let I � i 1 , i 2 , . . . , i m be a set of items and for each item i p ∈ I(1 ≤ p ≥ m) profit unit (External Utility) denoted as pr(i p ) in each transaction each item is associated with internal utility (Quantity) denoted as q(i p , T d ). A quantitative database D � T 1 , T 2 , . . . , T n } contains a set of transactions. Table 2 shows the transactional database, while Table 3 shows external utilities for the items in Table 2.
Definition 3. Utility of an itemset X in the transaction T d is denoted by u(X, T d ) and is defined as , that is, the sum of the utilities of all items inside pattern X in transaction T d .
For example, u(bc, Definition 4. Utility of an itemset X in the quantitative database D is denoted by u(X) and is defined as , that is, the sum of the utilities of itemset X in all transactions containing it.
For example, for the data presented in Table 2 with minUtil � 90, {bc} is high utility itemset.

Definition 6. Utility of a transaction T d is denoted by tu(T d )
and is defined as the sum of the utilities of all items inside transaction For example, the utility of transaction T 8 is calculated as

Definition 7.
e Transaction Weighted Utilization (TWU) of an itemset X in database D is defined as TWU(X) � Definition 8. An itemset X is called High Transaction-Weighted Utilization Itemset (HTWUI) if TWU(X) ≥ min Util, where minUtil is the minimum utility threshold.
Different measures have been used to evaluate the interestingness of the HUIs, such as frequency affinity, bond, all-confidence, and Kulczynsky. Kulczynsky measure was recommended in [2] and has been used in [18,20,24]. Kulczynsky (abbreviated as Kulc) is a null-invariant measure; it is not influenced by the null transactions and is used to evaluate the inherent correlation of patterns [48,51].
e support of an itemset X in the transactional database D is denoted by sup(X) and is defined as the proportion of transactions in the database which are matched by X. Sup(X) � (count(X)/n), where n is the total number of transactions in the database.

Proposed Algorithm
In order to address the need for more efficient algorithm for mining Correlated High Utility Itemsets, we propose a new algorithm named Efficient Correlated High Utility Pattern Mining (ECoHUPM).
is section presents the proposed ECoHUPM algorithm in detail, the data structures that it utilizes to store sufficient information for mining the desired patterns, and the pruning properties that are used to reduce the search space and improve the mining performance.

Database Revising.
e proposed ECoHUPM algorithm revises the input database in its first step. First, Property 1 [16] is used to remove all 1-itemsets with their TWU less than minimum utility threshold. For instance, for the data presented in Table 2, with minUtil � 90, "g" item is removed Second, in each transaction, the utility of each item is computed through the formula quantity × profit as is stated in Definition 2. ird, items in each transaction are sorted in the descending order of their support and the total utility is assigned to each transaction. Table 4 shows the items in the descending order of their support, while Table 5 shows the revised database.

Search Space.
e proposed ECoHUPM algorithm utilizes a set-enumeration tree as a search space, whose efficiency has been verified in pattern mining [29]. Reversed depth-first search traversal is adopted as shown in Figure 1 to facilitate the search tree. Note that the ECoHUPM uses the support descending order to revise database and then to construct the UTtree. Hence, with reversed depth-first search, the mining order for the running example is f≺e≺b≺a≺d≺c .
Definition 12. Given a set-enumeration tree and itemset X represented by a node N, a set of nodes with their ancestors N are called the extensions (supersets) of X.
For the k-itemset (itemset containing k items), we denote its extensions containing (k + i) items as i-extension of the itemset. By adopting reverse depth-first traversal, any extension of itemset X is a combination of X with the item(s) before X.
For instance, in the set-enumeration tree represented in Figure 1, itemset debf is 2-extension of bf , while itemset cdebf is 3-extension of bf .

Utility Tree and Correlation Utility-List Structures.
Once the database is revised, the proposed ECoHUPM algorithm constructs the utility tree (UTtree). A UTtree is a concise structure that stores sufficient information for facilitating the mining of Correlated High Utility Itemsets in a single phase. It is an extended form of FP-tree [25], where each node consists of four fields: label, interLink, parentLink, and utList. e label refers to the item's label,interLink points to the next node of the same item, parentLink points to the parent node, and utList is a dictionary that stores both a transaction's ID as keys and item's utility in each transaction as values.
A UTtree is constructed with only one scan of the revised database as is shown in Algorithm 1. First, the tree is initialized by creating the Root node. en the transactions are processed one by one, as shown in lines 1 and 2. e information of each transaction is inserted into the tree by calling insertTree(T i , itemsList, N d) function as shown in  [22] Utility and bond Utility list (1) TWU (2) Sum of iutil and rutil (3) Antimonotonicity of bond CoHUIM [20] Projected database TWU CoUPM [18] Utility and Kulczynsky Utility list (1) TWU (2) Sum of iutil and rutil CoHUI-Miner [24] Projected database with prefix utility    For the revised database presented in Table 4, Figure 2 shows the UTtree after inserting the first transaction. First, the tree is initialized by the Root node, and then node c is created with label � c, parentLink � Root, and utList � T 1 : 8 .
en, node d is created with label � d, parentLink � c, and utList � T 1 : 2 . Node a is created with label � a, parentLink � d, and utList � T 1 : 6 . Node b is created with label � b, parentLink � a, and utList � T 1 : 24 . Node e is created with label � e, parentLink � b, and utList � T 1 : 30 . As the current transaction is the first transaction, interLink of all nodes points to the items holding the same label in the header table. Similarly, the second and third transactions are inserted into the UTtree as shown in Figure 3. e final UTtree after inserting the last transaction is shown in Figure 4.
Besides adopting UTtree, new condensed data structure named CoUTlist is proposed to store sufficient information for mining the superset patterns without needing to scan the UTtree multiple times.

Definition 13.
e Correlation Utility list (CoUTlist) of an itemset X contains a set of elements, where each element represents node called a suffix where itemset X appears. In the CoUTlist, each element has four fields: (i) nodeNo is a unique identifier number for each node, which is used as a sequence number, for example, (1, 2, . . ., n).   Figure 1: Set-enumeration tree with reversed depth-first traversal.
In the same manner, the CoUTlist of the remaining 1itemsets are constructed as shown in Figure 7

4.3.2.
e CoUTlist of k-Itemset. Let itemset X � I k , I k−1 , . . . , I 2 , I 1 } be an extension of itemset Y � I k−1 , . . . , I 2 , I 1 }. We denote the element in CoUTlist(Y) as Y node and the element in CoUTlist(X) as X node .   . e proposed algorithm utilizes the TWU property [16] to remove all 1itemsets having TWU less than minUtil threshold. Let X be a k-itemset, and let Y be a (k − 1)-itemset such that Y ⊂ X. If X is HTWUI, Y is HTWUI as well. is means that if an itemset is Low Transaction-Weighted Utilization Itemset (LTWUI), all its supersets will be LTWUIs as well. Hence, this property can be used to reduce the search space by removing LTWUIs with their supersets from the search space.

Property 2.
e first proposed pruning property is Upper Bound property based on summation of Utility and the Path Utilities (UBUPU).
Given an itemset Y, if u(Y) plus pu(Y) is less than the minimum utility threshold (minUtil), Y and any superset of Y are not CoHUIs.
Proof. Let X be a superset of Y; we know the following. e utility of Y is calculated as the sum of the nodes' utilities in the CoUTlist(Y).
for example, u(a) � 51 + 24 � 75. e path utility of Y itemset is calculated as the sum of the prefixes' utilities stored in the prefixpath in the CoUTlist(Y).
for example, pu(a) � [(15 + 72) + (28)) � 115. Since Y⊆X, □ Property 3. e second proposed pruning property is Lower Bound property based on the Node Utility (LBNU). As the CoUTlist of each itemset X is a list of elements (nodes), where each element represents a set of transactions containing X, on the contrary to the utility list [18,22] or projected database [20,24], where each element represents a single transaction, with CoUTlist, there is a possibility that the utility of some itemsets exceeds the min Util in some elements of their CoUTlists, and thus the following lower bound property based on the nodeUtility is formed.
Consider an itemset X: ∀ X node ∈ CoUTlist(X), if nodeUtility ≥ min Util, then all possible combinations of itemsets in the current path are considered high utility itemsets (Figure 9).
Proof. Let CoUTlist(X) be the correlation utility list of itemset X, and ∀ X node ∈ CoUTlist(X), we know the following: e set of parent nodes in the current path is denoted as [P 1 , P 2 , . . . , P n ]. u(P 1 + X) � nodeUtility + prefixPath[P 1 ]. e third proposed pruning property is Sorted-Reversing Downward Closure (SRDC) property based on Kulc measure, which is used as a correlation measure in the proposed ECoHUPM algorithm. By adopting reverse depthfirst traversal, each k-itemset is in this form I k , I k−1 , . . . , I 2 , I 1 , and because these items are sorted based on their support descending order, the sorted-reversing property based on Kulc measure is formed as

Complexity
Proof. Let X � I k , I k−1 , . . . , I 2 , I 1 be a superset of Y � I k−1 , . . . , I 2 , I 1 ; we know that Hence, Note that the proposed SRDC property is similar to the sorted downward closure (SDC) property which was employed in [20,24]. However, SDC cannot be applied directly in the proposed ECoHUPM algorithm, because, in [20,24], the items are sorted in the ascending order of their supports. Meanwhile, in ECoHUPM, the items are sorted in the descending order of their supports. e proposed UBUPU and SRDC properties are employed to reduce the search space by removing all supersets of each Y itemset if kulc(Y) < (min Corr) or sum(u(Y), pu(Y)) < min Util. On the other hand, the proposed LBNU property is employed to improve the searching efficiency as follows: in each element in the CoUTlist of Y itemset, if nodeUtility is equal to or greater than min Util, all possible supersets of Y in the current path are considered as HUIs. Hence, ECoHUPM needs only to calculate the correlation of each superset X in the current path without needing to make sure that u(X) or sum(u(X) + pu(X)) exceeds min Util. Figure 10 shows how these three properties help significantly in reducing the search space and thus improve the mining performance. For example, with min Corr � 0. 4  ECoHUPM adopts reverse depth-first traversal for setenumeration tree searching and employs the pruning properties to reduce the search space. e pseudocode of ECoHUPM is shown in Algorithm 2.
e input for ECoHUPM is database D including transactional database with external utility along with min Util, as a given minimum utility threshold, and min Corr, as a given minimum correlation threshold. In line 1, the ECoHUPM preprocess database D to obtain the revised database RD, and then it stores the set of unique items sorted in the descending order of their support in itemsList (line 2). en it runs Algorithm 1 to construct the UTtree by performing one scan of the RD (line 3).
Lines 4 to 15 state the procedure of extracting the Correlated High Utility Itemsets. For each loop started by line 4, ECoHUPM finds all Correlated HUIs that are supersets of item X. Lines 5 and 6 construct the CoUTlist of 1-itemset X with the help of interLink and parentLink of nodes whose label is X in the UTtree as is illustrated in Section 4.3.1. As the correlation value of each 1-itemset is 1, lines 7 to 9 add X itemset to the CoHUPs list if its utility is equal or greater than min Util. Line 10 employs the proposed UBUPU property by examining the summation of utility and path utilities of an itemset X. If the sum(u(X) + pu(X)) is less than min Util, all its supersets will be pruned using the proposed UBUPU property. Otherwise, the function search is called to search its supersets (line 11). is procedure is recursively performed for all 1-itemsets to discover Correlated High Utility Itemsets (Algorithm 3). e function search(X, CoUTlist(X), min Util, min Corr) is used to search the whole list of extensions of itemset X in order to discover all correlated high utility supersets. It scans CoUTlist(X) node by node to find all possible prefixes (lines 1-12). If the utility of the current node nodUtility is equal to or greater than min Util, all prefixes in the prefixPath of the current node are added to the list of high utility prefixes HUprefixesList (lines 5 to 7). All unique prefixes are added to the prefixList (lines 8 to10). en the procedure in lines 13 to 29 is performed for each prefix P i in prefixList. First, 1-extension itemset of X is formed such that itemset � P i + X (line 14) and its CoUTlist is constructed (line 15). Line 16 implements the proposed SRDC property to remove the itemset with all its supersets from the search space if its Kulc value is less than min Corr. Lines 17 to 19 employ the proposed LBNU property to add an itemset to the list of CoHUPs if current P i is in the HUprefixList and search function is called to search all its supersets. Otherwise, lines 21 to 23 add an itemset to the list of CoHUPs if its utility is equal to or greater than min Util, while lines 24-26 employ the proposed UBUPU property to remove the itemset with all its supersets if its utility plus path utilities is less than min Util; otherwise, search function is called to search its supersets.

Experiment Design
In this section, we present the design of the experiments for performance evaluation. Experiments were performed on a 10 Complexity computer with an Intel ® Core TM i7-6600U CPU @ 2.60 GHz (4 CPUs), 2.8 GHz, and 8 GB of memory, running 64-bit Windows 10 Pro. e performance of the proposed ECo-HUPM algorithm was compared to that of the CoHUIM and CoHUI-Miner algorithms in terms of both runtime and memory consumption. All algorithms were implemented using Python 3 and Jupyter Notebook.

Datasets Used.
We used five standard datasets downloaded from SPMF library [54], two real-life datasets with real utility values (Foodmart and Ecommerce), and three real-life datasets with synthetic utility values (BMS, Chess, and Mushroom). Characteristics of the considered datasets are shown in Table 6. min Corr is adapted with three times on each sparse dataset and four times on each dense dataset to evaluate the efficiency of the proposed ECoHUPM algorithm and they are denoted, respectively, as ECoHUPM -minCorr1 , ECo-HUPM -minCorr2 , ECoHUPM -minCorr3 , and ECoHUPM -min- e different three min Corr thresholds are, respectively, set as follows: (1)

5.2.
Runtime. e runtime of the proposed ECoHUPM algorithm was compared with those of two state-of-the-art Correlated HUIs mining algorithms: CoHUIM and CoHUI-Miner. For each dataset, the min Util threshold was adjusted, Input: Database D, min Util, and min Corr. Output: All Correlated High Utility Itemsets.
Function: search (X, CoUTlist(X), min Util, min Corr) (1) HUprefixList←∅ (2) prefixList←∅ (3) for each X node ∈ CoUTlist(X) do (4) for each P i ∈ prefixPath do (5) if nodeUtility ≥ min Util then (6) HUprefixList←HUprefixList ∪ P i (7) end if (8) if P i ∉ prefixList then (9) prefixList←prefixList ∪ P i (10) end if (11) end for (12) end for (13) for each P i ∈ prefixList do (14) itemset←P i + X; (15) Scan the CoUTlis(X) to construct the CoUTlist(itemset) (16) if Kulc(itemset) ≥ min Corr then (17) if P i ∈ HUprefixList then (18) CoHUPs←CoHUPs ∪ itemset (19) Call search(itemset, CoUTlist(itemset), min Util, min Corr) (20) else (21) if u(itemset) ≥ min Util then (22) CoHUPs←CoHUPs ∪ itemset (23) end if (24) if u(itemset) + pu(itemset) ≥ min Util then (25) Call search (itemset, CoUTlist(itemset), min Util, min Corr) (26) end if (27) end if (28) end if (29) end for ALGORITHM 3: Algorithm for searching the list of extensions of itemset X.      On the other hand, the ECoHUPM is faster than the CoHUI-Miner on sparse datasets such as Foodmart and Ecommerce (up to 5 and 2.2 times, respectively) and it is slightly faster on very sparse dataset such as BMS (up to 1.3 times). For dense datasets with low threshold values, the ECoHUPM is significantly faster than the CoHUI-Miner on the Mushroom dataset (up to 3.2 times) and on very dense datasets such as Chess (up to 4.3 times faster). Meanwhile, with high threshold values, the ECoHUPM is slightly faster than the CoHUI-Miner on Mushroom dataset (up to 2.1 times faster) and on Chess dataset (up to 1.4 times faster). e main reason why the ECoHUPM algorithm is always faster than CoHUIM and CoHUI-Miner algorithms is that the novel CoUTlist structure is highly effective in reducing the database size as compared to the projection mechanism used on CoHUIM and CoHUI-Miner algorithms. at is, in the CoUTlist, each element represents a set of transactions where the itemset occurs in the same path. Meanwhile, in the projected database, each element represents a single transaction where the itemset occurs. Moreover, the proposed pruning properties help in reducing the search space. e CoHUIM algorithm performs two phases. It first generates the candidate itemsets whose correlation is equal to or greater than the min Corr threshold, and then it calculates the utility of each candidate. Hence, in all datasets, CoHUIM is much slower than the CoHUI-Miner and the proposed ECoHUPM. e size of the projected database of an itemset increases as the density of the datasets is increased and thus the cost of building the projected database of the supersets is also increased. us, the CoHUIM could not find the Correlated HUIs when it was run on Mushroom and Chess datasets with low min Util and min Corr thresholds. is is because it suffers from excessive dataset scanning in the second phase.
e CoHUI-Miner is a One-Phase algorithm. However, due to the big size of the projected database of each itemset as compared to the CoUTlist especially in dense and very dense datasets, the proposed ECoHUPM is significantly faster than the CoHUI-Miner on Mushroom and Chess datasets.
In very sparse datasets, the size of the CoUTlist of each itemset is slightly smaller than the size of the projected database. Hence, the proposed ECoHUPM is slightly faster than the CoHUI-Miner in BMS dataset.

Memory Usage.
e comparison of the memory usage of the proposed ECoHUPM against CoHUIM and CoHUI-Miner is shown in Figure 13. In this figure, the Y-axis represents the memory usage which is measured by the memory usage module in Python.
It is observed that the proposed ECoHUPM algorithm consumes less memory as compared to the CoHUIM and CoHUI-Miner in all datasets. More specifically, on the sparse datasets such as Foodmart and Ecommerce, the memory usage of the CoHUIM occupies 1.5 and 3 times the memory of the proposed ECoHUPM, respectively. Meanwhile, on very sparse datasets such as BMS, the memory usage of the CoHUIM occupies 2.2 times the memory of the proposed ECoHUPM. Likewise, on sparse and very sparse datasets such as Foodmart, Ecommerce, and BMS, the CoHUI-Miner occupies 1.07, 1.8, and 1.3 times the memory of the proposed ECoHUPM, respectively. Meanwhile, on dense and very dense datasets such as Mushroom and Chess, the CoHUI-Miner occupies 1.7 and 2.2 times the memory of the proposed ECoHUPM, respectively.

Conclusion
is paper proposed an efficient algorithm named ECo-HUPM for mining Correlated HUIs.
e ECoHUPM algorithm adopts divide-and-conquer approach and employs UTtree structure which is an extended form of FP-tree. A novel data structure based on the UTtree named CoUTlist is proposed in the ECoHUPM to store sufficient information for mining the desired patterns in an efficient manner. ree new pruning properties have been introduced and applied to reduce the search space and improve the mining performance. e first proposed pruning property is Upper Bound property based on summation of Utility and the Path Utilities (UBUPU), the second one is Lower Bound property based on the Node Utility (LBNU), and the third one is Sorted-Reversing Downward Closure (SRDC) property based on Kulc measure.
An extensive experimental evaluation was conducted on five datasets including sparse, very sparse, dense, and very dense datasets.
e experimental results show that the proposed ECoHUPM algorithm is efficient as compared to the state-of-the-art CoHUIM and CoHUI-Miner algorithms in terms of both time and memory consumption.
Data Availability e data used in the experiments of this paper are available at SPMF library [54]: https://www.philippe-fournier-viger. com/spmf/.  16 Complexity