PHUIMUS : A Potential High Utility Itemsets Mining Algorithm Based on Stream Data with Uncertainty

High utility itemsets (HUIs) mining has been a hot topic recently, which can be used to mine the profitable itemsets by considering both the quantity and profit factors. Up to now, researches on HUIs mining over uncertain datasets and data stream had been studied respectively. However, to the best of our knowledge, the issue of HUIs mining over uncertain data stream is seldom studied. In this paper, PHUIMUS (potential high utility itemsets mining over uncertain data stream) algorithm is proposed to mine potential high utility itemsets (PHUIs) that represent the itemsets with high utilities and high existential probabilities over uncertain data stream based on sliding windows. To realize the algorithm, potential utility list over uncertain data stream (PUSlist) is designed to mine PHUIs without rescanning the analyzed uncertain data stream. And transaction weighted probability and utility tree (TWPUS-tree) over uncertain data stream is also designed to decrease the number of candidate itemsets generated by the PHUIMUS algorithm. Substantial experiments are conducted in terms of run-time, number of discovered PHUIs, memory consumption, and scalability on real-life and synthetic databases. The results show that our proposed algorithm is reasonable and acceptable for mining meaningful PHUIs from uncertain data streams.


Introduction
Knowledge discovery in databases (KDD) is an emerging issue since the important, implicit, unknown, and potential useful information can be found from huge databases [1,2].And frequent itemsets mining (FIM), which is used to mine the frequent itemsets that their occurrence frequencies are no less than minimum support threshold, is one of the most important and common tasks of data mining [3].Apriori [4] based on bread first search and FP-growth [2] based on depth first search are well-known fundamental FIM algorithms.However, these traditional FIM algorithms assume that the profit of every item is the same and the frequency value of every item in transactions is 0 or 1.In real-life applications, the itemsets that bring high profit to retailers and managers are useful [5], not the most frequent itemsets.Thus, factors like quantity, price, and profit are needed to be included in the FIM.
To deal with the limitations of FIM, Chan et al. [6] first proposed high utility itemsets mining algorithm over the nonbinary databases with different profit values of items.The goal of HUIs mining is to discover itemsets that bring considerable profit to users, although they are not frequent itemsets.Aiming at the issue of HUIs mining, level-wise approaches [6,7], pattern growth approaches [8][9][10], and list based approaches [11,12] are three main frameworks to deal with the problem of undownward closure property and combinational explosion about it.
These traditional HUIs mining algorithms are proposed to deal with static databases, which ignore itemsets' timeliness.Therefore, Tseng et al. first proposed THUI-Mine [13] to mine HUIs from data stream according to the twophase model based on sliding windows.Afterward, lots of improved algorithms [14][15][16][17][18] are proposed to handle this problem more efficiently.However, the above algorithms can only deal with the precise data streams, and they could not deal with uncertainty.
In real-life applications, while the data is collected from noisy data sources, uncertainty may be introduced.But most HUIs mining algorithms are developed to handle precise (2) As HUIs mining over uncertain data stream brings existential probability and sliding windows into consideration, the calculation of items utility, itemsets utility, transaction utility, and transaction weighted utility is changed.In this paper, new definitions about them are given, and a novel type of itemsets named PHUIs is designed.
(3) PHUIMUS algorithm is proposed to mine PHUIs over uncertain data stream based on the developed PUS-list and TWPUS-tree in the current window, which can efficiently prune the unpromising itemsets and get PHUIs without rescanning the analyzed uncertain data stream.
(4) Substantial experiments have been conducted on reallife and synthetic databases.Results show that the designed algorithm can effectively discover PHUIs over uncertain data stream and has a good performance on run-time, number of discovered PHUIs, memory consumption, and scalability.
The remainder of this paper is organized as follows.In Section 2, we describe the related work.In Section 3, we present our new definitions and the problem of HUIs mining over uncertain data stream.In Section 4, we develop our proposed PUS-list and TWPUS-tree and design PHUIMUS algorithm to the stated problem.In Section 5, our experimental results are presented and analyzed.Finally, in Section 6, conclusions are drawn.

Related Work
In this section, related work about HUIs mining over data stream and uncertain database are briefly reviewed, respectively.

HUIs Mining over Data Stream.
As an expansion of FIM, HUIs mining focuses on finding itemsets whose utilities are not lower than a minimum utility threshold which has been widely studied recently, which can be used in various areas, such as web click analysis, biological gene analysis, and retail marketing [18].Its goal is to discover items or itemsets in transactions that are valuable to users, not the most frequent ones.Aiming at the issue of HUIs mining, several typical algorithms had been proposed to deal with the problem of undownward closure property and combinational explosion about it, such as two-phase model [7], HUP-growth [9], HUI-Miner [11], HUPEumu-GRAM [21], and HUIs mining-BPSO [22].
In contrast to discovering HUIs from static database, THUI-Mine [13] is the first algorithm for mining HUIs from data stream according to the two-phase model based on sliding windows [7] and thus suffers from the problem of levelwise candidate generation.Afterward, to reduce the number of THUI-Mine's candidate itemsets, Li et al. [14,15] proposed MHUI-BT and MHUI-TID by using bit vectors and TID-lists for each distinct item.The experiments show that MHUI-TID is an outstanding algorithm for mining HUIs from data stream since TIDlist is an efficient data structure that can reduce the number of candidate itemsets sharply.
Moreover, Shie et al. [16] proposed efficient algorithms for mining maximal HUIs from data streams with different models.Ahmed et al. [17] designed an interactive mining algorithm of high utility patterns over data streams.Zihayat and An [18] suggested an algorithm in mining top-k high utility patterns over data streams.However, the above algorithms can only deal with the precise data streams, and they could not deal with uncertainty.

HUIs Mining over Uncertain Database.
It is assumed by most HUIs mining algorithms that the information stored in the databases is precise, which ignore itemsets' existential probability.Thus, traditional HUIs mining algorithms are insufficient to process transactions with uncertainty in reallife applications.In fact, for the uncertain database, itemsets with high utility and high existential probability are useful to users, not itemsets with only one of them.To the best of our knowledge, Lin et al. [19] proposed PHUI-UP based on twophase model and PHUI-List based on list structure, Lan et al. [20] proposed UHUI-apriori based on Apriori, and these are  only algorithms that used to solve HUIs mining problem over uncertain databases.However, the above algorithms can only handle static databases with uncertainty, and they could not deal with uncertain data stream.In this paper, the concepts of HUIs mining over data stream and uncertain database to discover HUIs from uncertain data stream are combined.To the best of our knowledge, the proposed PHUIMUS algorithm is the first work that discovers HUIs over uncertain data stream.

New Definitions and Problem Statement
In this section, some definitions of HUIs mining are extended from the precise and static databases to the uncertain data streams.New definitions and problem statement related to HUIs mining over uncertain data stream are given.
In an uncertain data stream, data cannot be completely stored as infinite volume and storage structure are required in the dynamic adjustment for the purpose of reflecting the evolution of itemsets utility.So a sliding window is needed, which consists of  most recent batches, represented by SW  = {  ,  +1 , . . .,  +−1 }.And a batch   consists of a certain number of continuously transactions in a time period; that is,   = {  ,  +1 , . . .,  +ℎ−1 }.HUIs mining over uncertain data stream based on sliding windows is to mine PHUIs from every new window, which is formed once the oldest batch is removed from the window and the newest batch is inserted into the window.
For example, database in Figure 1 is partial of an uncertain data stream and its profit table, respectively.Assume that each batch includes two transactions and each sliding window includes three batches; there are four batches in the stream: }, and  4 = { 7 ,  8 }, and the first three batches form the first sliding window: SW 1 = { 1 ,  2 ,  3 }.When the fourth batch  4 is filled up with transactions, the second sliding window is formed: What is different in PHUIs mining compared to traditional HUIs mining over data stream is that the items in the transactions have existential probability, which brings a change in the calculation of itemsets utility, transaction utility, and transaction weighted utilization.New definitions about them are presented below.
Definition 5 (see [20] (minimum potential utility value in a window)).Given the minimum utility threshold  SW  , minimum potential utility value in the window SW  , minutil SW  , is defined as For example, in Figure 1, when set Definition 6 (see [20] (PHUIs in a window)).An itemset  is a PHUI in the window SW  , if pu SW  () ≥ minutil SW  .Finding PHUIs in window SW  means finding out all the itemsets  having criteria pu SW  () ≥ minutil SW  .
It can be seen that potential utility of itemsets has no down closure property [4], showing that the potential utility constraint is not monotone and antimonotone.Hence, unlike FIM, the potential utility of an itemset cannot be used to prune the search space.To deal with this problem, itemset's overestimate utility, transaction weighted probability and utility (TWPU), is used in the PHUIs mining process to prune the search space.
Proof.Let  be an itemset that is contained in SW  and  be a superset of itemset .Then, if  is absent,  cannot be presented in any transaction.So according to Definition 8, the TWPU value of  is no larger than twpu SW 2 (), denoted as twpu SW 2 () ≤ twpu SW 2 (), and if twpu SW 2 () is less than minutil SW 2 ,  cannot be a HTWPUI.
Lemma 11.For a window   and a minimum utility threshold    , the set of PHUIs (  ) is a subset of the set of HTWPUIs (  ).
Proof.Let  be a PHUI in SW  .According to Definitions 3 and 8, pu SW  () must be less than or equal to twpu SW  ().So, if  is a PHUI, it must be a HTWPUI in SW  , Furthermore, it can obtain that  is a member of the set  SW , and  SW ⊆  SW .
Since twpu SW  () is an overestimate of pu SW  (), any PHUI will not be missed.But the true potential utility of the generated HTWPUIs may be lower than the minimum utility threshold.So in our algorithm, while finding HTWPUIs in SW  by TWPUS-tree, we calculate PHUIs from them by PUS-list.

Problem Statement.
Given a continuous uncertain data stream, a predefined profit table and a user specified minimum utility threshold  SW  , the problem of HUIs mining over uncertain data stream is to find the PHUIs whose potential utilities are no lower than minutil SW  .

The Proposed Algorithm for Mining HUIs over Uncertain Data Stream
In this section, we first develop our proposed PUS-list and TWPUS-tree, respectively.Then, algorithm for HUIs mining over uncertain data stream based on sliding windows, PHUIs mining, is designed.Lastly, the proposed algorithm is thoroughly described and analyzed.

Construction of PUS-List and TWPUS-Tree over Uncertain
Data Stream.The construction procedure of our proposed PUS-list and TWPUS-tree that are used to deal with the problem of HUIs mining over uncertain data stream is described.PUS-list is employed to calculate potential utility of candidates that are generated by mining TWPUS-tree without rescanning the analyzed uncertain data stream, which consists of potential utility of the items and transactions in the current window, respectively.Its number of rows is   equal to the number of transactions in the current window, and its number of columns is equal to the number of items in utility table.The items in the TWPUS-tree with its header table are arranged in the lexicographic order.Item-id and TWPU value of the items are maintained in the header table to get HTWPUIs.Item-id and batch-by-batch TWPU information are maintained by each node in the TWPUS-tree to keep the window sliding environment.And adjacent links are also maintained in the tree structure to facilitate the tree traversals.
For the example of uncertain data stream in Figure 1, when  1 arrives, sorting items in the lexicographic order, current window SW 1 is formed, and SW1_ represents the th transaction in SW 1 .Calculate potential utility of each item and transaction in  1 , respectively, when scanning the uncertain data stream, and potential PUS-list in Figure 2(a) can be obtained.Subsequently, as the first transaction  1 has a PTU value of 21.26 and includes three items "," "," and "," "" is inserted into TWPUS-tree by creating a node with a TWPU value of 21.26, and items "" and "" are inserted with TWPU values of 21.26 later.At the same time, their TWPU values are also inserted into the header table in the lexicographic order.After inserting transactions  1 and  2 into TWPUS-tree, Figure 2(b) shows the TWPUS-tree constructed for  1 .
Similarly,  2 and  3 are inserted in PUS-list and TWPUStree, since SW 1 contains the first three batches; Figures 3(a) and 3(b) are the final PUS-list and TWPUS-tree for it, respectively.
When  4 arrives, the information of  4 needs to be inserted into PUS-list and TWPUS-tree, and the information of  1 also needs to be deleted from PUS-list and TWPUStree.In detail, PUS-list needs to remove the transactions of  1 from top of the list, insert transactions of  4 from bottom of the list, and change SW1_ to SW2_ correspondingly.The TWPU counters of the nodes in TWPUS-tree are shifted one position left to remove the TWPU information of  1 , and the TWPU information of  4 is inserted subsequently.Figures 4  and 5 indicate the deleting and inserting process, respectively.In Figure 4(b), as "" contains information for  1 ,  2 , and  3 , its new information is now {: 49.18, 16.98, 0}.On the other hand, since its child "" does not include any information of  2 and  3 , it becomes {: 0, 0, 0} after shift operation and is deleted from the tree.Perform the same operations with the nodes in Figure 4(b).Subsequently,  4 is inserted into the tree, and the result is shown in Figure 5.

Mining Process of PHUIMS.
This section deals with the mining procedure of our proposed PHUIMUS algorithm, which combines pattern growth approach with list based approach.In the proposed algorithm, a prefix tree is created from the bottommost item, where all the branches prefixing that item are taken with their TWPU values.For facilitation, all the TWPU values of each node in the prefix tree are added to one value.Subsequently, conditional tree is established based on the prefix tree, by removing those nodes with low TWPU value for that particular item.Lastly, potential utility of candidates that are generated by mining TWPUS-tree are calculated from current PUS-list, which can avoid rescanning the uncertain data stream.
For the example in Figure 1, mining the recent PHUIs means all the PHUIs in SW 2 must be found.Let  SW 2 = 0.2 and minutil SW 2 = 231.69× 0.2 = 46.338, the prefix tree of item "," which is the bottom item, is shown in Figure 6(a).It demonstrates that items "" and "" cannot form any candidate itemsets with item "" as their TWPU values are lower than minutil SW 2 .Hence, by deleting all the nodes that contain items "" and "" from the prefix tree of "," the conditional tree of item "" is constructed and shown in Figure 6(b    What is more, it is not difficult to find that following properties are satisfied by our proposed algorithm.

Lemma 12. The number of candidate itemsets generated by the PHUIMUS algorithm (𝑁 1 ) is no larger than that of the levelwise based algorithms (𝑁 2 ), denoted as
Proof.When all the subsets of an itemset  are candidate itemsets (HTWPUIs), it becomes a candidate itemset in the existing level-wise based algorithms.Therefore,  may have low TWPU value that cannot be a candidate or does not appear in the current window.In PHUIMUS algorithm, if  Proof.As the proposed PHUIMUS algorithm combines pattern growth approach with list based approach, exact potential utility can be calculated by PUS-list from the global candidate itemsets that are generated by TWPUS-tree directly, without rescanning the analyzed uncertain data stream.

Experimental Results
In this section, four experiments are used to evaluate the performance of our proposed algorithm over uncertain data stream in terms of run-time, memory consumption, number of discovered PHUIs, and scalability.Because it is considered to be the first work HUIs mining over uncertain data stream and MHUI-TID is an outstanding algorithm for mining HUIs from data streams, the performance of the designed PHUIMUS algorithm is only compared with MHUI-TID.And the comparison between PHUIs and HUIs is made to evaluate whether the proposed algorithm is acceptable.The overall algorithms are carried out in Matlab, with experiments in a PC with Intel Core's i5-4590 dual core processor, 4 GB RAM, and 32-bit windows operation system of the Microsoft company.Experimental results and discussions are followed.

Datasets.
Experiments were performed on synthetic database T10I4D100K and real-life databases mushroom, connect, and accidents, which are widely used in the issue of HUIs mining.Parameters and characteristics of these databases are, respectively, shown in Tables 1 and 2.
As these databases do not provide external utility, internal utility, and existential probability of each item, a simulation model [7] is employed.The model generates random numbers that obey log-normal distribution in the [1,5] interval and [1,1000] interval, which correspond to internal and external utility, respectively.In addition, due to uncertainty property of items in each transaction, their existential probability obeys uniform distribution in the [0.5, 1] interval.What is more, as the study object of our proposed algorithm is uncertain data stream, we divide these databases into some windows containing a fixed number of batches, and the Batchsize () and Winsize () of each database are also shown in Table 2.
Input: A transaction uncertain data stream, utility

Run-Time.
Set  SW  is equal in all the windows, abbreviated as , run-time of the proposed algorithm is compared with that of MHUI-TID on the above four databases for various , and the result is shown in Figure 7. Notice that the databases processed by MHUI-TID do not contain probability values, so only precise versions of the four databases are used by MHUI-TID.It is indicated in Figure 7 that the algorithm is superior to MHUI-TID.This result is reasonable since the proposed algorithm discovers PHUIs from the TWPUS-tree and PUSlist directly, which can effectively avoid consuming time on database scans.What is more, this also indicates that the combination of pattern growth approach and list based approach has a good performance on dealing with the problem of HUIs mining over uncertain data stream.

Number of Discovered PHUIs.
In an uncertain data stream, when the existential probability of items is set to 1, it degraded for an accurate data stream.For this case, PHUIs mining algorithms also get the whole set of HUIs, which is the same as the result of traditional HUIs mining algorithms.As no algorithm had been developed for discovering HUIs over uncertain data stream previously, the result of the designed algorithm is compared to that of MHUI-TID algorithm by ignoring probability values of uncertain data stream.This comparison is made between PHUIs and HUIs, which is employed to evaluate whether the proposed algorithm can be accepted.The number of discovered HUIs and PHUIs under various  is shown in Figure 8.
From Figure 8, for various  on four databases, it is presented that the number of PHUIs is usually no larger compared with that of HUIs.Besides, both the numbers of HUIs and PHUIs are inversely proportional to .This is because that the proposed algorithm considers both the probability and the utility, and MHUI-TID only considers the utility.This result also shows that few PHUIs are produced from numerous discovered HUIs when considering the probability constraint.So, in real-life applications, lots of HUIs may not be the itemsets needed by users for making efficient decisions, especially when the  is set high.Therefore, PHUIs are more valuable and fewer compared to HUIs as PHUIs have distinct probability values.

Memory Consumption.
The peak memory consumption of the proposed algorithm and MHUI-TID algorithm is compared, and the results under various  are shown in Figure 9.
From Figure 9, in various  for the four databases, the proposed algorithm has a slightly good performance on memory consumption compared with MHUI-TID algorithm.This result is reasonable since the proposed PHUIMUS algorithm discovers PHUIs by taking both the probability and utility constraints into consideration through the designed TWPUS-tree and PUS-list, so more efficient pruning strategies can be applied in them to improve its performance.As a result, the memory consumption of the PHUIMUS algorithm is somehow a little better than MHUI-TID algorithm.

Efficiency of PHUIMUS with Window Size Variation.
In terms of run-time and memory consuming, stream data mining algorithms based on sliding windows are greatly influenced by window sizes.As usual, window size depends on the number of transactions and batches in a window.Therefore, given a certain , by varying both of these two parameters, it compares the run-time of the algorithm and that of the existing MHUI-TID algorithm, as shown in Figure 10.
When the window size changes, it is presented in Figure 10 that our algorithm is better than the existing one.Particularly when the window size is bigger or the number of distinct items increases, the efficiency of the proposed algorithm is more prominent.
What is more, Figure 11 shows memory consumption of the proposed algorithm and MHUI-TID algorithm under various sizes of windows.From Figure 11, it is presented that our proposed algorithm exceeds the existing MHUI-TID algorithm in terms of memory consuming under different window sizes.The main reason for this result is that our proposed TWPUS-tree can represent all the useful information in a compressed form.More importantly, our algorithm can be effective without rescanning the analyzed uncertain data stream with PUS-list to discover PHUIs.

Conclusions
In the precise data stream, several algorithms have been proposed to mine HUIs, such as THUI-Mine, MHUI-BT, and MHUI-TID.Moreover, some extended areas of HUIs mining, such as maximal high utility itemsets mining, interactive mining, and top-k high utility itemsets mining, have been studied recently.
In the uncertain databases, itemsets with high utility and high existential probability are useful to users, not itemsets with only one of them.To the best of our knowledge, Lin et al. proposed PHUI-UP based on two-phase model and PHUI-List based on list structure, Lan al. proposed UHUIapriori based on Apriori, and these are only algorithms that used to solve HUIs mining problem over uncertain databases.
So according to the above researches, this paper provides an efficient method for HUIs mining over uncertain data stream.New definitions of items utility, itemsets utility, transaction utility, and transaction weighted utility are given.A novel tree structure, TWPUS-tree, list structure, PUS-list, and a new algorithm PHUIMUS are proposed.In detail, TWPUS-tree can maintain a fixed sort order and batch-bybatch information, which is easy to construct and maintain with a sliding window.PUS-list can get exact potential utility of candidate itemsets generated by TWPUS-tree without rescanning the analyzed uncertain data stream.By using TWPUS-tree and PUS-list, PHUIMUS algorithm can capture the recent change of information in an uncertain data stream adaptively.Experiments results show that our algorithm outperforms the existing algorithm in run-time, number of discovered PHUIs, memory usage, and scalability.
To the best of my knowledge, this is the first algorithm about finding HUIs over uncertain data stream.By combining a pattern growth approach with a list based approach, the proposed algorithm can significantly reduce the number of candidate itemsets as well as the overall run-time.What is more, by keeping the recent information very efficiently in the TWPUS-tree and PUS-list, the algorithm also saves a lot of memory space.More works can be done in improving efficiency of discovering HUIs over uncertain data stream in the near future.

Figure 1 :
Figure 1: Partial of an uncertain data stream with its profit table.
PUS-list of  1

Figure 7 :
Figure 7: Run-time of the compared algorithms for various .

Figure 8 :
Figure 8: Number of HUIs and PHUIs under various .

Figure 10 :
Figure 10: Effect of window size variation on run-time.

Figure 11 :
Figure 11: Effect of window size variation on memory consumption.
). Candidate itemsets {, , , : 82.16}, {, , : 82.16}, {, , : 82.16}, {, , : 82.16}, {, : 82.16}, {, : 82.16}, {, : 82.16}, and {: 82.16} are generated here.Judge whether TWPU value of the other items is less than minutil SW 2 .If yes, any super itemsets of it/them cannot be candidate itemsets as well as PHUIs according to the downward closure property, so prefix/conditional tree for it/them need not be created.If no, generate candidate itemsets in the same way as item "" for the items.All the candidate itemsets are added to global candidate list, and it is reset to NULL when the sliding window changed.Exact potential utility in SW 2 for candidate itemsets is calculated from PUSlist in Figure5(b) directly.For example, for the candidate 1 .Figure 4: Construction of PUS-list and TWPUS-tree for deleting  1 .itemset{,,:82.16}, it exists in SW2_3 and SW2_6, so we can calculate that pu SW 2 (, ) = (pu(,  3 ) + pu(,  6 )) + (pu(,  3 ) + pu(,  6 )) + (pu(,  3 ) + pu(,  6 )) = (11.16+16.56)+(8.7 + 24.9) + (3.78 + 3.78) = 68.88 from the PUSlist in Figure5(b) without rescanning the analyzed uncertain data stream.And as pu  2 (, , ) > minutil SW 2 , {, , } is a PHUI.Perform the same calculation process for the other candidate itemsets in the global candidate list; then PHUIs for the current window SW 2 can be obtained.4.3.Algorithm Description and Analysis.In this section, PHUIMUS algorithm is described and analyzed.At first, the description of PHUIMUS algorithm is shown in Algorithm 1.In Algorithm 1, Steps 1 to 3 are used to initialize a global header table H that is used to keep all the items in the lexicographic order, TWPUS-tree that is initialized as NULL, and a global PUS-list that can keep potential utility of items and transactions are created, respectively.Step 3 to Step 6 are TWPUS-tree's construction process.When a new batch   arrives, Step 4 sorts items of transaction   in batch   and Step 5 calculates potential utility of items and transactions and inserts them into PUS-list; Step 6 updates header table H.If current batch number  is no more than ,   must be included in the first sliding window SW 1 , Step 7 only changes the th position of TWPU counter arrays, whose size is , no matter whether the items exist in the TWPUS-tree before.On the contrary, when current batch number  is larger than , Step 8 first performs one time left shift operation for all the TWPU counter arrays to remove the oldest batch.Then, remove the transactions of  −+1 and insert transactions of   in the PUS-list.Subsequently, it updates header table H and deletes the nodes that all the values in their corresponding TWPU counter arrays are zero.Lastly, Step 8 changes the rightest position of TWPU counter arrays and keeps the size of the arrays as , no matter whether the items exist in the TWPUS-tree before.Step9 to Step 15 are the mining process of PHUIs from the current window SW  .From each bottom item  of H, prefix tree PT  with its header table HT  is created.According to user specified  SW  , the items that TWPU values are less than minutil SW  are deleted from PT  and HT  , and conditional tree CT  and its header table HC  are created.Subsequently, mine all the candidate itemsets CT  _list from CT  and add {CT  _list, } in the global candidate list.Furthermore, calculate PHUIs from the global candidate list by PUS-list.Finally, delete current bottom item  of H, and when it becomes NULL, jump out current loop.

Table 1 :
Parameters of used databases. is not a HTWPUI, it is pruned.So the candidate set of PHUIMUS contains only the true HTWPUIs, and  1 cannot be larger than  2 .
table,  SW  for current window SW  , number of batches in a window (), number of transactions in a batch ().Output: PHUIs for the current window SW  Step 1. Create a global table H to keep the items in the lexicographic order; Step 2. Create the root of TWPUS-tree and initialize it as NULL; Step 3. Create a global PUS-list to keep potential utility of items and transactions, whose number of row is  × , and number of column is equal to the number of items in utility table; while a new batch   arrives do for transaction   in batch   do Step 4. Sort the items of   in the lexicographic order; Step 5. Calculate potential utility of items and transactions, inserting them into PUS-list; Step 6. Update twpu value in the header table H; Step 7. If  ≤  then If the item  in   isn't exist in the TWPUS-tree then Create a new node for it, which consists by the item's name and twpu counter array (0  1 , . . ., twpu   (), . . ., 0   ); else Insert twpu value of  to its twpu counter array's th position, denoted as (twpu  1 (), . . ., twpu   (), . . ., 0   ); Create Prefix tree PT  with its header table HT  for item ; Step 11.For each item  of HT  do If twpu SW  () < minutil SW  then Delete  from PT  and HT  to create conditional tree CT  and its header table HC  ; Create all the candidate itemsets from CT  , represented by CT  _list; Step 13.Add {CT  _list, } to the global candidate list; else Perform one time left shift operation for all the twpu counter arrays; Remove the transactions of  −+1 and insert transactions of   in the PUS-list; Update them in the header table H; Delete the nodes that all the values in their corresponding twpu counter array are zero; If the item in   isn't exist in the TWPUS-tree then Create a new node for it, which consists by the item's name and twpu counter array (0  −+1 , . . ., 0, . . ., twpu   ); else Insert twpu value of  to its twpu counter array's rightest position, denoted as (twpu  −+1 (), . . ., twpu   ());

Table 2 :
Characteristics of the databases.