Association rules mining is an important technology in data mining. FPGrowth (frequentpattern growth) algorithm is a classical algorithm in association rules mining. But the FPGrowth algorithm in mining needs two times to scan database, which reduces the efficiency of algorithm. Through the study of association rules mining and FPGrowth algorithm, we worked out improved algorithms of FPGrowth algorithm—PaintingGrowth algorithm and N (not) PaintingGrowth algorithm (removes the painting steps, and uses another way to achieve). We compared two kinds of improved algorithms with FPGrowth algorithm. Experimental results show that PaintingGrowth algorithm is more than 1050 and N PaintingGrowth algorithm is less than 10000 in data volume; the performance of the two kinds of improved algorithms is better than that of FPGrowth algorithm.
Data mining is a process to obtain potentially useful, previously unknown, and ultimately understandable knowledge from the data [
FPGrowth (frequentpattern growth) algorithm is an improved algorithm of the Apriori algorithm put forward by Jiawei Han and so forth [
In this paper, we worked out two kinds of improved algorithms—N PaintingGrowth algorithm and PaintingGrowth algorithm. N PaintingGrowth algorithm builds twoitem permutation sets to find association sets of all frequent items and then digs up all the frequent item sets according to the association sets. PaintingGrowth algorithm builds an association picture based on the twoitem permutation sets to find association sets of all frequent items and then digs up all the frequent item sets according to the association sets. Both of the improved algorithms scanning the database only once, improving the overhead of scanning database twice in traditional FPGrowth algorithm, and completing the mining only according to twoitem permutation sets, thus, have the advantages of running faster, taking up small space in memory, having low complexity, and being easy to maintain. It is obvious that improved algorithms provide a reference for next association rules mining research.
Set
For a given minimum support, minsup, if the item set meets support
When the length of the item set
FPGrowth algorithm [
Transaction database.
TID  Items 

001  A, B, C, D, E 
002  B, C, E 
003  C, E, D 
004  A, C, D 
Scanning the database for the first time, we can obtain a set of frequent items and their support count. The collection of frequent items is ordered by decreasing sequence of support count. The result set or list writes for
For convenience of tree traversal, the algorithm creates an item header table. Each item through a node link points to itself in FPtree. After scanning all transactions, we get the FPtree displayed in Figure
Generating FPtree.
Dig FPtree through creating conditional subpattern base.
Item  Conditional pattern base  Conditional FPtree  Frequent pattern 

B  {(C D E A:1), (C E:1)} 

C B:2, E B:2, C E B:2 
A  {(C D E:1), (C D:1)} 

C A:2, D A:2, C D A:2 
E  {(C D:2), (C:1)} 

C E:2, D E:2, C D E:2 
D  {(C:2)} 

C D:2 
Algorithms of frequent patterns mining have been applied in many fields. Researching their system model can facilitate a better understanding of them. Figure
Association rules mining system model.
The user can get needed knowledge which passes data mining through the data mining platform. Data mining platform includes data definition, mining designer, and pattern filter. Through the data definition, we can do a pretreatment for data and make incomplete data usable; through the mining designer, we can use the improved algorithms to dig data and get useful patterns (here are frequent item sets); through the pattern filter, we can select interesting patterns from obtained patterns.
FPGrowth algorithm requires scanning database twice. Its algorithm efficiency is not high. This paper puts forward two improved algorithms—PaintingGrowth algorithm and N PaintingGrowth algorithm—which use twoitem permutation sets to dig. Both algorithms scan database only once to obtain the results of mining.
Taking the transaction database in Table
(1) The algorithm scans the database once, obtains twoitem permutation sets of all transactions, and paints peak set (the peak set is a set of all different items in transaction database). Here we take the first transaction as an example.
The first transaction is
Twoitem permutation sets after scanning the first transaction are
Other transactions are similar to the first transaction. The peak set after scanning database is
(2) After obtaining the peak set and twoitem permutation sets of all transactions, the algorithm paints the association picture according to twoitem permutation sets and peak set. It links the two items appearing in each twoitem permutation. When the permutation appears again, the link count increases by 1. The association picture is shown in Figure
The association picture.
(3) According to the association picture, algorithm exploits the support count to remove unfrequented associations. We can get the frequent item association sets as follows:
Here we take the item A as an example.
(4) According to the frequent item association sets, we can get all twoitem frequent sets of this transaction database:
(5) According to the frequent item association sets
And according to the frequent item association sets
Similarly, according to the frequent item association sets
(6) At this point, we get all frequent item sets.
The algorithm pseudocode is as follows.
HashMap
List
List
paint(Graphics g) //painting method
String
String z, y;
HashMap
For (int i=0; i<list. size(); i++)
{
s = list.get(i).split(“,”); //let list.get(i) to a String
drawLine(
HashMap
}
Iterator it = hm.keySet
z = it.next
Iterator it0 = hm.get(z). keySet
y = it0.next
if(hm.get(z).get(y)<minsup*N) //if the value in value sets of hm less than minimum support count
{it0.remove
List
for(int j=0; j<list0.size
{
x = list0.get(j).split(“,”);
if(count(hm.contain(z+“,”+list0.get(j))==1+x.length)) //if the count of item sets in hm equal with the length of the item sets(first consider the key of hm in the item sets or not)
{hm0.put(z+“,”+list0.get(j),value)};//save the item sets and its support count in hm0
}
return hm0;//gain all frequent item sets
super.paintComponents(g); //execute painting method
The thought of N PaintingGrowth algorithm is similar to the PaintingGrowth algorithm, but with different implementation method. N PaintingGrowth algorithm removes the painting steps. The mining process of N PaintingGrowth is as follows.
The algorithm scans the database once and gets twoitem permutation sets of all transactions.
Then, the algorithm counts each permutation in twoitem permutation sets getting all item association sets.
Later, the algorithm removes infrequent associations according to the support count and gets frequent item association sets.
Finally, it gets all frequent item sets according to the frequent item association sets. Mining ends.
From the above processes it can be seen that the N PaintingGrowth algorithm is the removing of painting steps version of PaintingGrowth. The implementation methods are different: PaintingGrowth algorithm imports java.awt and javax.swing, implementing mining through calling super.paintComponents(g); N PaintingGrowth algorithm only passes instantiation of a class in main function to implement.
To improved algorithms—PaintingGrowth and N PaintingGrowth algorithm—the biggest advantage is reducing database scanning to once. Comparing with scanning database twice of FPGrowth algorithm, it has improved time efficiency.
Another advantage is that improved algorithms are simple, completing all mining only needing transactions’ twoitem permutation sets. Although the FPGrowth algorithm is also getting FPtree to complete mining, the FPtree builds complexly and requires memory overhead largely. Relatively, the twoitem permutation sets can be obtained easily.
Of course, improved algorithms have disadvantages. In PaintingGrowth algorithm, the algorithm needs to build the association picture, leading to a large memory overhead. In N PaintingGrowth algorithm, the implementation method is less vivid than PaintingGrowth algorithm. When using the two improved algorithms to dig multiitem frequent sets, they scan the frequent item association sets repeatedly for count. This reduces the time efficiency.
In order to verify the two kinds of improved algorithms relative to the FPGrowth algorithm existing superiority, we use the Java language, in eclipse development environment, Windows 7 64bit operating system, implementing the PaintingGrowth algorithm, N PaintingGrowth algorithm, and FPGrowth algorithm. The data in experiments come from Data Tang—research sharing platform. Transactions in database, respectively, are 1050, 5250, 10500, 21000, 31500, 42000, and 52500.
In experiments, three kinds of algorithms accept the same original data input and support parameter. The algorithms run 20 times in each bout, calculating the mean as a result.
Figure
Threealgorithm transactionsexecution time comparison.
On the other hand, from 1050 transactions, the execution time of PaintingGrowth algorithm is a little bit more than FPGrowth algorithm. But with the increase in number of transactions, the execution time is less than the FPGrowth algorithm significantly. Thus it can be seen, from the transactionsexecution time comparing, that PaintingGrowth algorithm is more stable and efficient than FPGrowth algorithm.
Another, the implementation method of PaintingGrowth algorithm and N PaintingGrowth is different. The performance is also different. Although N PaintingGrowth algorithm omits the painting steps, only around 1050 transactions to 10500 transactions, the execution time of N PaintingGrowth algorithm is a little less than PaintingGrowth algorithm. Then, with the increase of transaction amount, the performance of PaintingGrowth algorithm is far better than N PaintingGrowth algorithm. This shows that the implementation method of N PaintingGrowth has large memory consumption which leading the execution time of N PaintingGrowth grows faster.
Figure
The increase rate of three algorithms in different transaction stages.
From Figure
Secondly, to N PaintingGrowth algorithm at the first three stages, the execution time’s increase rate of N PaintingGrowth algorithm is lower than FPGrowth algorithm, performing well. But later, the increase rate of N PaintingGrowth algorithm is almost higher than FPGrowth algorithm and PaintingGrowth algorithm. It also explains why the execution time of N PaintingGrowth is rising rapidly.
Finally, to FPGrowth algorithm, although the whole change trend of increase rate is similar to improved algorithms, it has more clear change than improved algorithms in stage 2 and stage 5. So, the FPGrowth algorithm is less stable than improved algorithms.
From what is above it can be concluded that our PaintingGrowth algorithm has an obvious breakthrough in data analysis. Unhesitatingly, when the data size is suitable, we can consider adopting improved algorithms to achieve further performance. Carefully, the transactions are less than 10000 and we can consider N PaintingGrowth algorithm. In other cases, the PaintingGrowth algorithm performs better and we can consider adopting it.
In this paper, we put forward improved algorithms—PaintingGrowth algorithm and N PaintingGrowth algorithm. Both algorithms get all frequent item sets only through the twoitem permutation sets of transactions, being simple in principle and easy to implement and only scanning the database once. So, at appropriate transactions, we can consider using the improved algorithms. But we also see the problems of improved algorithm: in large data, the performance of the N PaintingGrowth is disappointing. Considering how to make the performance of the improved algorithms more stable, make the removal of unfrequented item associations efficient, and make the mining of multiitem frequent sets quick will be our future work.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work is supported by the Fundamental Research Funds for the Central Universities (XDJK2009C027) and Science & Technology Project (2013001287).