FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce
Data mining is a process of discovering the pattern from the huge amount of data. There are many data mining technics like clustering, classification and association rule. The most popular one is the association rule that is divided into two parts
i) generating the frequent itemset
ii) generating association rule from all itemsets.
Frequent itemset mining (FIM) is the core problem in the association rule mining. Sequential FIM algorithm suffers from performance deterioration when it operated on a huge amount of data on a single machine.to address this problem parallel FIM algorithms were proposed. There are two types of algorithms that can be used for mining the frequent itemsets first method is the candidate-itemset generation approach and without candidate itemset generation algorithm. The example for candidate itemset generation approach is the Apriori algorithm and for, without candidate itemsets generation is the FPgrowth algorithm. The important data-mining problem is discovering the association rule between the frequent itemset.in order to find the best method for mining in parallel, we explore a spectrum for the trade-off between computation, synchronization, communication, memory usage. Count distribution, data distribution, candidate distribution are three algorithms for discovering the association rule
between frequent itemsets. Minimizing communication is the focus of the count distribution algorithm.it will thus even at the expense of winding up redundant duplication computation in parallel. The data distribution effectively utilizes the main
memory of the system.it is the communication-happy algorithm. Here nodes to all other nodes broadcast the local data. The candidate distribution algorithm for both, to segment the database upon the different transaction support and the patterns, exploits the linguistics of a particular problem. Load balancing is also incorporated by this algorithm.
Overall Proposed Work
Research Paper Link: Download Paper