SLGP Header

A Fast Clustering Feature Based on Subselection Algorithm In Big Data using Fidoop

IJCSEC Front Page

Abstract
Now a days large amount of data’s getting seeked through Internet of Things. Big Data is a promising and emerging technology for all medical and industrial applications such as business intelligence and marketing. In this paper, The main task is to identify essential features from the subset selection. In traditional method, we does not achieve some mechanism namely parallelization, load balancing, data distribution and fault tolerance. To avoid this problem by using Frequent Itemset Mining algorithm can be used in hadoop. It is a traditional data mining, it is the well techniques to extract knowledge from the hadoop cluster. By using this techniques. we can eliminate both redundant and irrelevant data from the subset selection. Our Proposed solution is, Fidoop on hadoop cluster by using map reduce programming model. Here mappers independently decompose the itemsets. Finally, we achieved all existing problems, improving energy efficiency on Hadoop cluster. To achieve compressed storage and also avoid conditional pattern bases.
Keywords:Frequent Itemset Mining, Feature subset selection, Hadoop cluster, Map Reduce

References:

  1. J. Han, M. Kamber, Data Mining: Concepts and Techniques, second ed., Morgan Kaufman, San Francisco, (2006).
  2. Agrawal, R. and Shafer, J. C. “Parallel mining of association rules,” IEEE Transactions on Knowledge and Data Engineering, 8(6), 962-969, (1996)
  3. B. Vo, T. Hong, B. Le, DBV-Miner: A Dynamic Bit-Vector approach for fast mining frequent closed itemsets, Expert Systems with Applications, 39(8), 7196-7206,(2006)
  4. R.Agrawal, T.Imielinski, and A. Swami, "Mining association rules between sets of items in large databases,"ACMSIGMODRec.,vol.22,no.2,pp.207-216,(1993).
  5. L. Feng, L. Wang, B. Jin, UT-Tree: Efficient mining of high utility itemsets from data streams,Intelligent Data Analysis, 17(4), 585-602,(2013)
  6. C. F. Ahmed et al., Efficient tree structures for high utility pattern mining in incremental databases,IEEE Transactions on Knowledge and Data Engineering, 21(12), 1708-1721,(2009).
  7. J. Pei et al., H-Mine: Fast and space-preserving frequent pattern mining in a large databases, IIETransactions (Institute of Industrial Engineers), 39(6), 593-605,(2007).
  8. D. Burdick et al., MAFIA: A maximal frequent itemset algorithm, IEEE Transactions on Knowledge and Data Engineering, 17(11), 1490-1504.(2005)
  9. J. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation, ACM SIGMOD International Conference on Management of Data, Dallas, TX, United States, (2000).
  10. V.Priyadharshini, MJ Jose, MS Anand, A.Kumersan, NM Kumar,"Hybrid Image Segmentation Using Edge Detection With Fuzzy Thresholding For Hand Gesture Image Recognition",IJIR (2013).
  11. D. I. Lin, Z. M. Kedem, Pincer search: A new algorithm for discovering the maximum frequentset, Advances in Database Technology, 1377,103-119,(1998).
  12. Lydia Boudjeloud and Fran¸cois Poulet, “Attribute Selection for High Dimensional Data Clustering,” (2007).
  13. I. Kononenko, “Estimating Attributes: Analysis and Extensions of RELIEF,” Proc. European Conf. Machine Learning, pp. 171-182,(1994).
  14. Apache Hadoop. http://hadoop.apache.org/
  15. K. Shvachko, HairongKuang, Sanjay Radia, Robert Chansler, “The Hadoop Distributed File System”,In Proceedings of the IEEE 26thSymposium on Mass Storage Systems and Technologies, (2010)