A Comparative Survey on Different Text Categorization Techniques

IJCSEC Front Page

Abstract:
To classify billions of documents manually is an expensive task and it is very time lagging. In internet huge amount of data are in the uncategorized form. Text Categorization is a task of automatically sorting a set of documents in to different classes from predefined set. It is mostly depends up on information retrieval and Machine Learning techniques. Text Classification (also called Text Categorization) performed on the basis of endogenous collection of data. Classifier algorithms should be used to classify various meaning of sentences. The major difficulty of this categorization approach is high dimensionality of feature space. Documents are classified on the basis of Supervised, Unsupervised or Semi- Supervised Learning. A key element is linking together of extracted data’s together to form new hypothesis or facts to be analyze. This paper surveys of various methods such as Decision Tree, K –Nearest Neighbor, Bayesian Approaches, Support Vector Machine and Neural Network. This paper surveys of Text Categorization classifiers and Comparison. It also aims to various available classifiers on the basis of some criteria like complexity and performance.

Keywords: Decision Tree, K –Nearest Neighbor, Bayesian Approaches, Support Vector Machine, Neural Network, Text Categorization, Supervised Learning and Semi-Supervised Learning. Information retrieval and Machine Learning Techniques.

References:

  1. Bhavitha Varma E, Senthil Kumar B, “A Survey on Text Categorization”, International Journal of Advanced Research in Computer and Communications Engineering, August 2016.
  2. Niharika S, Sneha V Latha, Dr.Lavanya, “A Survey on Text Categorization”, International Journal of Computer Trends and Technology, Vol 3, Issue 1, 2012.
  3. Jordan pascual, Pinki Kumari,Vinay Kumar,Viswanath Bijalwan,”KNN Based Machine learning Approach for Text Document Mining”, International Journal of Database Theory and Application, Vol 7,2014.
  4. Reena Rani, Shaijali Gupta, “Improvement in KNN Classifier (imp- KNN) for Text Categorization.”,International Journal of Advanced Research in Computer Science and Software Engineering, Vol 6, Issue 6, June 2016.
  5. Deipali Gore, Monica Bali, “A Survey on Text Categorization with Different Types of Classification Methods”,International Journal of Innovative Research in Computer”,Vol 3, Issue 5,May 2015.
  6. Menaka S, Radha N, “Text Classification using Keyword Extraction Technique”, International Journal of Advanced Research in Computer Science and Software Engineering”, Vol 3,Issue 12,December 2013.
  7. Bing Wang, Bowei, Lewei Wei,”Text Classification using Support Vector Machine with Mixture of Kernal”, A Journal of Software Engineering and Applications,2012.
  8. Mr Gaurav Sharma, Mr Rahul Patel, “ A Survey on Text Mining Techniques”, International Journal of Engineering and Computer Science ,Vol 3, Issue 5,May 2014.
  9. Krishnendhu Ghosh, Zahid Hasan, “A Decision Tree based Text Categorization for News Bulletin Data”, Proc of Int.Conf on Emerging Trends in Engineering and Technology,2013.
  10. Goetz T ,Johnson D E, Oles F J, Zhang T .” A Decision Tre Based Symbolic Rule Induction System for Text Categorization”, IBM Systems Journal, 2002.