Evaluation of Stemming and Stop Word Techniques on Text Classification Problem

Dharmendra Sharma and Suresh Jain

Full Paper View Go Back

Evaluation of Stemming and Stop Word Techniques on Text Classification Problem

Dharmendra Sharma¹ , Suresh Jain²

Section:Research Paper, Product Type: Isroset-Conference
Vol.3 , Issue.2 , pp.1-4, Mar-2015

Online published on Jun 22, 2015

Copyright © Dharmendra Sharma , Suresh Jain . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at Google Scholar | DPI Digital Library

XML View PDF Download

How to Cite this Paper

IEEE Citation
MLA Citation
APA Citation
BibTex Citation
RIS Citation

IEEE Style Citation: Dharmendra Sharma , Suresh Jain, “Evaluation of Stemming and Stop Word Techniques on Text Classification Problem,” International Journal of Scientific Research in Computer Science and Engineering, Vol.3, Issue.2, pp.1-4, 2015.

MLA Style Citation: Dharmendra Sharma , Suresh Jain "Evaluation of Stemming and Stop Word Techniques on Text Classification Problem." International Journal of Scientific Research in Computer Science and Engineering 3.2 (2015): 1-4.

APA Style Citation: Dharmendra Sharma , Suresh Jain, (2015). Evaluation of Stemming and Stop Word Techniques on Text Classification Problem. International Journal of Scientific Research in Computer Science and Engineering, 3(2), 1-4.

BibTex Style Citation:
@article{Sharma_2015,
author = {Dharmendra Sharma , Suresh Jain},
title = {Evaluation of Stemming and Stop Word Techniques on Text Classification Problem},
journal = {International Journal of Scientific Research in Computer Science and Engineering},
issue_date = {3 2015},
volume = {3},
Issue = {2},
month = {3},
year = {2015},
issn = {2347-2693},
pages = {1-4},
url = {https://www.isroset.org/journal/IJSRCSE/full_paper_view.php?paper_id=188},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.isroset.org/journal/IJSRCSE/full_paper_view.php?paper_id=188
TI - Evaluation of Stemming and Stop Word Techniques on Text Classification Problem
T2 - International Journal of Scientific Research in Computer Science and Engineering
AU - Dharmendra Sharma , Suresh Jain
PY - 2015
DA - 2015/06/22
PB - IJCSE, Indore, INDIA
SP - 1-4
IS - 2
VL - 3
SN - 2347-2693
ER -

2477 Views

2323 Downloads

2191 Downloads

Bar Line

Abstract :
Now-a-days a huge amount of information is available over the internet in electronic format. This large amount of data can be analyzed to maximize the benefits, for intelligent decision making. Text categorization is an important and extensively studied problem in machine learning. The basic phases in text categorization include preprocessing features, extracting relevant features against the features in a database, and finally categorizing a set of documents into predefined categories. Most of the researches in text categorization are focusing more on the development of algorithms for optimization of preprocessing technique for text categorization. In this paper we are summarizing the impact of stop word and stemming onto feature selection.

Key-Words / Index Term :
Machine Learning, Stemming, Feature Selection

References :
[1] K.Aas and A.Eikvil, “Text categorization: A survey", Technical report, Norwegian Computing Center, June, 1999.
[2] Katharina, M. and Martin, S. (2004) the Mining Mart Approach to Knowledge Discovery in Databases, Ning Zhong and Jiming Liu(editors), Intelligent Technologies for Information Analysis, Springer, Pp. 47-65.
[3] T. G. Kolda, D. P. O'Leary, "A semidiscrete matrix decomposition for latent semantic indexing information retrieval", Journal ACM Transactions on Information Systems (TOIS) TOIS Homepage archive vol.16(4), pp. 322-346, Oct. 1998.
[4] G.Salton, C. Buckley, "Term weighting approaches in automatic text retrieval," Inf. Process. Manage. 24, pp. 513–523, 1988.
[5] D. Harman, "Ranking algorithms. In Information Retrieval: Data Structures and Algorithms," W. B. Frakes and R. Baeza-Yates, Eds. Prentice Hall, Englewood Cliffs, NJ, pp.363–392, 1992.
[6] Xue, X. and Zhou, Z. (2009) Distributional Features for Text Categorization, IEEE Transactions on Knowledge and Data Engineering,Vol. 21, No. 3, Pp. 428-442.

[7] Porter, M. (1980) An algorithm for suffix stripping, Program, Vol. 14, No. 3, Pp. 130–137.
[8] Karbasi, S. and Boughanem, M. (2006) Document length normalization using effective level of term frequency in large collections, Advances in Information Retrieval, Lecture Notes in Computer Science, Springer Berlin / Heidelberg, Vol. 3936/2006, Pp.72-83.
[9] Diao, Q. and Diao, H. (2000) Three Term Weighting and Classification Algorithms in Text Automatic Classification, The Fourth International Conference on High-Performance Computing in theAsia-Pacific Region,Vol. 2, P.629.
[10] Chisholm, E. and Kolda, T.F. (1998) New term weighting formulas for the vector space method in information retrieval, TechnicalReport, Oak Ridge National Laboratory.
[11]Sharma Dharmendra, jain suresh, “Content sharing in information storage and retrieval system using tree representation of documents”,IEEE ,International conference on IT industry, business and government,CSIBIG2014 page 1-4,2014

Authorization Required

Close(X)

You do not have rights to view the full text article.
Please contact administration for subscription to Journal or individual article.
Mail us at support@isroset.org or view contact page for more details.

Full Paper View Go Back

Main Menu

Journals Contents

Information

Download

Publication Certificate

Contact Us

Use full Link