№1, 2019

AUTO STEMMING OF AZERBAIJANI LANGUAGE

Morteza B. Hasan Alizadeh, Seyyed Amin H. Seyyedi

One of important features in natural language processing is to find the root of a word. Stemming means to remove prefixes, suffixes, and infixes for finding the root of the word. Its aims are about to information retrieval, exploring text, machine for translation, and word look up based on its root. Stemming increases document retrieval by 10-50% in most of international languages, it also compresses the size of web-based table indexes documents up to 50%. In this paper, by analyzing stemming approaches, using structural methods, and deterministic finite automaton machine, applying 274 existing prefixes in language (linkage), a stemming system for Azerbaijani language is generated. Experimental result demonstrates that the proposed algorithm performs more than 97% accuracy (pp.59-66).

Keywords: Natural Language Processing, Stemming, information retrieval, machine translation, Azerbaijani Language.
DOI : 10.25045/jpit.v10.i1.06
References
  • Ehsan n, Fili h.Analysing Effects of Stemming in Information Retrieval in Persian Language, Pardis Electronic and IT College Tehran Scientific College, 2011(in persian).
  • Taghi Zadeh h, Sadrodini MH, Deyanati MH, Rasekh AH.A Persian Stemming based on Structure by using methodical Phrases / The Eleventh Conference of Iran’s intelligent Systems, Kharazmi University, February 29–30, 2012 (in persian).  
  • Momeni p, Marjab f and Marjab s, Challenges in Persian Texts Stemming in information retrieval Systems, The third Conference of Progressing Scientifical Engineering,-Tankarisheh Ayandegan institution of High Learning, 2016 (in persian).
  • Arhaft s, Shadgar b and ashakori m, Proposing a ner Persian Stemming Based on paradigmatical diversity / International Conference Research on, Science and Technology Engineering, Istanbul institute of Idiological Manager of Vira Capital, http://www.civilica.com/Paper-RSTCONF01-RSTCONF01_454.html, 2015 (in persian).
  • GhasemSani Gh, and Hesami, R. 2006. A Stemming Algorithm for Farsi Language / In Proceeding of 11 International CSI Computer Conference (CSICC’2006),.
  • Naebi M.Azerbaijani Turkish Grammar Learning, 2008 , p.80 (in persian).
  • Nojavan Agdaragh b, Rezaey m, Feyzi Derakhshi MR, auto Stemming of Persian Words by using useful combination of word structure rules and date bases.the Eighth International Conference of Persian literature fomentation. Iran, ZanJan Persian language fomentation assemblage, Iran. http://civilica.com/Paper-ISPL08-ISPL08_345.html, 2014 (in persian).
  • Farzaneh MA. nittygritty of Azerbaijani Turkish Language Grammar. Kaveyan,1987, p.61 (in persian).
  • Hadi A.Turkish is Art. Iran,Tabriz: Ahrar,1995, p.291(in persian).
  • Frakes W.B. Stemming algorithms, in Information Retrieval Data Structures and Algorithms, W. B. Frakes, Ed. Prentice-Hall, 1992, pp.131–160.
  • http://www.icherisheher.gov.az/qanunlar,154/lang,az/
  • http://www.maliyye.gov.az/sites/default/files/store/13/AASMN_qerarlar_1.doc
  • http://facemark.az/files/telebe/628941319836404291011.doc
  • Poor Soleyman a, Mobir a, Noroozi z. Natural Language Processing. The Forth National Conference Iran Student’s assemblage, Tehran Tarbeyat Moalem University,2002 (in persian).
  • Salavati SH. Proposing Stemming Algorythm and proofing For Kordish Language Documents. Masterart Thesis Software Engineering Artifical Intelligence propensity. Departement of Computer engineering and IT, Kordestan University, 2013 (in persian).
  • Mehrdad J, Naseri m. Natural Language Processing and Information Retrieval. Tehran: Chappar, 2008, p.110 (in persian).
  • Nami p, Farzadi m, Sameyian sh and Hashemi S M. A new Approach for word Stemming by using bundle of standard Data and word replacement with less correction / First National Conference of Research on Computer engineering, Tehran, the Center of Farzin Science and Technology, 2014 (in persian).
  • Bahrami r, Hoorali n. Proposing a Persian stemming based on statue / The second National Conference of Usefull Researches in Computer Science and It Tehran, Science and Technology University, 2014 (in persian).
  • Tamadon d, Yektaey m, Dezfooli m A. Proposing a new method for Word stemming, National Conference of Computer engineering and It, Shoshtar, Shoshtar Azad Islamic University, 2013 (in persian).
  • Deyanati m, Sadroddini m, Rasekh a, Taghi Zadh H. A Independent method from language for Stemming. By using Similar Factors. The Eleventh National Intelligent Systems assemblage of Iran Intelligent Systems. Kharazmi University, 2012 (in persian).
  • taghva k.، bekley، and sadeh، M. A Stemming algorithm for the Farsi Language. international conference on  information technology: coding and computing, 2005.
  • Porter, M.F. An algorithm for suffix stripping Program, vol.14 no.3, pp.130–137, July 1980.
  • Tashakori M., Meybodi M., Oroumchian F. Bon: First Persian Stemmer / In Proc, 1st EurAsian Conf. on Information, 2003.
  • Jaafar Y. et al. 2016. Enhancing Arabic stemming process using resources and benchmarking tools // Journal of King Saud University – Computer and Information Sciences http://dx.doi.org/10.1016/j.jksuci.2016.11.010.
  • Frakes W. B. Stemming Algorithms. http://matrix.nbu.bg/books/books/book5/chap08.htm
  • Estahbanati, Somayyeh, Javidan, Reza, Nikkhah, Mehdi. A New Multi-Phase Algorithm For Stemming In Farsi Language Based On Morphology // International Journal of Computer Theory and Engineering, vol.3, no.5, october 2011.
  • Rahimtoroghi Elaheh, Faili Hesham, Shakery Azadeh (2111). A Structural Rule-based Stemmer for Persian. 5th International Symposium on Telecommunications, IST.
  • Jadidinejad Amir, Mahmoudi Fariborz, and Dehdari John (2111). Evaluation of Perstem: A Simple and Efficient Stemming Algorithm for Persian. CLEF 2112 Workshop. Part I. LNCS 6241, pp.29–111.
  • Dianati MohammadHassan, sadrodini Mohammad Hadi, Rasekh Amir Hossein, Fakhrahmad Seyed Mostafa, Taghi-Zadeh Hossein, (June 2114). Words Stemming Based on Structural and Semantic Similarity // Computer Engineering and Applications, vol.3, no.2, pp.92–22.
  • Mokhtaripour Alireza, Jahanpour Saber (2116). Introduction to a New Farsi Stemmer / CIKM 6 Proceedings of the 15th ACM international conference on Information and knowledge PP 926-923.
  • Sharifloo Amir Azim., Shamsfard Mehrnoush. (2119). A Bottom Up Approach to Persian stemming / proceedings of the third joint conference on Natural language processing, vol.2, pp. 593–599.
  • Mohammad Nasiri Mojtaba., Esmaeili, Kiomars, Abolhassani Hassan. (2116). A statistical stemmer for Persian language // 11th International CSI Computer Conference (CSICC’). School of Computer.
  • Lovins J.B. (1101) Development of a stemming algorithm, MIT Information Processing Group, Electronic Systems Laboratory.
  • Mehrad, S.R.Berenjian. Providing Persian language singular-stemmer system (RICeST Stemmer) // International journal of information science and management (IJISM), 2012.
  • Eryiğit G. & Adalı E. An Affix Stripping Morphological Analyzer for Turkish, Proceedings of the IASTED / International Conference artificial intelligence and applications, 2004, Innsbruck, Austria.
  • Altintas K., Can F. 2002. Stemming for Turkish: a comparative evaluation. Proceedings of the 11th Turkish Symposium on Artificial Intelligence and Neural Networks (TAINN), Istanbul / Turkey, June 2002, pp.181–188.
  • Sever, H., & Bitirim, Y. Findstem: Analysis and evaluation of a Turkish stemming algorithm. SPIRE’03, pp.238–251, 2003.