№2, 2015

DETECTING TERRORISM-RELATED ARTICLES ON THE E-GOVERNMENT USING TEXT-MINING TECHNIQUES

Ramiz M. Aliguliyev, Gunay Y. Niftaliyeva

In this paper, a method based on text-mining techniques for detecting terror-related articles on the e-government is proposed. The proposed method consists of several stages: 1) creation of terror-related vocabulary; 2) creation of a semantic network of words; 3) morphological analysis of words; 4) initial filtration of documents; 5) calculation of the semantic similarity between words by using a semantic network of words; 6) determination of semantic similarity between sentences; 7) determination of semantic similarity between documents; 8) classification of documents. Hybrid similarity measures are introduced to calculate the similarity among words, sentences and documents. A hybrid classification method combining the kNN, Bayes and new proposed Ramiz-Gunay methods for identification of terror-related articles is proposed (pp. 36-46).

Keywords: e-government; e-government security; terrorism; text mining; hybrid similarity measure; kNN method; modified Bayes method; Ramiz-Gunay method; hybrid classification method.
DOI : 10.25045/jpit.v06.i2.04
References
  • Alruily M., Ayesh A., Al-Marghilani A. Using self organizing map to cluster arabic crime documents, Proceedings of the International Multiconference on Computer Science and Information Technology, Wisla, Poland, 18–20 October, 2010, pp.357–363.
  • Bsoul Q., Salim J., Zakaria L.Q. An intelligent document clustering approach to detect crime patterns, Procedia Technology, 2013, vol.11, pp.1181–1187.
  • Choi D., Ko B., Kim H., Kim H. Text analysis for detecting terrorism-related articles on the web, Journal of Network and Computer Applications, 2014, vol.38, pp.16-21.
  • Ku C.-H., Leroy G. A crime reports analysis system to identify related crimes, Journal of the American Society for Information Science and Technology, 2011, vol.62, no.8, 1533–1547.
  • Ku C.-H., Leroy G. A decision support system: automated crime report analysis and classification for e-government, Government Information Quarterly, 2014, vol.31, no.4, pp.534–544.
  • Yildiz M. E-government research: reviewing the literature, limitations, and ways forward, Government Information Quarterly, 2007, vol.24, no.3, pp.646–665.
  • Zhao J.J., Zhao S.Y., Zhao S.Y. Opportunities and threats: security assessment of state e-government websites, Government Information Quarterly, 2010, vol.27, no.1, pp.49–56.
  • Wimmer M., Codagnone C., Janssen M. Future e-government research: 13 research themes identified in the eGovRTD2020 project, Proceedings of the 41st Hawaii International Conference on System Sciences, Hawaii, USA, 7–10 January, 2008, pp.1–11.
  • Linders D. From e-government to we-government: defining a typology for citizen coproduction in the age of social media, Government Information Quarterly, 2012, vol.29, no.4, pp.446–454.
  • Aliguliyev R.M. Role of text mining in national security, Problems of Information Technology, 2013, no.1, pp.38–43. (in Russian)
  • Aggarwal C.C., Zhai C.X. Mining text data. Springer New York Dordrecht Heidelberg London. 2014.
  • www.idc.com
  • Miller G.A. WordNet: a lexical database for English, Communications on the ACM, 1995, vol.38, no.11, pp.39-41.
  • Wu Z., Palmer M. Verb semantics and lexical selection, Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, New Mexico, USA, 27–30 June, 1994, pp.133–138.
  • Keselj V., Peng F., Cercone N., Thomas C. N-gram based author profiles for authorship attribution, Proceedings of the Conference of the Pacific Association for Computational Linguistics, Nova Scotia, Canada, August 22–25, 2003, pp.255–264.
  • Last M., Markov A., Kandel A. Multi-lingual detection of terrorist content on the web, Lecture Notes in Computer Science, 2006, vol.3917, pp.16–30.
  • Shapira B., Last M., Elovici Y., Kandel A., Zaafrany O. Using data mining techniques for detecting terror-related activities on the web, Journal of Information Warfare, 2003, vol.3, no.1, pp.17–28.
  • Sharef N.M., Martin T. Evolving fuzzy grammar for crime texts categorization, Applied Soft Computing, 2015, vol.28, pp.175–187.
  • ru.wikipedia.org/wiki/Коэффициент_Симпсона#cite_note-2
  • Abdi A., Idris N., Alguliev R.M., Aliguliyev R.M. Automatic summarization assessment through a combination of semantic and syntactic information for intelligent educational systems, Information Processing & Management, 2015, vol.51, no.4, pp.340–358.
  • Lin D. An information-theoretic definition of similarity, Proceedings of the Fifteenth International Conference on Machine Learning, 1998, pp.296–304.
  • Zhao L., Wu L., Huang X. Using query expansion in graph-based approach for query-focused multi-document summarization, Information Processing & Management, 2009, vol.45, no.1, pp.35–41.
  • Alguliev R.M., Aliguliyev R.M., Mehdiyev C.A. Sentence selection for generic document summarization using an adaptive differential evolution algorithm, Swarm and Evolutionary Computation, 2011, vol.1, no.4, pp.213–222.
  • Aliguliyev R.M. A new sentence similarity measure and sentence based extractive technique for automatic text summarization, Expert Systems with Applications, 2009, vol.36, no.4, pp.7764–7772.
  • Li Y., McLean D., Bandar Z.A., O’shea J.D., Crockett K. Sentence similarity based on semantic nets and corpus statistics, IEEE Transactions on Knowledge and Data Engineering, 2006, vol.18, no.8, pp.1138–1150.
  • Aliguliyev R.M. Effective summarization method of text documents, Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, France, September 19-22, 2005, pp.264–271.
  • Devroye L., Gyorfi L., Lugosi G. A probabilistic theory of pattern recognition, Springer, 1996.