№1, 2015

A NEW SIMILARITY MEASURE AND MATHEMATICAL MODEL FOR TEXT SUMMARIZATION

Alguliyev Rasim M., Aliguliyev Ramiz M., Isazade Nicat R.

This paper proposes a new text similarity measure and mathematical model for automatic text summarization. Model consists of two stages. At the first stage, for detection of topics the sentences in document collection are clustered. At the second stage, the model generates a summary by extracting relevant sentences from each cluster. For clustering of sentences the k-means algorithm is utilized. Sentence selection process is formalized as an optimization problem. To select relevant sentences from each cluster and avoid redundancy in the summary this model uses both the sentence-to-cluster relation and the sentence-to-sentence relation. To solve the optimization problem a differential evolution algorithm with adaptive mutation strategy is developed. (pp. 42-53)

Keywords: new RRN similarity measure, sentence clustering, k-means, optimization model, differential evolution algorithm, modified mutation operator
References
  • Canhasi E., Kononenko I. Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization // Expert Systems with Applications, 2014, vol.41, no.2, pp.535–543.
  • Ferreira R., Cabral L.S., Freitas F., Lins R.D., Silva G.F., Simske S.J., Favaro L. A multi-document summarization system based on statistics and linguistic treatment // Expert Systems with Applications, 2014, vol.41, no.13, pp.5780–5787.
  • Yang C.C., Wang F.L. Hierarchical summarization of large documents // Journal of the American Society for Information Science and Technology, 2008, vol.59, no.6, pp.887–902.
  • Lloret E., Palomar M. COMPENDIUM: a text summarization tool for generating summaries of multiple purposes, domains, and genres // Natural Language Engineering, 2013, vol.19, no.2, pp.147–186.
  • Luo W., Zhuang F., He Q., Shi Z. Exploiting relevance, coverage, and novelty for query-focused multi-document summarization // Knowledge-Based Systems, 2013, vol.46, 33–42.
  • Alyguliyev R.M. The two-stage unsupervised approach to multi-document summarization // Automatic Control and Computer Sciences, 2009, vol.43, no.5, pp.276–284.
  • Aliguliyev R.M. Multidocument summarization through clustering and ranking of sentences // Problems of Information Technology, 2010, no.1, pp.26–37.
  • Aliguliyev R.M. A novel partitioning-based clustering method and generic document summarization // Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Hong Kong, China, December 18‒22, 2006, pp.626‒629.
  • Alguliev R.M., Alyguliev R.M. Summarization of text-based documents with a determination of latent topical sections and information-rich sentences // Automatic Control and Computer Sciences, 2007, vol.41, no.3, pp.132–140.
  • Alguliev R.M., Aliguliyev R.M. Automatic text documents summarization through sentences clustering // Journal of Automation and Information Sciences, 2008, vol.40, no.9, pp.53‒63.
  • Aliguliyev R.M. Clustering techniques and discrete particle swarm optimization algorithm for multi-document summarization // Computational Intelligence, 2010, vol.26, no.4, pp.420–448.
  • Aliguliyev R.M. A new sentence similarity measure and sentence based extractive technique for automatic text summarization // Expert Systems with Applications, 2009, vol.36, no.4, pp.7764‒7772.
  • Cai X., Li W., Zhan R. Enhancing diversity and coverage of document summaries through subspace clustering and clustering-based optimization // Information Sciences, 2014, vol.279, pp.764–775.
  • Cai X., Li W., Zhang R. Combining co-clustering with noise detection for theme-based summarization // ACM Transactions on Speech and Language Processing, 2013, vol.10, no.4, Article 16, 27 pages.
  • Yang L., Cai X., Zhang Y., Shi P. Enhancing sentence-level clustering with ranking-based clustering framework for theme-based summarization // Information Sciences, 2014, vol.260, pp.37–50.
  • Mei J.-P., Chen L. SumCR: A new subtopic-based extractive approach for text summarization // Knowledge and Information Systems, 2012, vol.31, no.3, pp.527–545.
  • Alguliev R.M., Aliguliyev R.M., Hajirahimova M.S., Mehdiyev C.A. MCMR: maximum coverage and minimum redundant text summarization model // Expert Systems with Applications, 2011, vol.38, no.12, pp.14514–14522.
  • Alguliev R.M., Aliguliyev R.M., Mehdiyev C.A. An optimization model and DPSO-EDA for document summarization // International Journal of Information Technology and Computer Science, 2011, vol.3, no.5, pp.59–68.
  • Alguliev R.M., Aliguliyev R.M., Mehdiyev C.A. Sentence selection for generic document summarization using an adaptive differential evolution algorithm // Swarm and Evolutionary Computation, 2011, vol.1, no.4, pp.213–222.
  • Alguliev R.M., Aliguliyev R.M., Hajirahimova M.S. GenDocSum + MCLR: Generic document summarization based on maximum coverage and less redundancy // Expert Systems with Applications, 2012, vol.39, no.16, pp.12460–12473.
  • Alguliev R.M., Aliguliyev R.M., Hajirahimova M.S. Quadratic Boolean programming model and binary differential evolution algorithm for text summarization // Problems of Information Technology, 2012, no.2, pp.20–29.
  • Alguliev R.M., Aliguliyev R.M., Isazade N.R. DESAMC+DocSum: differential evolution with self-adaptive mutation and crossover parameters for multi-document summarization // Knowledge-Based Systems, 2012, vol.36, pp.21–38.
  • Alguliev R.M., Aliguliyev R.M., Mehdiyev C.A. An optimization approach to automatic generic document summarization // Computational Intelligence, 2013, vol.29, no.1, 129–155.
  • Alguliev R.M., Aliguliyev R.M., Isazade N.R. CDDS: Constraint-driven document summarization models // Expert Systems with Applications, 2013, vol.40, no.2, pp.458–465.
  • Alguliev R.M., Aliguliyev R.M., Isazade N.R. Multiple documents summarization based on evolutionary optimization algorithm // Expert Systems with Applications, 2013, vol.40, no.5, pp.1675–1689.
  • Alguliev R.M., Aliguliyev R.M., Isazade N.R. Formulation of document summarization as a 0–1 nonlinear programming problem // Computers and Industrial Engineering, 2013, vol.64, no.1, pp.94–102.
  • Takamura H., Okumura M. Text summarization model based on maximum coverage problem and its variant // Proceedings of the 12th Conference of the European Chapter of the ACL, Athens, Greece, March 30 - April 3, 2009, pp.781–789.
  • Song W., Liang J.Z., Park S.C. Fuzzy control GA with a novel hybrid semantic similarity strategy for text clustering // Information Sciences, 2014, vol.273, pp.156–170.
  • Bagirov A.M., Ugon J., Webb D. Fast modified global k-means algorithm for incremental cluster construction // Pattern Recognition, 2011, vol.44, no.4, pp.866–876.
  • Storn R., Price K. Differential evolution – a simple and efficient heuristic for global optimization over continuous space // Journal of Global Optimization, 1997, vol.11, no.4, pp.341–359.
  • Das S., Suganthan P.N. Differential evolution: a survey of the state-of-the-art // IEEE Transactions on Evolutionary Computation, 2011, vol.15, no.1, pp.4–31.
  • Alguliev R.M., Aliguliyev R.M. Evolutionary algorithm for extractive text summarization // Intelligent Information Management, 2009, vol.1, no.2, pp.128–138.