№2, 2015

AN OPTIMIZATION MODEL FOR AUTOMATIC TEXT SUMMARIZATION

Ramiz Aliguliyev M., Makrufa Hajirahimova S.

In this paper, an unsupervised approach to automatic document summarization is proposed. This approach is based on sentence selection. In the proposed approach, sentence selection is modeled as an optimization problem. The model generally attempts to optimize three properties: relevance – summary should contain informative sentences that carry the main topics of the source text; redundancy – summaries should not contain multiple sentences that convey the same information; length – summary is bounded in length (pp. 84-90).

Keywords: information overload, text mining, text summarization, redundancy, coverage, optimization model.
DOI : 10.25045/jpit.v06.i2.10
References
  • M.S. Hajirahimova. Actual problems and solutions of electronic document management systems // Information Society Problems, 2010, No2, pp.21-29.
  • Alguliev R.M., Aliguliyev R.M., Hajirahimova M.S. GenDocSum + MCLR: Generic document summarization based on maximum coverage and less redundancy // Expert Systems with Application, 2012, vol.39, 16, pp.12460–12473.
  • Huang L., He Y., Wei F., Li W. Modeling document summarization as multi-objective optimization / Proceedings of the Third International Symposium on Intelligent Information Technology and Security Informatics, Jinggangshan, China, 2010, april 02–04, pp.382–386.
  • Aliguliyev R.M. Clustering techniques and discrete particle swarm optimization algorithm for multi-document summarization // Computational Intelligence, 2010, vol.26, no.4, pp.420–448.
  • Jones K.S. Automatic summarizing: the state of the art // Information Processing and Management, 2007, vol.43, no.6, pp.1449‒1481.
  • Das D., Martins A. F.T.A Survey on Automatic Text Summarization // Language, 2007, no.4, pp.1–31. http://www.cs.cmu.edu/~nasmith/LS2/das-martins.07.pdf
  • Alguliev R.M., Aliguliyev R.M., Isazade N.R. MR&MR-SUM: maximum relevance and minimum redundancy document summarization model // İnternational Journal of İnformation Technology & Decision Making, 2013, vol.12, no.3, pp.361–393
  • Tucker R. Automatic summarizing and the CLASP system, PhD thesis, University of Cambridge, UK, 1999, 190 p.
  • Zajic D.M. Mutipe alternative sentence compressions as a tool for automatik summarization task, PhD Thesis, University of Maryland College park,. 2007, 229 p. umiacs.umd.edu
  • Ouyang Y., Li W., Li S., Lu Q. Applying regression models to query-focused multi-document summarization // Information Processing & Management, 2011, vol.47, no.2, pp.227–237.
  • Radev D., Jing H., Stys M., Tam D. Centroid-based summarization of multiple documents // Information Processing and Management, 2004, vol.40, no.6, pp.919–938.
  • Huang H.H., Yang H.C., Kuo Y.H. A fuzzy-rough hybrid approach to multi-document extractive summarization / Proceedings of the Ninth International Conference on Hybrid Intelligent Systems, Shenyang, China, 2009, august 12–14, pp.168–173.
  • Carbonell J.G., Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries / Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, 1998, august 24–28, pp.335–336.
  • Lee J.H., Park S., Ahn C.M., Kim D. Automatic generic document summarization based on non-negative matrix factorization // Information Processing and Management, 2009, vol.45, no.1, pp.20–34.
  • Gong Y., Liu X. Generic text summarization using relevance measure and latent semantic analysis / Proceedings of the 24th Annual International Conference on Research and Development in Information Retrieval, New Orleans, USA, 2001, september 9–12, pp.19–25.
  • Wang D., Li T., Zhu S., Ding C. Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization / Proceedings of the 31st Annual International Conference on Research and Development in Information Retrieval, Singapore, 2008, july 20–24, pp.307–314.
  • Wan X., Xiao J. Graph-based multi-modality learning for topic-focused multi-document summarization / Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI’09), Pasadena, USA, 2009, july 11‒17, pp.1586–1591.
  • Erkan G., Radev D. Lexrank: graph-based centrality as salience in text summarization // Journal of Artificial Intelligence Research, 2004, vol. 22, pp.457–479.
  • Zhang J., Xu H., Cheng X. GSPSummary: a graph-based sub-topic partition algorithm for summarization / Proceedings of the Asia Information Retrieval Symposium, Harbin, China, 2008, january 15–18, pp.321–334.
  • Zhao L., Wu L., Huang X. Using query expansion in graph-based approach for query-focused multi-document summarization // Information Processing and Management, 2009, vol.45, no.1, pp.35–41.
  • Mitra M., Singhal A., Buckley C. Automatic text summarization by paragraph extraction / Proceedings of the ACL'97/EACL'97 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, 1997, pp.39–46.
  • Binwahlan M.S., Salim N., Suanmali L. Fuzzy swarm diversity hybrid model for text summarization // Information Processing and Management, 2010, vol.46, no.5, pp.571–588.
  • Nomoto T., Matsumoto Y. The diversity-based approach to open-domain text summarization // Information Processing and Management, 2003, vol.39, no.3, pp.363‒389.
  • Alguliev R., Aliguliyev R., Hajirahimova M. Multi-document summarization model based on integer linear programming // Intelligent Control and Automation, 2010, vol.1, no.1, pp.105–111.
  • McDonald R. A study of global inference algorithms in multi-document summarization / Proceedings of the 29th European Conference on IR Research, Rome, Italy, Springer-Verlag, LNCS, 2007, april 2‒5, no.25, pp.557‒564.
  • Filatova E., Hatzivassiloglou V. A formal model for information selection in multi-sentence text extraction / Proceedings of the 20th International Conference on Computational Linguistics (COLING'04), Geneva, Switzerland, 2004, august 23–27, pp.397–403.
  • Takamura H., Okumura M. Text summarization model based on maximum coverage problem and its variant / Proceedings of the 12th Conference of the European Chapter of the ACL, Athens, Greece, 2009, march 30‒april 3, pp.781‒789.
  • Lin J., Madnani N., Dorr B. Putting the user in the loop: interactive maximal marginal relevance for query-focused summarization / Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, USA, 2010, june 1–6, pp.305–308.
  • Alguliev R.M., Aliguliyev R.M., Isazade N.R. Multiple documents summarization based on evolutionary optimization algorithm // Expert Systems with Applications, 2013, vol.40, no.5, pp.1675–1689.
  • Alguliev R.M, Mehdiyev Ch.A. Modeling the document summarization as a modified task of p-median and adaptive ant algorithm for optimization solution // Information Technologies, 2011, No 9, pp. 9-17.
  • Alguliev R.M., Aliguliyev R.M., Isazade N.R. CDDS: Constraint-driven document summarization models // Expert Systems with Applications, 2013, vol.40, no.2, pp.458–465.