№1, 2010

MULTIDOCUMENT SUMMARIZATION THROUGH CLUSTERING AND RANKING OF SENTENCES

Aliguliyev R.M.

Two staged unsupervised approach to multidocument summarization is offered. At the first stage a set of documents are grouped into topics and at the second stage the informative sentences are extracted. The topics are defined through sentences clustering, and the informative sentences are extracted using the ranking algorithm. It is shown that summarization results depend on the clustering method, ranking algorithm and similarity measure. Experiments on open benchmarks DUC2001 and DUC2002 have shown that the offered clustering methods and ranking algorithm outperform the k-means method and the ranking algorithms PageRank and HITS. (p. 26-37)

Keywords: multidocument summarization, sentence clustering, sentence ranking.
References
  • Harabagiu S., Hickl A., Lacatusu V. Satisfying information needs with multi-document summaries // Information Processing and Management. 2007. V.43. № P.1619–1642.
  • Jones K. Automatic summarizing: the state of the art // Information Processing and Management. 2007. V.43. № P.1449–1481.
  • Moens M-F., Angheluta R., Dumortier J. Generic technologies for single- and multi-document summarization // Information Processing and Management. 2005. V.41. № P.569–586.
  • Zajic D., Dorr B.J., Lin J., Schwartz R. Multi-candidate reduction: sentence compression as a tool for document summarization tasks // Information Processing and 2007. V.43. №6. P.1549–1570.
  • Zhang Y., Zincir-Heywood N., Milios E. World Wide Web site summarization // International Journal of Web Intelligence and Agents Systems. 2004. V.2. №P.39–53.
  • Antiqueira L, Oliveira O., Costa L., Nunes M. A complex network approach to text summarization // Information Sciences. 2009. V.179. №5. P.584–599.
  • Diao Q., Shan J. A new web page summarization method / Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’06). Washington. USA. P.639–640.
  • Erkan G., Radev D. Lexrank: graph-based centrality as salience in text summarization // Journal of Artificial Intelligence Research. 2004. V.22. P.457–479.
  • Otterbacher J., Erkan G., Radev D. Biased LexRank: passage retrieval using random walks with question-based priors // Information Processing and Management. 2009. V.45. № P.42–54.
  • Zhang J., Xu H., Cheng X. GSPSummary: a graph-based sub-topic partition algorithm for summarization / Proceedings of the 2008 Asia Information Retrieval Symposium. Harbin. China. 2008. P.321–334.
  • Liu Y., Wang X., Zhang J., Xu H. Personalized PageRank based multi-document summarization / Proceedings of the First IEEE International Workshop on Semantic Computing and Systems (WSCS2008). Huangshan. China. 2008. P.169–173.
  • Zhang J., Cheng X., Wu G., Xu H. AdaSum: an adaptive model for summarization / Proceedings of the ACM 17th Conference on Information and Knowledge Management (CIKM’08). Napa Valley. USA. 2008. P.901–909.
  • Yeh J-Y., Ke H-R., Yang W-P. iSpreadRank: ranking sentences for extraction-based summarization using feature weight propagation in the sentence similarity network // Expert Systems with Applications. 2008. V.35. № P.1451–1462.
  • Diligenti M., Gori M., Maggini M. A unified probabilistic framework for web page scoring systems // IEEE Transactions on Knowledge and Data Engineering. 2004. V.16. № P.4–16.
  • Wan X., Yang J., Xiao J. Manifold-ranking based topic-focused multi-document summarization / Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-2007). Hyderabad. India. 2007. P.2903–2909.
  • Тарасов С.Д. Алгоритм ранжирования связанных структур для задачи автоматического составления обзорных рефератов новостных сюжетов / Труды 11-й национальной конференции по искусственному интеллекту с международным участием (КИИ-2008). Дубна. Россия. 2008. Т.2. С.204–211.
  • Wan X., Yang J. Multi-document summarization using cluster-based link analysis / Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08). Singapore. P.299–306.
  • Aliguliyev R.M. A new sentence similarity measure and sentence based extractive technique for automatic text summarization // Expert Systems with Applications. 2009. V.36. №4. P.7764–7772.
  • Aliguliyev R.M. Clustering techniques and discrete particle swarm optimization algorithm for multi-document summarization // Computational Intelligence. 2009. V.25. №4.
  • Strehl A., Ghosh J. Value-based customer grouping from large retail data-sets / Proceedings of the SPIE Conference on Data Mining and Knowledge Discovery. Orlando. USA. 2000. V.4057. P.33–42.
  • Padmanabhan D., Desikan P., Srivastava J. WICER: a weighted inter-cluster edge ranking for clustered graphs / Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI’2005). Compiegne. France. 2005. P.522–528.
  • Lin C-Y. ROUGE: a package for automatic evaluation summaries / Proceedings of the Workshop on Text Summarization Branches Out. Barcelona. Spain. 2004. P.74–81.
  • http://duc.nist.gov