Exploring content selection strategies for Multilingual Multi-Document Summarization based on the Universal Network Language (UNL)

Multilingual Multi-Document Summarization aims at ranking the sentences of a cluster with (at least) 2 news texts (1 in the user’s language and 1 in a foreign language), and select the top-ranked sentences for a summary in the user’s language. We explored three concept-based statistics and one super...

Full description

Bibliographic Details
Main Authors: Matheus Rigobelo Chaud, Ariani Di Felippo
Format: Article
Language:English
Published: Universidade Federal de Minas Gerais 2017-11-01
Series:Revista de Estudos da Linguagem
Subjects:
Online Access:http://periodicos.letras.ufmg.br/index.php/relin/article/view/10857
id doaj-d46da1401d64482c948d3e5d64ca34d2
record_format Article
spelling doaj-d46da1401d64482c948d3e5d64ca34d22020-11-24T21:29:08ZengUniversidade Federal de Minas GeraisRevista de Estudos da Linguagem0104-05882237-20832017-11-01261457110.17851/2237-2083.26.1.45-719343Exploring content selection strategies for Multilingual Multi-Document Summarization based on the Universal Network Language (UNL)Matheus Rigobelo Chaud0Ariani Di Felippo1Universidade de São PauloUniversidade Federal de São CarlosMultilingual Multi-Document Summarization aims at ranking the sentences of a cluster with (at least) 2 news texts (1 in the user’s language and 1 in a foreign language), and select the top-ranked sentences for a summary in the user’s language. We explored three concept-based statistics and one superficial strategy for sentence ranking. We used a bilingual corpus (Brazilian Portuguese-English) encoded in UNL (Universal Network Language) with source and summary sentences aligned based on content overlap. Our experiment shows that “concept frequency normalized by the number of concepts in the sentence” is the measure that best ranks the sentences selected by humans. However, it does not outperform the superficial strategy based on the position of the sentences in the texts. This indicates that the most frequent concepts are not always contained in first sentences, usually selected by humans to build the summaries because they convey the main information of the collection. Keywords: content selection; concept; statistical measure; multilingual corpus; multi-document summarization.http://periodicos.letras.ufmg.br/index.php/relin/article/view/10857content selectionconceptstatistical measuremultilingual corpusmulti-document summarization
collection DOAJ
language English
format Article
sources DOAJ
author Matheus Rigobelo Chaud
Ariani Di Felippo
spellingShingle Matheus Rigobelo Chaud
Ariani Di Felippo
Exploring content selection strategies for Multilingual Multi-Document Summarization based on the Universal Network Language (UNL)
Revista de Estudos da Linguagem
content selection
concept
statistical measure
multilingual corpus
multi-document summarization
author_facet Matheus Rigobelo Chaud
Ariani Di Felippo
author_sort Matheus Rigobelo Chaud
title Exploring content selection strategies for Multilingual Multi-Document Summarization based on the Universal Network Language (UNL)
title_short Exploring content selection strategies for Multilingual Multi-Document Summarization based on the Universal Network Language (UNL)
title_full Exploring content selection strategies for Multilingual Multi-Document Summarization based on the Universal Network Language (UNL)
title_fullStr Exploring content selection strategies for Multilingual Multi-Document Summarization based on the Universal Network Language (UNL)
title_full_unstemmed Exploring content selection strategies for Multilingual Multi-Document Summarization based on the Universal Network Language (UNL)
title_sort exploring content selection strategies for multilingual multi-document summarization based on the universal network language (unl)
publisher Universidade Federal de Minas Gerais
series Revista de Estudos da Linguagem
issn 0104-0588
2237-2083
publishDate 2017-11-01
description Multilingual Multi-Document Summarization aims at ranking the sentences of a cluster with (at least) 2 news texts (1 in the user’s language and 1 in a foreign language), and select the top-ranked sentences for a summary in the user’s language. We explored three concept-based statistics and one superficial strategy for sentence ranking. We used a bilingual corpus (Brazilian Portuguese-English) encoded in UNL (Universal Network Language) with source and summary sentences aligned based on content overlap. Our experiment shows that “concept frequency normalized by the number of concepts in the sentence” is the measure that best ranks the sentences selected by humans. However, it does not outperform the superficial strategy based on the position of the sentences in the texts. This indicates that the most frequent concepts are not always contained in first sentences, usually selected by humans to build the summaries because they convey the main information of the collection. Keywords: content selection; concept; statistical measure; multilingual corpus; multi-document summarization.
topic content selection
concept
statistical measure
multilingual corpus
multi-document summarization
url http://periodicos.letras.ufmg.br/index.php/relin/article/view/10857
work_keys_str_mv AT matheusrigobelochaud exploringcontentselectionstrategiesformultilingualmultidocumentsummarizationbasedontheuniversalnetworklanguageunl
AT arianidifelippo exploringcontentselectionstrategiesformultilingualmultidocumentsummarizationbasedontheuniversalnetworklanguageunl
_version_ 1725967163772108800