Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction
Automatic text summarization and keyphrase extraction are two interesting areas of research which extend along natural language processing and information retrieval. They have recently become very popular because of their wide applicability. Devising generic techniques for these tasks is challenging...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | English |
Published: |
University of North Texas
2016
|
Subjects: | |
Online Access: | https://digital.library.unt.edu/ark:/67531/metadc862796/ |
id |
ndltd-unt.edu-info-ark-67531-metadc862796 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-unt.edu-info-ark-67531-metadc8627962020-07-15T07:09:31Z Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction Hamid, Fahmida Evaluation Technique Summarization Keyphrase Extraction Graph-based Algorithms Absolute Scale Relativized Scale Degree of Agreement Baseline Computer Science Automatic text summarization and keyphrase extraction are two interesting areas of research which extend along natural language processing and information retrieval. They have recently become very popular because of their wide applicability. Devising generic techniques for these tasks is challenging due to several issues. Yet we have a good number of intelligent systems performing the tasks. As different systems are designed with different perspectives, evaluating their performances with a generic strategy is crucial. It has also become immensely important to evaluate the performances with minimal human effort. In our work, we focus on designing a relativized scale for evaluating different algorithms. This is our major contribution which challenges the traditional approach of working with an absolute scale. We consider the impact of some of the environment variables (length of the document, references, and system-generated outputs) on the performance. Instead of defining some rigid lengths, we show how to adjust to their variations. We prove a mathematically sound baseline that should work for all kinds of documents. We emphasize automatically determining the syntactic well-formedness of the structures (sentences). We also propose defining an equivalence class for each unit (e.g. word) instead of the exact string matching strategy. We show an evaluation approach that considers the weighted relatedness of multiple references to adjust to the degree of disagreements between the gold standards. We publish the proposed approach as a free tool so that other systems can use it. We have also accumulated a dataset (scientific articles) with a reference summary and keyphrases for each document. Our approach is applicable not only for evaluating single-document based tasks but also for evaluating multiple-document based tasks. We have tested our evaluation method for three intrinsic tasks (taken from DUC 2004 conference), and in all three cases, it correlates positively with ROUGE. Based on our experiments for DUC 2004 Question-Answering task, it correlates with the human decision (extrinsic task) with 36.008% of accuracy. In general, we can state that the proposed relativized scale performs as well as the popular technique (ROUGE) with flexibility for the length of the output. As part of the evaluation we have also devised a new graph-based algorithm focusing on sentiment analysis. The proposed model can extract units (e.g. words or sentences) from the original text belonging either to the positive sentiment-pole or to the negative sentiment-pole. It embeds both (positive and negative) types of sentiment-flow into a single text-graph. The text-graph is composed with words or phrases as nodes, and their relations as edges. By recursively calling two mutually exclusive relations the model builds the final rank of the nodes. Based on the final rank, it splits two segments from the article: one with highly positive sentiment and the other with highly negative sentiments. The output of this model was tested with the non-polar TextRank generated output to quantify how much of the polar summaries actually covers the fact along with sentiment. University of North Texas Tarau, Paul Mihalcea, Rada, 1974- Buckles, Bill Blanco, Eduardo 2016-08 Thesis or Dissertation Text local-cont-no: submission_365 https://digital.library.unt.edu/ark:/67531/metadc862796/ ark: ark:/67531/metadc862796 English Public Hamid, Fahmida Copyright Copyright is held by the author, unless otherwise noted. All rights Reserved. |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
Evaluation Technique Summarization Keyphrase Extraction Graph-based Algorithms Absolute Scale Relativized Scale Degree of Agreement Baseline Computer Science |
spellingShingle |
Evaluation Technique Summarization Keyphrase Extraction Graph-based Algorithms Absolute Scale Relativized Scale Degree of Agreement Baseline Computer Science Hamid, Fahmida Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction |
description |
Automatic text summarization and keyphrase extraction are two interesting areas of research which extend along natural language processing and information retrieval. They have recently become very popular because of their wide applicability. Devising generic techniques for these tasks is challenging due to several issues. Yet we have a good number of intelligent systems performing the tasks. As different systems are designed with different perspectives, evaluating their performances with a generic strategy is crucial. It has also become immensely important to evaluate the performances with minimal human effort.
In our work, we focus on designing a relativized scale for evaluating different algorithms. This is our major contribution which challenges the traditional approach of working with an absolute scale. We consider the impact of some of the environment variables (length of the document, references, and system-generated outputs) on the performance. Instead of defining some rigid lengths, we show how to adjust to their variations. We prove a mathematically sound baseline that should work for all kinds of documents. We emphasize automatically determining the syntactic well-formedness of the structures (sentences). We also propose defining an equivalence class for each unit (e.g. word) instead of the exact string matching strategy. We show an evaluation approach that considers the weighted relatedness of multiple references to adjust to the degree of disagreements between the gold standards. We publish the proposed approach as a free tool so that other systems can use it. We have also accumulated a dataset (scientific articles) with a reference summary and keyphrases for each document. Our approach is applicable not only for evaluating single-document based tasks but also for evaluating multiple-document based tasks.
We have tested our evaluation method for three intrinsic tasks (taken from DUC 2004 conference), and in all three cases, it correlates positively with ROUGE. Based on our experiments for DUC 2004 Question-Answering task, it correlates with the human decision (extrinsic task) with 36.008% of accuracy. In general, we can state that the proposed relativized scale performs as well as the popular technique (ROUGE) with flexibility for the length of the output.
As part of the evaluation we have also devised a new graph-based algorithm focusing on sentiment analysis. The proposed model can extract units (e.g. words or sentences) from the original text belonging either to the positive sentiment-pole or to the negative sentiment-pole. It embeds both (positive and negative) types of sentiment-flow into a single text-graph. The text-graph is composed with words or phrases as nodes, and their relations as edges. By recursively calling two mutually exclusive relations the model builds the final rank of the nodes. Based on the final rank, it splits two segments from the article: one with highly positive sentiment and the other with highly negative sentiments. The output of this model was tested with the non-polar TextRank generated output to quantify how much of the polar summaries actually covers the fact along with sentiment. |
author2 |
Tarau, Paul |
author_facet |
Tarau, Paul Hamid, Fahmida |
author |
Hamid, Fahmida |
author_sort |
Hamid, Fahmida |
title |
Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction |
title_short |
Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction |
title_full |
Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction |
title_fullStr |
Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction |
title_full_unstemmed |
Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction |
title_sort |
evaluation techniques and graph-based algorithms for automatic summarization and keyphrase extraction |
publisher |
University of North Texas |
publishDate |
2016 |
url |
https://digital.library.unt.edu/ark:/67531/metadc862796/ |
work_keys_str_mv |
AT hamidfahmida evaluationtechniquesandgraphbasedalgorithmsforautomaticsummarizationandkeyphraseextraction |
_version_ |
1719329339402616832 |