Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction

Automatic text summarization and keyphrase extraction are two interesting areas of research which extend along natural language processing and information retrieval. They have recently become very popular because of their wide applicability. Devising generic techniques for these tasks is challenging...

Full description

Bibliographic Details
Main Author: Hamid, Fahmida
Other Authors: Tarau, Paul
Format: Others
Language:English
Published: University of North Texas 2016
Subjects:
Online Access:https://digital.library.unt.edu/ark:/67531/metadc862796/
id ndltd-unt.edu-info-ark-67531-metadc862796
record_format oai_dc
spelling ndltd-unt.edu-info-ark-67531-metadc8627962020-07-15T07:09:31Z Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction Hamid, Fahmida Evaluation Technique Summarization Keyphrase Extraction Graph-based Algorithms Absolute Scale Relativized Scale Degree of Agreement Baseline Computer Science Automatic text summarization and keyphrase extraction are two interesting areas of research which extend along natural language processing and information retrieval. They have recently become very popular because of their wide applicability. Devising generic techniques for these tasks is challenging due to several issues. Yet we have a good number of intelligent systems performing the tasks. As different systems are designed with different perspectives, evaluating their performances with a generic strategy is crucial. It has also become immensely important to evaluate the performances with minimal human effort. In our work, we focus on designing a relativized scale for evaluating different algorithms. This is our major contribution which challenges the traditional approach of working with an absolute scale. We consider the impact of some of the environment variables (length of the document, references, and system-generated outputs) on the performance. Instead of defining some rigid lengths, we show how to adjust to their variations. We prove a mathematically sound baseline that should work for all kinds of documents. We emphasize automatically determining the syntactic well-formedness of the structures (sentences). We also propose defining an equivalence class for each unit (e.g. word) instead of the exact string matching strategy. We show an evaluation approach that considers the weighted relatedness of multiple references to adjust to the degree of disagreements between the gold standards. We publish the proposed approach as a free tool so that other systems can use it. We have also accumulated a dataset (scientific articles) with a reference summary and keyphrases for each document. Our approach is applicable not only for evaluating single-document based tasks but also for evaluating multiple-document based tasks. We have tested our evaluation method for three intrinsic tasks (taken from DUC 2004 conference), and in all three cases, it correlates positively with ROUGE. Based on our experiments for DUC 2004 Question-Answering task, it correlates with the human decision (extrinsic task) with 36.008% of accuracy. In general, we can state that the proposed relativized scale performs as well as the popular technique (ROUGE) with flexibility for the length of the output. As part of the evaluation we have also devised a new graph-based algorithm focusing on sentiment analysis. The proposed model can extract units (e.g. words or sentences) from the original text belonging either to the positive sentiment-pole or to the negative sentiment-pole. It embeds both (positive and negative) types of sentiment-flow into a single text-graph. The text-graph is composed with words or phrases as nodes, and their relations as edges. By recursively calling two mutually exclusive relations the model builds the final rank of the nodes. Based on the final rank, it splits two segments from the article: one with highly positive sentiment and the other with highly negative sentiments. The output of this model was tested with the non-polar TextRank generated output to quantify how much of the polar summaries actually covers the fact along with sentiment. University of North Texas Tarau, Paul Mihalcea, Rada, 1974- Buckles, Bill Blanco, Eduardo 2016-08 Thesis or Dissertation Text local-cont-no: submission_365 https://digital.library.unt.edu/ark:/67531/metadc862796/ ark: ark:/67531/metadc862796 English Public Hamid, Fahmida Copyright Copyright is held by the author, unless otherwise noted. All rights Reserved.
collection NDLTD
language English
format Others
sources NDLTD
topic Evaluation Technique
Summarization
Keyphrase Extraction
Graph-based Algorithms
Absolute Scale
Relativized Scale
Degree of Agreement
Baseline
Computer Science
spellingShingle Evaluation Technique
Summarization
Keyphrase Extraction
Graph-based Algorithms
Absolute Scale
Relativized Scale
Degree of Agreement
Baseline
Computer Science
Hamid, Fahmida
Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction
description Automatic text summarization and keyphrase extraction are two interesting areas of research which extend along natural language processing and information retrieval. They have recently become very popular because of their wide applicability. Devising generic techniques for these tasks is challenging due to several issues. Yet we have a good number of intelligent systems performing the tasks. As different systems are designed with different perspectives, evaluating their performances with a generic strategy is crucial. It has also become immensely important to evaluate the performances with minimal human effort. In our work, we focus on designing a relativized scale for evaluating different algorithms. This is our major contribution which challenges the traditional approach of working with an absolute scale. We consider the impact of some of the environment variables (length of the document, references, and system-generated outputs) on the performance. Instead of defining some rigid lengths, we show how to adjust to their variations. We prove a mathematically sound baseline that should work for all kinds of documents. We emphasize automatically determining the syntactic well-formedness of the structures (sentences). We also propose defining an equivalence class for each unit (e.g. word) instead of the exact string matching strategy. We show an evaluation approach that considers the weighted relatedness of multiple references to adjust to the degree of disagreements between the gold standards. We publish the proposed approach as a free tool so that other systems can use it. We have also accumulated a dataset (scientific articles) with a reference summary and keyphrases for each document. Our approach is applicable not only for evaluating single-document based tasks but also for evaluating multiple-document based tasks. We have tested our evaluation method for three intrinsic tasks (taken from DUC 2004 conference), and in all three cases, it correlates positively with ROUGE. Based on our experiments for DUC 2004 Question-Answering task, it correlates with the human decision (extrinsic task) with 36.008% of accuracy. In general, we can state that the proposed relativized scale performs as well as the popular technique (ROUGE) with flexibility for the length of the output. As part of the evaluation we have also devised a new graph-based algorithm focusing on sentiment analysis. The proposed model can extract units (e.g. words or sentences) from the original text belonging either to the positive sentiment-pole or to the negative sentiment-pole. It embeds both (positive and negative) types of sentiment-flow into a single text-graph. The text-graph is composed with words or phrases as nodes, and their relations as edges. By recursively calling two mutually exclusive relations the model builds the final rank of the nodes. Based on the final rank, it splits two segments from the article: one with highly positive sentiment and the other with highly negative sentiments. The output of this model was tested with the non-polar TextRank generated output to quantify how much of the polar summaries actually covers the fact along with sentiment.
author2 Tarau, Paul
author_facet Tarau, Paul
Hamid, Fahmida
author Hamid, Fahmida
author_sort Hamid, Fahmida
title Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction
title_short Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction
title_full Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction
title_fullStr Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction
title_full_unstemmed Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction
title_sort evaluation techniques and graph-based algorithms for automatic summarization and keyphrase extraction
publisher University of North Texas
publishDate 2016
url https://digital.library.unt.edu/ark:/67531/metadc862796/
work_keys_str_mv AT hamidfahmida evaluationtechniquesandgraphbasedalgorithmsforautomaticsummarizationandkeyphraseextraction
_version_ 1719329339402616832