Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction

Automatic text summarization and keyphrase extraction are two interesting areas of research which extend along natural language processing and information retrieval. They have recently become very popular because of their wide applicability. Devising generic techniques for these tasks is challenging...

Full description

Bibliographic Details
Main Author:	Hamid, Fahmida
Other Authors:	Tarau, Paul
Format:	Others
Language:	English
Published:	University of North Texas 2016
Subjects:	Evaluation Technique Summarization Keyphrase Extraction Graph-based Algorithms Absolute Scale Relativized Scale Degree of Agreement Baseline Computer Science
Online Access:	https://digital.library.unt.edu/ark:/67531/metadc862796/

id	ndltd-unt.edu-info-ark-67531-metadc862796
record_format	oai_dc
spelling	ndltd-unt.edu-info-ark-67531-metadc8627962020-07-15T07:09:31Z Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction Hamid, Fahmida Evaluation Technique Summarization Keyphrase Extraction Graph-based Algorithms Absolute Scale Relativized Scale Degree of Agreement Baseline Computer Science Automatic text summarization and keyphrase extraction are two interesting areas of research which extend along natural language processing and information retrieval. They have recently become very popular because of their wide applicability. Devising generic techniques for these tasks is challenging due to several issues. Yet we have a good number of intelligent systems performing the tasks. As different systems are designed with different perspectives, evaluating their performances with a generic strategy is crucial. It has also become immensely important to evaluate the performances with minimal human effort. In our work, we focus on designing a relativized scale for evaluating different algorithms. This is our major contribution which challenges the traditional approach of working with an absolute scale. We consider the impact of some of the environment variables (length of the document, references, and system-generated outputs) on the performance. Instead of defining some rigid lengths, we show how to adjust to their variations. We prove a mathematically sound baseline that should work for all kinds of documents. We emphasize automatically determining the syntactic well-formedness of the structures (sentences). We also propose defining an equivalence class for each unit (e.g. word) instead of the exact string matching strategy. We show an evaluation approach that considers the weighted relatedness of multiple references to adjust to the degree of disagreements between the gold standards. We publish the proposed approach as a free tool so that other systems can use it. We have also accumulated a dataset (scientific articles) with a reference summary and keyphrases for each document. Our approach is applicable not only for evaluating single-document based tasks but also for evaluating multiple-document based tasks. We have tested our evaluation method for three intrinsic tasks (taken from DUC 2004 conference), and in all three cases, it correlates positively with ROUGE. Based on our experiments for DUC 2004 Question-Answering task, it correlates with the human decision (extrinsic task) with 36.008% of accuracy. In general, we can state that the proposed relativized scale performs as well as the popular technique (ROUGE) with flexibility for the length of the output. As part of the evaluation we have also devised a new graph-based algorithm focusing on sentiment analysis. The proposed model can extract units (e.g. words or sentences) from the original text belonging either to the positive sentiment-pole or to the negative sentiment-pole. It embeds both (positive and negative) types of sentiment-flow into a single text-graph. The text-graph is composed with words or phrases as nodes, and their relations as edges. By recursively calling two mutually exclusive relations the model builds the final rank of the nodes. Based on the final rank, it splits two segments from the article: one with highly positive sentiment and the other with highly negative sentiments. The output of this model was tested with the non-polar TextRank generated output to quantify how much of the polar summaries actually covers the fact along with sentiment. University of North Texas Tarau, Paul Mihalcea, Rada, 1974- Buckles, Bill Blanco, Eduardo 2016-08 Thesis or Dissertation Text local-cont-no: submission_365 https://digital.library.unt.edu/ark:/67531/metadc862796/ ark: ark:/67531/metadc862796 English Public Hamid, Fahmida Copyright Copyright is held by the author, unless otherwise noted. All rights Reserved.
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Evaluation Technique Summarization Keyphrase Extraction Graph-based Algorithms Absolute Scale Relativized Scale Degree of Agreement Baseline Computer Science
spellingShingle	Evaluation Technique Summarization Keyphrase Extraction Graph-based Algorithms Absolute Scale Relativized Scale Degree of Agreement Baseline Computer Science Hamid, Fahmida Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction
description	Automatic text summarization and keyphrase extraction are two interesting areas of research which extend along natural language processing and information retrieval. They have recently become very popular because of their wide applicability. Devising generic techniques for these tasks is challenging due to several issues. Yet we have a good number of intelligent systems performing the tasks. As different systems are designed with different perspectives, evaluating their performances with a generic strategy is crucial. It has also become immensely important to evaluate the performances with minimal human effort. In our work, we focus on designing a relativized scale for evaluating different algorithms. This is our major contribution which challenges the traditional approach of working with an absolute scale. We consider the impact of some of the environment variables (length of the document, references, and system-generated outputs) on the performance. Instead of defining some rigid lengths, we show how to adjust to their variations. We prove a mathematically sound baseline that should work for all kinds of documents. We emphasize automatically determining the syntactic well-formedness of the structures (sentences). We also propose defining an equivalence class for each unit (e.g. word) instead of the exact string matching strategy. We show an evaluation approach that considers the weighted relatedness of multiple references to adjust to the degree of disagreements between the gold standards. We publish the proposed approach as a free tool so that other systems can use it. We have also accumulated a dataset (scientific articles) with a reference summary and keyphrases for each document. Our approach is applicable not only for evaluating single-document based tasks but also for evaluating multiple-document based tasks. We have tested our evaluation method for three intrinsic tasks (taken from DUC 2004 conference), and in all three cases, it correlates positively with ROUGE. Based on our experiments for DUC 2004 Question-Answering task, it correlates with the human decision (extrinsic task) with 36.008% of accuracy. In general, we can state that the proposed relativized scale performs as well as the popular technique (ROUGE) with flexibility for the length of the output. As part of the evaluation we have also devised a new graph-based algorithm focusing on sentiment analysis. The proposed model can extract units (e.g. words or sentences) from the original text belonging either to the positive sentiment-pole or to the negative sentiment-pole. It embeds both (positive and negative) types of sentiment-flow into a single text-graph. The text-graph is composed with words or phrases as nodes, and their relations as edges. By recursively calling two mutually exclusive relations the model builds the final rank of the nodes. Based on the final rank, it splits two segments from the article: one with highly positive sentiment and the other with highly negative sentiments. The output of this model was tested with the non-polar TextRank generated output to quantify how much of the polar summaries actually covers the fact along with sentiment.
author2	Tarau, Paul
author_facet	Tarau, Paul Hamid, Fahmida
author	Hamid, Fahmida
author_sort	Hamid, Fahmida
title	Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction
title_short	Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction
title_full	Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction
title_fullStr	Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction
title_full_unstemmed	Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction
title_sort	evaluation techniques and graph-based algorithms for automatic summarization and keyphrase extraction
publisher	University of North Texas
publishDate	2016
url	https://digital.library.unt.edu/ark:/67531/metadc862796/
work_keys_str_mv	AT hamidfahmida evaluationtechniquesandgraphbasedalgorithmsforautomaticsummarizationandkeyphraseextraction
_version_	1719329339402616832

Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction

Similar Items