Section-Based Focus Time Estimation of News Articles

Information retrieval systems embed temporal information for retrieving the news documents related to temporal queries. One of the important aspects of a news document is the focus time, a time to which the content of document refers. The contemporary state-of-the-art does not exploit focus time to...

Full description

Bibliographic Details
Main Authors: Shafiq Ur Rehman Khan, Muhammad Arshad Islam, Muhammad Aleem, Muhammad Azhar Iqbal, Usman Ahmed
Format: Article
Language:English
Published: IEEE 2018-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8543588/
id doaj-f4861f4dbfa14c319b63a70c570cd84a
record_format Article
spelling doaj-f4861f4dbfa14c319b63a70c570cd84a2021-03-29T21:39:38ZengIEEEIEEE Access2169-35362018-01-016754527546010.1109/ACCESS.2018.28829888543588Section-Based Focus Time Estimation of News ArticlesShafiq Ur Rehman Khan0https://orcid.org/0000-0002-1475-0190Muhammad Arshad Islam1Muhammad Aleem2Muhammad Azhar Iqbal3Usman Ahmed4https://orcid.org/0000-0002-3933-4273Department of Computer Science, Capital University of Science and Technology, Islamabad, PakistanDepartment of Computer Science, Capital University of Science and Technology, Islamabad, PakistanDepartment of Computer Science, Capital University of Science and Technology, Islamabad, PakistanDepartment of Computer Science, Capital University of Science and Technology, Islamabad, PakistanDepartment of Computer Science, Capital University of Science and Technology, Islamabad, PakistanInformation retrieval systems embed temporal information for retrieving the news documents related to temporal queries. One of the important aspects of a news document is the focus time, a time to which the content of document refers. The contemporary state-of-the-art does not exploit focus time to retrieve relevant news document. This paper investigates the inverted pyramid news paradigm to determine the focus time of news documents by extracting temporal expressions, normalizing their value and assigning them a score on the basis of their position in the text. In this method, the news documents are first divided into three sections following the inverted pyramid news paradigm. This paper presents a comprehensive analysis of four methods for splitting news document into sections: the paragraph-based method, the words-based method, the sentence-based method, and the semantic-based method (SeBM). Temporal expressions in each section are assigned weights using a linear regression model. Finally, a scoring function is used to calculate a temporal score for each time expression appearing in the document. These temporal expressions are then ranked on the basis of their temporal score, where the most suitable expression appears on top. The effectiveness of the proposed method is evaluated on a diverse dataset of news related to popular events; the results revealed that the proposed splitting methods achieved an average error of less than 5.6 years, whereas the SeBM achieved a high precision score of 0.35 and 0.77 at positions 1 and 2, respectively.https://ieeexplore.ieee.org/document/8543588/Information retrievaltemporal information retrievalfocus timeinverted pyramidnews retrieval
collection DOAJ
language English
format Article
sources DOAJ
author Shafiq Ur Rehman Khan
Muhammad Arshad Islam
Muhammad Aleem
Muhammad Azhar Iqbal
Usman Ahmed
spellingShingle Shafiq Ur Rehman Khan
Muhammad Arshad Islam
Muhammad Aleem
Muhammad Azhar Iqbal
Usman Ahmed
Section-Based Focus Time Estimation of News Articles
IEEE Access
Information retrieval
temporal information retrieval
focus time
inverted pyramid
news retrieval
author_facet Shafiq Ur Rehman Khan
Muhammad Arshad Islam
Muhammad Aleem
Muhammad Azhar Iqbal
Usman Ahmed
author_sort Shafiq Ur Rehman Khan
title Section-Based Focus Time Estimation of News Articles
title_short Section-Based Focus Time Estimation of News Articles
title_full Section-Based Focus Time Estimation of News Articles
title_fullStr Section-Based Focus Time Estimation of News Articles
title_full_unstemmed Section-Based Focus Time Estimation of News Articles
title_sort section-based focus time estimation of news articles
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2018-01-01
description Information retrieval systems embed temporal information for retrieving the news documents related to temporal queries. One of the important aspects of a news document is the focus time, a time to which the content of document refers. The contemporary state-of-the-art does not exploit focus time to retrieve relevant news document. This paper investigates the inverted pyramid news paradigm to determine the focus time of news documents by extracting temporal expressions, normalizing their value and assigning them a score on the basis of their position in the text. In this method, the news documents are first divided into three sections following the inverted pyramid news paradigm. This paper presents a comprehensive analysis of four methods for splitting news document into sections: the paragraph-based method, the words-based method, the sentence-based method, and the semantic-based method (SeBM). Temporal expressions in each section are assigned weights using a linear regression model. Finally, a scoring function is used to calculate a temporal score for each time expression appearing in the document. These temporal expressions are then ranked on the basis of their temporal score, where the most suitable expression appears on top. The effectiveness of the proposed method is evaluated on a diverse dataset of news related to popular events; the results revealed that the proposed splitting methods achieved an average error of less than 5.6 years, whereas the SeBM achieved a high precision score of 0.35 and 0.77 at positions 1 and 2, respectively.
topic Information retrieval
temporal information retrieval
focus time
inverted pyramid
news retrieval
url https://ieeexplore.ieee.org/document/8543588/
work_keys_str_mv AT shafiqurrehmankhan sectionbasedfocustimeestimationofnewsarticles
AT muhammadarshadislam sectionbasedfocustimeestimationofnewsarticles
AT muhammadaleem sectionbasedfocustimeestimationofnewsarticles
AT muhammadazhariqbal sectionbasedfocustimeestimationofnewsarticles
AT usmanahmed sectionbasedfocustimeestimationofnewsarticles
_version_ 1724192580621565952