A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications
Purpose: Our study proposes a bootstrapping-based method to automatically extract data-usage statements from academic texts. Design/methodology/approach: The method for data-usage statements extraction starts with seed entities and iteratively learns patterns and data-usage statements from unlabele...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Chinese Academy of Sciences
2016-03-01
|
Series: | Journal of Data and Information Science |
Subjects: | |
Online Access: | http://www.jdis.org/CN/html/Article8610.htm |
id |
doaj-ca5ae15221ed4e82b876286458b3f68d |
---|---|
record_format |
Article |
spelling |
doaj-ca5ae15221ed4e82b876286458b3f68d2020-11-24T22:23:07ZengChinese Academy of SciencesJournal of Data and Information Science2096-157X2096-157X2016-03-0111698510.20309/jdis.201606A Bootstrapping-based Method to Automatically Identify Data-usage Statements in PublicationsQiuzi Zhang0Qikai Cheng1Yong Huang2Wei Lu3School of Information Management, Wuhan University, Wuhan 430072, ChinaSchool of Information Management, Wuhan University, Wuhan 430072, ChinaSchool of Information Management, Wuhan University, Wuhan 430072, ChinaSchool of Information Management, Wuhan University, Wuhan 430072, ChinaPurpose: Our study proposes a bootstrapping-based method to automatically extract data-usage statements from academic texts. Design/methodology/approach: The method for data-usage statements extraction starts with seed entities and iteratively learns patterns and data-usage statements from unlabeled text. In each iteration, new patterns are constructed and added to the pattern list based on their calculated score. Three seed-selection strategies are also proposed in this paper. Findings: The performance of the method is verified by means of experiments on real data collected from computer science journals. The results show that the method can achieve satisfactory performance regarding precision of extraction and extensibility of obtained patterns. Research limitations: While the triple representation of sentences is effective and efficient for extracting data-usage statements, it is unable to handle complex sentences. Additional features that can address complex sentences should thus be explored in the future. Practical implications: Data-usage statements extraction is beneficial for data-repository construction and facilitates research on data-usage tracking, dataset-based scholar search, and dataset evaluation. Originality/value: To the best of our knowledge, this paper is among the first to address the important task of automatically extracting data-usage statements from real data. http://www.jdis.org/CN/html/Article8610.htmData-usage statements extractionInformation extractionBootstrappingUnsupervised learningAcademic text-mining |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Qiuzi Zhang Qikai Cheng Yong Huang Wei Lu |
spellingShingle |
Qiuzi Zhang Qikai Cheng Yong Huang Wei Lu A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications Journal of Data and Information Science Data-usage statements extraction Information extraction Bootstrapping Unsupervised learning Academic text-mining |
author_facet |
Qiuzi Zhang Qikai Cheng Yong Huang Wei Lu |
author_sort |
Qiuzi Zhang |
title |
A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications |
title_short |
A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications |
title_full |
A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications |
title_fullStr |
A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications |
title_full_unstemmed |
A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications |
title_sort |
bootstrapping-based method to automatically identify data-usage statements in publications |
publisher |
Chinese Academy of Sciences |
series |
Journal of Data and Information Science |
issn |
2096-157X 2096-157X |
publishDate |
2016-03-01 |
description |
Purpose: Our study proposes a bootstrapping-based method to automatically extract data-usage statements from academic texts.
Design/methodology/approach: The method for data-usage statements extraction starts with seed entities and iteratively learns patterns and data-usage statements from unlabeled text. In each iteration, new patterns are constructed and added to the pattern list based on their calculated score. Three seed-selection strategies are also proposed in this paper.
Findings: The performance of the method is verified by means of experiments on real data collected from computer science journals. The results show that the method can achieve satisfactory performance regarding precision of extraction and extensibility of obtained patterns.
Research limitations: While the triple representation of sentences is effective and efficient for extracting data-usage statements, it is unable to handle complex sentences. Additional features that can address complex sentences should thus be explored in the future.
Practical implications: Data-usage statements extraction is beneficial for data-repository construction and facilitates research on data-usage tracking, dataset-based scholar search, and dataset evaluation.
Originality/value: To the best of our knowledge, this paper is among the first to address the important task of automatically extracting data-usage statements from real data.
|
topic |
Data-usage statements extraction Information extraction Bootstrapping Unsupervised learning Academic text-mining |
url |
http://www.jdis.org/CN/html/Article8610.htm |
work_keys_str_mv |
AT qiuzizhang abootstrappingbasedmethodtoautomaticallyidentifydatausagestatementsinpublications AT qikaicheng abootstrappingbasedmethodtoautomaticallyidentifydatausagestatementsinpublications AT yonghuang abootstrappingbasedmethodtoautomaticallyidentifydatausagestatementsinpublications AT weilu abootstrappingbasedmethodtoautomaticallyidentifydatausagestatementsinpublications AT qiuzizhang bootstrappingbasedmethodtoautomaticallyidentifydatausagestatementsinpublications AT qikaicheng bootstrappingbasedmethodtoautomaticallyidentifydatausagestatementsinpublications AT yonghuang bootstrappingbasedmethodtoautomaticallyidentifydatausagestatementsinpublications AT weilu bootstrappingbasedmethodtoautomaticallyidentifydatausagestatementsinpublications |
_version_ |
1725765817550766080 |