Recognizing speculative language in biomedical research articles: a linguistically motivated perspective

<p>Abstract</p> <p>Background</p> <p>Due to the nature of scientific methodology, research articles are rich in speculative and tentative statements, also known as hedges. We explore a linguistically motivated approach to the problem of recognizing such language in biom...

Full description

Bibliographic Details
Main Authors: Bergler Sabine, Kilicoglu Halil
Format: Article
Language:English
Published: BMC 2008-11-01
Series:BMC Bioinformatics
id doaj-fa6cafe64aa1496088f0e07efba02ed4
record_format Article
spelling doaj-fa6cafe64aa1496088f0e07efba02ed42020-11-25T00:27:26ZengBMCBMC Bioinformatics1471-21052008-11-019Suppl 11S1010.1186/1471-2105-9-S11-S10Recognizing speculative language in biomedical research articles: a linguistically motivated perspectiveBergler SabineKilicoglu Halil<p>Abstract</p> <p>Background</p> <p>Due to the nature of scientific methodology, research articles are rich in speculative and tentative statements, also known as hedges. We explore a linguistically motivated approach to the problem of recognizing such language in biomedical research articles. Our approach draws on prior linguistic work as well as existing lexical resources to create a dictionary of hedging cues and extends it by introducing syntactic patterns.</p> <p>Furthermore, recognizing that hedging cues differ in speculative strength, we assign them weights in two ways: automatically using the information gain (IG) measure and semi-automatically based on their types and centrality to hedging. Weights of hedging cues are used to determine the speculative strength of sentences.</p> <p>Results</p> <p>We test our system on two publicly available hedging datasets. On the fruit-fly dataset, we achieve a precision-recall breakeven point (BEP) of 0.85 using the semi-automatic weighting scheme and a lower BEP of 0.80 with the information gain weighting scheme. These results are competitive with the previously reported best results (BEP of 0.85). On the BMC dataset, using semi-automatic weighting yields a BEP of 0.82, a statistically significant improvement (p <0.01) over the previously reported best result (BEP of 0.76), while information gain weighting yields a BEP of 0.70.</p> <p>Conclusion</p> <p>Our results demonstrate that speculative language can be recognized successfully with a linguistically motivated approach and confirms that selection of hedging devices affects the speculative strength of the sentence, which can be captured reasonably by weighting the hedging cues. The improvement obtained on the BMC dataset with a semi-automatic weighting scheme indicates that our linguistically oriented approach is more portable than the machine-learning based approaches. Lower performance obtained with the information gain weighting scheme suggests that this method may benefit from a larger, manually annotated corpus for automatically inducing the weights.</p>
collection DOAJ
language English
format Article
sources DOAJ
author Bergler Sabine
Kilicoglu Halil
spellingShingle Bergler Sabine
Kilicoglu Halil
Recognizing speculative language in biomedical research articles: a linguistically motivated perspective
BMC Bioinformatics
author_facet Bergler Sabine
Kilicoglu Halil
author_sort Bergler Sabine
title Recognizing speculative language in biomedical research articles: a linguistically motivated perspective
title_short Recognizing speculative language in biomedical research articles: a linguistically motivated perspective
title_full Recognizing speculative language in biomedical research articles: a linguistically motivated perspective
title_fullStr Recognizing speculative language in biomedical research articles: a linguistically motivated perspective
title_full_unstemmed Recognizing speculative language in biomedical research articles: a linguistically motivated perspective
title_sort recognizing speculative language in biomedical research articles: a linguistically motivated perspective
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2008-11-01
description <p>Abstract</p> <p>Background</p> <p>Due to the nature of scientific methodology, research articles are rich in speculative and tentative statements, also known as hedges. We explore a linguistically motivated approach to the problem of recognizing such language in biomedical research articles. Our approach draws on prior linguistic work as well as existing lexical resources to create a dictionary of hedging cues and extends it by introducing syntactic patterns.</p> <p>Furthermore, recognizing that hedging cues differ in speculative strength, we assign them weights in two ways: automatically using the information gain (IG) measure and semi-automatically based on their types and centrality to hedging. Weights of hedging cues are used to determine the speculative strength of sentences.</p> <p>Results</p> <p>We test our system on two publicly available hedging datasets. On the fruit-fly dataset, we achieve a precision-recall breakeven point (BEP) of 0.85 using the semi-automatic weighting scheme and a lower BEP of 0.80 with the information gain weighting scheme. These results are competitive with the previously reported best results (BEP of 0.85). On the BMC dataset, using semi-automatic weighting yields a BEP of 0.82, a statistically significant improvement (p <0.01) over the previously reported best result (BEP of 0.76), while information gain weighting yields a BEP of 0.70.</p> <p>Conclusion</p> <p>Our results demonstrate that speculative language can be recognized successfully with a linguistically motivated approach and confirms that selection of hedging devices affects the speculative strength of the sentence, which can be captured reasonably by weighting the hedging cues. The improvement obtained on the BMC dataset with a semi-automatic weighting scheme indicates that our linguistically oriented approach is more portable than the machine-learning based approaches. Lower performance obtained with the information gain weighting scheme suggests that this method may benefit from a larger, manually annotated corpus for automatically inducing the weights.</p>
work_keys_str_mv AT berglersabine recognizingspeculativelanguageinbiomedicalresearcharticlesalinguisticallymotivatedperspective
AT kilicogluhalil recognizingspeculativelanguageinbiomedicalresearcharticlesalinguisticallymotivatedperspective
_version_ 1725339723171364864