Is searching full text more effective than searching abstracts?
<p>Abstract</p> <p>Background</p> <p>With the growing availability of full-text articles online, scientists and other consumers of the life sciences literature now have the ability to go beyond searching bibliographic records (title, abstract, metadata) to directly acce...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2009-02-01
|
Series: | BMC Bioinformatics |
Online Access: | http://www.biomedcentral.com/1471-2105/10/46 |
id |
doaj-278e3fda137541b6960b7e9ab32723a7 |
---|---|
record_format |
Article |
spelling |
doaj-278e3fda137541b6960b7e9ab32723a72020-11-25T00:13:16ZengBMCBMC Bioinformatics1471-21052009-02-011014610.1186/1471-2105-10-46Is searching full text more effective than searching abstracts?Lin Jimmy<p>Abstract</p> <p>Background</p> <p>With the growing availability of full-text articles online, scientists and other consumers of the life sciences literature now have the ability to go beyond searching bibliographic records (title, abstract, metadata) to directly access full-text content. Motivated by this emerging trend, I posed the following question: is searching full text more effective than searching abstracts? This question is answered by comparing text retrieval algorithms on MEDLINE<sup>® </sup>abstracts, full-text articles, and spans (paragraphs) within full-text articles using data from the TREC 2007 genomics track evaluation. Two retrieval models are examined: <it>bm25 </it>and the ranking algorithm implemented in the open-source Lucene search engine.</p> <p>Results</p> <p>Experiments show that treating an entire article as an indexing unit does not consistently yield higher effectiveness compared to abstract-only search. However, retrieval based on spans, or paragraphs-sized segments of full-text articles, consistently outperforms abstract-only search. Results suggest that highest overall effectiveness may be achieved by combining evidence from spans and full articles.</p> <p>Conclusion</p> <p>Users searching full text are more likely to find relevant articles than searching only abstracts. This finding affirms the value of full text collections for text retrieval and provides a starting point for future work in exploring algorithms that take advantage of rapidly-growing digital archives. Experimental results also highlight the need to develop distributed text retrieval algorithms, since full-text articles are significantly longer than abstracts and may require the computational resources of multiple machines in a cluster. The MapReduce programming model provides a convenient framework for organizing such computations.</p> http://www.biomedcentral.com/1471-2105/10/46 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Lin Jimmy |
spellingShingle |
Lin Jimmy Is searching full text more effective than searching abstracts? BMC Bioinformatics |
author_facet |
Lin Jimmy |
author_sort |
Lin Jimmy |
title |
Is searching full text more effective than searching abstracts? |
title_short |
Is searching full text more effective than searching abstracts? |
title_full |
Is searching full text more effective than searching abstracts? |
title_fullStr |
Is searching full text more effective than searching abstracts? |
title_full_unstemmed |
Is searching full text more effective than searching abstracts? |
title_sort |
is searching full text more effective than searching abstracts? |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2009-02-01 |
description |
<p>Abstract</p> <p>Background</p> <p>With the growing availability of full-text articles online, scientists and other consumers of the life sciences literature now have the ability to go beyond searching bibliographic records (title, abstract, metadata) to directly access full-text content. Motivated by this emerging trend, I posed the following question: is searching full text more effective than searching abstracts? This question is answered by comparing text retrieval algorithms on MEDLINE<sup>® </sup>abstracts, full-text articles, and spans (paragraphs) within full-text articles using data from the TREC 2007 genomics track evaluation. Two retrieval models are examined: <it>bm25 </it>and the ranking algorithm implemented in the open-source Lucene search engine.</p> <p>Results</p> <p>Experiments show that treating an entire article as an indexing unit does not consistently yield higher effectiveness compared to abstract-only search. However, retrieval based on spans, or paragraphs-sized segments of full-text articles, consistently outperforms abstract-only search. Results suggest that highest overall effectiveness may be achieved by combining evidence from spans and full articles.</p> <p>Conclusion</p> <p>Users searching full text are more likely to find relevant articles than searching only abstracts. This finding affirms the value of full text collections for text retrieval and provides a starting point for future work in exploring algorithms that take advantage of rapidly-growing digital archives. Experimental results also highlight the need to develop distributed text retrieval algorithms, since full-text articles are significantly longer than abstracts and may require the computational resources of multiple machines in a cluster. The MapReduce programming model provides a convenient framework for organizing such computations.</p> |
url |
http://www.biomedcentral.com/1471-2105/10/46 |
work_keys_str_mv |
AT linjimmy issearchingfulltextmoreeffectivethansearchingabstracts |
_version_ |
1725395273314729984 |