Is searching full text more effective than searching abstracts?

<p>Abstract</p> <p>Background</p> <p>With the growing availability of full-text articles online, scientists and other consumers of the life sciences literature now have the ability to go beyond searching bibliographic records (title, abstract, metadata) to directly acce...

Full description

Bibliographic Details
Main Author: Lin Jimmy
Format: Article
Language:English
Published: BMC 2009-02-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/10/46
id doaj-278e3fda137541b6960b7e9ab32723a7
record_format Article
spelling doaj-278e3fda137541b6960b7e9ab32723a72020-11-25T00:13:16ZengBMCBMC Bioinformatics1471-21052009-02-011014610.1186/1471-2105-10-46Is searching full text more effective than searching abstracts?Lin Jimmy<p>Abstract</p> <p>Background</p> <p>With the growing availability of full-text articles online, scientists and other consumers of the life sciences literature now have the ability to go beyond searching bibliographic records (title, abstract, metadata) to directly access full-text content. Motivated by this emerging trend, I posed the following question: is searching full text more effective than searching abstracts? This question is answered by comparing text retrieval algorithms on MEDLINE<sup>® </sup>abstracts, full-text articles, and spans (paragraphs) within full-text articles using data from the TREC 2007 genomics track evaluation. Two retrieval models are examined: <it>bm25 </it>and the ranking algorithm implemented in the open-source Lucene search engine.</p> <p>Results</p> <p>Experiments show that treating an entire article as an indexing unit does not consistently yield higher effectiveness compared to abstract-only search. However, retrieval based on spans, or paragraphs-sized segments of full-text articles, consistently outperforms abstract-only search. Results suggest that highest overall effectiveness may be achieved by combining evidence from spans and full articles.</p> <p>Conclusion</p> <p>Users searching full text are more likely to find relevant articles than searching only abstracts. This finding affirms the value of full text collections for text retrieval and provides a starting point for future work in exploring algorithms that take advantage of rapidly-growing digital archives. Experimental results also highlight the need to develop distributed text retrieval algorithms, since full-text articles are significantly longer than abstracts and may require the computational resources of multiple machines in a cluster. The MapReduce programming model provides a convenient framework for organizing such computations.</p> http://www.biomedcentral.com/1471-2105/10/46
collection DOAJ
language English
format Article
sources DOAJ
author Lin Jimmy
spellingShingle Lin Jimmy
Is searching full text more effective than searching abstracts?
BMC Bioinformatics
author_facet Lin Jimmy
author_sort Lin Jimmy
title Is searching full text more effective than searching abstracts?
title_short Is searching full text more effective than searching abstracts?
title_full Is searching full text more effective than searching abstracts?
title_fullStr Is searching full text more effective than searching abstracts?
title_full_unstemmed Is searching full text more effective than searching abstracts?
title_sort is searching full text more effective than searching abstracts?
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2009-02-01
description <p>Abstract</p> <p>Background</p> <p>With the growing availability of full-text articles online, scientists and other consumers of the life sciences literature now have the ability to go beyond searching bibliographic records (title, abstract, metadata) to directly access full-text content. Motivated by this emerging trend, I posed the following question: is searching full text more effective than searching abstracts? This question is answered by comparing text retrieval algorithms on MEDLINE<sup>® </sup>abstracts, full-text articles, and spans (paragraphs) within full-text articles using data from the TREC 2007 genomics track evaluation. Two retrieval models are examined: <it>bm25 </it>and the ranking algorithm implemented in the open-source Lucene search engine.</p> <p>Results</p> <p>Experiments show that treating an entire article as an indexing unit does not consistently yield higher effectiveness compared to abstract-only search. However, retrieval based on spans, or paragraphs-sized segments of full-text articles, consistently outperforms abstract-only search. Results suggest that highest overall effectiveness may be achieved by combining evidence from spans and full articles.</p> <p>Conclusion</p> <p>Users searching full text are more likely to find relevant articles than searching only abstracts. This finding affirms the value of full text collections for text retrieval and provides a starting point for future work in exploring algorithms that take advantage of rapidly-growing digital archives. Experimental results also highlight the need to develop distributed text retrieval algorithms, since full-text articles are significantly longer than abstracts and may require the computational resources of multiple machines in a cluster. The MapReduce programming model provides a convenient framework for organizing such computations.</p>
url http://www.biomedcentral.com/1471-2105/10/46
work_keys_str_mv AT linjimmy issearchingfulltextmoreeffectivethansearchingabstracts
_version_ 1725395273314729984