PubMed related articles: a probabilistic topic-based model for content similarity
<p>Abstract</p> <p>Background</p> <p>We present a probabilistic topic-based model for content similarity called <it>pmra </it>that underlies the related article search feature in PubMed. Whether or not a document is about a particular topic is computed from...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2007-10-01
|
Series: | BMC Bioinformatics |
Online Access: | http://www.biomedcentral.com/1471-2105/8/423 |
id |
doaj-6821395bdd57417c8eb2c04c6b123810 |
---|---|
record_format |
Article |
spelling |
doaj-6821395bdd57417c8eb2c04c6b1238102020-11-24T22:18:46ZengBMCBMC Bioinformatics1471-21052007-10-018142310.1186/1471-2105-8-423PubMed related articles: a probabilistic topic-based model for content similarityLin JimmyWilbur W John<p>Abstract</p> <p>Background</p> <p>We present a probabilistic topic-based model for content similarity called <it>pmra </it>that underlies the related article search feature in PubMed. Whether or not a document is about a particular topic is computed from term frequencies, modeled as Poisson distributions. Unlike previous probabilistic retrieval models, we do not attempt to estimate relevance–but rather our focus is "relatedness", the probability that a user would want to examine a particular document given known interest in another. We also describe a novel technique for estimating parameters that does not require human relevance judgments; instead, the process is based on the existence of MeSH <sup>® </sup>in MEDLINE <sup>®</sup>.</p> <p>Results</p> <p>The <it>pmra </it>retrieval model was compared against <it>bm25</it>, a competitive probabilistic model that shares theoretical similarities. Experiments using the test collection from the TREC 2005 genomics track shows a small but statistically significant improvement of <it>pmra </it>over <it>bm25 </it>in terms of precision.</p> <p>Conclusion</p> <p>Our experiments suggest that the <it>pmra </it>model provides an effective ranking algorithm for related article search.</p> http://www.biomedcentral.com/1471-2105/8/423 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Lin Jimmy Wilbur W John |
spellingShingle |
Lin Jimmy Wilbur W John PubMed related articles: a probabilistic topic-based model for content similarity BMC Bioinformatics |
author_facet |
Lin Jimmy Wilbur W John |
author_sort |
Lin Jimmy |
title |
PubMed related articles: a probabilistic topic-based model for content similarity |
title_short |
PubMed related articles: a probabilistic topic-based model for content similarity |
title_full |
PubMed related articles: a probabilistic topic-based model for content similarity |
title_fullStr |
PubMed related articles: a probabilistic topic-based model for content similarity |
title_full_unstemmed |
PubMed related articles: a probabilistic topic-based model for content similarity |
title_sort |
pubmed related articles: a probabilistic topic-based model for content similarity |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2007-10-01 |
description |
<p>Abstract</p> <p>Background</p> <p>We present a probabilistic topic-based model for content similarity called <it>pmra </it>that underlies the related article search feature in PubMed. Whether or not a document is about a particular topic is computed from term frequencies, modeled as Poisson distributions. Unlike previous probabilistic retrieval models, we do not attempt to estimate relevance–but rather our focus is "relatedness", the probability that a user would want to examine a particular document given known interest in another. We also describe a novel technique for estimating parameters that does not require human relevance judgments; instead, the process is based on the existence of MeSH <sup>® </sup>in MEDLINE <sup>®</sup>.</p> <p>Results</p> <p>The <it>pmra </it>retrieval model was compared against <it>bm25</it>, a competitive probabilistic model that shares theoretical similarities. Experiments using the test collection from the TREC 2005 genomics track shows a small but statistically significant improvement of <it>pmra </it>over <it>bm25 </it>in terms of precision.</p> <p>Conclusion</p> <p>Our experiments suggest that the <it>pmra </it>model provides an effective ranking algorithm for related article search.</p> |
url |
http://www.biomedcentral.com/1471-2105/8/423 |
work_keys_str_mv |
AT linjimmy pubmedrelatedarticlesaprobabilistictopicbasedmodelforcontentsimilarity AT wilburwjohn pubmedrelatedarticlesaprobabilistictopicbasedmodelforcontentsimilarity |
_version_ |
1725781735239581696 |