PubMed related articles: a probabilistic topic-based model for content similarity

<p>Abstract</p> <p>Background</p> <p>We present a probabilistic topic-based model for content similarity called <it>pmra </it>that underlies the related article search feature in PubMed. Whether or not a document is about a particular topic is computed from...

Full description

Bibliographic Details
Main Authors: Lin Jimmy, Wilbur W John
Format: Article
Language:English
Published: BMC 2007-10-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/8/423
id doaj-6821395bdd57417c8eb2c04c6b123810
record_format Article
spelling doaj-6821395bdd57417c8eb2c04c6b1238102020-11-24T22:18:46ZengBMCBMC Bioinformatics1471-21052007-10-018142310.1186/1471-2105-8-423PubMed related articles: a probabilistic topic-based model for content similarityLin JimmyWilbur W John<p>Abstract</p> <p>Background</p> <p>We present a probabilistic topic-based model for content similarity called <it>pmra </it>that underlies the related article search feature in PubMed. Whether or not a document is about a particular topic is computed from term frequencies, modeled as Poisson distributions. Unlike previous probabilistic retrieval models, we do not attempt to estimate relevance–but rather our focus is "relatedness", the probability that a user would want to examine a particular document given known interest in another. We also describe a novel technique for estimating parameters that does not require human relevance judgments; instead, the process is based on the existence of MeSH <sup>® </sup>in MEDLINE <sup>®</sup>.</p> <p>Results</p> <p>The <it>pmra </it>retrieval model was compared against <it>bm25</it>, a competitive probabilistic model that shares theoretical similarities. Experiments using the test collection from the TREC 2005 genomics track shows a small but statistically significant improvement of <it>pmra </it>over <it>bm25 </it>in terms of precision.</p> <p>Conclusion</p> <p>Our experiments suggest that the <it>pmra </it>model provides an effective ranking algorithm for related article search.</p> http://www.biomedcentral.com/1471-2105/8/423
collection DOAJ
language English
format Article
sources DOAJ
author Lin Jimmy
Wilbur W John
spellingShingle Lin Jimmy
Wilbur W John
PubMed related articles: a probabilistic topic-based model for content similarity
BMC Bioinformatics
author_facet Lin Jimmy
Wilbur W John
author_sort Lin Jimmy
title PubMed related articles: a probabilistic topic-based model for content similarity
title_short PubMed related articles: a probabilistic topic-based model for content similarity
title_full PubMed related articles: a probabilistic topic-based model for content similarity
title_fullStr PubMed related articles: a probabilistic topic-based model for content similarity
title_full_unstemmed PubMed related articles: a probabilistic topic-based model for content similarity
title_sort pubmed related articles: a probabilistic topic-based model for content similarity
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2007-10-01
description <p>Abstract</p> <p>Background</p> <p>We present a probabilistic topic-based model for content similarity called <it>pmra </it>that underlies the related article search feature in PubMed. Whether or not a document is about a particular topic is computed from term frequencies, modeled as Poisson distributions. Unlike previous probabilistic retrieval models, we do not attempt to estimate relevance–but rather our focus is "relatedness", the probability that a user would want to examine a particular document given known interest in another. We also describe a novel technique for estimating parameters that does not require human relevance judgments; instead, the process is based on the existence of MeSH <sup>® </sup>in MEDLINE <sup>®</sup>.</p> <p>Results</p> <p>The <it>pmra </it>retrieval model was compared against <it>bm25</it>, a competitive probabilistic model that shares theoretical similarities. Experiments using the test collection from the TREC 2005 genomics track shows a small but statistically significant improvement of <it>pmra </it>over <it>bm25 </it>in terms of precision.</p> <p>Conclusion</p> <p>Our experiments suggest that the <it>pmra </it>model provides an effective ranking algorithm for related article search.</p>
url http://www.biomedcentral.com/1471-2105/8/423
work_keys_str_mv AT linjimmy pubmedrelatedarticlesaprobabilistictopicbasedmodelforcontentsimilarity
AT wilburwjohn pubmedrelatedarticlesaprobabilistictopicbasedmodelforcontentsimilarity
_version_ 1725781735239581696