The expected metric principle for probabilistic information retrieval

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007. === Includes bibliographical references (leaves 125-128). === Traditionally, information retrieval systems aim to maximize the number of relevant documents returned to a user within some...

Full description

Bibliographic Details
Main Author:	Chen, Harr
Other Authors:	David R. Karger.
Format:	Others
Language:	English
Published:	Massachusetts Institute of Technology 2007
Subjects:	Electrical Engineering and Computer Science.
Online Access:	http://hdl.handle.net/1721.1/38672

id	ndltd-MIT-oai-dspace.mit.edu-1721.1-38672
record_format	oai_dc
spelling	ndltd-MIT-oai-dspace.mit.edu-1721.1-386722019-05-02T16:16:50Z The expected metric principle for probabilistic information retrieval Chen, Harr David R. Karger. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007. Includes bibliographical references (leaves 125-128). Traditionally, information retrieval systems aim to maximize the number of relevant documents returned to a user within some window of the top. For that goal, the Probability Ranking Principle, which ranks documents in decreasing order of probability of relevance, is provably optimal. However, there are many scenarios in which that ranking does not optimize for the user's information need. One example is when the user would be satisfied with some limited number of relevant documents, rather than needing all relevant documents. We show that in such a scenario, an attempt to return many relevant documents can actually reduce the chances of finding any relevant documents. In this thesis, we introduce the Expected Metric Principle, which generalizes the Probability Ranking Principle in a way that intimately connects the evaluation metric and the retrieval model. We observe that given a probabilistic model of relevance, it is appropriate to rank so as to directly optimize these metrics in expectation. (cont.) We consider a number of metrics from the literature, such as the rank of the first relevant result, the %no metric that penalizes a system only for retrieving no relevant results near the top, and the diversity of retrieved results when queries have multiple interpretations, as well as introducing our own new metrics. While direct optimization of a metric's expected value may be computationally intractable, we explore heuristic search approaches, and show that a simple approximate greedy optimization algorithm produces rankings for TREC queries that outperform the standard approach based on the probability ranking principle. by Harr Chen. S.M. 2007-08-29T20:42:13Z 2007-08-29T20:42:13Z 2007 2007 Thesis http://hdl.handle.net/1721.1/38672 163943285 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 128 leaves application/pdf Massachusetts Institute of Technology
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Electrical Engineering and Computer Science.
spellingShingle	Electrical Engineering and Computer Science. Chen, Harr The expected metric principle for probabilistic information retrieval
description	Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007. === Includes bibliographical references (leaves 125-128). === Traditionally, information retrieval systems aim to maximize the number of relevant documents returned to a user within some window of the top. For that goal, the Probability Ranking Principle, which ranks documents in decreasing order of probability of relevance, is provably optimal. However, there are many scenarios in which that ranking does not optimize for the user's information need. One example is when the user would be satisfied with some limited number of relevant documents, rather than needing all relevant documents. We show that in such a scenario, an attempt to return many relevant documents can actually reduce the chances of finding any relevant documents. In this thesis, we introduce the Expected Metric Principle, which generalizes the Probability Ranking Principle in a way that intimately connects the evaluation metric and the retrieval model. We observe that given a probabilistic model of relevance, it is appropriate to rank so as to directly optimize these metrics in expectation. === (cont.) We consider a number of metrics from the literature, such as the rank of the first relevant result, the %no metric that penalizes a system only for retrieving no relevant results near the top, and the diversity of retrieved results when queries have multiple interpretations, as well as introducing our own new metrics. While direct optimization of a metric's expected value may be computationally intractable, we explore heuristic search approaches, and show that a simple approximate greedy optimization algorithm produces rankings for TREC queries that outperform the standard approach based on the probability ranking principle. === by Harr Chen. === S.M.
author2	David R. Karger.
author_facet	David R. Karger. Chen, Harr
author	Chen, Harr
author_sort	Chen, Harr
title	The expected metric principle for probabilistic information retrieval
title_short	The expected metric principle for probabilistic information retrieval
title_full	The expected metric principle for probabilistic information retrieval
title_fullStr	The expected metric principle for probabilistic information retrieval
title_full_unstemmed	The expected metric principle for probabilistic information retrieval
title_sort	expected metric principle for probabilistic information retrieval
publisher	Massachusetts Institute of Technology
publishDate	2007
url	http://hdl.handle.net/1721.1/38672
work_keys_str_mv	AT chenharr theexpectedmetricprincipleforprobabilisticinformationretrieval AT chenharr expectedmetricprincipleforprobabilisticinformationretrieval
_version_	1719037891538059264

The expected metric principle for probabilistic information retrieval

Similar Items