Optimizing the construction of information retrieval test collections

We consider the problem of optimally allocating a limited budget to acquire relevance judgments when constructing an information retrieval test collection. We assume that there is a large set of test queries, for each of which a large number of documents need to be judged. However, the available bud...

Full description

Bibliographic Details
Main Author:	Hosseini, M.
Published:	University College London (University of London) 2013
Subjects:	004
Online Access:	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.626052

id	ndltd-bl.uk-oai-ethos.bl.uk-626052
record_format	oai_dc
spelling	ndltd-bl.uk-oai-ethos.bl.uk-6260522015-12-03T03:29:39ZOptimizing the construction of information retrieval test collectionsHosseini, M.2013We consider the problem of optimally allocating a limited budget to acquire relevance judgments when constructing an information retrieval test collection. We assume that there is a large set of test queries, for each of which a large number of documents need to be judged. However, the available budget only permits to judge a subset of them. We begin by developing a mathematical framework for query selection as a mechanism for reducing the cost of constructing information retrieval test collections. The mathematical framework provides valuable insights into properties of the optimal subset of queries. These are that the optimal subset of queries should be least correlated with one another, but have a strong correlation with the rest of queries. In contrast to previous work, which is mostly retrospective, our mathematical framework does not assume that relevance judgments are available a priori, and hence is designed to work in practice. The mathematical framework is then extended to accommodate both the query selection and document selection approaches to arrive at a unified budget allocation method that prioritizes query-document pairs and selects a subset of them with the highest priority scores to be judged. The unified budget allocation is formulated as a convex optimization, thereby permitting efficient solution and providing a flexible framework to incorporate various optimization constraints. Once a subset of query-document pairs are selected, crowdsourcing can be used to collect associated relevance judgments. While the labels provided by crowdsourcing are relatively inexpensive, they vary in quality, introducing noise into the relevance judgments. To deal with noisy relevance judgments, multiple labels for a document are collected from different assessors. It is common practice in information retrieval to use majority voting to aggregate multiple labels. In contrast, we develop a probabilistic model that provides accurate relevance judgments with a smaller number of labels collected per document. We demonstrate the effectiveness of our cost optimization approach on three experimental data, namely: (i) various TREC tracks, (ii) a web test collection of an online search engine, and (iii) crowdsourced data collected for the INEX 2010 Book Search track. Our approach should assist research institutes, e.g. National Institute and Standard Technology (NIST), and commercial search engines, e.g. Google and Bing, to construct test collections where there are large document collections and large query logs, but where economic constraints prohibit gathering comprehensive relevance judgments.004University College London (University of London)http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.626052http://discovery.ucl.ac.uk/1382616/Electronic Thesis or Dissertation
collection	NDLTD
sources	NDLTD
topic	004
spellingShingle	004 Hosseini, M. Optimizing the construction of information retrieval test collections
description	We consider the problem of optimally allocating a limited budget to acquire relevance judgments when constructing an information retrieval test collection. We assume that there is a large set of test queries, for each of which a large number of documents need to be judged. However, the available budget only permits to judge a subset of them. We begin by developing a mathematical framework for query selection as a mechanism for reducing the cost of constructing information retrieval test collections. The mathematical framework provides valuable insights into properties of the optimal subset of queries. These are that the optimal subset of queries should be least correlated with one another, but have a strong correlation with the rest of queries. In contrast to previous work, which is mostly retrospective, our mathematical framework does not assume that relevance judgments are available a priori, and hence is designed to work in practice. The mathematical framework is then extended to accommodate both the query selection and document selection approaches to arrive at a unified budget allocation method that prioritizes query-document pairs and selects a subset of them with the highest priority scores to be judged. The unified budget allocation is formulated as a convex optimization, thereby permitting efficient solution and providing a flexible framework to incorporate various optimization constraints. Once a subset of query-document pairs are selected, crowdsourcing can be used to collect associated relevance judgments. While the labels provided by crowdsourcing are relatively inexpensive, they vary in quality, introducing noise into the relevance judgments. To deal with noisy relevance judgments, multiple labels for a document are collected from different assessors. It is common practice in information retrieval to use majority voting to aggregate multiple labels. In contrast, we develop a probabilistic model that provides accurate relevance judgments with a smaller number of labels collected per document. We demonstrate the effectiveness of our cost optimization approach on three experimental data, namely: (i) various TREC tracks, (ii) a web test collection of an online search engine, and (iii) crowdsourced data collected for the INEX 2010 Book Search track. Our approach should assist research institutes, e.g. National Institute and Standard Technology (NIST), and commercial search engines, e.g. Google and Bing, to construct test collections where there are large document collections and large query logs, but where economic constraints prohibit gathering comprehensive relevance judgments.
author	Hosseini, M.
author_facet	Hosseini, M.
author_sort	Hosseini, M.
title	Optimizing the construction of information retrieval test collections
title_short	Optimizing the construction of information retrieval test collections
title_full	Optimizing the construction of information retrieval test collections
title_fullStr	Optimizing the construction of information retrieval test collections
title_full_unstemmed	Optimizing the construction of information retrieval test collections
title_sort	optimizing the construction of information retrieval test collections
publisher	University College London (University of London)
publishDate	2013
url	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.626052
work_keys_str_mv	AT hosseinim optimizingtheconstructionofinformationretrievaltestcollections
_version_	1718141800293597184

Optimizing the construction of information retrieval test collections

Similar Items