Efficient processing of complex features for information retrieval

Text search systems research has primarily focused on simple occurrences of query terms within documents to compute document relevance scores. However, recent research shows that additional document features are crucial for improving retrieval effectiveness. We develop a series of techniques for eff...

Full description

Bibliographic Details
Main Author: Strohman, Trevor
Language:ENG
Published: ScholarWorks@UMass Amherst 2008
Subjects:
Online Access:https://scholarworks.umass.edu/dissertations/AAI3315499
Description
Summary:Text search systems research has primarily focused on simple occurrences of query terms within documents to compute document relevance scores. However, recent research shows that additional document features are crucial for improving retrieval effectiveness. We develop a series of techniques for efficiently processing queries with feature-based models. Our TupleFlow framework, an extension of MapReduce, provides a basis for custom binned indexes, which efficiently store feature data. Our work in binning probabilities shows how to effectively map language model probabilities into the space of small positive integers, which helps improve speeds without reducing query effectiveness. We also show new efficient query processing results for both document-sorted and score-sorted indexes. All of our work is evaluated using the largest available research dataset.