Discovering and using implicit data for information retrieval

In real-world information retrieval (IR) tasks, the searched items and/or the users' queries often have implicit information associated with them—information that describes unspecified aspects of the items or queries. For example, in web search tasks, web pages are often pointed to by hyperlink...

Full description

Bibliographic Details
Main Author:	Yi, Xing
Language:	ENG
Published:	ScholarWorks@UMass Amherst 2011
Subjects:	Computer science
Online Access:	https://scholarworks.umass.edu/dissertations/AAI3482732

id	ndltd-UMASS-oai-scholarworks.umass.edu-dissertations-6338
record_format	oai_dc
spelling	ndltd-UMASS-oai-scholarworks.umass.edu-dissertations-63382020-12-02T14:32:23Z Discovering and using implicit data for information retrieval Yi, Xing In real-world information retrieval (IR) tasks, the searched items and/or the users' queries often have implicit information associated with them—information that describes unspecified aspects of the items or queries. For example, in web search tasks, web pages are often pointed to by hyperlinks (known as anchors) from other pages, and thus have human-generated succinct descriptions of their content (anchor text) associated with them. This indirectly available information has been shown to improve search effectiveness for different retrieval tasks. However, in many real-world IR challenges this information is sparse in the data; i.e., it is incomplete or missing in a large portion of the data. In this work, we explore how to discover and use implicit information in large amounts of data in the context of IR. We present a general perspective for discovering implicit information and demonstrate how to use the discovered data in four specific IR challenges: (1) finding relevant records in semi-structured databases where many records contain incomplete or empty fields; (2) searching web pages that have little or no associated anchor text; (3) using click-through records in web query logs to help search pages that have no or very few clicks; and (4) discovering plausible geographic locations for web queries that contain no explicit geographic information. The intuition behind our approach is that data similar in some aspects are often similar in other aspects. Thus we can (a) use the observed information of queries/documents to find similar queries/documents, and then (b) utilize those similar queries/documents to reconstruct plausible implicit information for the original queries/documents. We develop language modeling based techniques to effectively use content similarity among data for our work. Using the four different search tasks on large-scale noisy datasets, we empirically demonstrate the effectiveness of our approach. We further discuss the advantages and weaknesses of two complementary approaches within our general perspective of handling implicit information for retrieval purpose. Taken together, we describe a general perspective that uses contextual similarity among data to discover implicit information for IR challenges. Using this general perspective, we formally present two language modeling based information discovery approaches. We empirically evaluate our approaches using different IR challenges. Our research shows that supporting information discovery tailored to different search tasks can enhance IR systems' search performance and improve users' search experience. 2011-01-01T08:00:00Z text https://scholarworks.umass.edu/dissertations/AAI3482732 Doctoral Dissertations Available from Proquest ENG ScholarWorks@UMass Amherst Computer science
collection	NDLTD
language	ENG
sources	NDLTD
topic	Computer science
spellingShingle	Computer science Yi, Xing Discovering and using implicit data for information retrieval
description	In real-world information retrieval (IR) tasks, the searched items and/or the users' queries often have implicit information associated with them—information that describes unspecified aspects of the items or queries. For example, in web search tasks, web pages are often pointed to by hyperlinks (known as anchors) from other pages, and thus have human-generated succinct descriptions of their content (anchor text) associated with them. This indirectly available information has been shown to improve search effectiveness for different retrieval tasks. However, in many real-world IR challenges this information is sparse in the data; i.e., it is incomplete or missing in a large portion of the data. In this work, we explore how to discover and use implicit information in large amounts of data in the context of IR. We present a general perspective for discovering implicit information and demonstrate how to use the discovered data in four specific IR challenges: (1) finding relevant records in semi-structured databases where many records contain incomplete or empty fields; (2) searching web pages that have little or no associated anchor text; (3) using click-through records in web query logs to help search pages that have no or very few clicks; and (4) discovering plausible geographic locations for web queries that contain no explicit geographic information. The intuition behind our approach is that data similar in some aspects are often similar in other aspects. Thus we can (a) use the observed information of queries/documents to find similar queries/documents, and then (b) utilize those similar queries/documents to reconstruct plausible implicit information for the original queries/documents. We develop language modeling based techniques to effectively use content similarity among data for our work. Using the four different search tasks on large-scale noisy datasets, we empirically demonstrate the effectiveness of our approach. We further discuss the advantages and weaknesses of two complementary approaches within our general perspective of handling implicit information for retrieval purpose. Taken together, we describe a general perspective that uses contextual similarity among data to discover implicit information for IR challenges. Using this general perspective, we formally present two language modeling based information discovery approaches. We empirically evaluate our approaches using different IR challenges. Our research shows that supporting information discovery tailored to different search tasks can enhance IR systems' search performance and improve users' search experience.
author	Yi, Xing
author_facet	Yi, Xing
author_sort	Yi, Xing
title	Discovering and using implicit data for information retrieval
title_short	Discovering and using implicit data for information retrieval
title_full	Discovering and using implicit data for information retrieval
title_fullStr	Discovering and using implicit data for information retrieval
title_full_unstemmed	Discovering and using implicit data for information retrieval
title_sort	discovering and using implicit data for information retrieval
publisher	ScholarWorks@UMass Amherst
publishDate	2011
url	https://scholarworks.umass.edu/dissertations/AAI3482732
work_keys_str_mv	AT yixing discoveringandusingimplicitdataforinformationretrieval
_version_	1719364469093564416

Discovering and using implicit data for information retrieval

Similar Items