Development of an information retrieval and distillation agent

Though a large number of search engines are commercially available today, the use of most of them often involves tedious human efforts. Also, a large amount of information obtained using the existing search engines may or may not be relevant to the intended query. Furthermore, there is a lack of sys...

Full description

Bibliographic Details
Main Author: Liu, Yongsheng
Other Authors: Liang, Ming
Format: Others
Language:en
Published: University of Ottawa (Canada) 2013
Subjects:
Online Access:http://hdl.handle.net/10393/26514
http://dx.doi.org/10.20381/ruor-18223
id ndltd-uottawa.ca-oai-ruor.uottawa.ca-10393-26514
record_format oai_dc
spelling ndltd-uottawa.ca-oai-ruor.uottawa.ca-10393-265142018-01-05T19:07:09Z Development of an information retrieval and distillation agent Liu, Yongsheng Liang, Ming, Computer Science. Though a large number of search engines are commercially available today, the use of most of them often involves tedious human efforts. Also, a large amount of information obtained using the existing search engines may or may not be relevant to the intended query. Furthermore, there is a lack of systematic approach to quantify the value of the information for the user's needs. In this thesis, to free the user from the drudgery of the search and to provide a basis for building personalized database for a particular topic, we develop a web search and distillation agent. To retrieve the information with higher quality, we modified the existing Term frequency vs Inverse Document Frequency (TFIDF) term weighting scheme and combined it with the Hyperlink Induced Topic Search (HITS) method to create a solution measuring both importance and relevancy of a document. To construct a dynamic graph and ensure an affordable continuous search, we propose a Sliding Window Model (SWM) which is used to control the size of the node set of a graph. To improve the intelligence of the search agent, we employ the Exponential Smoothing (ES) approach to guide the search. Our experimental results show that the proposed web search and distillation approach with the above features is effective compared to other algorithms and models: the improved TFIDF algorithm improves the rationality of the search results; the proposed SWM can control the size of the node set as expected; the ES algorithm employed in SWM can further save computing time and help the search agent harvest the information with higher quality, and gains much more advantages compared to other methods implemented in the search agent. 2013-11-07T17:24:47Z 2013-11-07T17:24:47Z 2003 2003 Thesis Source: Masters Abstracts International, Volume: 42-06, page: 2237. http://hdl.handle.net/10393/26514 http://dx.doi.org/10.20381/ruor-18223 en 195 p. University of Ottawa (Canada)
collection NDLTD
language en
format Others
sources NDLTD
topic Computer Science.
spellingShingle Computer Science.
Liu, Yongsheng
Development of an information retrieval and distillation agent
description Though a large number of search engines are commercially available today, the use of most of them often involves tedious human efforts. Also, a large amount of information obtained using the existing search engines may or may not be relevant to the intended query. Furthermore, there is a lack of systematic approach to quantify the value of the information for the user's needs. In this thesis, to free the user from the drudgery of the search and to provide a basis for building personalized database for a particular topic, we develop a web search and distillation agent. To retrieve the information with higher quality, we modified the existing Term frequency vs Inverse Document Frequency (TFIDF) term weighting scheme and combined it with the Hyperlink Induced Topic Search (HITS) method to create a solution measuring both importance and relevancy of a document. To construct a dynamic graph and ensure an affordable continuous search, we propose a Sliding Window Model (SWM) which is used to control the size of the node set of a graph. To improve the intelligence of the search agent, we employ the Exponential Smoothing (ES) approach to guide the search. Our experimental results show that the proposed web search and distillation approach with the above features is effective compared to other algorithms and models: the improved TFIDF algorithm improves the rationality of the search results; the proposed SWM can control the size of the node set as expected; the ES algorithm employed in SWM can further save computing time and help the search agent harvest the information with higher quality, and gains much more advantages compared to other methods implemented in the search agent.
author2 Liang, Ming,
author_facet Liang, Ming,
Liu, Yongsheng
author Liu, Yongsheng
author_sort Liu, Yongsheng
title Development of an information retrieval and distillation agent
title_short Development of an information retrieval and distillation agent
title_full Development of an information retrieval and distillation agent
title_fullStr Development of an information retrieval and distillation agent
title_full_unstemmed Development of an information retrieval and distillation agent
title_sort development of an information retrieval and distillation agent
publisher University of Ottawa (Canada)
publishDate 2013
url http://hdl.handle.net/10393/26514
http://dx.doi.org/10.20381/ruor-18223
work_keys_str_mv AT liuyongsheng developmentofaninformationretrievalanddistillationagent
_version_ 1718601969493344256