A Domain-Specific Deep Web Query Interface Classifier

碩士 === 淡江大學 === 資訊管理學系碩士班 === 99 === From previous research, the amount of data of the deep web is about 400 to 550 times larger than that of the surface web. In order to retrieve the deep web content residing in databases, we need to find the entrances of the databases, which are the deep web...

Full description

Bibliographic Details
Main Authors:	Pei-Tzu Chang, 張珮慈
Other Authors:	Chichang Jou
Format:	Others
Language:	zh-TW
Published:	2011
Online Access:	http://ndltd.ncl.edu.tw/handle/87554416458031123645

id	ndltd-TW-099TKU05396001
record_format	oai_dc
spelling	ndltd-TW-099TKU053960012015-10-30T04:05:41Z http://ndltd.ncl.edu.tw/handle/87554416458031123645 A Domain-Specific Deep Web Query Interface Classifier 一個識別特定主題深網查詢介面的分類器 Pei-Tzu Chang 張珮慈碩士淡江大學資訊管理學系碩士班 99 From previous research, the amount of data of the deep web is about 400 to 550 times larger than that of the surface web. In order to retrieve the deep web content residing in databases, we need to find the entrances of the databases, which are the deep web query interfaces. Moreover, since the content of deep web is domain-specific, to identify the deep web query interfaces from various web forms, we propose a two-phase analysis methodology which combines pre-query and post-query analyses, and develop an automatic deep web query interface classification technique. We not only can identify deep web query forms, but also can filter out search engine forms and site search forms, which are to extract static web pages inside a site. Before the classification, we would build feature words for the non-query forms, and would crawl a large scale of domain-specific query forms to extract the semantics of popular fields of that domain. In our classification system, in the pre-query analysis phase, we use feature words for the non-query forms to filter out non-query forms so that processing time at the next phase could be reduced. In the post-query analysis stage, we use the field semantics to fill in values and submit forms automatically, and then classify forms according to the query results of the forms. The experimental result shows our two-phase analysis methodology can obtain high precision. We can filter out not only the search engine forms and site search forms, but also deep web query forms which link to disabled databases. Chichang Jou 周清江 2011 學位論文 ; thesis 80 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 淡江大學 === 資訊管理學系碩士班 === 99 === From previous research, the amount of data of the deep web is about 400 to 550 times larger than that of the surface web. In order to retrieve the deep web content residing in databases, we need to find the entrances of the databases, which are the deep web query interfaces. Moreover, since the content of deep web is domain-specific, to identify the deep web query interfaces from various web forms, we propose a two-phase analysis methodology which combines pre-query and post-query analyses, and develop an automatic deep web query interface classification technique. We not only can identify deep web query forms, but also can filter out search engine forms and site search forms, which are to extract static web pages inside a site. Before the classification, we would build feature words for the non-query forms, and would crawl a large scale of domain-specific query forms to extract the semantics of popular fields of that domain. In our classification system, in the pre-query analysis phase, we use feature words for the non-query forms to filter out non-query forms so that processing time at the next phase could be reduced. In the post-query analysis stage, we use the field semantics to fill in values and submit forms automatically, and then classify forms according to the query results of the forms. The experimental result shows our two-phase analysis methodology can obtain high precision. We can filter out not only the search engine forms and site search forms, but also deep web query forms which link to disabled databases.
author2	Chichang Jou
author_facet	Chichang Jou Pei-Tzu Chang 張珮慈
author	Pei-Tzu Chang 張珮慈
spellingShingle	Pei-Tzu Chang 張珮慈 A Domain-Specific Deep Web Query Interface Classifier
author_sort	Pei-Tzu Chang
title	A Domain-Specific Deep Web Query Interface Classifier
title_short	A Domain-Specific Deep Web Query Interface Classifier
title_full	A Domain-Specific Deep Web Query Interface Classifier
title_fullStr	A Domain-Specific Deep Web Query Interface Classifier
title_full_unstemmed	A Domain-Specific Deep Web Query Interface Classifier
title_sort	domain-specific deep web query interface classifier
publishDate	2011
url	http://ndltd.ncl.edu.tw/handle/87554416458031123645
work_keys_str_mv	AT peitzuchang adomainspecificdeepwebqueryinterfaceclassifier AT zhāngpèicí adomainspecificdeepwebqueryinterfaceclassifier AT peitzuchang yīgèshíbiétèdìngzhǔtíshēnwǎngcháxúnjièmiàndefēnlèiqì AT zhāngpèicí yīgèshíbiétèdìngzhǔtíshēnwǎngcháxúnjièmiàndefēnlèiqì AT peitzuchang domainspecificdeepwebqueryinterfaceclassifier AT zhāngpèicí domainspecificdeepwebqueryinterfaceclassifier
_version_	1718116793790234624

A Domain-Specific Deep Web Query Interface Classifier

Similar Items