A Domain-Specific Deep Web Query Interface Classifier
碩士 === 淡江大學 === 資訊管理學系碩士班 === 99 === From previous research, the amount of data of the deep web is about 400 to 550 times larger than that of the surface web. In order to retrieve the deep web content residing in databases, we need to find the entrances of the databases, which are the deep web...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2011
|
Online Access: | http://ndltd.ncl.edu.tw/handle/87554416458031123645 |
id |
ndltd-TW-099TKU05396001 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-099TKU053960012015-10-30T04:05:41Z http://ndltd.ncl.edu.tw/handle/87554416458031123645 A Domain-Specific Deep Web Query Interface Classifier 一個識別特定主題深網查詢介面的分類器 Pei-Tzu Chang 張珮慈 碩士 淡江大學 資訊管理學系碩士班 99 From previous research, the amount of data of the deep web is about 400 to 550 times larger than that of the surface web. In order to retrieve the deep web content residing in databases, we need to find the entrances of the databases, which are the deep web query interfaces. Moreover, since the content of deep web is domain-specific, to identify the deep web query interfaces from various web forms, we propose a two-phase analysis methodology which combines pre-query and post-query analyses, and develop an automatic deep web query interface classification technique. We not only can identify deep web query forms, but also can filter out search engine forms and site search forms, which are to extract static web pages inside a site. Before the classification, we would build feature words for the non-query forms, and would crawl a large scale of domain-specific query forms to extract the semantics of popular fields of that domain. In our classification system, in the pre-query analysis phase, we use feature words for the non-query forms to filter out non-query forms so that processing time at the next phase could be reduced. In the post-query analysis stage, we use the field semantics to fill in values and submit forms automatically, and then classify forms according to the query results of the forms. The experimental result shows our two-phase analysis methodology can obtain high precision. We can filter out not only the search engine forms and site search forms, but also deep web query forms which link to disabled databases. Chichang Jou 周清江 2011 學位論文 ; thesis 80 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 淡江大學 === 資訊管理學系碩士班 === 99 === From previous research, the amount of data of the deep web is about 400 to 550 times larger than that of the surface web. In order to retrieve the deep web content residing in databases, we need to find the entrances of the databases, which are the deep web query interfaces. Moreover, since the content of deep web is domain-specific, to identify the deep web query interfaces from various web forms, we propose a two-phase analysis methodology which combines pre-query and post-query analyses, and develop an automatic deep web query interface classification technique. We not only can identify deep web query forms, but also can filter out search engine forms and site search forms, which are to extract static web pages inside a site.
Before the classification, we would build feature words for the non-query forms, and would crawl a large scale of domain-specific query forms to extract the semantics of popular fields of that domain. In our classification system, in the pre-query analysis phase, we use feature words for the non-query forms to filter out non-query forms so that processing time at the next phase could be reduced. In the post-query analysis stage, we use the field semantics to fill in values and submit forms automatically, and then classify forms according to the query results of the forms. The experimental result shows our two-phase analysis methodology can obtain high precision. We can filter out not only the search engine forms and site search forms, but also deep web query forms which link to disabled databases.
|
author2 |
Chichang Jou |
author_facet |
Chichang Jou Pei-Tzu Chang 張珮慈 |
author |
Pei-Tzu Chang 張珮慈 |
spellingShingle |
Pei-Tzu Chang 張珮慈 A Domain-Specific Deep Web Query Interface Classifier |
author_sort |
Pei-Tzu Chang |
title |
A Domain-Specific Deep Web Query Interface Classifier |
title_short |
A Domain-Specific Deep Web Query Interface Classifier |
title_full |
A Domain-Specific Deep Web Query Interface Classifier |
title_fullStr |
A Domain-Specific Deep Web Query Interface Classifier |
title_full_unstemmed |
A Domain-Specific Deep Web Query Interface Classifier |
title_sort |
domain-specific deep web query interface classifier |
publishDate |
2011 |
url |
http://ndltd.ncl.edu.tw/handle/87554416458031123645 |
work_keys_str_mv |
AT peitzuchang adomainspecificdeepwebqueryinterfaceclassifier AT zhāngpèicí adomainspecificdeepwebqueryinterfaceclassifier AT peitzuchang yīgèshíbiétèdìngzhǔtíshēnwǎngcháxúnjièmiàndefēnlèiqì AT zhāngpèicí yīgèshíbiétèdìngzhǔtíshēnwǎngcháxúnjièmiàndefēnlèiqì AT peitzuchang domainspecificdeepwebqueryinterfaceclassifier AT zhāngpèicí domainspecificdeepwebqueryinterfaceclassifier |
_version_ |
1718116793790234624 |