Block-level Ranking for Intra-Website Pages
碩士 === 國立交通大學 === 資訊學院碩士在職專班資訊組 === 95 === According to the statistical data, there are more than 14 billion web pages in whole world by June of 2007. It’s a important thing that how to use this huge database efficiently. For the information that we do not know its location, we usually use search en...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2007
|
Online Access: | http://ndltd.ncl.edu.tw/handle/2tvrgw |
id |
ndltd-TW-095NCTU5392009 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-095NCTU53920092019-05-15T19:48:25Z http://ndltd.ncl.edu.tw/handle/2tvrgw Block-level Ranking for Intra-Website Pages 網站內網頁之區塊等級分析 Wen-Feng Yao 姚文鋒 碩士 國立交通大學 資訊學院碩士在職專班資訊組 95 According to the statistical data, there are more than 14 billion web pages in whole world by June of 2007. It’s a important thing that how to use this huge database efficiently. For the information that we do not know its location, we usually use search engines to help us to find it out. And for the information that we do know where it is, we use data extraction to increase the efficiency. BODE (Browser Oriented Data Extraction), developed by our laboratory, is such a web data extraction system. Its GUI can be used to indicate the data they want to retrieve, and the system will generate the BODE script that is used in the extraction process, and then start to extract. However, people must have the basic knowledge about the syntax of BODE script, XPath and HTML Tag to build the BODE script. To reduce the threshold of using BODE system, this thesis proposes an algorithm to distinguish the useful information blocks from a single web site, so as to accomplish the goal of automatically generating BODE script. I-Chen Wu 吳毅成 2007 學位論文 ; thesis 37 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立交通大學 === 資訊學院碩士在職專班資訊組 === 95 === According to the statistical data, there are more than 14 billion web pages in whole world by June of 2007. It’s a important thing that how to use this huge database efficiently. For the information that we do not know its location, we usually use search engines to help us to find it out. And for the information that we do know where it is, we use data extraction to increase the efficiency.
BODE (Browser Oriented Data Extraction), developed by our laboratory, is such a web data extraction system. Its GUI can be used to indicate the data they want to retrieve, and the system will generate the BODE script that is used in the extraction process, and then start to extract.
However, people must have the basic knowledge about the syntax of BODE script, XPath and HTML Tag to build the BODE script. To reduce the threshold of using BODE system, this
thesis proposes an algorithm to distinguish the useful information blocks from a single web site, so as to accomplish the goal of automatically generating BODE script.
|
author2 |
I-Chen Wu |
author_facet |
I-Chen Wu Wen-Feng Yao 姚文鋒 |
author |
Wen-Feng Yao 姚文鋒 |
spellingShingle |
Wen-Feng Yao 姚文鋒 Block-level Ranking for Intra-Website Pages |
author_sort |
Wen-Feng Yao |
title |
Block-level Ranking for Intra-Website Pages |
title_short |
Block-level Ranking for Intra-Website Pages |
title_full |
Block-level Ranking for Intra-Website Pages |
title_fullStr |
Block-level Ranking for Intra-Website Pages |
title_full_unstemmed |
Block-level Ranking for Intra-Website Pages |
title_sort |
block-level ranking for intra-website pages |
publishDate |
2007 |
url |
http://ndltd.ncl.edu.tw/handle/2tvrgw |
work_keys_str_mv |
AT wenfengyao blocklevelrankingforintrawebsitepages AT yáowénfēng blocklevelrankingforintrawebsitepages AT wenfengyao wǎngzhànnèiwǎngyèzhīqūkuàiděngjífēnxī AT yáowénfēng wǎngzhànnèiwǎngyèzhīqūkuàiděngjífēnxī |
_version_ |
1719094157033603072 |