Block-level Ranking for Intra-Website Pages

碩士 === 國立交通大學 === 資訊學院碩士在職專班資訊組 === 95 === According to the statistical data, there are more than 14 billion web pages in whole world by June of 2007. It’s a important thing that how to use this huge database efficiently. For the information that we do not know its location, we usually use search en...

Full description

Bibliographic Details
Main Authors: Wen-Feng Yao, 姚文鋒
Other Authors: I-Chen Wu
Format: Others
Language:zh-TW
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/2tvrgw
id ndltd-TW-095NCTU5392009
record_format oai_dc
spelling ndltd-TW-095NCTU53920092019-05-15T19:48:25Z http://ndltd.ncl.edu.tw/handle/2tvrgw Block-level Ranking for Intra-Website Pages 網站內網頁之區塊等級分析 Wen-Feng Yao 姚文鋒 碩士 國立交通大學 資訊學院碩士在職專班資訊組 95 According to the statistical data, there are more than 14 billion web pages in whole world by June of 2007. It’s a important thing that how to use this huge database efficiently. For the information that we do not know its location, we usually use search engines to help us to find it out. And for the information that we do know where it is, we use data extraction to increase the efficiency. BODE (Browser Oriented Data Extraction), developed by our laboratory, is such a web data extraction system. Its GUI can be used to indicate the data they want to retrieve, and the system will generate the BODE script that is used in the extraction process, and then start to extract. However, people must have the basic knowledge about the syntax of BODE script, XPath and HTML Tag to build the BODE script. To reduce the threshold of using BODE system, this thesis proposes an algorithm to distinguish the useful information blocks from a single web site, so as to accomplish the goal of automatically generating BODE script. I-Chen Wu 吳毅成 2007 學位論文 ; thesis 37 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立交通大學 === 資訊學院碩士在職專班資訊組 === 95 === According to the statistical data, there are more than 14 billion web pages in whole world by June of 2007. It’s a important thing that how to use this huge database efficiently. For the information that we do not know its location, we usually use search engines to help us to find it out. And for the information that we do know where it is, we use data extraction to increase the efficiency. BODE (Browser Oriented Data Extraction), developed by our laboratory, is such a web data extraction system. Its GUI can be used to indicate the data they want to retrieve, and the system will generate the BODE script that is used in the extraction process, and then start to extract. However, people must have the basic knowledge about the syntax of BODE script, XPath and HTML Tag to build the BODE script. To reduce the threshold of using BODE system, this thesis proposes an algorithm to distinguish the useful information blocks from a single web site, so as to accomplish the goal of automatically generating BODE script.
author2 I-Chen Wu
author_facet I-Chen Wu
Wen-Feng Yao
姚文鋒
author Wen-Feng Yao
姚文鋒
spellingShingle Wen-Feng Yao
姚文鋒
Block-level Ranking for Intra-Website Pages
author_sort Wen-Feng Yao
title Block-level Ranking for Intra-Website Pages
title_short Block-level Ranking for Intra-Website Pages
title_full Block-level Ranking for Intra-Website Pages
title_fullStr Block-level Ranking for Intra-Website Pages
title_full_unstemmed Block-level Ranking for Intra-Website Pages
title_sort block-level ranking for intra-website pages
publishDate 2007
url http://ndltd.ncl.edu.tw/handle/2tvrgw
work_keys_str_mv AT wenfengyao blocklevelrankingforintrawebsitepages
AT yáowénfēng blocklevelrankingforintrawebsitepages
AT wenfengyao wǎngzhànnèiwǎngyèzhīqūkuàiděngjífēnxī
AT yáowénfēng wǎngzhànnèiwǎngyèzhīqūkuàiděngjífēnxī
_version_ 1719094157033603072