Block-level Ranking for Intra-Website Pages

碩士 === 國立交通大學 === 資訊學院碩士在職專班資訊組 === 95 === According to the statistical data, there are more than 14 billion web pages in whole world by June of 2007. It’s a important thing that how to use this huge database efficiently. For the information that we do not know its location, we usually use search en...

Full description

Bibliographic Details
Main Authors: Wen-Feng Yao, 姚文鋒
Other Authors: I-Chen Wu
Format: Others
Language:zh-TW
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/2tvrgw
Description
Summary:碩士 === 國立交通大學 === 資訊學院碩士在職專班資訊組 === 95 === According to the statistical data, there are more than 14 billion web pages in whole world by June of 2007. It’s a important thing that how to use this huge database efficiently. For the information that we do not know its location, we usually use search engines to help us to find it out. And for the information that we do know where it is, we use data extraction to increase the efficiency. BODE (Browser Oriented Data Extraction), developed by our laboratory, is such a web data extraction system. Its GUI can be used to indicate the data they want to retrieve, and the system will generate the BODE script that is used in the extraction process, and then start to extract. However, people must have the basic knowledge about the syntax of BODE script, XPath and HTML Tag to build the BODE script. To reduce the threshold of using BODE system, this thesis proposes an algorithm to distinguish the useful information blocks from a single web site, so as to accomplish the goal of automatically generating BODE script.