Automatic Identification of Data Blocks based on Web Page Structure

碩士 === 淡江大學 === 資訊工程學系碩士在職專班 === 98 === The internet has been a major source of information. It has taken the place of paper and become the most popular medium, such as: News web sites. Therefore, developing an automatic data collection technology is very important. At present the Really Simple Syn...

Full description

Bibliographic Details
Main Authors: Yi-Chen Liso, 廖益辰
Other Authors: Yih-Jia Tsai
Format: Others
Language:zh-TW
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/34610846490727929812
Description
Summary:碩士 === 淡江大學 === 資訊工程學系碩士在職專班 === 98 === The internet has been a major source of information. It has taken the place of paper and become the most popular medium, such as: News web sites. Therefore, developing an automatic data collection technology is very important. At present the Really Simple Syndication (RSS) is a general of data collection method for the users. Besides, it is use the specific program analysis web page structures to obtain the web page information. When the web page changed, the program must be rewritten. Therefore, this paper provides an automated analysis web page structure method. Using this method find the web page pattern and approved it can be the rule. It has been tested in automatic collection of web page data.