Automatic Identification of Data Blocks based on Web Page Structure
碩士 === 淡江大學 === 資訊工程學系碩士在職專班 === 98 === The internet has been a major source of information. It has taken the place of paper and become the most popular medium, such as: News web sites. Therefore, developing an automatic data collection technology is very important. At present the Really Simple Syn...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2010
|
Online Access: | http://ndltd.ncl.edu.tw/handle/34610846490727929812 |
Summary: | 碩士 === 淡江大學 === 資訊工程學系碩士在職專班 === 98 === The internet has been a major source of information. It has taken the place of paper and become the most popular medium, such as: News web sites. Therefore, developing an automatic data collection technology is very important.
At present the Really Simple Syndication (RSS) is a general of data collection method for the users. Besides, it is use the specific program analysis web page structures to obtain the web page information. When the web page changed, the program must be rewritten. Therefore, this paper provides an automated analysis web page structure method. Using this method find the web page pattern and approved it can be the rule. It has been tested in automatic collection of web page data.
|
---|