Summary: | 碩士 === 國立交通大學 === 資訊學院碩士在職專班資訊組 === 98 === According to the statistical datas, Up to 2010, a total of 113 million websites existed, of which 99.9% was established nearly 15 years, the face of such large and high replacement page data, how to effectively use is a very important matter。 For the information that we don’t know its location, we usually use search engine to help us to find it out。 And for the information that we do know where it is, we use data extraction to increase the efficiency。 And whether it is a search engine or information extraction tool, to analyze the complex web, the first steps is to split the Web Page to provide subject area of this location, It’s a important thing that how to use this huge database efficiently。
Since 2003 the team released Microsoft Visual Web segmentation algorithm (Vision-based page segmentation: VIPS), many papers are mostly used segmentation based on visual segmentation, However, in recent years, more and more web page Layout design, using DHTML technology-based, the original method of VIPS in the use, they are in the original design did not take into account small defects, though after the study, there are many page segmentation algorithm combined patterns to make up for the use of deficiency。
But since they are using other features of the algorithm to make up for VIPS, so this part of the Visual cues is losing the characteristics of visual segmentation,This paper presents a method, in order to split based on visualization, into the HTML document Rendering features, to solve the visual segmentation in DHTML pages, you may not find the visual Separator problems。
|