Sequence-based Web Page Template Detection

abstract: Templates are wildly used in Web sites development. Finding the template for a given set of Web pages could be very important and useful for many applications like Web page classification and monitoring content and structure changes of Web pages. In this thesis, two novel sequence-based We...

Full description

Bibliographic Details
Other Authors: Huang, Wei (Author)
Format: Dissertation
Language:English
Published: 2011
Subjects:
Online Access:http://hdl.handle.net/2286/R.I.9268
id ndltd-asu.edu-item-9268
record_format oai_dc
spelling ndltd-asu.edu-item-92682018-06-22T03:01:53Z Sequence-based Web Page Template Detection abstract: Templates are wildly used in Web sites development. Finding the template for a given set of Web pages could be very important and useful for many applications like Web page classification and monitoring content and structure changes of Web pages. In this thesis, two novel sequence-based Web page template detection algorithms are presented. Different from tree mapping algorithms which are based on tree edit distance, sequence-based template detection algorithms operate on the Prüfer/Consolidated Prüfer sequences of trees. Since there are one-to-one correspondences between Prüfer/Consolidated Prüfer sequences and trees, sequence-based template detection algorithms identify the template by finding a common subsequence between to Prüfer/Consolidated Prüfer sequences. This subsequence should be a sequential representation of a common subtree of input trees. Experiments on real-world web pages showed that our approaches detect templates effectively and efficiently. Dissertation/Thesis Huang, Wei (Author) Candan, Kasim Selçuk (Advisor) Sundaram, Hari (Committee member) Davulcu, Hasan (Committee member) Arizona State University (Publisher) Computer Science eng 70 pages M.S. Computer Science 2011 Masters Thesis http://hdl.handle.net/2286/R.I.9268 http://rightsstatements.org/vocab/InC/1.0/ All Rights Reserved 2011
collection NDLTD
language English
format Dissertation
sources NDLTD
topic Computer Science
spellingShingle Computer Science
Sequence-based Web Page Template Detection
description abstract: Templates are wildly used in Web sites development. Finding the template for a given set of Web pages could be very important and useful for many applications like Web page classification and monitoring content and structure changes of Web pages. In this thesis, two novel sequence-based Web page template detection algorithms are presented. Different from tree mapping algorithms which are based on tree edit distance, sequence-based template detection algorithms operate on the Prüfer/Consolidated Prüfer sequences of trees. Since there are one-to-one correspondences between Prüfer/Consolidated Prüfer sequences and trees, sequence-based template detection algorithms identify the template by finding a common subsequence between to Prüfer/Consolidated Prüfer sequences. This subsequence should be a sequential representation of a common subtree of input trees. Experiments on real-world web pages showed that our approaches detect templates effectively and efficiently. === Dissertation/Thesis === M.S. Computer Science 2011
author2 Huang, Wei (Author)
author_facet Huang, Wei (Author)
title Sequence-based Web Page Template Detection
title_short Sequence-based Web Page Template Detection
title_full Sequence-based Web Page Template Detection
title_fullStr Sequence-based Web Page Template Detection
title_full_unstemmed Sequence-based Web Page Template Detection
title_sort sequence-based web page template detection
publishDate 2011
url http://hdl.handle.net/2286/R.I.9268
_version_ 1718699663113060352