Finding structure and characteristic of web documents for classification.

by Wong, Wai Ching. === Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. === Includes bibliographical references (leaves 91-94). === Abstracts in English and Chinese. === Abstract --- p.ii === Acknowledgments --- p.v === Chapter 1 --- Introduction --- p.1 === Chapter 1.1 --- Semistructure...

Full description

Bibliographic Details
Other Authors: Wong, Wai Ching.
Format: Others
Language:English
Chinese
Published: 2000
Subjects:
Online Access:http://library.cuhk.edu.hk/record=b5890340
http://repository.lib.cuhk.edu.hk/en/item/cuhk-323129
id ndltd-cuhk.edu.hk-oai-cuhk-dr-cuhk_323129
record_format oai_dc
spelling ndltd-cuhk.edu.hk-oai-cuhk-dr-cuhk_3231292019-02-26T03:33:35Z Finding structure and characteristic of web documents for classification. World Wide Web Information organization Web search engines by Wong, Wai Ching. Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. Includes bibliographical references (leaves 91-94). Abstracts in English and Chinese. Abstract --- p.ii Acknowledgments --- p.v Chapter 1 --- Introduction --- p.1 Chapter 1.1 --- Semistructured Data --- p.2 Chapter 1.2 --- Problem Addressed in the Thesis --- p.4 Chapter 1.2.1 --- Labels and Values --- p.4 Chapter 1.2.2 --- Discover Labels for the Same Attribute --- p.5 Chapter 1.2.3 --- Classifying A Web Page --- p.6 Chapter 1.3 --- Organization of the Thesis --- p.8 Chapter 2 --- Background --- p.8 Chapter 2.1 --- Related Work on Web Data --- p.8 Chapter 2.1.1 --- Object Exchange Model (OEM) --- p.9 Chapter 2.1.2 --- Schema Extraction --- p.11 Chapter 2.1.3 --- Discovering Typical Structure --- p.15 Chapter 2.1.4 --- Information Extraction of Web Data --- p.17 Chapter 2.2 --- Automatic Text Processing --- p.19 Chapter 2.2.1 --- Stopwords Elimination --- p.19 Chapter 2.2.2 --- Stemming --- p.20 Chapter 3 --- Web Data Definition --- p.22 Chapter 3.1 --- Web Page --- p.22 Chapter 3.2 --- Problem Description --- p.27 Chapter 4 --- Hierarchical Structure --- p.32 Chapter 4.1 --- Types of HTML Tags --- p.33 Chapter 4.2 --- Tag-tree --- p.36 Chapter 4.3 --- Hierarchical Structure Construction --- p.41 Chapter 4.4 --- Hierarchical Structure Statistics --- p.50 Chapter 5 --- Similar Labels Discovery --- p.53 Chapter 5.1 --- Expression of Hierarchical Structure --- p.53 Chapter 5.2 --- Labels Discovery Algorithm --- p.55 Chapter 5.2.1 --- Phase 1: Remove Non-label Nodes --- p.57 Chapter 5.2.2 --- Phase 2: Identify Label Nodes --- p.61 Chapter 5.2.3 --- Phase 3: Discover Similar Labels --- p.66 Chapter 5.3 --- Performance Evaluation of Labels Discovery Algorithm --- p.76 Chapter 5.3.1 --- Phase 1 Results --- p.75 Chapter 5.3.2 --- Phase 2 Results --- p.77 Chapter 5.3.3 --- Phase 3 Results --- p.81 Chapter 5.4 --- Classifying a Web Page --- p.83 Chapter 5.4.1 --- Similarity Measurement --- p.84 Chapter 5.4.2 --- Performance Evaluation --- p.86 Chapter 6 --- Conclusion --- p.89 Wong, Wai Ching. Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering. 2000 Text bibliography print xii, 94 leaves : ill. ; 30 cm. cuhk:323129 http://library.cuhk.edu.hk/record=b5890340 eng chi Use of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/) http://repository.lib.cuhk.edu.hk/en/islandora/object/cuhk%3A323129/datastream/TN/view/Finding%20structure%20and%20characteristic%20of%20web%20documents%20for%20classification.jpghttp://repository.lib.cuhk.edu.hk/en/item/cuhk-323129
collection NDLTD
language English
Chinese
format Others
sources NDLTD
topic World Wide Web
Information organization
Web search engines
spellingShingle World Wide Web
Information organization
Web search engines
Finding structure and characteristic of web documents for classification.
description by Wong, Wai Ching. === Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. === Includes bibliographical references (leaves 91-94). === Abstracts in English and Chinese. === Abstract --- p.ii === Acknowledgments --- p.v === Chapter 1 --- Introduction --- p.1 === Chapter 1.1 --- Semistructured Data --- p.2 === Chapter 1.2 --- Problem Addressed in the Thesis --- p.4 === Chapter 1.2.1 --- Labels and Values --- p.4 === Chapter 1.2.2 --- Discover Labels for the Same Attribute --- p.5 === Chapter 1.2.3 --- Classifying A Web Page --- p.6 === Chapter 1.3 --- Organization of the Thesis --- p.8 === Chapter 2 --- Background --- p.8 === Chapter 2.1 --- Related Work on Web Data --- p.8 === Chapter 2.1.1 --- Object Exchange Model (OEM) --- p.9 === Chapter 2.1.2 --- Schema Extraction --- p.11 === Chapter 2.1.3 --- Discovering Typical Structure --- p.15 === Chapter 2.1.4 --- Information Extraction of Web Data --- p.17 === Chapter 2.2 --- Automatic Text Processing --- p.19 === Chapter 2.2.1 --- Stopwords Elimination --- p.19 === Chapter 2.2.2 --- Stemming --- p.20 === Chapter 3 --- Web Data Definition --- p.22 === Chapter 3.1 --- Web Page --- p.22 === Chapter 3.2 --- Problem Description --- p.27 === Chapter 4 --- Hierarchical Structure --- p.32 === Chapter 4.1 --- Types of HTML Tags --- p.33 === Chapter 4.2 --- Tag-tree --- p.36 === Chapter 4.3 --- Hierarchical Structure Construction --- p.41 === Chapter 4.4 --- Hierarchical Structure Statistics --- p.50 === Chapter 5 --- Similar Labels Discovery --- p.53 === Chapter 5.1 --- Expression of Hierarchical Structure --- p.53 === Chapter 5.2 --- Labels Discovery Algorithm --- p.55 === Chapter 5.2.1 --- Phase 1: Remove Non-label Nodes --- p.57 === Chapter 5.2.2 --- Phase 2: Identify Label Nodes --- p.61 === Chapter 5.2.3 --- Phase 3: Discover Similar Labels --- p.66 === Chapter 5.3 --- Performance Evaluation of Labels Discovery Algorithm --- p.76 === Chapter 5.3.1 --- Phase 1 Results --- p.75 === Chapter 5.3.2 --- Phase 2 Results --- p.77 === Chapter 5.3.3 --- Phase 3 Results --- p.81 === Chapter 5.4 --- Classifying a Web Page --- p.83 === Chapter 5.4.1 --- Similarity Measurement --- p.84 === Chapter 5.4.2 --- Performance Evaluation --- p.86 === Chapter 6 --- Conclusion --- p.89
author2 Wong, Wai Ching.
author_facet Wong, Wai Ching.
title Finding structure and characteristic of web documents for classification.
title_short Finding structure and characteristic of web documents for classification.
title_full Finding structure and characteristic of web documents for classification.
title_fullStr Finding structure and characteristic of web documents for classification.
title_full_unstemmed Finding structure and characteristic of web documents for classification.
title_sort finding structure and characteristic of web documents for classification.
publishDate 2000
url http://library.cuhk.edu.hk/record=b5890340
http://repository.lib.cuhk.edu.hk/en/item/cuhk-323129
_version_ 1718982621595172864