Finding structure and characteristic of web documents for classification.
by Wong, Wai Ching. === Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. === Includes bibliographical references (leaves 91-94). === Abstracts in English and Chinese. === Abstract --- p.ii === Acknowledgments --- p.v === Chapter 1 --- Introduction --- p.1 === Chapter 1.1 --- Semistructure...
Other Authors: | |
---|---|
Format: | Others |
Language: | English Chinese |
Published: |
2000
|
Subjects: | |
Online Access: | http://library.cuhk.edu.hk/record=b5890340 http://repository.lib.cuhk.edu.hk/en/item/cuhk-323129 |
id |
ndltd-cuhk.edu.hk-oai-cuhk-dr-cuhk_323129 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-cuhk.edu.hk-oai-cuhk-dr-cuhk_3231292019-02-26T03:33:35Z Finding structure and characteristic of web documents for classification. World Wide Web Information organization Web search engines by Wong, Wai Ching. Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. Includes bibliographical references (leaves 91-94). Abstracts in English and Chinese. Abstract --- p.ii Acknowledgments --- p.v Chapter 1 --- Introduction --- p.1 Chapter 1.1 --- Semistructured Data --- p.2 Chapter 1.2 --- Problem Addressed in the Thesis --- p.4 Chapter 1.2.1 --- Labels and Values --- p.4 Chapter 1.2.2 --- Discover Labels for the Same Attribute --- p.5 Chapter 1.2.3 --- Classifying A Web Page --- p.6 Chapter 1.3 --- Organization of the Thesis --- p.8 Chapter 2 --- Background --- p.8 Chapter 2.1 --- Related Work on Web Data --- p.8 Chapter 2.1.1 --- Object Exchange Model (OEM) --- p.9 Chapter 2.1.2 --- Schema Extraction --- p.11 Chapter 2.1.3 --- Discovering Typical Structure --- p.15 Chapter 2.1.4 --- Information Extraction of Web Data --- p.17 Chapter 2.2 --- Automatic Text Processing --- p.19 Chapter 2.2.1 --- Stopwords Elimination --- p.19 Chapter 2.2.2 --- Stemming --- p.20 Chapter 3 --- Web Data Definition --- p.22 Chapter 3.1 --- Web Page --- p.22 Chapter 3.2 --- Problem Description --- p.27 Chapter 4 --- Hierarchical Structure --- p.32 Chapter 4.1 --- Types of HTML Tags --- p.33 Chapter 4.2 --- Tag-tree --- p.36 Chapter 4.3 --- Hierarchical Structure Construction --- p.41 Chapter 4.4 --- Hierarchical Structure Statistics --- p.50 Chapter 5 --- Similar Labels Discovery --- p.53 Chapter 5.1 --- Expression of Hierarchical Structure --- p.53 Chapter 5.2 --- Labels Discovery Algorithm --- p.55 Chapter 5.2.1 --- Phase 1: Remove Non-label Nodes --- p.57 Chapter 5.2.2 --- Phase 2: Identify Label Nodes --- p.61 Chapter 5.2.3 --- Phase 3: Discover Similar Labels --- p.66 Chapter 5.3 --- Performance Evaluation of Labels Discovery Algorithm --- p.76 Chapter 5.3.1 --- Phase 1 Results --- p.75 Chapter 5.3.2 --- Phase 2 Results --- p.77 Chapter 5.3.3 --- Phase 3 Results --- p.81 Chapter 5.4 --- Classifying a Web Page --- p.83 Chapter 5.4.1 --- Similarity Measurement --- p.84 Chapter 5.4.2 --- Performance Evaluation --- p.86 Chapter 6 --- Conclusion --- p.89 Wong, Wai Ching. Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering. 2000 Text bibliography print xii, 94 leaves : ill. ; 30 cm. cuhk:323129 http://library.cuhk.edu.hk/record=b5890340 eng chi Use of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/) http://repository.lib.cuhk.edu.hk/en/islandora/object/cuhk%3A323129/datastream/TN/view/Finding%20structure%20and%20characteristic%20of%20web%20documents%20for%20classification.jpghttp://repository.lib.cuhk.edu.hk/en/item/cuhk-323129 |
collection |
NDLTD |
language |
English Chinese |
format |
Others
|
sources |
NDLTD |
topic |
World Wide Web Information organization Web search engines |
spellingShingle |
World Wide Web Information organization Web search engines Finding structure and characteristic of web documents for classification. |
description |
by Wong, Wai Ching. === Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. === Includes bibliographical references (leaves 91-94). === Abstracts in English and Chinese. === Abstract --- p.ii === Acknowledgments --- p.v === Chapter 1 --- Introduction --- p.1 === Chapter 1.1 --- Semistructured Data --- p.2 === Chapter 1.2 --- Problem Addressed in the Thesis --- p.4 === Chapter 1.2.1 --- Labels and Values --- p.4 === Chapter 1.2.2 --- Discover Labels for the Same Attribute --- p.5 === Chapter 1.2.3 --- Classifying A Web Page --- p.6 === Chapter 1.3 --- Organization of the Thesis --- p.8 === Chapter 2 --- Background --- p.8 === Chapter 2.1 --- Related Work on Web Data --- p.8 === Chapter 2.1.1 --- Object Exchange Model (OEM) --- p.9 === Chapter 2.1.2 --- Schema Extraction --- p.11 === Chapter 2.1.3 --- Discovering Typical Structure --- p.15 === Chapter 2.1.4 --- Information Extraction of Web Data --- p.17 === Chapter 2.2 --- Automatic Text Processing --- p.19 === Chapter 2.2.1 --- Stopwords Elimination --- p.19 === Chapter 2.2.2 --- Stemming --- p.20 === Chapter 3 --- Web Data Definition --- p.22 === Chapter 3.1 --- Web Page --- p.22 === Chapter 3.2 --- Problem Description --- p.27 === Chapter 4 --- Hierarchical Structure --- p.32 === Chapter 4.1 --- Types of HTML Tags --- p.33 === Chapter 4.2 --- Tag-tree --- p.36 === Chapter 4.3 --- Hierarchical Structure Construction --- p.41 === Chapter 4.4 --- Hierarchical Structure Statistics --- p.50 === Chapter 5 --- Similar Labels Discovery --- p.53 === Chapter 5.1 --- Expression of Hierarchical Structure --- p.53 === Chapter 5.2 --- Labels Discovery Algorithm --- p.55 === Chapter 5.2.1 --- Phase 1: Remove Non-label Nodes --- p.57 === Chapter 5.2.2 --- Phase 2: Identify Label Nodes --- p.61 === Chapter 5.2.3 --- Phase 3: Discover Similar Labels --- p.66 === Chapter 5.3 --- Performance Evaluation of Labels Discovery Algorithm --- p.76 === Chapter 5.3.1 --- Phase 1 Results --- p.75 === Chapter 5.3.2 --- Phase 2 Results --- p.77 === Chapter 5.3.3 --- Phase 3 Results --- p.81 === Chapter 5.4 --- Classifying a Web Page --- p.83 === Chapter 5.4.1 --- Similarity Measurement --- p.84 === Chapter 5.4.2 --- Performance Evaluation --- p.86 === Chapter 6 --- Conclusion --- p.89 |
author2 |
Wong, Wai Ching. |
author_facet |
Wong, Wai Ching. |
title |
Finding structure and characteristic of web documents for classification. |
title_short |
Finding structure and characteristic of web documents for classification. |
title_full |
Finding structure and characteristic of web documents for classification. |
title_fullStr |
Finding structure and characteristic of web documents for classification. |
title_full_unstemmed |
Finding structure and characteristic of web documents for classification. |
title_sort |
finding structure and characteristic of web documents for classification. |
publishDate |
2000 |
url |
http://library.cuhk.edu.hk/record=b5890340 http://repository.lib.cuhk.edu.hk/en/item/cuhk-323129 |
_version_ |
1718982621595172864 |