Improving Retrieval Accuracy in Main Content Extraction from HTML Web Documents
The rapid growth of text based information on the World Wide Web and various applications making use of this data motivates the need for efficient and effective methods to identify and separate the “main content” from the additional content items, such as navigation menus, advertisements, design ele...
Main Author: | |
---|---|
Other Authors: | |
Format: | Doctoral Thesis |
Language: | English |
Published: |
Universitätsbibliothek Leipzig
2013
|
Subjects: | |
Online Access: | http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-130500 http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-130500 http://www.qucosa.de/fileadmin/data/qucosa/documents/13050/Thesis-Hadi-Mohammadzadeh.pdf |