A Framework for Extraction Plans and Heuristics in an Ontology-Based Data-Extraction System

Extraction of information from semi-structured or unstructured documents, such as Web pages, is a useful yet complex task. Research has demonstrated that ontologies may be used to achieve a high degree of accuracy in data extraction while maintaining resiliency in the face of document changes. Ontol...

Full description

Bibliographic Details
Main Author: Wessman, Alan E.
Format: Others
Published: BYU ScholarsArchive 2005
Subjects:
OSM
Online Access:https://scholarsarchive.byu.edu/etd/238
https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=1237&context=etd
id ndltd-BGMYU2-oai-scholarsarchive.byu.edu-etd-1237
record_format oai_dc
spelling ndltd-BGMYU2-oai-scholarsarchive.byu.edu-etd-12372019-05-16T03:07:19Z A Framework for Extraction Plans and Heuristics in an Ontology-Based Data-Extraction System Wessman, Alan E. Extraction of information from semi-structured or unstructured documents, such as Web pages, is a useful yet complex task. Research has demonstrated that ontologies may be used to achieve a high degree of accuracy in data extraction while maintaining resiliency in the face of document changes. Ontologies do not, however, diminish the complexity of a data-extraction system. As research in the field progresses, the need for a modular data-extraction system that de-couples the various functional processes involved continues to grow. In this thesis we propose a framework for such a system. The nature of the framework allows new algorithms and ideas to be incorporated into a data extraction system without requiring wholesale rewrites of a large part of the system’s source code. It also allows researchers to focus their attention on parts of the system relevant to their research without having to worry about introducing incompatibilities with the remaining components. We demonstrate the value of the framework by providing a implementation of it, and we show that our implementation is capable of achieving accuracy in its extraction results comparable to that achieved by the legacy BYU-Ontos data-extraction system. We also suggest alternate ways in which the framework may be extended and implemented, and we supply documentation on the framework for future use by data-extraction researchers. 2005-01-26T08:00:00Z text application/pdf https://scholarsarchive.byu.edu/etd/238 https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=1237&context=etd http://lib.byu.edu/about/copyright/ All Theses and Dissertations BYU ScholarsArchive data extraction ontology framework extraction plan inference conceptual modeling data frame information extraction OSMX OSM Ontos OntosEngine OntologyEditor Computer Sciences
collection NDLTD
format Others
sources NDLTD
topic data extraction
ontology
framework
extraction plan
inference
conceptual modeling
data frame
information extraction
OSMX
OSM
Ontos
OntosEngine
OntologyEditor
Computer Sciences
spellingShingle data extraction
ontology
framework
extraction plan
inference
conceptual modeling
data frame
information extraction
OSMX
OSM
Ontos
OntosEngine
OntologyEditor
Computer Sciences
Wessman, Alan E.
A Framework for Extraction Plans and Heuristics in an Ontology-Based Data-Extraction System
description Extraction of information from semi-structured or unstructured documents, such as Web pages, is a useful yet complex task. Research has demonstrated that ontologies may be used to achieve a high degree of accuracy in data extraction while maintaining resiliency in the face of document changes. Ontologies do not, however, diminish the complexity of a data-extraction system. As research in the field progresses, the need for a modular data-extraction system that de-couples the various functional processes involved continues to grow. In this thesis we propose a framework for such a system. The nature of the framework allows new algorithms and ideas to be incorporated into a data extraction system without requiring wholesale rewrites of a large part of the system’s source code. It also allows researchers to focus their attention on parts of the system relevant to their research without having to worry about introducing incompatibilities with the remaining components. We demonstrate the value of the framework by providing a implementation of it, and we show that our implementation is capable of achieving accuracy in its extraction results comparable to that achieved by the legacy BYU-Ontos data-extraction system. We also suggest alternate ways in which the framework may be extended and implemented, and we supply documentation on the framework for future use by data-extraction researchers.
author Wessman, Alan E.
author_facet Wessman, Alan E.
author_sort Wessman, Alan E.
title A Framework for Extraction Plans and Heuristics in an Ontology-Based Data-Extraction System
title_short A Framework for Extraction Plans and Heuristics in an Ontology-Based Data-Extraction System
title_full A Framework for Extraction Plans and Heuristics in an Ontology-Based Data-Extraction System
title_fullStr A Framework for Extraction Plans and Heuristics in an Ontology-Based Data-Extraction System
title_full_unstemmed A Framework for Extraction Plans and Heuristics in an Ontology-Based Data-Extraction System
title_sort framework for extraction plans and heuristics in an ontology-based data-extraction system
publisher BYU ScholarsArchive
publishDate 2005
url https://scholarsarchive.byu.edu/etd/238
https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=1237&context=etd
work_keys_str_mv AT wessmanalane aframeworkforextractionplansandheuristicsinanontologybaseddataextractionsystem
AT wessmanalane frameworkforextractionplansandheuristicsinanontologybaseddataextractionsystem
_version_ 1719184535751491584