Ontology-based Feature Construction on Non-structured Data
Main Author: | |
---|---|
Language: | English |
Published: |
University of Cincinnati / OhioLINK
2015
|
Subjects: | |
Online Access: | http://rave.ohiolink.edu/etdc/view?acc_num=ucin1439309340 |
id |
ndltd-OhioLink-oai-etd.ohiolink.edu-ucin1439309340 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-OhioLink-oai-etd.ohiolink.edu-ucin14393093402021-08-03T06:32:47Z Ontology-based Feature Construction on Non-structured Data Ni, Weizeng Engineering Feature construction ontology domain knowledge knowledge discovery Data mining on non-structured data is a relatively under-researched area because most efforts in the KDD community in the last decades are devoted to mining relational structured data. Thanks to the information explosion in the big-data era, the majority of knowledge is emerging in various forms of non-structured data. This necessitates new methodologies of constructing meaningful features from non-structured data to facilitate knowledge learning. Most existing data-driven methods only serve the objective of improving feature discriminative power, while severely underestimate the importance of interpretability. In many domains, the discovery and learning of new hypotheses and knowledge in a meaningful and understandable form from non-structured data is the prime aim. In this study, an ontology-based feature construction framework is proposed. The framework presents the structural relations embedded with domain knowledge in the form of ontology. Features of 3 levels are defined based on the granularity in ontology. A feature, representing a domain hypothesis, can be readily constructed by evolving ontology. Support and confidence are two criteria proposed to evaluate the usefulness of the features in support of searching for optimal ones. Furthermore, in an interactive way, domain experts are involved to explore new hypotheses with the aid of data-driven heuristic algorithms. Also, ontology is highly flexible to be reconstructed in order to accommodate different hypotheses. A comprehensive case study is conducted in which the proposed methodology is applied on a miscellaneous medical claim data to build features that are both interpretable and highly predictive for hospitalization forecast. A medical professor are constantly consulted to bring in domain insights to aid ontology evolution and assess meaningfulness of the constructed features and prediction. The constructed features outperform those based on initial hypotheses in terms of prediction accuracy. Moreover, the ability of discovering new and useful knowledge is demonstrated by the meaningfulness of the new features and evolved ontology. 2015-09-10 English text University of Cincinnati / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=ucin1439309340 http://rave.ohiolink.edu/etdc/view?acc_num=ucin1439309340 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws. |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
topic |
Engineering Feature construction ontology domain knowledge knowledge discovery |
spellingShingle |
Engineering Feature construction ontology domain knowledge knowledge discovery Ni, Weizeng Ontology-based Feature Construction on Non-structured Data |
author |
Ni, Weizeng |
author_facet |
Ni, Weizeng |
author_sort |
Ni, Weizeng |
title |
Ontology-based Feature Construction on Non-structured Data |
title_short |
Ontology-based Feature Construction on Non-structured Data |
title_full |
Ontology-based Feature Construction on Non-structured Data |
title_fullStr |
Ontology-based Feature Construction on Non-structured Data |
title_full_unstemmed |
Ontology-based Feature Construction on Non-structured Data |
title_sort |
ontology-based feature construction on non-structured data |
publisher |
University of Cincinnati / OhioLINK |
publishDate |
2015 |
url |
http://rave.ohiolink.edu/etdc/view?acc_num=ucin1439309340 |
work_keys_str_mv |
AT niweizeng ontologybasedfeatureconstructiononnonstructureddata |
_version_ |
1719438900423819264 |