Learning from multirelational data through multiple views

Since their first release in the 1970s, relational databases have been routinely used to collect and organize real-world data---from financial transactions, marketing surveys, to health informatics observations. Traditional data mining methods expect data in the form of a single table, thus resultin...

Full description

Bibliographic Details
Main Author: Guo, Hongyu
Format: Others
Language:en
Published: University of Ottawa (Canada) 2013
Subjects:
Online Access:http://hdl.handle.net/10393/29521
http://dx.doi.org/10.20381/ruor-19790
id ndltd-uottawa.ca-oai-ruor.uottawa.ca-10393-29521
record_format oai_dc
spelling ndltd-uottawa.ca-oai-ruor.uottawa.ca-10393-295212018-01-05T19:08:29Z Learning from multirelational data through multiple views Guo, Hongyu Computer Science. Since their first release in the 1970s, relational databases have been routinely used to collect and organize real-world data---from financial transactions, marketing surveys, to health informatics observations. Traditional data mining methods expect data in the form of a single table, thus resulting in an inability to deal with such relational repositories. Multirelational Data Mining, on the other hand, aims to discover useful patterns across multiple inter-connected relations in a relational database. To this end, this work focuses on how to build classification models for relational databases through multiple views (feature sets). This study developed four multiple view strategies for mining multirelational data. The thesis firstly introduces the Multi-View Relational Classification (MRC) framework, for constructing hypotheses from sets of attributes of the presented data. The MRC strategy distinguishes itself from existing multirelational mining algorithms by excluding the need to either transform multiple relations into a universal single table or to devise new techniques for direct relational learning. The MRC algorithm offers both predictive performance and efficiency gains over current relational models, when mining diverse relational databases. Secondly, the MRC-IM method extends the MRC approach in order to deal with skew-class multirelational data. Here, the number of examples from one class is much higher than the others and correctly classifying the underrepresented examples is of prime importance. The MRC-IM method offers performance gains over a current relational model not only against majority class instances, but also against underrepresented examples. While the MRC and MRC-IM methods construct an individual view using features within a sole relation, the third multi-view strategy formulated by this work, namely the MRC-Cross approach, enables the search and collection of relevant attributes across multiple relations when constructing individual views. Finally, we present the SESP technique for pre-pruning uninteresting relations of complex relational databases. Through identifying uninteresting views from the MRC framework, our SESP method creates a pruned structure, while minimizing predictive performance loss on the final classification model. The results of this study thus suggest that learning from multiple views sets a new direction for efficiently mining data in many relational forms, including relational databases, graphs and social networks. 2013-11-08T16:07:49Z 2013-11-08T16:07:49Z 2008 2008 Thesis Source: Dissertation Abstracts International, Volume: 69-08, Section: B, page: 4842. http://hdl.handle.net/10393/29521 http://dx.doi.org/10.20381/ruor-19790 en 216 p. University of Ottawa (Canada)
collection NDLTD
language en
format Others
sources NDLTD
topic Computer Science.
spellingShingle Computer Science.
Guo, Hongyu
Learning from multirelational data through multiple views
description Since their first release in the 1970s, relational databases have been routinely used to collect and organize real-world data---from financial transactions, marketing surveys, to health informatics observations. Traditional data mining methods expect data in the form of a single table, thus resulting in an inability to deal with such relational repositories. Multirelational Data Mining, on the other hand, aims to discover useful patterns across multiple inter-connected relations in a relational database. To this end, this work focuses on how to build classification models for relational databases through multiple views (feature sets). This study developed four multiple view strategies for mining multirelational data. The thesis firstly introduces the Multi-View Relational Classification (MRC) framework, for constructing hypotheses from sets of attributes of the presented data. The MRC strategy distinguishes itself from existing multirelational mining algorithms by excluding the need to either transform multiple relations into a universal single table or to devise new techniques for direct relational learning. The MRC algorithm offers both predictive performance and efficiency gains over current relational models, when mining diverse relational databases. Secondly, the MRC-IM method extends the MRC approach in order to deal with skew-class multirelational data. Here, the number of examples from one class is much higher than the others and correctly classifying the underrepresented examples is of prime importance. The MRC-IM method offers performance gains over a current relational model not only against majority class instances, but also against underrepresented examples. While the MRC and MRC-IM methods construct an individual view using features within a sole relation, the third multi-view strategy formulated by this work, namely the MRC-Cross approach, enables the search and collection of relevant attributes across multiple relations when constructing individual views. Finally, we present the SESP technique for pre-pruning uninteresting relations of complex relational databases. Through identifying uninteresting views from the MRC framework, our SESP method creates a pruned structure, while minimizing predictive performance loss on the final classification model. The results of this study thus suggest that learning from multiple views sets a new direction for efficiently mining data in many relational forms, including relational databases, graphs and social networks.
author Guo, Hongyu
author_facet Guo, Hongyu
author_sort Guo, Hongyu
title Learning from multirelational data through multiple views
title_short Learning from multirelational data through multiple views
title_full Learning from multirelational data through multiple views
title_fullStr Learning from multirelational data through multiple views
title_full_unstemmed Learning from multirelational data through multiple views
title_sort learning from multirelational data through multiple views
publisher University of Ottawa (Canada)
publishDate 2013
url http://hdl.handle.net/10393/29521
http://dx.doi.org/10.20381/ruor-19790
work_keys_str_mv AT guohongyu learningfrommultirelationaldatathroughmultipleviews
_version_ 1718602976262619136