Statistical models and analysis techniques for learning in relational data

Many data sets routinely captured by organizations are relational in nature---from marketing and sales transactions, to scientific observations and medical records. Relational data record characteristics of heterogeneous objects and persistent relationships among those objects (e.g., citation graphs...

Full description

Bibliographic Details
Main Author:	Neville, Jennifer
Language:	ENG
Published:	ScholarWorks@UMass Amherst 2006
Subjects:	Computer science
Online Access:	https://scholarworks.umass.edu/dissertations/AAI3242344

id	ndltd-UMASS-oai-scholarworks.umass.edu-dissertations-4447
record_format	oai_dc
spelling	ndltd-UMASS-oai-scholarworks.umass.edu-dissertations-44472020-12-02T14:36:49Z Statistical models and analysis techniques for learning in relational data Neville, Jennifer Many data sets routinely captured by organizations are relational in nature---from marketing and sales transactions, to scientific observations and medical records. Relational data record characteristics of heterogeneous objects and persistent relationships among those objects (e.g., citation graphs, the World Wide Web, genomic structures). These data offer unique opportunities to improve model accuracy, and thereby decision-making, if machine learning techniques can effectively exploit the relational information. This work focuses on how to learn accurate statistical models of complex, relational data sets and develops two novel probabilistic models to represent, learn, and reason about statistical dependencies in these data. Relational dependency networks are the first relational model capable of learning general autocorrelation dependencies, an important class of statistical dependencies that are ubiquitous in relational data. Latent group models are the first relational model to generalize about the properties of underlying group structures to improve inference accuracy and efficiency. Not only do these two models offer performance gains over current relational models, but they also offer efficiency gains which will make relational modeling feasible for large, relational datasets where current methods are computationally intensive, if not intractable. We also formulate of a novel analysis framework to analyze relational model performance and ascribe errors to model learning and inference procedures. Within this framework, we explore the effects of data characteristics and representation choices on inference accuracy and investigate the mechanisms behind model performance. In particular, we show that the inference process in relational models can be a significant source of error and that relative model performance varies significantly across different types of relational data. 2006-01-01T08:00:00Z text https://scholarworks.umass.edu/dissertations/AAI3242344 Doctoral Dissertations Available from Proquest ENG ScholarWorks@UMass Amherst Computer science
collection	NDLTD
language	ENG
sources	NDLTD
topic	Computer science
spellingShingle	Computer science Neville, Jennifer Statistical models and analysis techniques for learning in relational data
description	Many data sets routinely captured by organizations are relational in nature---from marketing and sales transactions, to scientific observations and medical records. Relational data record characteristics of heterogeneous objects and persistent relationships among those objects (e.g., citation graphs, the World Wide Web, genomic structures). These data offer unique opportunities to improve model accuracy, and thereby decision-making, if machine learning techniques can effectively exploit the relational information. This work focuses on how to learn accurate statistical models of complex, relational data sets and develops two novel probabilistic models to represent, learn, and reason about statistical dependencies in these data. Relational dependency networks are the first relational model capable of learning general autocorrelation dependencies, an important class of statistical dependencies that are ubiquitous in relational data. Latent group models are the first relational model to generalize about the properties of underlying group structures to improve inference accuracy and efficiency. Not only do these two models offer performance gains over current relational models, but they also offer efficiency gains which will make relational modeling feasible for large, relational datasets where current methods are computationally intensive, if not intractable. We also formulate of a novel analysis framework to analyze relational model performance and ascribe errors to model learning and inference procedures. Within this framework, we explore the effects of data characteristics and representation choices on inference accuracy and investigate the mechanisms behind model performance. In particular, we show that the inference process in relational models can be a significant source of error and that relative model performance varies significantly across different types of relational data.
author	Neville, Jennifer
author_facet	Neville, Jennifer
author_sort	Neville, Jennifer
title	Statistical models and analysis techniques for learning in relational data
title_short	Statistical models and analysis techniques for learning in relational data
title_full	Statistical models and analysis techniques for learning in relational data
title_fullStr	Statistical models and analysis techniques for learning in relational data
title_full_unstemmed	Statistical models and analysis techniques for learning in relational data
title_sort	statistical models and analysis techniques for learning in relational data
publisher	ScholarWorks@UMass Amherst
publishDate	2006
url	https://scholarworks.umass.edu/dissertations/AAI3242344
work_keys_str_mv	AT nevillejennifer statisticalmodelsandanalysistechniquesforlearninginrelationaldata
_version_	1719365368506482688

Statistical models and analysis techniques for learning in relational data

Similar Items