Statistical models and analysis techniques for learning in relational data

Many data sets routinely captured by organizations are relational in nature---from marketing and sales transactions, to scientific observations and medical records. Relational data record characteristics of heterogeneous objects and persistent relationships among those objects (e.g., citation graphs...

Full description

Bibliographic Details
Main Author: Neville, Jennifer
Language:ENG
Published: ScholarWorks@UMass Amherst 2006
Subjects:
Online Access:https://scholarworks.umass.edu/dissertations/AAI3242344
id ndltd-UMASS-oai-scholarworks.umass.edu-dissertations-4447
record_format oai_dc
spelling ndltd-UMASS-oai-scholarworks.umass.edu-dissertations-44472020-12-02T14:36:49Z Statistical models and analysis techniques for learning in relational data Neville, Jennifer Many data sets routinely captured by organizations are relational in nature---from marketing and sales transactions, to scientific observations and medical records. Relational data record characteristics of heterogeneous objects and persistent relationships among those objects (e.g., citation graphs, the World Wide Web, genomic structures). These data offer unique opportunities to improve model accuracy, and thereby decision-making, if machine learning techniques can effectively exploit the relational information. This work focuses on how to learn accurate statistical models of complex, relational data sets and develops two novel probabilistic models to represent, learn, and reason about statistical dependencies in these data. Relational dependency networks are the first relational model capable of learning general autocorrelation dependencies, an important class of statistical dependencies that are ubiquitous in relational data. Latent group models are the first relational model to generalize about the properties of underlying group structures to improve inference accuracy and efficiency. Not only do these two models offer performance gains over current relational models, but they also offer efficiency gains which will make relational modeling feasible for large, relational datasets where current methods are computationally intensive, if not intractable. We also formulate of a novel analysis framework to analyze relational model performance and ascribe errors to model learning and inference procedures. Within this framework, we explore the effects of data characteristics and representation choices on inference accuracy and investigate the mechanisms behind model performance. In particular, we show that the inference process in relational models can be a significant source of error and that relative model performance varies significantly across different types of relational data. 2006-01-01T08:00:00Z text https://scholarworks.umass.edu/dissertations/AAI3242344 Doctoral Dissertations Available from Proquest ENG ScholarWorks@UMass Amherst Computer science
collection NDLTD
language ENG
sources NDLTD
topic Computer science
spellingShingle Computer science
Neville, Jennifer
Statistical models and analysis techniques for learning in relational data
description Many data sets routinely captured by organizations are relational in nature---from marketing and sales transactions, to scientific observations and medical records. Relational data record characteristics of heterogeneous objects and persistent relationships among those objects (e.g., citation graphs, the World Wide Web, genomic structures). These data offer unique opportunities to improve model accuracy, and thereby decision-making, if machine learning techniques can effectively exploit the relational information. This work focuses on how to learn accurate statistical models of complex, relational data sets and develops two novel probabilistic models to represent, learn, and reason about statistical dependencies in these data. Relational dependency networks are the first relational model capable of learning general autocorrelation dependencies, an important class of statistical dependencies that are ubiquitous in relational data. Latent group models are the first relational model to generalize about the properties of underlying group structures to improve inference accuracy and efficiency. Not only do these two models offer performance gains over current relational models, but they also offer efficiency gains which will make relational modeling feasible for large, relational datasets where current methods are computationally intensive, if not intractable. We also formulate of a novel analysis framework to analyze relational model performance and ascribe errors to model learning and inference procedures. Within this framework, we explore the effects of data characteristics and representation choices on inference accuracy and investigate the mechanisms behind model performance. In particular, we show that the inference process in relational models can be a significant source of error and that relative model performance varies significantly across different types of relational data.
author Neville, Jennifer
author_facet Neville, Jennifer
author_sort Neville, Jennifer
title Statistical models and analysis techniques for learning in relational data
title_short Statistical models and analysis techniques for learning in relational data
title_full Statistical models and analysis techniques for learning in relational data
title_fullStr Statistical models and analysis techniques for learning in relational data
title_full_unstemmed Statistical models and analysis techniques for learning in relational data
title_sort statistical models and analysis techniques for learning in relational data
publisher ScholarWorks@UMass Amherst
publishDate 2006
url https://scholarworks.umass.edu/dissertations/AAI3242344
work_keys_str_mv AT nevillejennifer statisticalmodelsandanalysistechniquesforlearninginrelationaldata
_version_ 1719365368506482688