Data mining relational databases with probabilistic relational models

Relational databases are a popular method for organizing and storing data. Unfortunately, many machine-learning techniques are unable to handle complex relational models. The Probabilistic Relational Model (PRM) is an extension of the Bayesian Network framework that can express relational structure...

Full description

Bibliographic Details
Main Author: Chen, Yu, 1979-
Format: Others
Language:en
Published: McGill University 2006
Subjects:
Online Access:http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=97928
Description
Summary:Relational databases are a popular method for organizing and storing data. Unfortunately, many machine-learning techniques are unable to handle complex relational models. The Probabilistic Relational Model (PRM) is an extension of the Bayesian Network framework that can express relational structure as well as probabilistic dependencies. In this thesis, we significantly expand and improve an implementation of PRMs that allows defining conditional probability distributions over discrete and continuous variables. The thesis uses as starting point an implementation that has various problems, and runs very slowly when using a database management system (DBMS) as storage. This thesis discusses alternative algorithms that improve the accuracy of the learned models, the computing performance, and correct the inference problems of the existing implementation. The focus is on techniques used to reduce the running time of the algorithms when the implementation is used to learn from data stored on a DBMS. The thesis provides experimental results using this package on both synthetic and real data sets.