An Investigation of the Cost and Accuracy Tradeoffs of Supplanting AFDs with Bayes Network in Query Processing in the Presence of Incompleteness in Autonomous Databases

abstract: As the information available to lay users through autonomous data sources continues to increase, mediators become important to ensure that the wealth of information available is tapped effectively. A key challenge that these information mediators need to handle is the varying levels of inc...

Full description

Bibliographic Details
Other Authors: Raghunathan, Rohit (Author)
Format: Dissertation
Language:English
Published: 2011
Subjects:
Online Access:http://hdl.handle.net/2286/R.I.14249
id ndltd-asu.edu-item-14249
record_format oai_dc
spelling ndltd-asu.edu-item-142492018-06-22T03:02:11Z An Investigation of the Cost and Accuracy Tradeoffs of Supplanting AFDs with Bayes Network in Query Processing in the Presence of Incompleteness in Autonomous Databases abstract: As the information available to lay users through autonomous data sources continues to increase, mediators become important to ensure that the wealth of information available is tapped effectively. A key challenge that these information mediators need to handle is the varying levels of incompleteness in the underlying databases in terms of missing attribute values. Existing approaches such as Query Processing over Incomplete Autonomous Databases (QPIAD) aim to mine and use Approximate Functional Dependencies (AFDs) to predict and retrieve relevant incomplete tuples. These approaches make independence assumptions about missing values--which critically hobbles their performance when there are tuples containing missing values for multiple correlated attributes. In this thesis, I present a principled probabilis- tic alternative that views an incomplete tuple as defining a distribution over the complete tuples that it stands for. I learn this distribution in terms of Bayes networks. My approach involves min- ing/"learning" Bayes networks from a sample of the database, and using it do both imputation (predict a missing value) and query rewriting (retrieve relevant results with incompleteness on the query-constrained attributes, when the data sources are autonomous). I present empirical studies to demonstrate that (i) at higher levels of incompleteness, when multiple attribute values are missing, Bayes networks do provide a significantly higher classification accuracy and (ii) the relevant possible answers retrieved by the queries reformulated using Bayes networks provide higher precision and recall than AFDs while keeping query processing costs manageable. Dissertation/Thesis Raghunathan, Rohit (Author) Kambhampati, Subbarao (Advisor) Liu, Huan (Committee member) Lee, Joohyung (Committee member) Arizona State University (Publisher) Computer science Autonomous Databases Bayes Networks Incompleteness Uncertainty eng 44 pages M.S. Computer Science 2011 Masters Thesis http://hdl.handle.net/2286/R.I.14249 http://rightsstatements.org/vocab/InC/1.0/ All Rights Reserved 2011
collection NDLTD
language English
format Dissertation
sources NDLTD
topic Computer science
Autonomous Databases
Bayes Networks
Incompleteness
Uncertainty
spellingShingle Computer science
Autonomous Databases
Bayes Networks
Incompleteness
Uncertainty
An Investigation of the Cost and Accuracy Tradeoffs of Supplanting AFDs with Bayes Network in Query Processing in the Presence of Incompleteness in Autonomous Databases
description abstract: As the information available to lay users through autonomous data sources continues to increase, mediators become important to ensure that the wealth of information available is tapped effectively. A key challenge that these information mediators need to handle is the varying levels of incompleteness in the underlying databases in terms of missing attribute values. Existing approaches such as Query Processing over Incomplete Autonomous Databases (QPIAD) aim to mine and use Approximate Functional Dependencies (AFDs) to predict and retrieve relevant incomplete tuples. These approaches make independence assumptions about missing values--which critically hobbles their performance when there are tuples containing missing values for multiple correlated attributes. In this thesis, I present a principled probabilis- tic alternative that views an incomplete tuple as defining a distribution over the complete tuples that it stands for. I learn this distribution in terms of Bayes networks. My approach involves min- ing/"learning" Bayes networks from a sample of the database, and using it do both imputation (predict a missing value) and query rewriting (retrieve relevant results with incompleteness on the query-constrained attributes, when the data sources are autonomous). I present empirical studies to demonstrate that (i) at higher levels of incompleteness, when multiple attribute values are missing, Bayes networks do provide a significantly higher classification accuracy and (ii) the relevant possible answers retrieved by the queries reformulated using Bayes networks provide higher precision and recall than AFDs while keeping query processing costs manageable. === Dissertation/Thesis === M.S. Computer Science 2011
author2 Raghunathan, Rohit (Author)
author_facet Raghunathan, Rohit (Author)
title An Investigation of the Cost and Accuracy Tradeoffs of Supplanting AFDs with Bayes Network in Query Processing in the Presence of Incompleteness in Autonomous Databases
title_short An Investigation of the Cost and Accuracy Tradeoffs of Supplanting AFDs with Bayes Network in Query Processing in the Presence of Incompleteness in Autonomous Databases
title_full An Investigation of the Cost and Accuracy Tradeoffs of Supplanting AFDs with Bayes Network in Query Processing in the Presence of Incompleteness in Autonomous Databases
title_fullStr An Investigation of the Cost and Accuracy Tradeoffs of Supplanting AFDs with Bayes Network in Query Processing in the Presence of Incompleteness in Autonomous Databases
title_full_unstemmed An Investigation of the Cost and Accuracy Tradeoffs of Supplanting AFDs with Bayes Network in Query Processing in the Presence of Incompleteness in Autonomous Databases
title_sort investigation of the cost and accuracy tradeoffs of supplanting afds with bayes network in query processing in the presence of incompleteness in autonomous databases
publishDate 2011
url http://hdl.handle.net/2286/R.I.14249
_version_ 1718699363003269120