Data Quality Through Active Constraint Discovery and Maintenance

Although integrity constraints are the primary means for enforcing data integrity, there are cases in which they are not defined or are not strictly enforced. This leads to inconsistencies in the data, causing poor data quality. In this thesis, we leverage the power of constraints to improve data...

Full description

Bibliographic Details
Main Author: Chiang, Fei Yen
Other Authors: Miller, Renee J.
Language:en_ca
Published: 2012
Subjects:
Online Access:http://hdl.handle.net/1807/33955
id ndltd-LACETR-oai-collectionscanada.gc.ca-OTU.1807-33955
record_format oai_dc
spelling ndltd-LACETR-oai-collectionscanada.gc.ca-OTU.1807-339552013-11-02T03:43:08ZData Quality Through Active Constraint Discovery and MaintenanceChiang, Fei Yendata managementdata quality0984Although integrity constraints are the primary means for enforcing data integrity, there are cases in which they are not defined or are not strictly enforced. This leads to inconsistencies in the data, causing poor data quality. In this thesis, we leverage the power of constraints to improve data quality. To ensure that the data conforms to the intended application domain semantics, we develop two algorithms focusing on constraint discovery. The first algorithm discovers a class of conditional constraints, which hold over a subset of the relation, under specific conditional values. The second algorithm discovers attribute domain constraints, which bind specific values to the attributes of a relation for a given domain. These two types of constraints have been shown to be useful for data cleaning. In practice, weak enforcement of constraints often occurs for performance reasons. This leads to inconsistencies between the data and the set of defined constraints. To resolve this inconsistency, we must determine whether it is the constraints or the data that is incorrect, and then make the necessary corrections. We develop a repair model that considers repairs to the data and repairs to the constraints on an equal footing. We present repair algorithms that find the necessary repairs to bring the data and the constraints back to a consistent state. Finally, we study the efficiency and quality of our techniques. We show that our constraint discovery algorithms find meaningful constraints with good precision and recall. We also show that our repair algorithms resolve many inconsistencies with high quality repairs, and propose repairs that previous algorithms did not consider.Miller, Renee J.2012-112012-12-10T21:26:04ZNO_RESTRICTION2012-12-10T21:26:04Z2012-12-10Thesishttp://hdl.handle.net/1807/33955en_ca
collection NDLTD
language en_ca
sources NDLTD
topic data management
data quality
0984
spellingShingle data management
data quality
0984
Chiang, Fei Yen
Data Quality Through Active Constraint Discovery and Maintenance
description Although integrity constraints are the primary means for enforcing data integrity, there are cases in which they are not defined or are not strictly enforced. This leads to inconsistencies in the data, causing poor data quality. In this thesis, we leverage the power of constraints to improve data quality. To ensure that the data conforms to the intended application domain semantics, we develop two algorithms focusing on constraint discovery. The first algorithm discovers a class of conditional constraints, which hold over a subset of the relation, under specific conditional values. The second algorithm discovers attribute domain constraints, which bind specific values to the attributes of a relation for a given domain. These two types of constraints have been shown to be useful for data cleaning. In practice, weak enforcement of constraints often occurs for performance reasons. This leads to inconsistencies between the data and the set of defined constraints. To resolve this inconsistency, we must determine whether it is the constraints or the data that is incorrect, and then make the necessary corrections. We develop a repair model that considers repairs to the data and repairs to the constraints on an equal footing. We present repair algorithms that find the necessary repairs to bring the data and the constraints back to a consistent state. Finally, we study the efficiency and quality of our techniques. We show that our constraint discovery algorithms find meaningful constraints with good precision and recall. We also show that our repair algorithms resolve many inconsistencies with high quality repairs, and propose repairs that previous algorithms did not consider.
author2 Miller, Renee J.
author_facet Miller, Renee J.
Chiang, Fei Yen
author Chiang, Fei Yen
author_sort Chiang, Fei Yen
title Data Quality Through Active Constraint Discovery and Maintenance
title_short Data Quality Through Active Constraint Discovery and Maintenance
title_full Data Quality Through Active Constraint Discovery and Maintenance
title_fullStr Data Quality Through Active Constraint Discovery and Maintenance
title_full_unstemmed Data Quality Through Active Constraint Discovery and Maintenance
title_sort data quality through active constraint discovery and maintenance
publishDate 2012
url http://hdl.handle.net/1807/33955
work_keys_str_mv AT chiangfeiyen dataqualitythroughactiveconstraintdiscoveryandmaintenance
_version_ 1716612486715670528