Data Quality Through Active Constraint Discovery and Maintenance
Although integrity constraints are the primary means for enforcing data integrity, there are cases in which they are not defined or are not strictly enforced. This leads to inconsistencies in the data, causing poor data quality. In this thesis, we leverage the power of constraints to improve data...
Main Author: | |
---|---|
Other Authors: | |
Language: | en_ca |
Published: |
2012
|
Subjects: | |
Online Access: | http://hdl.handle.net/1807/33955 |
id |
ndltd-TORONTO-oai-tspace.library.utoronto.ca-1807-33955 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TORONTO-oai-tspace.library.utoronto.ca-1807-339552013-04-19T19:58:37ZData Quality Through Active Constraint Discovery and MaintenanceChiang, Fei Yendata managementdata quality0984Although integrity constraints are the primary means for enforcing data integrity, there are cases in which they are not defined or are not strictly enforced. This leads to inconsistencies in the data, causing poor data quality. In this thesis, we leverage the power of constraints to improve data quality. To ensure that the data conforms to the intended application domain semantics, we develop two algorithms focusing on constraint discovery. The first algorithm discovers a class of conditional constraints, which hold over a subset of the relation, under specific conditional values. The second algorithm discovers attribute domain constraints, which bind specific values to the attributes of a relation for a given domain. These two types of constraints have been shown to be useful for data cleaning. In practice, weak enforcement of constraints often occurs for performance reasons. This leads to inconsistencies between the data and the set of defined constraints. To resolve this inconsistency, we must determine whether it is the constraints or the data that is incorrect, and then make the necessary corrections. We develop a repair model that considers repairs to the data and repairs to the constraints on an equal footing. We present repair algorithms that find the necessary repairs to bring the data and the constraints back to a consistent state. Finally, we study the efficiency and quality of our techniques. We show that our constraint discovery algorithms find meaningful constraints with good precision and recall. We also show that our repair algorithms resolve many inconsistencies with high quality repairs, and propose repairs that previous algorithms did not consider.Miller, Renee J.2012-112012-12-10T21:26:04ZNO_RESTRICTION2012-12-10T21:26:04Z2012-12-10Thesishttp://hdl.handle.net/1807/33955en_ca |
collection |
NDLTD |
language |
en_ca |
sources |
NDLTD |
topic |
data management data quality 0984 |
spellingShingle |
data management data quality 0984 Chiang, Fei Yen Data Quality Through Active Constraint Discovery and Maintenance |
description |
Although integrity constraints are the primary means for enforcing data integrity, there are cases in which they are not defined or are not strictly enforced. This leads to inconsistencies in the data, causing poor data quality. In this thesis, we leverage the power of constraints to improve data quality. To ensure that the data conforms to the intended application domain semantics, we develop two algorithms focusing on constraint discovery. The first algorithm discovers a class of conditional constraints, which hold over a subset of the relation, under specific conditional values. The second algorithm discovers attribute domain constraints, which bind specific values to the attributes of a relation for a given domain. These two types of constraints have been shown to be useful for data cleaning.
In practice, weak enforcement of constraints often occurs for performance reasons. This leads to inconsistencies between the data and the set of defined constraints. To resolve this inconsistency, we must determine whether it is the constraints or the data that is incorrect, and then make the necessary corrections. We develop a repair model that considers repairs to the data and repairs to the constraints on an equal footing. We present repair algorithms that find the necessary repairs to bring the data and the constraints back to a consistent state. Finally, we study the efficiency and quality of our techniques. We show that our constraint discovery algorithms find meaningful constraints with good precision and recall. We also show that our repair algorithms resolve many inconsistencies with high quality repairs, and propose repairs that previous algorithms did not consider. |
author2 |
Miller, Renee J. |
author_facet |
Miller, Renee J. Chiang, Fei Yen |
author |
Chiang, Fei Yen |
author_sort |
Chiang, Fei Yen |
title |
Data Quality Through Active Constraint Discovery and Maintenance |
title_short |
Data Quality Through Active Constraint Discovery and Maintenance |
title_full |
Data Quality Through Active Constraint Discovery and Maintenance |
title_fullStr |
Data Quality Through Active Constraint Discovery and Maintenance |
title_full_unstemmed |
Data Quality Through Active Constraint Discovery and Maintenance |
title_sort |
data quality through active constraint discovery and maintenance |
publishDate |
2012 |
url |
http://hdl.handle.net/1807/33955 |
work_keys_str_mv |
AT chiangfeiyen dataqualitythroughactiveconstraintdiscoveryandmaintenance |
_version_ |
1716582251806851072 |