Data Quality By Design: A Goal-oriented Approach

A successful information system is the one that meets its design goals. Expressing these goals and subsequently translating them into a working solution is a major challenge for information systems engineering. This thesis adopts the concepts and techniques from goal-oriented (software) requirements...

Full description

Bibliographic Details
Main Author: Jiang, Lei
Other Authors: Mylopoulos, John
Language:en_ca
Published: 2010
Subjects:
Online Access:http://hdl.handle.net/1807/24774
Description
Summary:A successful information system is the one that meets its design goals. Expressing these goals and subsequently translating them into a working solution is a major challenge for information systems engineering. This thesis adopts the concepts and techniques from goal-oriented (software) requirements engineering research for conceptual database design, with a focus on data quality issues. Based on a real-world case study, a goal-oriented process is proposed for database requirements analysis and modeling. It spans from analysis of high-level stakeholder goals to detailed design of a conceptual databases schema. This process is then extended specifically for dealing with data quality issues: data of low quality may be detected and corrected by performing various quality assurance activities; to support these activities, the schema needs to be revised by accommodating additional data requirements. The extended process therefore focuses on analyzing and modeling quality assurance data requirements. A quality assurance activity supported by a revised schema may involve manual work, and/or rely on some automatic techniques, which often depend on the specification and enforcement of data quality rules. To address the constraint aspect in conceptual database design, data quality rules are classified according to a number of domain and application independent properties. This classification can be used to guide rule designers and to facilitate building of a rule repository. A quantitative framework is then proposed for measuring and comparing DQ rules according to one of these properties: effectiveness; this framework relies on derivation of formulas that represent the effectiveness of DQ rules under different probabilistic assumptions. A semi-automatic approach is also presented to derive these effectiveness formulas.