The notion of 'information content of data' for databases
This thesis is concerned with a fundamental notion of information in the context of databases. The problem of information content of a conceptual data schema appears elusive. The conventional definition of information is established upon an entropy-based quantitative theory proposed by Shannon (1948...
Main Author: | |
---|---|
Published: |
University of the West of Scotland
2009
|
Subjects: | |
Online Access: | http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.544372 |
id |
ndltd-bl.uk-oai-ethos.bl.uk-544372 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-bl.uk-oai-ethos.bl.uk-5443722016-02-03T03:20:50ZThe notion of 'information content of data' for databasesXu, Kaibo2009This thesis is concerned with a fundamental notion of information in the context of databases. The problem of information content of a conceptual data schema appears elusive. The conventional definition of information is established upon an entropy-based quantitative theory proposed by Shannon (1948). It is widely used in measuring the amount of information that is created and transmitted through a communication channel by applying the notion of entropy. However, such an approach seems lacking a capability of explaining phenomena concerning the content aspect of information. Moreover, it would appear how the information content of data in a database may be reasoned about has not been addressed adequately. We therefore believe that the notion of the information content of data should be fully investigated and formally defined. To this end, the notion of the information content of a signal is redefined by modifying the known definition of information content given by Dretske (1981, p. 65). Then what we call the information content inclusion relation (IIR) (a partial order of random events) between two random events is defined. A set of inference rules is presented for reasoning about the information content of a random event and explore how these ideas and the rules may be used in a database setting including the derivation of otherwise hidden information by deriving new IIR from a given set of IIR. Furthermore, it is observed that the problem of whether the instances of a data schema may be recovered from those of another does not seem to have been well investigated, and this, we believe, is fundamental for the relationship between two schemata. In the literature, works that are closest to this question are based upon the notion of relevant information capacity, which is concerned with whether one schema may replace another without losing the capacity of the system in storing data. It is also observed that the rationale of such an approach is over intuitive (even though the techniques involved are sophisticated): a convincing answer to this question should be based on the question whether one or more instances of a schema can tell us truly what an instance of another schema would be. This is a matter of one thing carrying information about another. To capture such a relationship, the notion of information carrying between states of affairs is introduced, through which we look at much more detailed levels of informational relationships than the conventional entropy-based approach, namely random events and particulars of random events. The validity of our ideas is demonstrated by applying them to schema transformations that are information bearing capability preserving. This includes, among others, some aspects of normalization for relational databases, schema transformation with Miller et al’s (1994) Schema Intension Graph (SIG) model. To verify our ideas on reasoning about the information content of data, a prototype called IIR-Reasoning is presented, which shows how our idea might be exploited in a real database setting including how real world events and database values are aligned.005.75University of the West of Scotlandhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.544372Electronic Thesis or Dissertation |
collection |
NDLTD |
sources |
NDLTD |
topic |
005.75 |
spellingShingle |
005.75 Xu, Kaibo The notion of 'information content of data' for databases |
description |
This thesis is concerned with a fundamental notion of information in the context of databases. The problem of information content of a conceptual data schema appears elusive. The conventional definition of information is established upon an entropy-based quantitative theory proposed by Shannon (1948). It is widely used in measuring the amount of information that is created and transmitted through a communication channel by applying the notion of entropy. However, such an approach seems lacking a capability of explaining phenomena concerning the content aspect of information. Moreover, it would appear how the information content of data in a database may be reasoned about has not been addressed adequately. We therefore believe that the notion of the information content of data should be fully investigated and formally defined. To this end, the notion of the information content of a signal is redefined by modifying the known definition of information content given by Dretske (1981, p. 65). Then what we call the information content inclusion relation (IIR) (a partial order of random events) between two random events is defined. A set of inference rules is presented for reasoning about the information content of a random event and explore how these ideas and the rules may be used in a database setting including the derivation of otherwise hidden information by deriving new IIR from a given set of IIR. Furthermore, it is observed that the problem of whether the instances of a data schema may be recovered from those of another does not seem to have been well investigated, and this, we believe, is fundamental for the relationship between two schemata. In the literature, works that are closest to this question are based upon the notion of relevant information capacity, which is concerned with whether one schema may replace another without losing the capacity of the system in storing data. It is also observed that the rationale of such an approach is over intuitive (even though the techniques involved are sophisticated): a convincing answer to this question should be based on the question whether one or more instances of a schema can tell us truly what an instance of another schema would be. This is a matter of one thing carrying information about another. To capture such a relationship, the notion of information carrying between states of affairs is introduced, through which we look at much more detailed levels of informational relationships than the conventional entropy-based approach, namely random events and particulars of random events. The validity of our ideas is demonstrated by applying them to schema transformations that are information bearing capability preserving. This includes, among others, some aspects of normalization for relational databases, schema transformation with Miller et al’s (1994) Schema Intension Graph (SIG) model. To verify our ideas on reasoning about the information content of data, a prototype called IIR-Reasoning is presented, which shows how our idea might be exploited in a real database setting including how real world events and database values are aligned. |
author |
Xu, Kaibo |
author_facet |
Xu, Kaibo |
author_sort |
Xu, Kaibo |
title |
The notion of 'information content of data' for databases |
title_short |
The notion of 'information content of data' for databases |
title_full |
The notion of 'information content of data' for databases |
title_fullStr |
The notion of 'information content of data' for databases |
title_full_unstemmed |
The notion of 'information content of data' for databases |
title_sort |
notion of 'information content of data' for databases |
publisher |
University of the West of Scotland |
publishDate |
2009 |
url |
http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.544372 |
work_keys_str_mv |
AT xukaibo thenotionofinformationcontentofdatafordatabases AT xukaibo notionofinformationcontentofdatafordatabases |
_version_ |
1718176718985887744 |