Information Loss due to the Data Reduction of Sample Data from Discrete Distributions
In this paper, we study the information lost when a real-valued statistic is used to reduce or summarize sample data from a discrete random variable with a one-dimensional parameter. We compare the probability that a random sample gives a particular data set to the probability of the statistic’s val...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-09-01
|
Series: | Data |
Subjects: | |
Online Access: | https://www.mdpi.com/2306-5729/5/3/84 |
id |
doaj-e8bba6e1356f46efbcb51d1ec2b52fd5 |
---|---|
record_format |
Article |
spelling |
doaj-e8bba6e1356f46efbcb51d1ec2b52fd52020-11-25T03:05:32ZengMDPI AGData2306-57292020-09-015848410.3390/data5030084Information Loss due to the Data Reduction of Sample Data from Discrete DistributionsMaryam Moghimi0H. W. Corley1Center on Stochastic Modeling, Optimization, and Statistics (COSMOS), The University of Texas at Arlington, Arlington, TX 76013, USACenter on Stochastic Modeling, Optimization, and Statistics (COSMOS), The University of Texas at Arlington, Arlington, TX 76013, USAIn this paper, we study the information lost when a real-valued statistic is used to reduce or summarize sample data from a discrete random variable with a one-dimensional parameter. We compare the probability that a random sample gives a particular data set to the probability of the statistic’s value for this data set. We focus on sufficient statistics for the parameter of interest and develop a general formula independent of the parameter for the Shannon information lost when a data sample is reduced to such a summary statistic. We also develop a measure of entropy for this lost information that depends only on the real-valued statistic but neither the parameter nor the data. Our approach would also work for non-sufficient statistics, but the lost information and associated entropy would involve the parameter. The method is applied to three well-known discrete distributions to illustrate its implementation.https://www.mdpi.com/2306-5729/5/3/84data reductionShannon informationentropyinformation loss |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Maryam Moghimi H. W. Corley |
spellingShingle |
Maryam Moghimi H. W. Corley Information Loss due to the Data Reduction of Sample Data from Discrete Distributions Data data reduction Shannon information entropy information loss |
author_facet |
Maryam Moghimi H. W. Corley |
author_sort |
Maryam Moghimi |
title |
Information Loss due to the Data Reduction of Sample Data from Discrete Distributions |
title_short |
Information Loss due to the Data Reduction of Sample Data from Discrete Distributions |
title_full |
Information Loss due to the Data Reduction of Sample Data from Discrete Distributions |
title_fullStr |
Information Loss due to the Data Reduction of Sample Data from Discrete Distributions |
title_full_unstemmed |
Information Loss due to the Data Reduction of Sample Data from Discrete Distributions |
title_sort |
information loss due to the data reduction of sample data from discrete distributions |
publisher |
MDPI AG |
series |
Data |
issn |
2306-5729 |
publishDate |
2020-09-01 |
description |
In this paper, we study the information lost when a real-valued statistic is used to reduce or summarize sample data from a discrete random variable with a one-dimensional parameter. We compare the probability that a random sample gives a particular data set to the probability of the statistic’s value for this data set. We focus on sufficient statistics for the parameter of interest and develop a general formula independent of the parameter for the Shannon information lost when a data sample is reduced to such a summary statistic. We also develop a measure of entropy for this lost information that depends only on the real-valued statistic but neither the parameter nor the data. Our approach would also work for non-sufficient statistics, but the lost information and associated entropy would involve the parameter. The method is applied to three well-known discrete distributions to illustrate its implementation. |
topic |
data reduction Shannon information entropy information loss |
url |
https://www.mdpi.com/2306-5729/5/3/84 |
work_keys_str_mv |
AT maryammoghimi informationlossduetothedatareductionofsampledatafromdiscretedistributions AT hwcorley informationlossduetothedatareductionofsampledatafromdiscretedistributions |
_version_ |
1724678080780304384 |