Information Loss due to the Data Reduction of Sample Data from Discrete Distributions

In this paper, we study the information lost when a real-valued statistic is used to reduce or summarize sample data from a discrete random variable with a one-dimensional parameter. We compare the probability that a random sample gives a particular data set to the probability of the statistic’s val...

Full description

Bibliographic Details
Main Authors: Maryam Moghimi, H. W. Corley
Format: Article
Language:English
Published: MDPI AG 2020-09-01
Series:Data
Subjects:
Online Access:https://www.mdpi.com/2306-5729/5/3/84
id doaj-e8bba6e1356f46efbcb51d1ec2b52fd5
record_format Article
spelling doaj-e8bba6e1356f46efbcb51d1ec2b52fd52020-11-25T03:05:32ZengMDPI AGData2306-57292020-09-015848410.3390/data5030084Information Loss due to the Data Reduction of Sample Data from Discrete DistributionsMaryam Moghimi0H. W. Corley1Center on Stochastic Modeling, Optimization, and Statistics (COSMOS), The University of Texas at Arlington, Arlington, TX 76013, USACenter on Stochastic Modeling, Optimization, and Statistics (COSMOS), The University of Texas at Arlington, Arlington, TX 76013, USAIn this paper, we study the information lost when a real-valued statistic is used to reduce or summarize sample data from a discrete random variable with a one-dimensional parameter. We compare the probability that a random sample gives a particular data set to the probability of the statistic’s value for this data set. We focus on sufficient statistics for the parameter of interest and develop a general formula independent of the parameter for the Shannon information lost when a data sample is reduced to such a summary statistic. We also develop a measure of entropy for this lost information that depends only on the real-valued statistic but neither the parameter nor the data. Our approach would also work for non-sufficient statistics, but the lost information and associated entropy would involve the parameter. The method is applied to three well-known discrete distributions to illustrate its implementation.https://www.mdpi.com/2306-5729/5/3/84data reductionShannon informationentropyinformation loss
collection DOAJ
language English
format Article
sources DOAJ
author Maryam Moghimi
H. W. Corley
spellingShingle Maryam Moghimi
H. W. Corley
Information Loss due to the Data Reduction of Sample Data from Discrete Distributions
Data
data reduction
Shannon information
entropy
information loss
author_facet Maryam Moghimi
H. W. Corley
author_sort Maryam Moghimi
title Information Loss due to the Data Reduction of Sample Data from Discrete Distributions
title_short Information Loss due to the Data Reduction of Sample Data from Discrete Distributions
title_full Information Loss due to the Data Reduction of Sample Data from Discrete Distributions
title_fullStr Information Loss due to the Data Reduction of Sample Data from Discrete Distributions
title_full_unstemmed Information Loss due to the Data Reduction of Sample Data from Discrete Distributions
title_sort information loss due to the data reduction of sample data from discrete distributions
publisher MDPI AG
series Data
issn 2306-5729
publishDate 2020-09-01
description In this paper, we study the information lost when a real-valued statistic is used to reduce or summarize sample data from a discrete random variable with a one-dimensional parameter. We compare the probability that a random sample gives a particular data set to the probability of the statistic’s value for this data set. We focus on sufficient statistics for the parameter of interest and develop a general formula independent of the parameter for the Shannon information lost when a data sample is reduced to such a summary statistic. We also develop a measure of entropy for this lost information that depends only on the real-valued statistic but neither the parameter nor the data. Our approach would also work for non-sufficient statistics, but the lost information and associated entropy would involve the parameter. The method is applied to three well-known discrete distributions to illustrate its implementation.
topic data reduction
Shannon information
entropy
information loss
url https://www.mdpi.com/2306-5729/5/3/84
work_keys_str_mv AT maryammoghimi informationlossduetothedatareductionofsampledatafromdiscretedistributions
AT hwcorley informationlossduetothedatareductionofsampledatafromdiscretedistributions
_version_ 1724678080780304384