Information Loss due to the Data Reduction of Sample Data from Discrete Distributions

In this paper, we study the information lost when a real-valued statistic is used to reduce or summarize sample data from a discrete random variable with a one-dimensional parameter. We compare the probability that a random sample gives a particular data set to the probability of the statistic’s val...

Full description

Bibliographic Details
Main Authors:	Maryam Moghimi, H. W. Corley
Format:	Article
Language:	English
Published:	MDPI AG 2020-09-01
Series:	Data
Subjects:	data reduction Shannon information entropy information loss
Online Access:	https://www.mdpi.com/2306-5729/5/3/84

id	doaj-e8bba6e1356f46efbcb51d1ec2b52fd5
record_format	Article
spelling	doaj-e8bba6e1356f46efbcb51d1ec2b52fd52020-11-25T03:05:32ZengMDPI AGData2306-57292020-09-015848410.3390/data5030084Information Loss due to the Data Reduction of Sample Data from Discrete DistributionsMaryam Moghimi0H. W. Corley1Center on Stochastic Modeling, Optimization, and Statistics (COSMOS), The University of Texas at Arlington, Arlington, TX 76013, USACenter on Stochastic Modeling, Optimization, and Statistics (COSMOS), The University of Texas at Arlington, Arlington, TX 76013, USAIn this paper, we study the information lost when a real-valued statistic is used to reduce or summarize sample data from a discrete random variable with a one-dimensional parameter. We compare the probability that a random sample gives a particular data set to the probability of the statistic’s value for this data set. We focus on sufficient statistics for the parameter of interest and develop a general formula independent of the parameter for the Shannon information lost when a data sample is reduced to such a summary statistic. We also develop a measure of entropy for this lost information that depends only on the real-valued statistic but neither the parameter nor the data. Our approach would also work for non-sufficient statistics, but the lost information and associated entropy would involve the parameter. The method is applied to three well-known discrete distributions to illustrate its implementation.https://www.mdpi.com/2306-5729/5/3/84data reductionShannon informationentropyinformation loss
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Maryam Moghimi H. W. Corley
spellingShingle	Maryam Moghimi H. W. Corley Information Loss due to the Data Reduction of Sample Data from Discrete Distributions Data data reduction Shannon information entropy information loss
author_facet	Maryam Moghimi H. W. Corley
author_sort	Maryam Moghimi
title	Information Loss due to the Data Reduction of Sample Data from Discrete Distributions
title_short	Information Loss due to the Data Reduction of Sample Data from Discrete Distributions
title_full	Information Loss due to the Data Reduction of Sample Data from Discrete Distributions
title_fullStr	Information Loss due to the Data Reduction of Sample Data from Discrete Distributions
title_full_unstemmed	Information Loss due to the Data Reduction of Sample Data from Discrete Distributions
title_sort	information loss due to the data reduction of sample data from discrete distributions
publisher	MDPI AG
series	Data
issn	2306-5729
publishDate	2020-09-01
description	In this paper, we study the information lost when a real-valued statistic is used to reduce or summarize sample data from a discrete random variable with a one-dimensional parameter. We compare the probability that a random sample gives a particular data set to the probability of the statistic’s value for this data set. We focus on sufficient statistics for the parameter of interest and develop a general formula independent of the parameter for the Shannon information lost when a data sample is reduced to such a summary statistic. We also develop a measure of entropy for this lost information that depends only on the real-valued statistic but neither the parameter nor the data. Our approach would also work for non-sufficient statistics, but the lost information and associated entropy would involve the parameter. The method is applied to three well-known discrete distributions to illustrate its implementation.
topic	data reduction Shannon information entropy information loss
url	https://www.mdpi.com/2306-5729/5/3/84
work_keys_str_mv	AT maryammoghimi informationlossduetothedatareductionofsampledatafromdiscretedistributions AT hwcorley informationlossduetothedatareductionofsampledatafromdiscretedistributions
_version_	1724678080780304384

Information Loss due to the Data Reduction of Sample Data from Discrete Distributions

Similar Items