Small-Area Estimation with Zero-Inflated Data – a Simulation Study

Many target variables in official statistics follow a semicontinuous distribution with a mixture of zeros and continuously distributed positive values. Such variables are called zero inflated. When reliable estimates for subpopulations with small sample sizes are required, model-based small-area est...

Full description

Bibliographic Details
Main Authors: Krieg Sabine, Boonstra Harm Jan, Smeets Marc
Format: Article
Language:English
Published: Sciendo 2016-12-01
Series:Journal of Official Statistics
Subjects:
Online Access:https://doi.org/10.1515/jos-2016-0051
id doaj-cce3cdd9727142eca1db48520808fcc3
record_format Article
spelling doaj-cce3cdd9727142eca1db48520808fcc32021-09-06T19:40:52ZengSciendoJournal of Official Statistics2001-73672016-12-0132496398610.1515/jos-2016-0051jos-2016-0051Small-Area Estimation with Zero-Inflated Data – a Simulation StudyKrieg Sabine0Boonstra Harm Jan1Smeets Marc2Statistics Netherlands, Postbus 4481, 6401CZ Heerlen, NetherlandsStatistics Netherlands, Postbus 4481, 6401CZ Heerlen, NetherlandsStatistics Netherlands, Postbus 4481, 6401CZ Heerlen, NetherlandsMany target variables in official statistics follow a semicontinuous distribution with a mixture of zeros and continuously distributed positive values. Such variables are called zero inflated. When reliable estimates for subpopulations with small sample sizes are required, model-based small-area estimators can be used, which improve the accuracy of the estimates by borrowing information from other subpopulations. In this article, three small-area estimators are investigated. The first estimator is the EBLUP, which can be considered the most common small-area estimator and is based on a linear mixed model that assumes normal distributions. Therefore, the EBLUP is model misspecified in the case of zero-inflated variables. The other two small-area estimators are based on a model that takes zero inflation explicitly into account. Both the Bayesian and the frequentist approach are considered. These small-area estimators are compared with each other and with design-based estimation in a simulation study with zero-inflated target variables. Both a simulation with artificial data and a simulation with real data from the Dutch Household Budget Survey are carried out. It is found that the small-area estimators improve the accuracy compared to the design-based estimator. The amount of improvement strongly depends on the properties of the population and the subpopulations of interest.https://doi.org/10.1515/jos-2016-0051generalized linear mixed modeleblupmcmclogitdutch household budget survey
collection DOAJ
language English
format Article
sources DOAJ
author Krieg Sabine
Boonstra Harm Jan
Smeets Marc
spellingShingle Krieg Sabine
Boonstra Harm Jan
Smeets Marc
Small-Area Estimation with Zero-Inflated Data – a Simulation Study
Journal of Official Statistics
generalized linear mixed model
eblup
mcmc
logit
dutch household budget survey
author_facet Krieg Sabine
Boonstra Harm Jan
Smeets Marc
author_sort Krieg Sabine
title Small-Area Estimation with Zero-Inflated Data – a Simulation Study
title_short Small-Area Estimation with Zero-Inflated Data – a Simulation Study
title_full Small-Area Estimation with Zero-Inflated Data – a Simulation Study
title_fullStr Small-Area Estimation with Zero-Inflated Data – a Simulation Study
title_full_unstemmed Small-Area Estimation with Zero-Inflated Data – a Simulation Study
title_sort small-area estimation with zero-inflated data – a simulation study
publisher Sciendo
series Journal of Official Statistics
issn 2001-7367
publishDate 2016-12-01
description Many target variables in official statistics follow a semicontinuous distribution with a mixture of zeros and continuously distributed positive values. Such variables are called zero inflated. When reliable estimates for subpopulations with small sample sizes are required, model-based small-area estimators can be used, which improve the accuracy of the estimates by borrowing information from other subpopulations. In this article, three small-area estimators are investigated. The first estimator is the EBLUP, which can be considered the most common small-area estimator and is based on a linear mixed model that assumes normal distributions. Therefore, the EBLUP is model misspecified in the case of zero-inflated variables. The other two small-area estimators are based on a model that takes zero inflation explicitly into account. Both the Bayesian and the frequentist approach are considered. These small-area estimators are compared with each other and with design-based estimation in a simulation study with zero-inflated target variables. Both a simulation with artificial data and a simulation with real data from the Dutch Household Budget Survey are carried out. It is found that the small-area estimators improve the accuracy compared to the design-based estimator. The amount of improvement strongly depends on the properties of the population and the subpopulations of interest.
topic generalized linear mixed model
eblup
mcmc
logit
dutch household budget survey
url https://doi.org/10.1515/jos-2016-0051
work_keys_str_mv AT kriegsabine smallareaestimationwithzeroinflateddataasimulationstudy
AT boonstraharmjan smallareaestimationwithzeroinflateddataasimulationstudy
AT smeetsmarc smallareaestimationwithzeroinflateddataasimulationstudy
_version_ 1717767612336701440