Small-Area Estimation with Zero-Inflated Data – a Simulation Study
Many target variables in official statistics follow a semicontinuous distribution with a mixture of zeros and continuously distributed positive values. Such variables are called zero inflated. When reliable estimates for subpopulations with small sample sizes are required, model-based small-area est...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Sciendo
2016-12-01
|
Series: | Journal of Official Statistics |
Subjects: | |
Online Access: | https://doi.org/10.1515/jos-2016-0051 |
id |
doaj-cce3cdd9727142eca1db48520808fcc3 |
---|---|
record_format |
Article |
spelling |
doaj-cce3cdd9727142eca1db48520808fcc32021-09-06T19:40:52ZengSciendoJournal of Official Statistics2001-73672016-12-0132496398610.1515/jos-2016-0051jos-2016-0051Small-Area Estimation with Zero-Inflated Data – a Simulation StudyKrieg Sabine0Boonstra Harm Jan1Smeets Marc2Statistics Netherlands, Postbus 4481, 6401CZ Heerlen, NetherlandsStatistics Netherlands, Postbus 4481, 6401CZ Heerlen, NetherlandsStatistics Netherlands, Postbus 4481, 6401CZ Heerlen, NetherlandsMany target variables in official statistics follow a semicontinuous distribution with a mixture of zeros and continuously distributed positive values. Such variables are called zero inflated. When reliable estimates for subpopulations with small sample sizes are required, model-based small-area estimators can be used, which improve the accuracy of the estimates by borrowing information from other subpopulations. In this article, three small-area estimators are investigated. The first estimator is the EBLUP, which can be considered the most common small-area estimator and is based on a linear mixed model that assumes normal distributions. Therefore, the EBLUP is model misspecified in the case of zero-inflated variables. The other two small-area estimators are based on a model that takes zero inflation explicitly into account. Both the Bayesian and the frequentist approach are considered. These small-area estimators are compared with each other and with design-based estimation in a simulation study with zero-inflated target variables. Both a simulation with artificial data and a simulation with real data from the Dutch Household Budget Survey are carried out. It is found that the small-area estimators improve the accuracy compared to the design-based estimator. The amount of improvement strongly depends on the properties of the population and the subpopulations of interest.https://doi.org/10.1515/jos-2016-0051generalized linear mixed modeleblupmcmclogitdutch household budget survey |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Krieg Sabine Boonstra Harm Jan Smeets Marc |
spellingShingle |
Krieg Sabine Boonstra Harm Jan Smeets Marc Small-Area Estimation with Zero-Inflated Data – a Simulation Study Journal of Official Statistics generalized linear mixed model eblup mcmc logit dutch household budget survey |
author_facet |
Krieg Sabine Boonstra Harm Jan Smeets Marc |
author_sort |
Krieg Sabine |
title |
Small-Area Estimation with Zero-Inflated Data – a Simulation Study |
title_short |
Small-Area Estimation with Zero-Inflated Data – a Simulation Study |
title_full |
Small-Area Estimation with Zero-Inflated Data – a Simulation Study |
title_fullStr |
Small-Area Estimation with Zero-Inflated Data – a Simulation Study |
title_full_unstemmed |
Small-Area Estimation with Zero-Inflated Data – a Simulation Study |
title_sort |
small-area estimation with zero-inflated data – a simulation study |
publisher |
Sciendo |
series |
Journal of Official Statistics |
issn |
2001-7367 |
publishDate |
2016-12-01 |
description |
Many target variables in official statistics follow a semicontinuous distribution with a mixture of zeros and continuously distributed positive values. Such variables are called zero inflated. When reliable estimates for subpopulations with small sample sizes are required, model-based small-area estimators can be used, which improve the accuracy of the estimates by borrowing information from other subpopulations. In this article, three small-area estimators are investigated. The first estimator is the EBLUP, which can be considered the most common small-area estimator and is based on a linear mixed model that assumes normal distributions. Therefore, the EBLUP is model misspecified in the case of zero-inflated variables. The other two small-area estimators are based on a model that takes zero inflation explicitly into account. Both the Bayesian and the frequentist approach are considered. These small-area estimators are compared with each other and with design-based estimation in a simulation study with zero-inflated target variables. Both a simulation with artificial data and a simulation with real data from the Dutch Household Budget Survey are carried out. It is found that the small-area estimators improve the accuracy compared to the design-based estimator. The amount of improvement strongly depends on the properties of the population and the subpopulations of interest. |
topic |
generalized linear mixed model eblup mcmc logit dutch household budget survey |
url |
https://doi.org/10.1515/jos-2016-0051 |
work_keys_str_mv |
AT kriegsabine smallareaestimationwithzeroinflateddataasimulationstudy AT boonstraharmjan smallareaestimationwithzeroinflateddataasimulationstudy AT smeetsmarc smallareaestimationwithzeroinflateddataasimulationstudy |
_version_ |
1717767612336701440 |