Statistical Challenges in Combining Survey and Auxiliary Data to Produce Official Statistics
Combining survey and auxiliary data to produce official statistics is gaining interest at federal agencies and among policy makers due to its efficiency. Recent studies have shown the practicality of small area estimation modeling approaches in the context of integrating data from multiple sources t...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Sciendo
2020-03-01
|
Series: | Journal of Official Statistics |
Subjects: | |
Online Access: | https://doi.org/10.2478/jos-2020-0004 |
id |
doaj-b961c9d93427435a9e47f22accaf339a |
---|---|
record_format |
Article |
spelling |
doaj-b961c9d93427435a9e47f22accaf339a2021-09-06T19:41:48ZengSciendoJournal of Official Statistics2001-73672020-03-01361638810.2478/jos-2020-0004jos-2020-0004Statistical Challenges in Combining Survey and Auxiliary Data to Produce Official StatisticsErciulescu Andreea L.0Cruze Nathan B.1Nandram Balgobin2Westat, 1600 Research Blvd., Rockville M.D., U.S.A.USDA National Agricultural Statistics Service, Research and Development Division, 1400 Independence Avenue, SW, Washington D.C., U.S.A.Worcester Polytechnic Institute, Mathematical Sciences, Stratton Hall, 100 Institute Road, Worcester, MA 01609-2247, Massachusetts, 01609, U.S.A.Combining survey and auxiliary data to produce official statistics is gaining interest at federal agencies and among policy makers due to its efficiency. Recent studies have shown the practicality of small area estimation modeling approaches in the context of integrating data from multiple sources to improve estimation at fine levels of aggregation. In this article, agricultural predictions are constructed using a hierarchical Bayes subarea-level model, fit to data available from different sources. Auxiliary data are initially used to complement the survey data and define the prediction space, and then to define covariates for the model. Finally, not-in-sample predictions are constructed using the model output, and benchmarking constraints are imposed on the final set of in-sample and not-in-sample predictions. Unlike most of the studies discussing not-in-sample prediction, this article illustrates a method that uses the data available from multiple sources to define the prediction space. As a consequence, the resulting framework provides a larger set of nationwide predictions as candidate for official statistics, and extrapolation is not of concern. Challenges in developing the methods to combine different data sources are discussed in the context of planted acreage prediction.https://doi.org/10.2478/jos-2020-0004administrative databenchmarkingincomplete datanot-in-sample predictionsmall area estimation |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Erciulescu Andreea L. Cruze Nathan B. Nandram Balgobin |
spellingShingle |
Erciulescu Andreea L. Cruze Nathan B. Nandram Balgobin Statistical Challenges in Combining Survey and Auxiliary Data to Produce Official Statistics Journal of Official Statistics administrative data benchmarking incomplete data not-in-sample prediction small area estimation |
author_facet |
Erciulescu Andreea L. Cruze Nathan B. Nandram Balgobin |
author_sort |
Erciulescu Andreea L. |
title |
Statistical Challenges in Combining Survey and Auxiliary Data to Produce Official Statistics |
title_short |
Statistical Challenges in Combining Survey and Auxiliary Data to Produce Official Statistics |
title_full |
Statistical Challenges in Combining Survey and Auxiliary Data to Produce Official Statistics |
title_fullStr |
Statistical Challenges in Combining Survey and Auxiliary Data to Produce Official Statistics |
title_full_unstemmed |
Statistical Challenges in Combining Survey and Auxiliary Data to Produce Official Statistics |
title_sort |
statistical challenges in combining survey and auxiliary data to produce official statistics |
publisher |
Sciendo |
series |
Journal of Official Statistics |
issn |
2001-7367 |
publishDate |
2020-03-01 |
description |
Combining survey and auxiliary data to produce official statistics is gaining interest at federal agencies and among policy makers due to its efficiency. Recent studies have shown the practicality of small area estimation modeling approaches in the context of integrating data from multiple sources to improve estimation at fine levels of aggregation. In this article, agricultural predictions are constructed using a hierarchical Bayes subarea-level model, fit to data available from different sources. Auxiliary data are initially used to complement the survey data and define the prediction space, and then to define covariates for the model. Finally, not-in-sample predictions are constructed using the model output, and benchmarking constraints are imposed on the final set of in-sample and not-in-sample predictions. Unlike most of the studies discussing not-in-sample prediction, this article illustrates a method that uses the data available from multiple sources to define the prediction space. As a consequence, the resulting framework provides a larger set of nationwide predictions as candidate for official statistics, and extrapolation is not of concern. Challenges in developing the methods to combine different data sources are discussed in the context of planted acreage prediction. |
topic |
administrative data benchmarking incomplete data not-in-sample prediction small area estimation |
url |
https://doi.org/10.2478/jos-2020-0004 |
work_keys_str_mv |
AT erciulescuandreeal statisticalchallengesincombiningsurveyandauxiliarydatatoproduceofficialstatistics AT cruzenathanb statisticalchallengesincombiningsurveyandauxiliarydatatoproduceofficialstatistics AT nandrambalgobin statisticalchallengesincombiningsurveyandauxiliarydatatoproduceofficialstatistics |
_version_ |
1717765378719875072 |