Statistical Challenges in Combining Survey and Auxiliary Data to Produce Official Statistics

Combining survey and auxiliary data to produce official statistics is gaining interest at federal agencies and among policy makers due to its efficiency. Recent studies have shown the practicality of small area estimation modeling approaches in the context of integrating data from multiple sources t...

Full description

Bibliographic Details
Main Authors: Erciulescu Andreea L., Cruze Nathan B., Nandram Balgobin
Format: Article
Language:English
Published: Sciendo 2020-03-01
Series:Journal of Official Statistics
Subjects:
Online Access:https://doi.org/10.2478/jos-2020-0004
id doaj-b961c9d93427435a9e47f22accaf339a
record_format Article
spelling doaj-b961c9d93427435a9e47f22accaf339a2021-09-06T19:41:48ZengSciendoJournal of Official Statistics2001-73672020-03-01361638810.2478/jos-2020-0004jos-2020-0004Statistical Challenges in Combining Survey and Auxiliary Data to Produce Official StatisticsErciulescu Andreea L.0Cruze Nathan B.1Nandram Balgobin2Westat, 1600 Research Blvd., Rockville M.D., U.S.A.USDA National Agricultural Statistics Service, Research and Development Division, 1400 Independence Avenue, SW, Washington D.C., U.S.A.Worcester Polytechnic Institute, Mathematical Sciences, Stratton Hall, 100 Institute Road, Worcester, MA 01609-2247, Massachusetts, 01609, U.S.A.Combining survey and auxiliary data to produce official statistics is gaining interest at federal agencies and among policy makers due to its efficiency. Recent studies have shown the practicality of small area estimation modeling approaches in the context of integrating data from multiple sources to improve estimation at fine levels of aggregation. In this article, agricultural predictions are constructed using a hierarchical Bayes subarea-level model, fit to data available from different sources. Auxiliary data are initially used to complement the survey data and define the prediction space, and then to define covariates for the model. Finally, not-in-sample predictions are constructed using the model output, and benchmarking constraints are imposed on the final set of in-sample and not-in-sample predictions. Unlike most of the studies discussing not-in-sample prediction, this article illustrates a method that uses the data available from multiple sources to define the prediction space. As a consequence, the resulting framework provides a larger set of nationwide predictions as candidate for official statistics, and extrapolation is not of concern. Challenges in developing the methods to combine different data sources are discussed in the context of planted acreage prediction.https://doi.org/10.2478/jos-2020-0004administrative databenchmarkingincomplete datanot-in-sample predictionsmall area estimation
collection DOAJ
language English
format Article
sources DOAJ
author Erciulescu Andreea L.
Cruze Nathan B.
Nandram Balgobin
spellingShingle Erciulescu Andreea L.
Cruze Nathan B.
Nandram Balgobin
Statistical Challenges in Combining Survey and Auxiliary Data to Produce Official Statistics
Journal of Official Statistics
administrative data
benchmarking
incomplete data
not-in-sample prediction
small area estimation
author_facet Erciulescu Andreea L.
Cruze Nathan B.
Nandram Balgobin
author_sort Erciulescu Andreea L.
title Statistical Challenges in Combining Survey and Auxiliary Data to Produce Official Statistics
title_short Statistical Challenges in Combining Survey and Auxiliary Data to Produce Official Statistics
title_full Statistical Challenges in Combining Survey and Auxiliary Data to Produce Official Statistics
title_fullStr Statistical Challenges in Combining Survey and Auxiliary Data to Produce Official Statistics
title_full_unstemmed Statistical Challenges in Combining Survey and Auxiliary Data to Produce Official Statistics
title_sort statistical challenges in combining survey and auxiliary data to produce official statistics
publisher Sciendo
series Journal of Official Statistics
issn 2001-7367
publishDate 2020-03-01
description Combining survey and auxiliary data to produce official statistics is gaining interest at federal agencies and among policy makers due to its efficiency. Recent studies have shown the practicality of small area estimation modeling approaches in the context of integrating data from multiple sources to improve estimation at fine levels of aggregation. In this article, agricultural predictions are constructed using a hierarchical Bayes subarea-level model, fit to data available from different sources. Auxiliary data are initially used to complement the survey data and define the prediction space, and then to define covariates for the model. Finally, not-in-sample predictions are constructed using the model output, and benchmarking constraints are imposed on the final set of in-sample and not-in-sample predictions. Unlike most of the studies discussing not-in-sample prediction, this article illustrates a method that uses the data available from multiple sources to define the prediction space. As a consequence, the resulting framework provides a larger set of nationwide predictions as candidate for official statistics, and extrapolation is not of concern. Challenges in developing the methods to combine different data sources are discussed in the context of planted acreage prediction.
topic administrative data
benchmarking
incomplete data
not-in-sample prediction
small area estimation
url https://doi.org/10.2478/jos-2020-0004
work_keys_str_mv AT erciulescuandreeal statisticalchallengesincombiningsurveyandauxiliarydatatoproduceofficialstatistics
AT cruzenathanb statisticalchallengesincombiningsurveyandauxiliarydatatoproduceofficialstatistics
AT nandrambalgobin statisticalchallengesincombiningsurveyandauxiliarydatatoproduceofficialstatistics
_version_ 1717765378719875072