Non-unique Records in International Survey Projects: The Need for Extending Data Quality Control

For a given survey data file we define a non-unique record, NUR, as a sequence of all values in a given case (record), which is identical to that of another case in the same dataset. We analyzed 1,721 national surveys in 22 international projects, covering 142 countries and 2.3 million respondent...

Full description

Bibliographic Details
Main Authors: Kazimierz Maciek Slomczynski, Przemek Powalko, Tadeusz Krauze
Format: Article
Language:English
Published: European Survey Research Association 2017-04-01
Series:Survey Research Methods
Subjects:
Online Access:https://ojs.ub.uni-konstanz.de/srm/article/view/6557
id doaj-ef8ba490de384cbdac06c45856446b70
record_format Article
spelling doaj-ef8ba490de384cbdac06c45856446b702020-11-24T21:28:58ZengEuropean Survey Research AssociationSurvey Research Methods1864-33611864-33612017-04-0111111610.18148/srm/2017.v11i1.65576481Non-unique Records in International Survey Projects: The Need for Extending Data Quality ControlKazimierz Maciek SlomczynskiPrzemek PowalkoTadeusz KrauzeFor a given survey data file we define a non-unique record, NUR, as a sequence of all values in a given case (record), which is identical to that of another case in the same dataset. We analyzed 1,721 national surveys in 22 international projects, covering 142 countries and 2.3 million respondents, and found a total of 5,893 NURs concentrated in 162 national surveys, in 17 projects and 80 countries. We show that the probability of the occurrence of any NUR in an average survey sample is exceedingly small, and although NURs constitute a minor fraction of all records, it is unlikely that they are solely the result of random chance. We describe how NURs are distributed across projects, countries, time, modes of data collection, and sampling methods. We demonstrate that NURs diminish data quality and potentially have undesirable effects on the results of statistical analyses. Identifying NURs allows researchers to examine the consequences of their existence in data files. We argue that such records should be flagged in all publically available data archives. We provide a complete list of NURs for all analyzed national surveys.https://ojs.ub.uni-konstanz.de/srm/article/view/6557Survey Data Quality, Duplicate Records, Rare Events, Non-Random Errors in Survey Data
collection DOAJ
language English
format Article
sources DOAJ
author Kazimierz Maciek Slomczynski
Przemek Powalko
Tadeusz Krauze
spellingShingle Kazimierz Maciek Slomczynski
Przemek Powalko
Tadeusz Krauze
Non-unique Records in International Survey Projects: The Need for Extending Data Quality Control
Survey Research Methods
Survey Data Quality, Duplicate Records, Rare Events, Non-Random Errors in Survey Data
author_facet Kazimierz Maciek Slomczynski
Przemek Powalko
Tadeusz Krauze
author_sort Kazimierz Maciek Slomczynski
title Non-unique Records in International Survey Projects: The Need for Extending Data Quality Control
title_short Non-unique Records in International Survey Projects: The Need for Extending Data Quality Control
title_full Non-unique Records in International Survey Projects: The Need for Extending Data Quality Control
title_fullStr Non-unique Records in International Survey Projects: The Need for Extending Data Quality Control
title_full_unstemmed Non-unique Records in International Survey Projects: The Need for Extending Data Quality Control
title_sort non-unique records in international survey projects: the need for extending data quality control
publisher European Survey Research Association
series Survey Research Methods
issn 1864-3361
1864-3361
publishDate 2017-04-01
description For a given survey data file we define a non-unique record, NUR, as a sequence of all values in a given case (record), which is identical to that of another case in the same dataset. We analyzed 1,721 national surveys in 22 international projects, covering 142 countries and 2.3 million respondents, and found a total of 5,893 NURs concentrated in 162 national surveys, in 17 projects and 80 countries. We show that the probability of the occurrence of any NUR in an average survey sample is exceedingly small, and although NURs constitute a minor fraction of all records, it is unlikely that they are solely the result of random chance. We describe how NURs are distributed across projects, countries, time, modes of data collection, and sampling methods. We demonstrate that NURs diminish data quality and potentially have undesirable effects on the results of statistical analyses. Identifying NURs allows researchers to examine the consequences of their existence in data files. We argue that such records should be flagged in all publically available data archives. We provide a complete list of NURs for all analyzed national surveys.
topic Survey Data Quality, Duplicate Records, Rare Events, Non-Random Errors in Survey Data
url https://ojs.ub.uni-konstanz.de/srm/article/view/6557
work_keys_str_mv AT kazimierzmaciekslomczynski nonuniquerecordsininternationalsurveyprojectstheneedforextendingdataqualitycontrol
AT przemekpowalko nonuniquerecordsininternationalsurveyprojectstheneedforextendingdataqualitycontrol
AT tadeuszkrauze nonuniquerecordsininternationalsurveyprojectstheneedforextendingdataqualitycontrol
_version_ 1725968210367348736