Non-unique Records in International Survey Projects: The Need for Extending Data Quality Control
For a given survey data file we define a non-unique record, NUR, as a sequence of all values in a given case (record), which is identical to that of another case in the same dataset. We analyzed 1,721 national surveys in 22 international projects, covering 142 countries and 2.3 million respondent...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
European Survey Research Association
2017-04-01
|
Series: | Survey Research Methods |
Subjects: | |
Online Access: | https://ojs.ub.uni-konstanz.de/srm/article/view/6557 |
id |
doaj-ef8ba490de384cbdac06c45856446b70 |
---|---|
record_format |
Article |
spelling |
doaj-ef8ba490de384cbdac06c45856446b702020-11-24T21:28:58ZengEuropean Survey Research AssociationSurvey Research Methods1864-33611864-33612017-04-0111111610.18148/srm/2017.v11i1.65576481Non-unique Records in International Survey Projects: The Need for Extending Data Quality ControlKazimierz Maciek SlomczynskiPrzemek PowalkoTadeusz KrauzeFor a given survey data file we define a non-unique record, NUR, as a sequence of all values in a given case (record), which is identical to that of another case in the same dataset. We analyzed 1,721 national surveys in 22 international projects, covering 142 countries and 2.3 million respondents, and found a total of 5,893 NURs concentrated in 162 national surveys, in 17 projects and 80 countries. We show that the probability of the occurrence of any NUR in an average survey sample is exceedingly small, and although NURs constitute a minor fraction of all records, it is unlikely that they are solely the result of random chance. We describe how NURs are distributed across projects, countries, time, modes of data collection, and sampling methods. We demonstrate that NURs diminish data quality and potentially have undesirable effects on the results of statistical analyses. Identifying NURs allows researchers to examine the consequences of their existence in data files. We argue that such records should be flagged in all publically available data archives. We provide a complete list of NURs for all analyzed national surveys.https://ojs.ub.uni-konstanz.de/srm/article/view/6557Survey Data Quality, Duplicate Records, Rare Events, Non-Random Errors in Survey Data |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Kazimierz Maciek Slomczynski Przemek Powalko Tadeusz Krauze |
spellingShingle |
Kazimierz Maciek Slomczynski Przemek Powalko Tadeusz Krauze Non-unique Records in International Survey Projects: The Need for Extending Data Quality Control Survey Research Methods Survey Data Quality, Duplicate Records, Rare Events, Non-Random Errors in Survey Data |
author_facet |
Kazimierz Maciek Slomczynski Przemek Powalko Tadeusz Krauze |
author_sort |
Kazimierz Maciek Slomczynski |
title |
Non-unique Records in International Survey Projects: The Need for Extending Data Quality Control |
title_short |
Non-unique Records in International Survey Projects: The Need for Extending Data Quality Control |
title_full |
Non-unique Records in International Survey Projects: The Need for Extending Data Quality Control |
title_fullStr |
Non-unique Records in International Survey Projects: The Need for Extending Data Quality Control |
title_full_unstemmed |
Non-unique Records in International Survey Projects: The Need for Extending Data Quality Control |
title_sort |
non-unique records in international survey projects: the need for extending data quality control |
publisher |
European Survey Research Association |
series |
Survey Research Methods |
issn |
1864-3361 1864-3361 |
publishDate |
2017-04-01 |
description |
For a given survey data file we define a non-unique record, NUR, as a sequence of all values
in a given case (record), which is identical to that of another case in the same dataset. We
analyzed 1,721 national surveys in 22 international projects, covering 142 countries and 2.3
million respondents, and found a total of 5,893 NURs concentrated in 162 national surveys, in
17 projects and 80 countries. We show that the probability of the occurrence of any NUR in
an average survey sample is exceedingly small, and although NURs constitute a minor fraction
of all records, it is unlikely that they are solely the result of random chance. We describe how
NURs are distributed across projects, countries, time, modes of data collection, and sampling
methods. We demonstrate that NURs diminish data quality and potentially have undesirable
effects on the results of statistical analyses. Identifying NURs allows researchers to examine
the consequences of their existence in data files. We argue that such records should be flagged
in all publically available data archives. We provide a complete list of NURs for all analyzed
national surveys. |
topic |
Survey Data Quality, Duplicate Records, Rare Events, Non-Random Errors in Survey Data |
url |
https://ojs.ub.uni-konstanz.de/srm/article/view/6557 |
work_keys_str_mv |
AT kazimierzmaciekslomczynski nonuniquerecordsininternationalsurveyprojectstheneedforextendingdataqualitycontrol AT przemekpowalko nonuniquerecordsininternationalsurveyprojectstheneedforextendingdataqualitycontrol AT tadeuszkrauze nonuniquerecordsininternationalsurveyprojectstheneedforextendingdataqualitycontrol |
_version_ |
1725968210367348736 |