Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system

Almost all quantitative studies in educational assessment, evaluation and educational research are based on incomplete data sets, which have been a problem for years without a single solution. The use of big identifiable data poses new challenges in dealing with missing values. In the first part of...

Full description

Bibliographic Details
Main Authors: Maria Eugénia Ferrão, Paula Prata, Maria Teresa Gonzaga Alves
Format: Article
Language:English
Published: Fundação CESGRANRIO 2020-07-01
Series:Ensaio
Subjects:
r
Online Access:https://www.scielo.br/scielo.php?script=sci_arttext&pid=S0104-40362020000300599&lng=pt&nrm=iso
id doaj-7585bd3e54a44da0bc0d7314df4ad202
record_format Article
spelling doaj-7585bd3e54a44da0bc0d7314df4ad2022020-11-25T03:07:20ZengFundação CESGRANRIOEnsaio0104-40361809-44652020-07-012810859962110.1590/s0104-40362020002802346Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment systemMaria Eugénia Ferrão0https://orcid.org/0000-0002-1317-0629Paula Prata1https://orcid.org/0000-0002-3072-0186Maria Teresa Gonzaga Alves2https://orcid.org/0000-0001-5820-4311University of Beira Interior, Covilhã/Center for Mathematics Applied to Economic Forecasting and Decision Making, Lisboa, PortugalUniversity of Beira Interior, Instituto de Telecomunicações, Covilhã, PortugalFederal University of Minas Gerais, Belo Horizonte, MG, BrazilAlmost all quantitative studies in educational assessment, evaluation and educational research are based on incomplete data sets, which have been a problem for years without a single solution. The use of big identifiable data poses new challenges in dealing with missing values. In the first part of this paper, we present the state-of-art of the topic in the Brazilian education scientific literature, and how researchers have dealt with missing data since the turn of the century. Next, we use open access software to analyze real-world data, the 2017 Prova Brasil , for several federation units to document how the naïve assumption of missing completely at random may substantially affect statistical conclusions, researcher interpretations, and subsequent implications for policy and practice. We conclude with straightforward suggestions for any education researcher on applying R routines to conduct the hypotheses test of missing completely at random and, if the null hypothesis is rejected, then how to implement the multiple imputation, which appears to be one of the most appropriate methods for handling missing data.https://www.scielo.br/scielo.php?script=sci_arttext&pid=S0104-40362020000300599&lng=pt&nrm=isoprova brasilmissing datarmultiple imputation
collection DOAJ
language English
format Article
sources DOAJ
author Maria Eugénia Ferrão
Paula Prata
Maria Teresa Gonzaga Alves
spellingShingle Maria Eugénia Ferrão
Paula Prata
Maria Teresa Gonzaga Alves
Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system
Ensaio
prova brasil
missing data
r
multiple imputation
author_facet Maria Eugénia Ferrão
Paula Prata
Maria Teresa Gonzaga Alves
author_sort Maria Eugénia Ferrão
title Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system
title_short Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system
title_full Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system
title_fullStr Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system
title_full_unstemmed Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system
title_sort multiple imputation in big identifiable data for educational research: an example from the brazilian education assessment system
publisher Fundação CESGRANRIO
series Ensaio
issn 0104-4036
1809-4465
publishDate 2020-07-01
description Almost all quantitative studies in educational assessment, evaluation and educational research are based on incomplete data sets, which have been a problem for years without a single solution. The use of big identifiable data poses new challenges in dealing with missing values. In the first part of this paper, we present the state-of-art of the topic in the Brazilian education scientific literature, and how researchers have dealt with missing data since the turn of the century. Next, we use open access software to analyze real-world data, the 2017 Prova Brasil , for several federation units to document how the naïve assumption of missing completely at random may substantially affect statistical conclusions, researcher interpretations, and subsequent implications for policy and practice. We conclude with straightforward suggestions for any education researcher on applying R routines to conduct the hypotheses test of missing completely at random and, if the null hypothesis is rejected, then how to implement the multiple imputation, which appears to be one of the most appropriate methods for handling missing data.
topic prova brasil
missing data
r
multiple imputation
url https://www.scielo.br/scielo.php?script=sci_arttext&pid=S0104-40362020000300599&lng=pt&nrm=iso
work_keys_str_mv AT mariaeugeniaferrao multipleimputationinbigidentifiabledataforeducationalresearchanexamplefromthebrazilianeducationassessmentsystem
AT paulaprata multipleimputationinbigidentifiabledataforeducationalresearchanexamplefromthebrazilianeducationassessmentsystem
AT mariateresagonzagaalves multipleimputationinbigidentifiabledataforeducationalresearchanexamplefromthebrazilianeducationassessmentsystem
_version_ 1724671107336765440