Missing data management and statistical measurement of socio-economic status: application of big data

Abstract Socio-economic status measurement is an ongoing problem where different suggested measurements are given by researchers. This work investigates a socio-economic status measurement derived from natural correlations of variables which can better and meaningfully cluster African countries for...

Full description

Bibliographic Details
Main Author: Habtamu Tilaye Wubetie
Format: Article
Language:English
Published: SpringerOpen 2017-12-01
Series:Journal of Big Data
Subjects:
Online Access:http://link.springer.com/article/10.1186/s40537-017-0099-y
id doaj-5eb7ecce6074446b8c5012f8b145c5b8
record_format Article
spelling doaj-5eb7ecce6074446b8c5012f8b145c5b82020-11-24T23:24:42ZengSpringerOpenJournal of Big Data2196-11152017-12-014114410.1186/s40537-017-0099-yMissing data management and statistical measurement of socio-economic status: application of big dataHabtamu Tilaye Wubetie0Department of Statistics, Debre Markos UniversityAbstract Socio-economic status measurement is an ongoing problem where different suggested measurements are given by researchers. This work investigates a socio-economic status measurement derived from natural correlations of variables which can better and meaningfully cluster African countries for the level of status. The researcher used 48 African countries socio-economic yearly time series data from 1993 to 2013 of IMF 2013 data set for data management (i.e, 2737 variables for 21 years), however, the analysis is reasonably done based on recent 14 years time series data. In data management, missing values are treated (imputed) by using regression estimates, Lagrange interpolation, linear interpolation and linear spline interpolation based on the appropriate method which best fits for the trend of data with minimum error at each time level. From principal component and factor analysis of average time series data, 7 principal factors contributed by 84 variables which explain $$70\%$$ 70 % of the variation in the data set are suggested as a socio-economic status measuring components and as a result the considered clustering methods (K-mean Method, Average linkage method, Ward’s method and Bootstrap Ward’s method) are agreed on six clusters of countries, those are statistically significant at $$95\%$$ 95 % , where as three countries each where suggested as outlier-countries made an individual cluster.http://link.springer.com/article/10.1186/s40537-017-0099-yAfrican countriessocio-economic developmentmissing data managementPrincipal component analysisFactor analysis and cluster analysis
collection DOAJ
language English
format Article
sources DOAJ
author Habtamu Tilaye Wubetie
spellingShingle Habtamu Tilaye Wubetie
Missing data management and statistical measurement of socio-economic status: application of big data
Journal of Big Data
African countries
socio-economic development
missing data management
Principal component analysis
Factor analysis and cluster analysis
author_facet Habtamu Tilaye Wubetie
author_sort Habtamu Tilaye Wubetie
title Missing data management and statistical measurement of socio-economic status: application of big data
title_short Missing data management and statistical measurement of socio-economic status: application of big data
title_full Missing data management and statistical measurement of socio-economic status: application of big data
title_fullStr Missing data management and statistical measurement of socio-economic status: application of big data
title_full_unstemmed Missing data management and statistical measurement of socio-economic status: application of big data
title_sort missing data management and statistical measurement of socio-economic status: application of big data
publisher SpringerOpen
series Journal of Big Data
issn 2196-1115
publishDate 2017-12-01
description Abstract Socio-economic status measurement is an ongoing problem where different suggested measurements are given by researchers. This work investigates a socio-economic status measurement derived from natural correlations of variables which can better and meaningfully cluster African countries for the level of status. The researcher used 48 African countries socio-economic yearly time series data from 1993 to 2013 of IMF 2013 data set for data management (i.e, 2737 variables for 21 years), however, the analysis is reasonably done based on recent 14 years time series data. In data management, missing values are treated (imputed) by using regression estimates, Lagrange interpolation, linear interpolation and linear spline interpolation based on the appropriate method which best fits for the trend of data with minimum error at each time level. From principal component and factor analysis of average time series data, 7 principal factors contributed by 84 variables which explain $$70\%$$ 70 % of the variation in the data set are suggested as a socio-economic status measuring components and as a result the considered clustering methods (K-mean Method, Average linkage method, Ward’s method and Bootstrap Ward’s method) are agreed on six clusters of countries, those are statistically significant at $$95\%$$ 95 % , where as three countries each where suggested as outlier-countries made an individual cluster.
topic African countries
socio-economic development
missing data management
Principal component analysis
Factor analysis and cluster analysis
url http://link.springer.com/article/10.1186/s40537-017-0099-y
work_keys_str_mv AT habtamutilayewubetie missingdatamanagementandstatisticalmeasurementofsocioeconomicstatusapplicationofbigdata
_version_ 1725559354620379136