Missing data management and statistical measurement of socio-economic status: application of big data

Abstract Socio-economic status measurement is an ongoing problem where different suggested measurements are given by researchers. This work investigates a socio-economic status measurement derived from natural correlations of variables which can better and meaningfully cluster African countries for...

Full description

Bibliographic Details
Main Author:	Habtamu Tilaye Wubetie
Format:	Article
Language:	English
Published:	SpringerOpen 2017-12-01
Series:	Journal of Big Data
Subjects:	African countries socio-economic development missing data management Principal component analysis Factor analysis and cluster analysis
Online Access:	http://link.springer.com/article/10.1186/s40537-017-0099-y

id	doaj-5eb7ecce6074446b8c5012f8b145c5b8
record_format	Article
spelling	doaj-5eb7ecce6074446b8c5012f8b145c5b82020-11-24T23:24:42ZengSpringerOpenJournal of Big Data2196-11152017-12-014114410.1186/s40537-017-0099-yMissing data management and statistical measurement of socio-economic status: application of big dataHabtamu Tilaye Wubetie0Department of Statistics, Debre Markos UniversityAbstract Socio-economic status measurement is an ongoing problem where different suggested measurements are given by researchers. This work investigates a socio-economic status measurement derived from natural correlations of variables which can better and meaningfully cluster African countries for the level of status. The researcher used 48 African countries socio-economic yearly time series data from 1993 to 2013 of IMF 2013 data set for data management (i.e, 2737 variables for 21 years), however, the analysis is reasonably done based on recent 14 years time series data. In data management, missing values are treated (imputed) by using regression estimates, Lagrange interpolation, linear interpolation and linear spline interpolation based on the appropriate method which best fits for the trend of data with minimum error at each time level. From principal component and factor analysis of average time series data, 7 principal factors contributed by 84 variables which explain $$70\%$$ 70 % of the variation in the data set are suggested as a socio-economic status measuring components and as a result the considered clustering methods (K-mean Method, Average linkage method, Ward’s method and Bootstrap Ward’s method) are agreed on six clusters of countries, those are statistically significant at $$95\%$$ 95 % , where as three countries each where suggested as outlier-countries made an individual cluster.http://link.springer.com/article/10.1186/s40537-017-0099-yAfrican countriessocio-economic developmentmissing data managementPrincipal component analysisFactor analysis and cluster analysis
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Habtamu Tilaye Wubetie
spellingShingle	Habtamu Tilaye Wubetie Missing data management and statistical measurement of socio-economic status: application of big data Journal of Big Data African countries socio-economic development missing data management Principal component analysis Factor analysis and cluster analysis
author_facet	Habtamu Tilaye Wubetie
author_sort	Habtamu Tilaye Wubetie
title	Missing data management and statistical measurement of socio-economic status: application of big data
title_short	Missing data management and statistical measurement of socio-economic status: application of big data
title_full	Missing data management and statistical measurement of socio-economic status: application of big data
title_fullStr	Missing data management and statistical measurement of socio-economic status: application of big data
title_full_unstemmed	Missing data management and statistical measurement of socio-economic status: application of big data
title_sort	missing data management and statistical measurement of socio-economic status: application of big data
publisher	SpringerOpen
series	Journal of Big Data
issn	2196-1115
publishDate	2017-12-01
description	Abstract Socio-economic status measurement is an ongoing problem where different suggested measurements are given by researchers. This work investigates a socio-economic status measurement derived from natural correlations of variables which can better and meaningfully cluster African countries for the level of status. The researcher used 48 African countries socio-economic yearly time series data from 1993 to 2013 of IMF 2013 data set for data management (i.e, 2737 variables for 21 years), however, the analysis is reasonably done based on recent 14 years time series data. In data management, missing values are treated (imputed) by using regression estimates, Lagrange interpolation, linear interpolation and linear spline interpolation based on the appropriate method which best fits for the trend of data with minimum error at each time level. From principal component and factor analysis of average time series data, 7 principal factors contributed by 84 variables which explain $$70\%$$ 70 % of the variation in the data set are suggested as a socio-economic status measuring components and as a result the considered clustering methods (K-mean Method, Average linkage method, Ward’s method and Bootstrap Ward’s method) are agreed on six clusters of countries, those are statistically significant at $$95\%$$ 95 % , where as three countries each where suggested as outlier-countries made an individual cluster.
topic	African countries socio-economic development missing data management Principal component analysis Factor analysis and cluster analysis
url	http://link.springer.com/article/10.1186/s40537-017-0099-y
work_keys_str_mv	AT habtamutilayewubetie missingdatamanagementandstatisticalmeasurementofsocioeconomicstatusapplicationofbigdata
_version_	1725559354620379136

Missing data management and statistical measurement of socio-economic status: application of big data

Similar Items