Missing data management and statistical measurement of socio-economic status: application of big data
Abstract Socio-economic status measurement is an ongoing problem where different suggested measurements are given by researchers. This work investigates a socio-economic status measurement derived from natural correlations of variables which can better and meaningfully cluster African countries for...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2017-12-01
|
Series: | Journal of Big Data |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s40537-017-0099-y |
id |
doaj-5eb7ecce6074446b8c5012f8b145c5b8 |
---|---|
record_format |
Article |
spelling |
doaj-5eb7ecce6074446b8c5012f8b145c5b82020-11-24T23:24:42ZengSpringerOpenJournal of Big Data2196-11152017-12-014114410.1186/s40537-017-0099-yMissing data management and statistical measurement of socio-economic status: application of big dataHabtamu Tilaye Wubetie0Department of Statistics, Debre Markos UniversityAbstract Socio-economic status measurement is an ongoing problem where different suggested measurements are given by researchers. This work investigates a socio-economic status measurement derived from natural correlations of variables which can better and meaningfully cluster African countries for the level of status. The researcher used 48 African countries socio-economic yearly time series data from 1993 to 2013 of IMF 2013 data set for data management (i.e, 2737 variables for 21 years), however, the analysis is reasonably done based on recent 14 years time series data. In data management, missing values are treated (imputed) by using regression estimates, Lagrange interpolation, linear interpolation and linear spline interpolation based on the appropriate method which best fits for the trend of data with minimum error at each time level. From principal component and factor analysis of average time series data, 7 principal factors contributed by 84 variables which explain $$70\%$$ 70 % of the variation in the data set are suggested as a socio-economic status measuring components and as a result the considered clustering methods (K-mean Method, Average linkage method, Ward’s method and Bootstrap Ward’s method) are agreed on six clusters of countries, those are statistically significant at $$95\%$$ 95 % , where as three countries each where suggested as outlier-countries made an individual cluster.http://link.springer.com/article/10.1186/s40537-017-0099-yAfrican countriessocio-economic developmentmissing data managementPrincipal component analysisFactor analysis and cluster analysis |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Habtamu Tilaye Wubetie |
spellingShingle |
Habtamu Tilaye Wubetie Missing data management and statistical measurement of socio-economic status: application of big data Journal of Big Data African countries socio-economic development missing data management Principal component analysis Factor analysis and cluster analysis |
author_facet |
Habtamu Tilaye Wubetie |
author_sort |
Habtamu Tilaye Wubetie |
title |
Missing data management and statistical measurement of socio-economic status: application of big data |
title_short |
Missing data management and statistical measurement of socio-economic status: application of big data |
title_full |
Missing data management and statistical measurement of socio-economic status: application of big data |
title_fullStr |
Missing data management and statistical measurement of socio-economic status: application of big data |
title_full_unstemmed |
Missing data management and statistical measurement of socio-economic status: application of big data |
title_sort |
missing data management and statistical measurement of socio-economic status: application of big data |
publisher |
SpringerOpen |
series |
Journal of Big Data |
issn |
2196-1115 |
publishDate |
2017-12-01 |
description |
Abstract Socio-economic status measurement is an ongoing problem where different suggested measurements are given by researchers. This work investigates a socio-economic status measurement derived from natural correlations of variables which can better and meaningfully cluster African countries for the level of status. The researcher used 48 African countries socio-economic yearly time series data from 1993 to 2013 of IMF 2013 data set for data management (i.e, 2737 variables for 21 years), however, the analysis is reasonably done based on recent 14 years time series data. In data management, missing values are treated (imputed) by using regression estimates, Lagrange interpolation, linear interpolation and linear spline interpolation based on the appropriate method which best fits for the trend of data with minimum error at each time level. From principal component and factor analysis of average time series data, 7 principal factors contributed by 84 variables which explain $$70\%$$ 70 % of the variation in the data set are suggested as a socio-economic status measuring components and as a result the considered clustering methods (K-mean Method, Average linkage method, Ward’s method and Bootstrap Ward’s method) are agreed on six clusters of countries, those are statistically significant at $$95\%$$ 95 % , where as three countries each where suggested as outlier-countries made an individual cluster. |
topic |
African countries socio-economic development missing data management Principal component analysis Factor analysis and cluster analysis |
url |
http://link.springer.com/article/10.1186/s40537-017-0099-y |
work_keys_str_mv |
AT habtamutilayewubetie missingdatamanagementandstatisticalmeasurementofsocioeconomicstatusapplicationofbigdata |
_version_ |
1725559354620379136 |