A robust machine learning approach to SDG data segmentation
Abstract In the light of the recent technological advances in computing and data explosion, the complex interactions of the Sustainable Development Goals (SDG) present both a challenge and an opportunity to researchers and decision makers across fields and sectors. The deep and wide socio-economic,...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2020-11-01
|
Series: | Journal of Big Data |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s40537-020-00373-y |
id |
doaj-f142c964410b4b569b2ee954bdd22c9c |
---|---|
record_format |
Article |
spelling |
doaj-f142c964410b4b569b2ee954bdd22c9c2020-11-25T04:09:41ZengSpringerOpenJournal of Big Data2196-11152020-11-017111710.1186/s40537-020-00373-yA robust machine learning approach to SDG data segmentationKassim S. Mwitondi0Isaac Munyakazi1Barnabas N. Gatsheni2College of Business, Technology and Engineering, Sheffield Hallam UniversityMinistry of EducationDepartment of Applied Information Systems, University of JohannesburgAbstract In the light of the recent technological advances in computing and data explosion, the complex interactions of the Sustainable Development Goals (SDG) present both a challenge and an opportunity to researchers and decision makers across fields and sectors. The deep and wide socio-economic, cultural and technological variations across the globe entail a unified understanding of the SDG project. The complexity of SDGs interactions and the dynamics through their indicators align naturally to technical and application specifics that require interdisciplinary solutions. We present a consilient approach to expounding triggers of SDG indicators. Illustrated through data segmentation, it is designed to unify our understanding of the complex overlap of the SDGs by utilising data from different sources. The paper treats each SDG as a Big Data source node, with the potential to contribute towards a unified understanding of applications across the SDG spectrum. Data for five SDGs was extracted from the United Nations SDG indicators data repository and used to model spatio-temporal variations in search of robust and consilient scientific solutions. Based on a number of pre-determined assumptions on socio-economic and geo-political variations, the data is subjected to sequential analyses, exploring distributional behaviour, component extraction and clustering. All three methods exhibit pronounced variations across samples, with initial distributional and data segmentation patterns isolating South Africa from the remaining five countries. Data randomness is dealt with via a specially developed algorithm for sampling, measuring and assessing, based on repeated samples of different sizes. Results exhibit consistent variations across samples, based on socio-economic, cultural and geo-political variations entailing a unified understanding, across disciplines and sectors. The findings highlight novel paths towards attaining informative patterns for a unified understanding of the triggers of SDG indicators and open new paths to interdisciplinary research.http://link.springer.com/article/10.1186/s40537-020-00373-yBig DataConsilienceData randomnessData ScienceDevelopment Science FrameworkK-Means |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Kassim S. Mwitondi Isaac Munyakazi Barnabas N. Gatsheni |
spellingShingle |
Kassim S. Mwitondi Isaac Munyakazi Barnabas N. Gatsheni A robust machine learning approach to SDG data segmentation Journal of Big Data Big Data Consilience Data randomness Data Science Development Science Framework K-Means |
author_facet |
Kassim S. Mwitondi Isaac Munyakazi Barnabas N. Gatsheni |
author_sort |
Kassim S. Mwitondi |
title |
A robust machine learning approach to SDG data segmentation |
title_short |
A robust machine learning approach to SDG data segmentation |
title_full |
A robust machine learning approach to SDG data segmentation |
title_fullStr |
A robust machine learning approach to SDG data segmentation |
title_full_unstemmed |
A robust machine learning approach to SDG data segmentation |
title_sort |
robust machine learning approach to sdg data segmentation |
publisher |
SpringerOpen |
series |
Journal of Big Data |
issn |
2196-1115 |
publishDate |
2020-11-01 |
description |
Abstract In the light of the recent technological advances in computing and data explosion, the complex interactions of the Sustainable Development Goals (SDG) present both a challenge and an opportunity to researchers and decision makers across fields and sectors. The deep and wide socio-economic, cultural and technological variations across the globe entail a unified understanding of the SDG project. The complexity of SDGs interactions and the dynamics through their indicators align naturally to technical and application specifics that require interdisciplinary solutions. We present a consilient approach to expounding triggers of SDG indicators. Illustrated through data segmentation, it is designed to unify our understanding of the complex overlap of the SDGs by utilising data from different sources. The paper treats each SDG as a Big Data source node, with the potential to contribute towards a unified understanding of applications across the SDG spectrum. Data for five SDGs was extracted from the United Nations SDG indicators data repository and used to model spatio-temporal variations in search of robust and consilient scientific solutions. Based on a number of pre-determined assumptions on socio-economic and geo-political variations, the data is subjected to sequential analyses, exploring distributional behaviour, component extraction and clustering. All three methods exhibit pronounced variations across samples, with initial distributional and data segmentation patterns isolating South Africa from the remaining five countries. Data randomness is dealt with via a specially developed algorithm for sampling, measuring and assessing, based on repeated samples of different sizes. Results exhibit consistent variations across samples, based on socio-economic, cultural and geo-political variations entailing a unified understanding, across disciplines and sectors. The findings highlight novel paths towards attaining informative patterns for a unified understanding of the triggers of SDG indicators and open new paths to interdisciplinary research. |
topic |
Big Data Consilience Data randomness Data Science Development Science Framework K-Means |
url |
http://link.springer.com/article/10.1186/s40537-020-00373-y |
work_keys_str_mv |
AT kassimsmwitondi arobustmachinelearningapproachtosdgdatasegmentation AT isaacmunyakazi arobustmachinelearningapproachtosdgdatasegmentation AT barnabasngatsheni arobustmachinelearningapproachtosdgdatasegmentation AT kassimsmwitondi robustmachinelearningapproachtosdgdatasegmentation AT isaacmunyakazi robustmachinelearningapproachtosdgdatasegmentation AT barnabasngatsheni robustmachinelearningapproachtosdgdatasegmentation |
_version_ |
1724422222818312192 |