Using Big Data-machine learning models for diabetes prediction and flight delays analytics

Abstract Introduction Nowadays large data volumes are daily generated at a high rate. Data from health system, social network, financial, government, marketing, bank transactions as well as the censors and smart devices are increasing. The tools and models have to be optimized. In this paper we appl...

Full description

Bibliographic Details
Main Authors: Thérence Nibareke, Jalal Laassiri
Format: Article
Language:English
Published: SpringerOpen 2020-09-01
Series:Journal of Big Data
Subjects:
Online Access:http://link.springer.com/article/10.1186/s40537-020-00355-0
id doaj-cced16a888da4ee1ba5257d74e9ffde8
record_format Article
spelling doaj-cced16a888da4ee1ba5257d74e9ffde82020-11-25T03:27:53ZengSpringerOpenJournal of Big Data2196-11152020-09-017111810.1186/s40537-020-00355-0Using Big Data-machine learning models for diabetes prediction and flight delays analyticsThérence Nibareke0Jalal Laassiri1Informatics Systems and Optimization Laboratory, Ibn Tofail UniversityInformatics Systems and Optimization Laboratory, Ibn Tofail UniversityAbstract Introduction Nowadays large data volumes are daily generated at a high rate. Data from health system, social network, financial, government, marketing, bank transactions as well as the censors and smart devices are increasing. The tools and models have to be optimized. In this paper we applied and compared Machine Learning algorithms (Linear Regression, Naïve bayes, Decision Tree) to predict diabetes. Further more, we performed analytics on flight delays. The main contribution of this paper is to give an overview of Big Data tools and machine learning models. We highlight some metrics that allow us to choose a more accurate model. We predict diabetes disease using three machine learning models and then compared their performance. Further more we analyzed flight delay and produced a dashboard which can help managers of flight companies to have a 360° view of their flights and take strategic decisions. Case description We applied three Machine Learning algorithms for predicting diabetes and we compared the performance to see what model give the best results. We performed analytics on flights datasets to help decision making and predict flight delays. Discussion and evaluation The experiment shows that the Linear Regression, Naive Bayesian and Decision Tree give the same accuracy (0.766) but Decision Tree outperforms the two other models with the greatest score (1) and the smallest error (0). For the flight delays analytics, the model could show for example the airport that recorded the most flight delays. Conclusions Several tools and machine learning models to deal with big data analytics have been discussed in this paper. We concluded that for the same datasets, we have to carefully choose the model to use in prediction. In our future works, we will test different models in other fields (climate, banking, insurance.).http://link.springer.com/article/10.1186/s40537-020-00355-0Big DataHadoopSparkHBaseMachine learningData analytics
collection DOAJ
language English
format Article
sources DOAJ
author Thérence Nibareke
Jalal Laassiri
spellingShingle Thérence Nibareke
Jalal Laassiri
Using Big Data-machine learning models for diabetes prediction and flight delays analytics
Journal of Big Data
Big Data
Hadoop
Spark
HBase
Machine learning
Data analytics
author_facet Thérence Nibareke
Jalal Laassiri
author_sort Thérence Nibareke
title Using Big Data-machine learning models for diabetes prediction and flight delays analytics
title_short Using Big Data-machine learning models for diabetes prediction and flight delays analytics
title_full Using Big Data-machine learning models for diabetes prediction and flight delays analytics
title_fullStr Using Big Data-machine learning models for diabetes prediction and flight delays analytics
title_full_unstemmed Using Big Data-machine learning models for diabetes prediction and flight delays analytics
title_sort using big data-machine learning models for diabetes prediction and flight delays analytics
publisher SpringerOpen
series Journal of Big Data
issn 2196-1115
publishDate 2020-09-01
description Abstract Introduction Nowadays large data volumes are daily generated at a high rate. Data from health system, social network, financial, government, marketing, bank transactions as well as the censors and smart devices are increasing. The tools and models have to be optimized. In this paper we applied and compared Machine Learning algorithms (Linear Regression, Naïve bayes, Decision Tree) to predict diabetes. Further more, we performed analytics on flight delays. The main contribution of this paper is to give an overview of Big Data tools and machine learning models. We highlight some metrics that allow us to choose a more accurate model. We predict diabetes disease using three machine learning models and then compared their performance. Further more we analyzed flight delay and produced a dashboard which can help managers of flight companies to have a 360° view of their flights and take strategic decisions. Case description We applied three Machine Learning algorithms for predicting diabetes and we compared the performance to see what model give the best results. We performed analytics on flights datasets to help decision making and predict flight delays. Discussion and evaluation The experiment shows that the Linear Regression, Naive Bayesian and Decision Tree give the same accuracy (0.766) but Decision Tree outperforms the two other models with the greatest score (1) and the smallest error (0). For the flight delays analytics, the model could show for example the airport that recorded the most flight delays. Conclusions Several tools and machine learning models to deal with big data analytics have been discussed in this paper. We concluded that for the same datasets, we have to carefully choose the model to use in prediction. In our future works, we will test different models in other fields (climate, banking, insurance.).
topic Big Data
Hadoop
Spark
HBase
Machine learning
Data analytics
url http://link.springer.com/article/10.1186/s40537-020-00355-0
work_keys_str_mv AT therencenibareke usingbigdatamachinelearningmodelsfordiabetespredictionandflightdelaysanalytics
AT jalallaassiri usingbigdatamachinelearningmodelsfordiabetespredictionandflightdelaysanalytics
_version_ 1724586596311760896