Successful Data Science Projects: Lessons Learned from Kaggle Competition

The workflow from data understanding to deployment of an analytical model of a data science project begins at framing the problem at hand, a task that is typically business-oriented and requires human-to-human interaction. However, the next three steps: data understanding, feature extraction, and mo...

Full description

Bibliographic Details
Main Authors: Mohammed Zuhair Al-Taie, Naomie Salim, Adekunle Isiaka Obasa
Format: Article
Language:English
Published: Sulaimani Polytechnic University 2017-08-01
Series:Kurdistan Journal of Applied Research
Subjects:
Online Access:http://kjar.spu.edu.iq/index.php/kjar/article/view/83
id doaj-26009821ba2b43e683a440858d11158a
record_format Article
spelling doaj-26009821ba2b43e683a440858d11158a2020-11-25T00:42:44ZengSulaimani Polytechnic UniversityKurdistan Journal of Applied Research2411-76842411-77062017-08-0123404910.24017/science.2017.3.1883Successful Data Science Projects: Lessons Learned from Kaggle CompetitionMohammed Zuhair Al-Taie0Naomie Salim1Adekunle Isiaka Obasa2Faculty of Computing Universiti Teknologi Malaysia, Johor MalaysiaFaculty of Computing Universiti Teknologi Malaysia Johor, MalaysiaDepartment of Computer Science, College of Science and Technology Kaduna Polytechnic Kaduna, NigeriaThe workflow from data understanding to deployment of an analytical model of a data science project begins at framing the problem at hand, a task that is typically business-oriented and requires human-to-human interaction. However, the next three steps: data understanding, feature extraction, and model building that come next in the pipeline are the key to successful data science projects. Failing to fully understand the requirements of each of these three steps can negatively affect the performance of the proposed system. Hence, the current study tries to answer the following question “What are the requirements of a successful data science project?” To answer this question, we will use the solution that we built to measure the relevance of local search results of small online e-businesses and submitted to Kaggle data science platform to shed light on why our solution did not achieve a top position among other competitors. Evaluation of the design that we submitted to the competition is going to be carried out in the spirit of the three winning submissions. Our results revealed that well-performed data preprocessing, well-defined features, and model ensembling are critical for building successful data science projects. Such a clarification provides insight into specific aspects of model design to help others including Kagglers avoid possible mistakes while approaching their data science projects.http://kjar.spu.edu.iq/index.php/kjar/article/view/83Data Science Pipeline, E-businesses, Kaggle Competition, Model Ensembling, Relevance Prediction.
collection DOAJ
language English
format Article
sources DOAJ
author Mohammed Zuhair Al-Taie
Naomie Salim
Adekunle Isiaka Obasa
spellingShingle Mohammed Zuhair Al-Taie
Naomie Salim
Adekunle Isiaka Obasa
Successful Data Science Projects: Lessons Learned from Kaggle Competition
Kurdistan Journal of Applied Research
Data Science Pipeline, E-businesses, Kaggle Competition, Model Ensembling, Relevance Prediction.
author_facet Mohammed Zuhair Al-Taie
Naomie Salim
Adekunle Isiaka Obasa
author_sort Mohammed Zuhair Al-Taie
title Successful Data Science Projects: Lessons Learned from Kaggle Competition
title_short Successful Data Science Projects: Lessons Learned from Kaggle Competition
title_full Successful Data Science Projects: Lessons Learned from Kaggle Competition
title_fullStr Successful Data Science Projects: Lessons Learned from Kaggle Competition
title_full_unstemmed Successful Data Science Projects: Lessons Learned from Kaggle Competition
title_sort successful data science projects: lessons learned from kaggle competition
publisher Sulaimani Polytechnic University
series Kurdistan Journal of Applied Research
issn 2411-7684
2411-7706
publishDate 2017-08-01
description The workflow from data understanding to deployment of an analytical model of a data science project begins at framing the problem at hand, a task that is typically business-oriented and requires human-to-human interaction. However, the next three steps: data understanding, feature extraction, and model building that come next in the pipeline are the key to successful data science projects. Failing to fully understand the requirements of each of these three steps can negatively affect the performance of the proposed system. Hence, the current study tries to answer the following question “What are the requirements of a successful data science project?” To answer this question, we will use the solution that we built to measure the relevance of local search results of small online e-businesses and submitted to Kaggle data science platform to shed light on why our solution did not achieve a top position among other competitors. Evaluation of the design that we submitted to the competition is going to be carried out in the spirit of the three winning submissions. Our results revealed that well-performed data preprocessing, well-defined features, and model ensembling are critical for building successful data science projects. Such a clarification provides insight into specific aspects of model design to help others including Kagglers avoid possible mistakes while approaching their data science projects.
topic Data Science Pipeline, E-businesses, Kaggle Competition, Model Ensembling, Relevance Prediction.
url http://kjar.spu.edu.iq/index.php/kjar/article/view/83
work_keys_str_mv AT mohammedzuhairaltaie successfuldatascienceprojectslessonslearnedfromkagglecompetition
AT naomiesalim successfuldatascienceprojectslessonslearnedfromkagglecompetition
AT adekunleisiakaobasa successfuldatascienceprojectslessonslearnedfromkagglecompetition
_version_ 1725280581829263360