Application of Machine Learning Models for Survival Prognosis in Breast Cancer Studies

The application of machine learning models for prediction and prognosis of disease development has become an irrevocable part of cancer studies aimed at improving the subsequent therapy and management of patients. The application of machine learning models for accurate prediction of survival time in...

Full description

Bibliographic Details
Main Authors: Iliyan Mihaylov, Maria Nisheva, Dimitar Vassilev
Format: Article
Language:English
Published: MDPI AG 2019-03-01
Series:Information
Subjects:
Online Access:http://www.mdpi.com/2078-2489/10/3/93
id doaj-024e3eeb5dad4d7cb1a6caed785c36de
record_format Article
spelling doaj-024e3eeb5dad4d7cb1a6caed785c36de2020-11-24T23:09:39ZengMDPI AGInformation2078-24892019-03-011039310.3390/info10030093info10030093Application of Machine Learning Models for Survival Prognosis in Breast Cancer StudiesIliyan Mihaylov0Maria Nisheva1Dimitar Vassilev2Faculty of Mathematics and Informatics, Sofia University “St. Kliment Ohridski”, 5 James Bourchier Blvd., Sofia 1164, BulgariaFaculty of Mathematics and Informatics, Sofia University “St. Kliment Ohridski”, 5 James Bourchier Blvd., Sofia 1164, BulgariaFaculty of Mathematics and Informatics, Sofia University “St. Kliment Ohridski”, 5 James Bourchier Blvd., Sofia 1164, BulgariaThe application of machine learning models for prediction and prognosis of disease development has become an irrevocable part of cancer studies aimed at improving the subsequent therapy and management of patients. The application of machine learning models for accurate prediction of survival time in breast cancer on the basis of clinical data is the main objective of the presented study. The paper discusses an approach to the problem in which the main factor used to predict survival time is the originally developed tumor-integrated clinical feature, which combines tumor stage, tumor size, and age at diagnosis. Two datasets from corresponding breast cancer studies are united by applying a data integration approach based on horizontal and vertical integration by using proper document-oriented and graph databases which show good performance and no data losses. Aside from data normalization and classification, the applied machine learning methods provide promising results in terms of accuracy of survival time prediction. The analysis of our experiments shows an advantage of the linear Support Vector Regression, Lasso regression, Kernel Ridge regression, K-neighborhood regression, and Decision Tree regression—these models achieve most accurate survival prognosis results. The cross-validation for accuracy demonstrates best performance of the same models on the studied breast cancer data. As a support for the proposed approach, a Python-based workflow has been developed and the plans for its further improvement are finally discussed in the paper.http://www.mdpi.com/2078-2489/10/3/93bioinformaticsmachine learningbreast cancersurvival time prognosiscross-validation
collection DOAJ
language English
format Article
sources DOAJ
author Iliyan Mihaylov
Maria Nisheva
Dimitar Vassilev
spellingShingle Iliyan Mihaylov
Maria Nisheva
Dimitar Vassilev
Application of Machine Learning Models for Survival Prognosis in Breast Cancer Studies
Information
bioinformatics
machine learning
breast cancer
survival time prognosis
cross-validation
author_facet Iliyan Mihaylov
Maria Nisheva
Dimitar Vassilev
author_sort Iliyan Mihaylov
title Application of Machine Learning Models for Survival Prognosis in Breast Cancer Studies
title_short Application of Machine Learning Models for Survival Prognosis in Breast Cancer Studies
title_full Application of Machine Learning Models for Survival Prognosis in Breast Cancer Studies
title_fullStr Application of Machine Learning Models for Survival Prognosis in Breast Cancer Studies
title_full_unstemmed Application of Machine Learning Models for Survival Prognosis in Breast Cancer Studies
title_sort application of machine learning models for survival prognosis in breast cancer studies
publisher MDPI AG
series Information
issn 2078-2489
publishDate 2019-03-01
description The application of machine learning models for prediction and prognosis of disease development has become an irrevocable part of cancer studies aimed at improving the subsequent therapy and management of patients. The application of machine learning models for accurate prediction of survival time in breast cancer on the basis of clinical data is the main objective of the presented study. The paper discusses an approach to the problem in which the main factor used to predict survival time is the originally developed tumor-integrated clinical feature, which combines tumor stage, tumor size, and age at diagnosis. Two datasets from corresponding breast cancer studies are united by applying a data integration approach based on horizontal and vertical integration by using proper document-oriented and graph databases which show good performance and no data losses. Aside from data normalization and classification, the applied machine learning methods provide promising results in terms of accuracy of survival time prediction. The analysis of our experiments shows an advantage of the linear Support Vector Regression, Lasso regression, Kernel Ridge regression, K-neighborhood regression, and Decision Tree regression—these models achieve most accurate survival prognosis results. The cross-validation for accuracy demonstrates best performance of the same models on the studied breast cancer data. As a support for the proposed approach, a Python-based workflow has been developed and the plans for its further improvement are finally discussed in the paper.
topic bioinformatics
machine learning
breast cancer
survival time prognosis
cross-validation
url http://www.mdpi.com/2078-2489/10/3/93
work_keys_str_mv AT iliyanmihaylov applicationofmachinelearningmodelsforsurvivalprognosisinbreastcancerstudies
AT marianisheva applicationofmachinelearningmodelsforsurvivalprognosisinbreastcancerstudies
AT dimitarvassilev applicationofmachinelearningmodelsforsurvivalprognosisinbreastcancerstudies
_version_ 1725609998378074112