Predicting breast cancer risk using personal health data and machine learning models.

Among women, breast cancer is a leading cause of death. Breast cancer risk predictions can inform screening and preventative actions. Previous works found that adding inputs to the widely-used Gail model improved its ability to predict breast cancer risk. However, these models used simple statistica...

Full description

Bibliographic Details
Main Authors: Gigi F Stark, Gregory R Hart, Bradley J Nartowt, Jun Deng
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2019-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0226765
id doaj-998c973930fe4c918e1750b988fc5c6f
record_format Article
spelling doaj-998c973930fe4c918e1750b988fc5c6f2021-03-04T11:20:02ZengPublic Library of Science (PLoS)PLoS ONE1932-62032019-01-011412e022676510.1371/journal.pone.0226765Predicting breast cancer risk using personal health data and machine learning models.Gigi F StarkGregory R HartBradley J NartowtJun DengAmong women, breast cancer is a leading cause of death. Breast cancer risk predictions can inform screening and preventative actions. Previous works found that adding inputs to the widely-used Gail model improved its ability to predict breast cancer risk. However, these models used simple statistical architectures and the additional inputs were derived from costly and / or invasive procedures. By contrast, we developed machine learning models that used highly accessible personal health data to predict five-year breast cancer risk. We created machine learning models using only the Gail model inputs and models using both Gail model inputs and additional personal health data relevant to breast cancer risk. For both sets of inputs, six machine learning models were trained and evaluated on the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial data set. The area under the receiver operating characteristic curve metric quantified each model's performance. Since this data set has a small percentage of positive breast cancer cases, we also reported sensitivity, specificity, and precision. We used Delong tests (p < 0.05) to compare the testing data set performance of each machine learning model to that of the Breast Cancer Risk Prediction Tool (BCRAT), an implementation of the Gail model. None of the machine learning models with only BCRAT inputs were significantly stronger than the BCRAT. However, the logistic regression, linear discriminant analysis, and neural network models with the broader set of inputs were all significantly stronger than the BCRAT. These results suggest that relative to the BCRAT, additional easy-to-obtain personal health inputs can improve five-year breast cancer risk prediction. Our models could be used as non-invasive and cost-effective risk stratification tools to increase early breast cancer detection and prevention, motivating both immediate actions like screening and long-term preventative measures such as hormone replacement therapy and chemoprevention.https://doi.org/10.1371/journal.pone.0226765
collection DOAJ
language English
format Article
sources DOAJ
author Gigi F Stark
Gregory R Hart
Bradley J Nartowt
Jun Deng
spellingShingle Gigi F Stark
Gregory R Hart
Bradley J Nartowt
Jun Deng
Predicting breast cancer risk using personal health data and machine learning models.
PLoS ONE
author_facet Gigi F Stark
Gregory R Hart
Bradley J Nartowt
Jun Deng
author_sort Gigi F Stark
title Predicting breast cancer risk using personal health data and machine learning models.
title_short Predicting breast cancer risk using personal health data and machine learning models.
title_full Predicting breast cancer risk using personal health data and machine learning models.
title_fullStr Predicting breast cancer risk using personal health data and machine learning models.
title_full_unstemmed Predicting breast cancer risk using personal health data and machine learning models.
title_sort predicting breast cancer risk using personal health data and machine learning models.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2019-01-01
description Among women, breast cancer is a leading cause of death. Breast cancer risk predictions can inform screening and preventative actions. Previous works found that adding inputs to the widely-used Gail model improved its ability to predict breast cancer risk. However, these models used simple statistical architectures and the additional inputs were derived from costly and / or invasive procedures. By contrast, we developed machine learning models that used highly accessible personal health data to predict five-year breast cancer risk. We created machine learning models using only the Gail model inputs and models using both Gail model inputs and additional personal health data relevant to breast cancer risk. For both sets of inputs, six machine learning models were trained and evaluated on the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial data set. The area under the receiver operating characteristic curve metric quantified each model's performance. Since this data set has a small percentage of positive breast cancer cases, we also reported sensitivity, specificity, and precision. We used Delong tests (p < 0.05) to compare the testing data set performance of each machine learning model to that of the Breast Cancer Risk Prediction Tool (BCRAT), an implementation of the Gail model. None of the machine learning models with only BCRAT inputs were significantly stronger than the BCRAT. However, the logistic regression, linear discriminant analysis, and neural network models with the broader set of inputs were all significantly stronger than the BCRAT. These results suggest that relative to the BCRAT, additional easy-to-obtain personal health inputs can improve five-year breast cancer risk prediction. Our models could be used as non-invasive and cost-effective risk stratification tools to increase early breast cancer detection and prevention, motivating both immediate actions like screening and long-term preventative measures such as hormone replacement therapy and chemoprevention.
url https://doi.org/10.1371/journal.pone.0226765
work_keys_str_mv AT gigifstark predictingbreastcancerriskusingpersonalhealthdataandmachinelearningmodels
AT gregoryrhart predictingbreastcancerriskusingpersonalhealthdataandmachinelearningmodels
AT bradleyjnartowt predictingbreastcancerriskusingpersonalhealthdataandmachinelearningmodels
AT jundeng predictingbreastcancerriskusingpersonalhealthdataandmachinelearningmodels
_version_ 1714803943444316160