Predicting breast cancer risk using personal health data and machine learning models.
Among women, breast cancer is a leading cause of death. Breast cancer risk predictions can inform screening and preventative actions. Previous works found that adding inputs to the widely-used Gail model improved its ability to predict breast cancer risk. However, these models used simple statistica...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2019-01-01
|
Series: | PLoS ONE |
Online Access: | https://doi.org/10.1371/journal.pone.0226765 |
id |
doaj-998c973930fe4c918e1750b988fc5c6f |
---|---|
record_format |
Article |
spelling |
doaj-998c973930fe4c918e1750b988fc5c6f2021-03-04T11:20:02ZengPublic Library of Science (PLoS)PLoS ONE1932-62032019-01-011412e022676510.1371/journal.pone.0226765Predicting breast cancer risk using personal health data and machine learning models.Gigi F StarkGregory R HartBradley J NartowtJun DengAmong women, breast cancer is a leading cause of death. Breast cancer risk predictions can inform screening and preventative actions. Previous works found that adding inputs to the widely-used Gail model improved its ability to predict breast cancer risk. However, these models used simple statistical architectures and the additional inputs were derived from costly and / or invasive procedures. By contrast, we developed machine learning models that used highly accessible personal health data to predict five-year breast cancer risk. We created machine learning models using only the Gail model inputs and models using both Gail model inputs and additional personal health data relevant to breast cancer risk. For both sets of inputs, six machine learning models were trained and evaluated on the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial data set. The area under the receiver operating characteristic curve metric quantified each model's performance. Since this data set has a small percentage of positive breast cancer cases, we also reported sensitivity, specificity, and precision. We used Delong tests (p < 0.05) to compare the testing data set performance of each machine learning model to that of the Breast Cancer Risk Prediction Tool (BCRAT), an implementation of the Gail model. None of the machine learning models with only BCRAT inputs were significantly stronger than the BCRAT. However, the logistic regression, linear discriminant analysis, and neural network models with the broader set of inputs were all significantly stronger than the BCRAT. These results suggest that relative to the BCRAT, additional easy-to-obtain personal health inputs can improve five-year breast cancer risk prediction. Our models could be used as non-invasive and cost-effective risk stratification tools to increase early breast cancer detection and prevention, motivating both immediate actions like screening and long-term preventative measures such as hormone replacement therapy and chemoprevention.https://doi.org/10.1371/journal.pone.0226765 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Gigi F Stark Gregory R Hart Bradley J Nartowt Jun Deng |
spellingShingle |
Gigi F Stark Gregory R Hart Bradley J Nartowt Jun Deng Predicting breast cancer risk using personal health data and machine learning models. PLoS ONE |
author_facet |
Gigi F Stark Gregory R Hart Bradley J Nartowt Jun Deng |
author_sort |
Gigi F Stark |
title |
Predicting breast cancer risk using personal health data and machine learning models. |
title_short |
Predicting breast cancer risk using personal health data and machine learning models. |
title_full |
Predicting breast cancer risk using personal health data and machine learning models. |
title_fullStr |
Predicting breast cancer risk using personal health data and machine learning models. |
title_full_unstemmed |
Predicting breast cancer risk using personal health data and machine learning models. |
title_sort |
predicting breast cancer risk using personal health data and machine learning models. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS ONE |
issn |
1932-6203 |
publishDate |
2019-01-01 |
description |
Among women, breast cancer is a leading cause of death. Breast cancer risk predictions can inform screening and preventative actions. Previous works found that adding inputs to the widely-used Gail model improved its ability to predict breast cancer risk. However, these models used simple statistical architectures and the additional inputs were derived from costly and / or invasive procedures. By contrast, we developed machine learning models that used highly accessible personal health data to predict five-year breast cancer risk. We created machine learning models using only the Gail model inputs and models using both Gail model inputs and additional personal health data relevant to breast cancer risk. For both sets of inputs, six machine learning models were trained and evaluated on the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial data set. The area under the receiver operating characteristic curve metric quantified each model's performance. Since this data set has a small percentage of positive breast cancer cases, we also reported sensitivity, specificity, and precision. We used Delong tests (p < 0.05) to compare the testing data set performance of each machine learning model to that of the Breast Cancer Risk Prediction Tool (BCRAT), an implementation of the Gail model. None of the machine learning models with only BCRAT inputs were significantly stronger than the BCRAT. However, the logistic regression, linear discriminant analysis, and neural network models with the broader set of inputs were all significantly stronger than the BCRAT. These results suggest that relative to the BCRAT, additional easy-to-obtain personal health inputs can improve five-year breast cancer risk prediction. Our models could be used as non-invasive and cost-effective risk stratification tools to increase early breast cancer detection and prevention, motivating both immediate actions like screening and long-term preventative measures such as hormone replacement therapy and chemoprevention. |
url |
https://doi.org/10.1371/journal.pone.0226765 |
work_keys_str_mv |
AT gigifstark predictingbreastcancerriskusingpersonalhealthdataandmachinelearningmodels AT gregoryrhart predictingbreastcancerriskusingpersonalhealthdataandmachinelearningmodels AT bradleyjnartowt predictingbreastcancerriskusingpersonalhealthdataandmachinelearningmodels AT jundeng predictingbreastcancerriskusingpersonalhealthdataandmachinelearningmodels |
_version_ |
1714803943444316160 |