Parameter estimation for multistage clonal expansion models from cancer incidence data: A practical identifiability analysis.

Many cancers are understood to be the product of multiple somatic mutations or other rate-limiting events. Multistage clonal expansion (MSCE) models are a class of continuous-time Markov chain models that capture the multi-hit initiation-promotion-malignant-conversion hypothesis of carcinogenesis. T...

Full description

Bibliographic Details
Main Authors: Andrew F Brouwer, Rafael Meza, Marisa C Eisenberg
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2017-03-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC5367820?pdf=render
Description
Summary:Many cancers are understood to be the product of multiple somatic mutations or other rate-limiting events. Multistage clonal expansion (MSCE) models are a class of continuous-time Markov chain models that capture the multi-hit initiation-promotion-malignant-conversion hypothesis of carcinogenesis. These models have been used broadly to investigate the epidemiology of many cancers, assess the impact of carcinogen exposures on cancer risk, and evaluate the potential impact of cancer prevention and control strategies on cancer rates. Structural identifiability (the analysis of the maximum parametric information available for a model given perfectly measured data) of certain MSCE models has been previously investigated. However, structural identifiability is a theoretical property and does not address the limitations of real data. In this study, we use pancreatic cancer as a case study to examine the practical identifiability of the two-, three-, and four-stage clonal expansion models given age-specific cancer incidence data using a numerical profile-likelihood approach. We demonstrate that, in the case of the three- and four-stage models, several parameters that are theoretically structurally identifiable, are, in practice, unidentifiable. This result means that key parameters such as the intermediate cell mutation rates are not individually identifiable from the data and that estimation of those parameters, even if structurally identifiable, will not be stable. We also show that products of these practically unidentifiable parameters are practically identifiable, and, based on this, we propose new reparameterizations of the model hazards that resolve the parameter estimation problems. Our results highlight the importance of identifiability to the interpretation of model parameter estimates.
ISSN:1553-734X
1553-7358