Evaluating the use of neighborhoods for query dependent estimation of survival prognosis for oropharyngeal cancer patients

Oropharyngeal Cancer diagnoses make up three percent of all cancer diagnoses in the United States per year. Recently, there has been an increase in the incidence of HPV-associated oropharyngeal cancer, necessitating updates to prior survival estimation techniques, in order to properly account for th...

Full description

Bibliographic Details
Main Author: Shay, Keegan P.
Other Authors: Canahuate, Guadalupe
Format: Others
Language:English
Published: University of Iowa 2019
Subjects:
QED
Online Access:https://ir.uiowa.edu/etd/6854
https://ir.uiowa.edu/cgi/viewcontent.cgi?article=8388&context=etd
Description
Summary:Oropharyngeal Cancer diagnoses make up three percent of all cancer diagnoses in the United States per year. Recently, there has been an increase in the incidence of HPV-associated oropharyngeal cancer, necessitating updates to prior survival estimation techniques, in order to properly account for this shift in demographic. Clinicians depend on accurate survival prognosis estimates in order to create successful treatment plans that aim to maximize patient life while minimizing adverse treatment side effects. Additionally, recent advances in data analysis have resulted in richer and more complex data, motivating the use of more advanced data analysis techniques. Incorporation of sophisticated survival analysis techniques can leverage complex data, from a variety of sources, resulting in improved personalized prediction. Current survival prognosis prediction methods often rely on summary statistics and underlying assumptions regarding distribution or overall risk. We propose a k-nearest neighbor influenced approach for predicting oropharyngeal survival outcomes. We evaluate our approach for overall survival (OS), recurrence-free survival (RFS), and recurrence-free overall survival (RF+OS). We define two distance functions, not subject to the curse of dimensionality, in order to reconcile heterogeneous features with patient-to-patient similarity scores to produce a meaningful overall measure of distance. Using these distance functions, we obtain the k-nearest neighbors for each patient, forming neighborhoods of similar patients. We leverage these neighborhoods for prediction in two novel ensemble methods. The first ensemble method uses the nearest neighbors for each patient to combine globally trained predictions, weighted by their accuracies within a selected neighborhood. The second ensemble method combines Kaplan-Meier predictions from a variety of neighborhoods. Both proposed methods outperform an ensemble of standard global survival predictive models, with statistically significant calibration.