Summary: | Analytic measurement of serum tumour markers is one of commonly used methods for cancer risk management in certain areas of the world (e.g. Taiwan). Recently, cancer screening based on multiple serum tumour markers has been frequently discussed. However, the risk-benefit outcomes appear to be unfavourable for patients because of the low sensitivity and specificity. In this study, cancer screening models based on multiple serum tumour markers were designed using machine learning methods, namely support vector machine (SVM), k-nearest neighbour (KNN), and logistic regression, to improve the screening performance for multiple cancers in a large asymptomatic population.AFP, CEA, CA19-9, CYFRA21-1, and SCC were determined for 20 696 eligible individuals. PSA was measured in men and CA15-3 and CA125 in women. A variable selection process was applied to select robust variables from these serum tumour markers to design cancer detection models. The sensitivity, specificity, positive predictive value (PPV), negative predictive value, area under the curve, and Youden index of the models based on single tumour markers, combined test, and machine learning methods were compared. Moreover, relative risk reduction, absolute risk reduction (ARR), and absolute risk increase (ARI) were evaluated.To design cancer detection models using machine learning methods, CYFRA21-1 and SCC were selected for women, and all tumour markers were selected for men. SVM and KNN models significantly outperformed the single tumour markers and the combined test for men. All 3 studied machine learning methods outperformed single tumour markers and the combined test for women. For either men or women, the ARRs were between 0.003-0.008; the ARIs were between 0.119-0.306.Machine learning methods outperformed the combined test in analysing multiple tumour markers for cancer detection. However, cancer screening based solely on the application of multiple tumour markers remains unfavourable because of the inadequate PPV, ARR, and ARI, even when machine learning methods were incorporated into the analysis.
|