Summary: | Modeling of second primary lung cancer (SPLC) patients’ survival prediction has important theoretical significance and practical needs. Cancer survivability prediction may provide advice for better clinical decisions and personalized medicine. The Surveillance, Epidemiology, and End Results (SEER) program provides large data sets for analysis with machine learning methods. SPLC cases are identified and labeled from the SEER database; the data set is then preprocessed with improved eigenvector centrality-based feature selection (IECFS). The IECFS method utilizes interclass and intraclass dispersions and the ranking criteria. By adjusting the value of the <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula> parameter and the number of features selected, the method achieves the best performance. The experiment is divided into five folds. This method yields a prediction accuracy of 90.998% for the five-year survivability that is higher than the original classification accuracy (89.16%) and the other state-of-the-art feature selection methods. For the three-year survivability, the proposed methods yields a prediction accuracy of 83.16%, slightly outperforming all of the compared methods. The method is effective and generalizable.
|