Summary: | 碩士 === 輔仁大學 === 管理學研究所 === 94 === Lung cancer is a very common and serious cancer which always among the top lists of fatal diseases. It is a very important and challenging task of lung cancer survival analysis as it can provide useful information for better diagnosis and treatments. Conventional models for survival analysis, such as Kaplan-Meier analysis and Cox propositional hazards models, are often being criticized due to their strict model assumptions and lack of classification accuracies. In order to avoid the above-mentioned drawbacks of the traditional methods, the objective of the proposed study is to build lung cancer survival classification models using commonly discussed data mining techniques.
When constructing a model based on large amount of datasets, analysis using sampling methods or reducing data dimensions may cause distorted results and lead to biased conclusions. In this study, the SEER lung cancer dataset with more than 300,000 datasets is used in building the lung cancer survivability classification model using the 10-fold cross validation approach to reduce possible bias. Five commonly adopted classification techniques including linear discriminant analysis, stepwise logistic regression, backpropagation neural networks (BPN), multivariate adaptive regression splines (MARS), and support vector machine (SVM) are used in building the
survival prediction model. Analytic results demonstrate that all five models obtain similar classification results with the correct classification rate close to 90 percent. Besides, the integrated modeling results using the obtained significant independent variables from the MARS model as the input variables of BPN and SVM models are also discussed. It can be observed that the proposed two-stage modeling procedure can save lots of model training time with similar classification capability. The research results strongly recommend that survivability classification models need to be built for better diagnosis and all the seven built models provide similar classification results.
|