Summary: | In the field of bioinformatics, feature selection in classification of cancer is a primary area of research and utilized to select the most informative genes from thousands of genes in the microarray. Microarray data is generally noisy, is highly redundant, and has an extremely asymmetric dimensionality, as the majority of the genes present here are believed to be uninformative. The paper adopts a methodology of classification of high dimensional lung cancer microarray data utilizing feature selection and optimization techniques. The methodology is divided into two stages; firstly, the ranking of each gene is done based on the standard gene selection techniques like Information Gain, Relief–F test, Chi-square statistic, and T-statistic test. As a result, the gathering of top scored genes is assimilated, and a new feature subset is obtained. In the second stage, the new feature subset is further optimized by using swarm intelligence techniques like Grasshopper Optimization (GO), Moth Flame Optimization (MFO), Bacterial Foraging Optimization (BFO), Krill Herd Optimization (KHO), and Artificial Fish Swarm Optimization (AFSO), and finally, an optimized subset is utilized. The selected genes are used for classification, and the classifiers used here are Naïve Bayesian Classifier (NBC), Decision Trees (DT), Support Vector Machines (SVM), and K-Nearest Neighbour (KNN). The best results are shown when Relief-F test is computed with AFSO and classified with Decision Trees classifier for hundred genes, and the highest classification accuracy of 99.10% is obtained.
|