Robust Model Selection for Classification of Microarrays

Recently, microarray-based cancer diagnosis systems have been increasingly investigated. However, cost reduction and reliability assurance of such diagnosis systems are still remaing problems in real clinical scenes. To reduce the cost, we need a supervised classifier involving the smallest number o...

Full description

Bibliographic Details
Main Authors: Ikumi Suzuki, Takashi Takenouchi, Miki Ohira, Shigeyuki Oba, Shin Ishii
Format: Article
Language:English
Published: SAGE Publishing 2009-01-01
Series:Cancer Informatics
Online Access:https://doi.org/10.4137/CIN.S2704
id doaj-b69bc498df7b495791c5c531c4e18e4a
record_format Article
spelling doaj-b69bc498df7b495791c5c531c4e18e4a2020-11-25T04:01:00ZengSAGE PublishingCancer Informatics1176-93512009-01-01710.4137/CIN.S2704Robust Model Selection for Classification of MicroarraysIkumi Suzuki0Takashi Takenouchi1Miki Ohira2Shigeyuki Oba3Shin Ishii4Graduate School of Information Science, Nara Institute of Science and Technology, Takayama, Ikoma, Nara 630-0192, Japan.Graduate School of Information Science, Nara Institute of Science and Technology, Takayama, Ikoma, Nara 630-0192, Japan.Division of Biochemistry, Chiba Cancer Center Research Institute, Chiba 260-8717, Japan.PRESTO, Japan Science and Technology Corporation.Graduate School of Informatics, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan.Recently, microarray-based cancer diagnosis systems have been increasingly investigated. However, cost reduction and reliability assurance of such diagnosis systems are still remaing problems in real clinical scenes. To reduce the cost, we need a supervised classifier involving the smallest number of genes, as long as the classifier is sufficiently reliable. To achieve a reliable classifier, we should assess candidate classifiers and select the best one. In the selection process of the best classifier, however, the assessment criterion must involve large variance because of limited number of samples and non-negligible observation noise. Therefore, even if a classifier with a very small number of genes exhibited the smallest leave-one-out cross-validation (LOO) error rate, it would not necessarily be reliable because classifiers based on a small number of genes tend to show large variance. We propose a robust model selection criterion, the min-max criterion, based on a resampling bootstrap simulation to assess the variance of estimation of classification error rates. We applied our assessment framework to four published real gene expression datasets and one synthetic dataset. We found that a state-of-the-art procedure, weighted voting classifiers with LOO criterion, had a non-negligible risk of selecting extremely poor classifiers and, on the other hand, that the new min-max criterion could eliminate that risk. These finding suggests that our criterion presents a safer procedure to design a practical cancer diagnosis system.https://doi.org/10.4137/CIN.S2704
collection DOAJ
language English
format Article
sources DOAJ
author Ikumi Suzuki
Takashi Takenouchi
Miki Ohira
Shigeyuki Oba
Shin Ishii
spellingShingle Ikumi Suzuki
Takashi Takenouchi
Miki Ohira
Shigeyuki Oba
Shin Ishii
Robust Model Selection for Classification of Microarrays
Cancer Informatics
author_facet Ikumi Suzuki
Takashi Takenouchi
Miki Ohira
Shigeyuki Oba
Shin Ishii
author_sort Ikumi Suzuki
title Robust Model Selection for Classification of Microarrays
title_short Robust Model Selection for Classification of Microarrays
title_full Robust Model Selection for Classification of Microarrays
title_fullStr Robust Model Selection for Classification of Microarrays
title_full_unstemmed Robust Model Selection for Classification of Microarrays
title_sort robust model selection for classification of microarrays
publisher SAGE Publishing
series Cancer Informatics
issn 1176-9351
publishDate 2009-01-01
description Recently, microarray-based cancer diagnosis systems have been increasingly investigated. However, cost reduction and reliability assurance of such diagnosis systems are still remaing problems in real clinical scenes. To reduce the cost, we need a supervised classifier involving the smallest number of genes, as long as the classifier is sufficiently reliable. To achieve a reliable classifier, we should assess candidate classifiers and select the best one. In the selection process of the best classifier, however, the assessment criterion must involve large variance because of limited number of samples and non-negligible observation noise. Therefore, even if a classifier with a very small number of genes exhibited the smallest leave-one-out cross-validation (LOO) error rate, it would not necessarily be reliable because classifiers based on a small number of genes tend to show large variance. We propose a robust model selection criterion, the min-max criterion, based on a resampling bootstrap simulation to assess the variance of estimation of classification error rates. We applied our assessment framework to four published real gene expression datasets and one synthetic dataset. We found that a state-of-the-art procedure, weighted voting classifiers with LOO criterion, had a non-negligible risk of selecting extremely poor classifiers and, on the other hand, that the new min-max criterion could eliminate that risk. These finding suggests that our criterion presents a safer procedure to design a practical cancer diagnosis system.
url https://doi.org/10.4137/CIN.S2704
work_keys_str_mv AT ikumisuzuki robustmodelselectionforclassificationofmicroarrays
AT takashitakenouchi robustmodelselectionforclassificationofmicroarrays
AT mikiohira robustmodelselectionforclassificationofmicroarrays
AT shigeyukioba robustmodelselectionforclassificationofmicroarrays
AT shinishii robustmodelselectionforclassificationofmicroarrays
_version_ 1724448064391872512