Robust Model Selection for Classification of Microarrays
Recently, microarray-based cancer diagnosis systems have been increasingly investigated. However, cost reduction and reliability assurance of such diagnosis systems are still remaing problems in real clinical scenes. To reduce the cost, we need a supervised classifier involving the smallest number o...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SAGE Publishing
2009-01-01
|
Series: | Cancer Informatics |
Online Access: | https://doi.org/10.4137/CIN.S2704 |
id |
doaj-b69bc498df7b495791c5c531c4e18e4a |
---|---|
record_format |
Article |
spelling |
doaj-b69bc498df7b495791c5c531c4e18e4a2020-11-25T04:01:00ZengSAGE PublishingCancer Informatics1176-93512009-01-01710.4137/CIN.S2704Robust Model Selection for Classification of MicroarraysIkumi Suzuki0Takashi Takenouchi1Miki Ohira2Shigeyuki Oba3Shin Ishii4Graduate School of Information Science, Nara Institute of Science and Technology, Takayama, Ikoma, Nara 630-0192, Japan.Graduate School of Information Science, Nara Institute of Science and Technology, Takayama, Ikoma, Nara 630-0192, Japan.Division of Biochemistry, Chiba Cancer Center Research Institute, Chiba 260-8717, Japan.PRESTO, Japan Science and Technology Corporation.Graduate School of Informatics, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan.Recently, microarray-based cancer diagnosis systems have been increasingly investigated. However, cost reduction and reliability assurance of such diagnosis systems are still remaing problems in real clinical scenes. To reduce the cost, we need a supervised classifier involving the smallest number of genes, as long as the classifier is sufficiently reliable. To achieve a reliable classifier, we should assess candidate classifiers and select the best one. In the selection process of the best classifier, however, the assessment criterion must involve large variance because of limited number of samples and non-negligible observation noise. Therefore, even if a classifier with a very small number of genes exhibited the smallest leave-one-out cross-validation (LOO) error rate, it would not necessarily be reliable because classifiers based on a small number of genes tend to show large variance. We propose a robust model selection criterion, the min-max criterion, based on a resampling bootstrap simulation to assess the variance of estimation of classification error rates. We applied our assessment framework to four published real gene expression datasets and one synthetic dataset. We found that a state-of-the-art procedure, weighted voting classifiers with LOO criterion, had a non-negligible risk of selecting extremely poor classifiers and, on the other hand, that the new min-max criterion could eliminate that risk. These finding suggests that our criterion presents a safer procedure to design a practical cancer diagnosis system.https://doi.org/10.4137/CIN.S2704 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Ikumi Suzuki Takashi Takenouchi Miki Ohira Shigeyuki Oba Shin Ishii |
spellingShingle |
Ikumi Suzuki Takashi Takenouchi Miki Ohira Shigeyuki Oba Shin Ishii Robust Model Selection for Classification of Microarrays Cancer Informatics |
author_facet |
Ikumi Suzuki Takashi Takenouchi Miki Ohira Shigeyuki Oba Shin Ishii |
author_sort |
Ikumi Suzuki |
title |
Robust Model Selection for Classification of Microarrays |
title_short |
Robust Model Selection for Classification of Microarrays |
title_full |
Robust Model Selection for Classification of Microarrays |
title_fullStr |
Robust Model Selection for Classification of Microarrays |
title_full_unstemmed |
Robust Model Selection for Classification of Microarrays |
title_sort |
robust model selection for classification of microarrays |
publisher |
SAGE Publishing |
series |
Cancer Informatics |
issn |
1176-9351 |
publishDate |
2009-01-01 |
description |
Recently, microarray-based cancer diagnosis systems have been increasingly investigated. However, cost reduction and reliability assurance of such diagnosis systems are still remaing problems in real clinical scenes. To reduce the cost, we need a supervised classifier involving the smallest number of genes, as long as the classifier is sufficiently reliable. To achieve a reliable classifier, we should assess candidate classifiers and select the best one. In the selection process of the best classifier, however, the assessment criterion must involve large variance because of limited number of samples and non-negligible observation noise. Therefore, even if a classifier with a very small number of genes exhibited the smallest leave-one-out cross-validation (LOO) error rate, it would not necessarily be reliable because classifiers based on a small number of genes tend to show large variance. We propose a robust model selection criterion, the min-max criterion, based on a resampling bootstrap simulation to assess the variance of estimation of classification error rates. We applied our assessment framework to four published real gene expression datasets and one synthetic dataset. We found that a state-of-the-art procedure, weighted voting classifiers with LOO criterion, had a non-negligible risk of selecting extremely poor classifiers and, on the other hand, that the new min-max criterion could eliminate that risk. These finding suggests that our criterion presents a safer procedure to design a practical cancer diagnosis system. |
url |
https://doi.org/10.4137/CIN.S2704 |
work_keys_str_mv |
AT ikumisuzuki robustmodelselectionforclassificationofmicroarrays AT takashitakenouchi robustmodelselectionforclassificationofmicroarrays AT mikiohira robustmodelselectionforclassificationofmicroarrays AT shigeyukioba robustmodelselectionforclassificationofmicroarrays AT shinishii robustmodelselectionforclassificationofmicroarrays |
_version_ |
1724448064391872512 |