Multiple Suboptimal Solutions for Prediction Rules in Gene Expression Data
This paper discusses mathematical and statistical aspects in analysis methods applied to microarray gene expressions. We focus on pattern recognition to extract informative features embedded in the data for prediction of phenotypes. It has been pointed out that there are severely difficult problems...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2013-01-01
|
Series: | Computational and Mathematical Methods in Medicine |
Online Access: | http://dx.doi.org/10.1155/2013/798189 |
id |
doaj-66cdb1fb8ade48ecb07d90fc1c1a8046 |
---|---|
record_format |
Article |
spelling |
doaj-66cdb1fb8ade48ecb07d90fc1c1a80462020-11-24T21:30:00ZengHindawi LimitedComputational and Mathematical Methods in Medicine1748-670X1748-67182013-01-01201310.1155/2013/798189798189Multiple Suboptimal Solutions for Prediction Rules in Gene Expression DataOsamu Komori0Mari Pritchard1Shinto Eguchi2The Institute of Statistical Mathematics, Midori-cho, Tachikawa, Tokyo 190-8562, JapanCLC Bio Japan, Inc., Daikanyama Park Side Village 204, 9-8 Sarugakucho, Shibuya-ku, Tokyo 150-0033, JapanThe Institute of Statistical Mathematics, Midori-cho, Tachikawa, Tokyo 190-8562, JapanThis paper discusses mathematical and statistical aspects in analysis methods applied to microarray gene expressions. We focus on pattern recognition to extract informative features embedded in the data for prediction of phenotypes. It has been pointed out that there are severely difficult problems due to the unbalance in the number of observed genes compared with the number of observed subjects. We make a reanalysis of microarray gene expression published data to detect many other gene sets with almost the same performance. We conclude in the current stage that it is not possible to extract only informative genes with high performance in the all observed genes. We investigate the reason why this difficulty still exists even though there are actively proposed analysis methods and learning algorithms in statistical machine learning approaches. We focus on the mutual coherence or the absolute value of the Pearson correlations between two genes and describe the distributions of the correlation for the selected set of genes and the total set. We show that the problem of finding informative genes in high dimensional data is ill-posed and that the difficulty is closely related with the mutual coherence.http://dx.doi.org/10.1155/2013/798189 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Osamu Komori Mari Pritchard Shinto Eguchi |
spellingShingle |
Osamu Komori Mari Pritchard Shinto Eguchi Multiple Suboptimal Solutions for Prediction Rules in Gene Expression Data Computational and Mathematical Methods in Medicine |
author_facet |
Osamu Komori Mari Pritchard Shinto Eguchi |
author_sort |
Osamu Komori |
title |
Multiple Suboptimal Solutions for Prediction Rules in Gene Expression Data |
title_short |
Multiple Suboptimal Solutions for Prediction Rules in Gene Expression Data |
title_full |
Multiple Suboptimal Solutions for Prediction Rules in Gene Expression Data |
title_fullStr |
Multiple Suboptimal Solutions for Prediction Rules in Gene Expression Data |
title_full_unstemmed |
Multiple Suboptimal Solutions for Prediction Rules in Gene Expression Data |
title_sort |
multiple suboptimal solutions for prediction rules in gene expression data |
publisher |
Hindawi Limited |
series |
Computational and Mathematical Methods in Medicine |
issn |
1748-670X 1748-6718 |
publishDate |
2013-01-01 |
description |
This paper discusses mathematical and statistical aspects in analysis methods applied to microarray gene expressions. We focus on pattern recognition to extract informative features embedded in the data for prediction of phenotypes. It has been pointed out that there are severely difficult problems due to the unbalance in the number of observed genes compared with the number of observed subjects. We make a reanalysis of microarray gene expression published data to detect many other gene sets with almost the same performance. We conclude in the current stage that it is not possible to extract only informative genes with high performance in the all observed genes. We investigate the reason why this difficulty still exists even though there are actively proposed analysis methods and learning algorithms in statistical machine learning approaches. We focus on the mutual coherence or the absolute value of the Pearson correlations between two genes and describe the distributions of the correlation for the selected set of genes and the total set. We show that the problem of finding informative genes in high dimensional data is ill-posed and that the difficulty is closely related with the mutual coherence. |
url |
http://dx.doi.org/10.1155/2013/798189 |
work_keys_str_mv |
AT osamukomori multiplesuboptimalsolutionsforpredictionrulesingeneexpressiondata AT maripritchard multiplesuboptimalsolutionsforpredictionrulesingeneexpressiondata AT shintoeguchi multiplesuboptimalsolutionsforpredictionrulesingeneexpressiondata |
_version_ |
1725964594937069568 |