Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets
Since it takes time to do experiments in bioinformatics, biological datasets are sometimes small but with high dimensionality. From probability theory, in order to discover knowledge from a set of data, we have to have a sufficient number of samples. Otherwise, the error bounds can become too large...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
International Institute of Informatics and Cybernetics
2013-04-01
|
Series: | Journal of Systemics, Cybernetics and Informatics |
Subjects: | |
Online Access: | http://www.iiisci.org/Journal/CV$/sci/pdfs/ISA619SF.pdf
|
id |
doaj-642a1297df23431ca194aaa57acebd90 |
---|---|
record_format |
Article |
spelling |
doaj-642a1297df23431ca194aaa57acebd902020-11-24T22:36:37ZengInternational Institute of Informatics and CyberneticsJournal of Systemics, Cybernetics and Informatics1690-45242013-04-011124146Multi-SOM: an Algorithm for High-Dimensional, Small Size DatasetsShen Lu0Richard S. Segall1 University of Arkansas at Little Rock Arkansas State University Since it takes time to do experiments in bioinformatics, biological datasets are sometimes small but with high dimensionality. From probability theory, in order to discover knowledge from a set of data, we have to have a sufficient number of samples. Otherwise, the error bounds can become too large to be useful. For the SOM (Self- Organizing Map) algorithm, the initial map is based on the training data. In order to avoid the bias caused by the insufficient training data, in this paper we present an algorithm, called Multi-SOM. Multi-SOM builds a number of small self-organizing maps, instead of just one big map. Bayesian decision theory is used to make the final decision among similar neurons on different maps. In this way, we can better ensure that we can get a real random initial weight vector set, the map size is less of consideration and errors tend to average out. In our experiments as applied to microarray datasets which are highly intense data composed of genetic related information, the precision of Multi-SOMs is 10.58% greater than SOMs, and its recall is 11.07% greater than SOMs. Thus, the Multi-SOMs algorithm is practical.http://www.iiisci.org/Journal/CV$/sci/pdfs/ISA619SF.pdf Weights VectorSample SelectionBayesian Decision TheoryFeature selectionSelf-Organizing Maps |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Shen Lu Richard S. Segall |
spellingShingle |
Shen Lu Richard S. Segall Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets Journal of Systemics, Cybernetics and Informatics Weights Vector Sample Selection Bayesian Decision Theory Feature selection Self-Organizing Maps |
author_facet |
Shen Lu Richard S. Segall |
author_sort |
Shen Lu |
title |
Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets |
title_short |
Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets |
title_full |
Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets |
title_fullStr |
Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets |
title_full_unstemmed |
Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets |
title_sort |
multi-som: an algorithm for high-dimensional, small size datasets |
publisher |
International Institute of Informatics and Cybernetics |
series |
Journal of Systemics, Cybernetics and Informatics |
issn |
1690-4524 |
publishDate |
2013-04-01 |
description |
Since it takes time to do experiments in bioinformatics, biological datasets are sometimes small but with high dimensionality. From probability theory, in order to discover knowledge from a set of data, we have to have a sufficient number of samples. Otherwise, the error bounds can become too large to be useful. For the SOM (Self- Organizing Map) algorithm, the initial map is based on the training data. In order to avoid the bias caused by the insufficient training data, in this paper we present an algorithm, called Multi-SOM. Multi-SOM builds a number of small self-organizing maps, instead of just one big map. Bayesian decision theory is used to make the final decision among similar neurons on different maps. In this way, we can better ensure that we can get a real random initial weight vector set, the map size is less of consideration and errors tend to average out. In our experiments as applied to microarray datasets which are highly intense data composed of genetic related information, the precision of Multi-SOMs is 10.58% greater than SOMs, and its recall is 11.07% greater than SOMs. Thus, the Multi-SOMs algorithm is practical. |
topic |
Weights Vector Sample Selection Bayesian Decision Theory Feature selection Self-Organizing Maps |
url |
http://www.iiisci.org/Journal/CV$/sci/pdfs/ISA619SF.pdf
|
work_keys_str_mv |
AT shenlu multisomanalgorithmforhighdimensionalsmallsizedatasets AT richardssegall multisomanalgorithmforhighdimensionalsmallsizedatasets |
_version_ |
1725719385941737472 |