Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets

Since it takes time to do experiments in bioinformatics, biological datasets are sometimes small but with high dimensionality. From probability theory, in order to discover knowledge from a set of data, we have to have a sufficient number of samples. Otherwise, the error bounds can become too large...

Full description

Bibliographic Details
Main Authors: Shen Lu, Richard S. Segall
Format: Article
Language:English
Published: International Institute of Informatics and Cybernetics 2013-04-01
Series:Journal of Systemics, Cybernetics and Informatics
Subjects:
Online Access:http://www.iiisci.org/Journal/CV$/sci/pdfs/ISA619SF.pdf
id doaj-642a1297df23431ca194aaa57acebd90
record_format Article
spelling doaj-642a1297df23431ca194aaa57acebd902020-11-24T22:36:37ZengInternational Institute of Informatics and CyberneticsJournal of Systemics, Cybernetics and Informatics1690-45242013-04-011124146Multi-SOM: an Algorithm for High-Dimensional, Small Size DatasetsShen Lu0Richard S. Segall1 University of Arkansas at Little Rock Arkansas State University Since it takes time to do experiments in bioinformatics, biological datasets are sometimes small but with high dimensionality. From probability theory, in order to discover knowledge from a set of data, we have to have a sufficient number of samples. Otherwise, the error bounds can become too large to be useful. For the SOM (Self- Organizing Map) algorithm, the initial map is based on the training data. In order to avoid the bias caused by the insufficient training data, in this paper we present an algorithm, called Multi-SOM. Multi-SOM builds a number of small self-organizing maps, instead of just one big map. Bayesian decision theory is used to make the final decision among similar neurons on different maps. In this way, we can better ensure that we can get a real random initial weight vector set, the map size is less of consideration and errors tend to average out. In our experiments as applied to microarray datasets which are highly intense data composed of genetic related information, the precision of Multi-SOMs is 10.58% greater than SOMs, and its recall is 11.07% greater than SOMs. Thus, the Multi-SOMs algorithm is practical.http://www.iiisci.org/Journal/CV$/sci/pdfs/ISA619SF.pdf Weights VectorSample SelectionBayesian Decision TheoryFeature selectionSelf-Organizing Maps
collection DOAJ
language English
format Article
sources DOAJ
author Shen Lu
Richard S. Segall
spellingShingle Shen Lu
Richard S. Segall
Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets
Journal of Systemics, Cybernetics and Informatics
Weights Vector
Sample Selection
Bayesian Decision Theory
Feature selection
Self-Organizing Maps
author_facet Shen Lu
Richard S. Segall
author_sort Shen Lu
title Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets
title_short Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets
title_full Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets
title_fullStr Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets
title_full_unstemmed Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets
title_sort multi-som: an algorithm for high-dimensional, small size datasets
publisher International Institute of Informatics and Cybernetics
series Journal of Systemics, Cybernetics and Informatics
issn 1690-4524
publishDate 2013-04-01
description Since it takes time to do experiments in bioinformatics, biological datasets are sometimes small but with high dimensionality. From probability theory, in order to discover knowledge from a set of data, we have to have a sufficient number of samples. Otherwise, the error bounds can become too large to be useful. For the SOM (Self- Organizing Map) algorithm, the initial map is based on the training data. In order to avoid the bias caused by the insufficient training data, in this paper we present an algorithm, called Multi-SOM. Multi-SOM builds a number of small self-organizing maps, instead of just one big map. Bayesian decision theory is used to make the final decision among similar neurons on different maps. In this way, we can better ensure that we can get a real random initial weight vector set, the map size is less of consideration and errors tend to average out. In our experiments as applied to microarray datasets which are highly intense data composed of genetic related information, the precision of Multi-SOMs is 10.58% greater than SOMs, and its recall is 11.07% greater than SOMs. Thus, the Multi-SOMs algorithm is practical.
topic Weights Vector
Sample Selection
Bayesian Decision Theory
Feature selection
Self-Organizing Maps
url http://www.iiisci.org/Journal/CV$/sci/pdfs/ISA619SF.pdf
work_keys_str_mv AT shenlu multisomanalgorithmforhighdimensionalsmallsizedatasets
AT richardssegall multisomanalgorithmforhighdimensionalsmallsizedatasets
_version_ 1725719385941737472