Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets

Since it takes time to do experiments in bioinformatics, biological datasets are sometimes small but with high dimensionality. From probability theory, in order to discover knowledge from a set of data, we have to have a sufficient number of samples. Otherwise, the error bounds can become too large...

Full description

Bibliographic Details
Main Authors:	Shen Lu, Richard S. Segall
Format:	Article
Language:	English
Published:	International Institute of Informatics and Cybernetics 2013-04-01
Series:	Journal of Systemics, Cybernetics and Informatics
Subjects:	Weights Vector Sample Selection Bayesian Decision Theory Feature selection Self-Organizing Maps
Online Access:	http://www.iiisci.org/Journal/CV$/sci/pdfs/ISA619SF.pdf

id	doaj-642a1297df23431ca194aaa57acebd90
record_format	Article
spelling	doaj-642a1297df23431ca194aaa57acebd902020-11-24T22:36:37ZengInternational Institute of Informatics and CyberneticsJournal of Systemics, Cybernetics and Informatics1690-45242013-04-011124146Multi-SOM: an Algorithm for High-Dimensional, Small Size DatasetsShen Lu0Richard S. Segall1 University of Arkansas at Little Rock Arkansas State University Since it takes time to do experiments in bioinformatics, biological datasets are sometimes small but with high dimensionality. From probability theory, in order to discover knowledge from a set of data, we have to have a sufficient number of samples. Otherwise, the error bounds can become too large to be useful. For the SOM (Self- Organizing Map) algorithm, the initial map is based on the training data. In order to avoid the bias caused by the insufficient training data, in this paper we present an algorithm, called Multi-SOM. Multi-SOM builds a number of small self-organizing maps, instead of just one big map. Bayesian decision theory is used to make the final decision among similar neurons on different maps. In this way, we can better ensure that we can get a real random initial weight vector set, the map size is less of consideration and errors tend to average out. In our experiments as applied to microarray datasets which are highly intense data composed of genetic related information, the precision of Multi-SOMs is 10.58% greater than SOMs, and its recall is 11.07% greater than SOMs. Thus, the Multi-SOMs algorithm is practical.http://www.iiisci.org/Journal/CV$/sci/pdfs/ISA619SF.pdf Weights VectorSample SelectionBayesian Decision TheoryFeature selectionSelf-Organizing Maps
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Shen Lu Richard S. Segall
spellingShingle	Shen Lu Richard S. Segall Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets Journal of Systemics, Cybernetics and Informatics Weights Vector Sample Selection Bayesian Decision Theory Feature selection Self-Organizing Maps
author_facet	Shen Lu Richard S. Segall
author_sort	Shen Lu
title	Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets
title_short	Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets
title_full	Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets
title_fullStr	Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets
title_full_unstemmed	Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets
title_sort	multi-som: an algorithm for high-dimensional, small size datasets
publisher	International Institute of Informatics and Cybernetics
series	Journal of Systemics, Cybernetics and Informatics
issn	1690-4524
publishDate	2013-04-01
description	Since it takes time to do experiments in bioinformatics, biological datasets are sometimes small but with high dimensionality. From probability theory, in order to discover knowledge from a set of data, we have to have a sufficient number of samples. Otherwise, the error bounds can become too large to be useful. For the SOM (Self- Organizing Map) algorithm, the initial map is based on the training data. In order to avoid the bias caused by the insufficient training data, in this paper we present an algorithm, called Multi-SOM. Multi-SOM builds a number of small self-organizing maps, instead of just one big map. Bayesian decision theory is used to make the final decision among similar neurons on different maps. In this way, we can better ensure that we can get a real random initial weight vector set, the map size is less of consideration and errors tend to average out. In our experiments as applied to microarray datasets which are highly intense data composed of genetic related information, the precision of Multi-SOMs is 10.58% greater than SOMs, and its recall is 11.07% greater than SOMs. Thus, the Multi-SOMs algorithm is practical.
topic	Weights Vector Sample Selection Bayesian Decision Theory Feature selection Self-Organizing Maps
url	http://www.iiisci.org/Journal/CV$/sci/pdfs/ISA619SF.pdf
work_keys_str_mv	AT shenlu multisomanalgorithmforhighdimensionalsmallsizedatasets AT richardssegall multisomanalgorithmforhighdimensionalsmallsizedatasets
_version_	1725719385941737472

Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets

Similar Items