Parallel Distance-Based Instance Selection Algorithm for Feed-Forward Neural Network

Instance selection endeavors to decide which instances from the data set should be maintained for further use during the learning process. It can result in increased generalization of the learning model, shorter time of the learning process, or scaling up to large data sources. This paper presents a...

Full description

Bibliographic Details
Main Author: Fuangkhon Piyabute
Format: Article
Language:English
Published: De Gruyter 2017-04-01
Series:Journal of Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1515/jisys-2015-0039
Description
Summary:Instance selection endeavors to decide which instances from the data set should be maintained for further use during the learning process. It can result in increased generalization of the learning model, shorter time of the learning process, or scaling up to large data sources. This paper presents a parallel distance-based instance selection approach for a feed-forward neural network (FFNN), which can utilize all available processing power to reduce the data set while obtaining similar levels of classification accuracy as when the original data set is used. The algorithm identifies the instances at the decision boundary between consecutive classes of data, which are essential for placing hyperplane decision surfaces, and retains these instances in the reduced data set (subset). Each identified instance, called a prototype, is one of the representatives of the decision boundary of its class that constitutes the shape or distribution model of the data set. No feature or dimension is sacrificed in the reduction process. Regarding reduction capability, the algorithm obtains approximately 85% reduction power on non-overlapping two-class synthetic data sets, 70% reduction power on highly overlapping two-class synthetic data sets, and 77% reduction power on multiclass real-world data sets. Regarding generalization, the reduced data sets obtain similar levels of classification accuracy as when the original data set is used on both FFNN and support vector machine. Regarding execution time requirement, the speedup of the parallel algorithm over the serial algorithm is proportional to the number of threads the processor can run concurrently.
ISSN:0334-1860
2191-026X