Analysis of Learning Influence of Training Data Selected by Distribution Consistency

This study suggests a method to select core data that will be helpful for machine learning. Specifically, we form a two-dimensional distribution based on the similarity of the training data and compose grids with fixed ratios on the distribution. In each grid, we select data based on the distributio...

Full description

Bibliographic Details
Main Authors:	Myunggwon Hwang, Yuna Jeong, Won-Kyung Sung
Format:	Article
Language:	English
Published:	MDPI AG 2021-02-01
Series:	Sensors
Subjects:	learning influence machine learning training data similarity distribution consistency
Online Access:	https://www.mdpi.com/1424-8220/21/4/1045

id	doaj-495f1940e1ce4d4fa197830c36eb565b
record_format	Article
spelling	doaj-495f1940e1ce4d4fa197830c36eb565b2021-02-05T00:00:23ZengMDPI AGSensors1424-82202021-02-01211045104510.3390/s21041045Analysis of Learning Influence of Training Data Selected by Distribution ConsistencyMyunggwon Hwang0Yuna Jeong1Won-Kyung Sung2Intelligent Infrastructure Technology Research Center, Korea Institute of Science and Technology Information (KISTI), Daejeon 34141, KoreaIntelligent Infrastructure Technology Research Center, Korea Institute of Science and Technology Information (KISTI), Daejeon 34141, KoreaIntelligent Infrastructure Technology Research Center, Korea Institute of Science and Technology Information (KISTI), Daejeon 34141, KoreaThis study suggests a method to select core data that will be helpful for machine learning. Specifically, we form a two-dimensional distribution based on the similarity of the training data and compose grids with fixed ratios on the distribution. In each grid, we select data based on the distribution consistency (DC) of the target class data and examine how it affects the classifier. We use CIFAR-10 for the experiment and set various grid ratios from 0.5 to 0.005. The influences of these variables were analyzed with the use of different training data sizes selected based on high-DC, low-DC (inverse of high DC), and random (no criteria) selections. As a result, the average point accuracy at 0.95% (±0.65) and the point accuracy at 1.54% (±0.59) improved for the grid configurations of 0.008 and 0.005, respectively. These outcomes justify an improved performance compared with that of the existing approach (data distribution search). In this study, we confirmed that the learning performance improved when the training data were selected for very small grid and high-DC settings.https://www.mdpi.com/1424-8220/21/4/1045learning influencemachine learningtraining data similaritydistribution consistency
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Myunggwon Hwang Yuna Jeong Won-Kyung Sung
spellingShingle	Myunggwon Hwang Yuna Jeong Won-Kyung Sung Analysis of Learning Influence of Training Data Selected by Distribution Consistency Sensors learning influence machine learning training data similarity distribution consistency
author_facet	Myunggwon Hwang Yuna Jeong Won-Kyung Sung
author_sort	Myunggwon Hwang
title	Analysis of Learning Influence of Training Data Selected by Distribution Consistency
title_short	Analysis of Learning Influence of Training Data Selected by Distribution Consistency
title_full	Analysis of Learning Influence of Training Data Selected by Distribution Consistency
title_fullStr	Analysis of Learning Influence of Training Data Selected by Distribution Consistency
title_full_unstemmed	Analysis of Learning Influence of Training Data Selected by Distribution Consistency
title_sort	analysis of learning influence of training data selected by distribution consistency
publisher	MDPI AG
series	Sensors
issn	1424-8220
publishDate	2021-02-01
description	This study suggests a method to select core data that will be helpful for machine learning. Specifically, we form a two-dimensional distribution based on the similarity of the training data and compose grids with fixed ratios on the distribution. In each grid, we select data based on the distribution consistency (DC) of the target class data and examine how it affects the classifier. We use CIFAR-10 for the experiment and set various grid ratios from 0.5 to 0.005. The influences of these variables were analyzed with the use of different training data sizes selected based on high-DC, low-DC (inverse of high DC), and random (no criteria) selections. As a result, the average point accuracy at 0.95% (±0.65) and the point accuracy at 1.54% (±0.59) improved for the grid configurations of 0.008 and 0.005, respectively. These outcomes justify an improved performance compared with that of the existing approach (data distribution search). In this study, we confirmed that the learning performance improved when the training data were selected for very small grid and high-DC settings.
topic	learning influence machine learning training data similarity distribution consistency
url	https://www.mdpi.com/1424-8220/21/4/1045
work_keys_str_mv	AT myunggwonhwang analysisoflearninginfluenceoftrainingdataselectedbydistributionconsistency AT yunajeong analysisoflearninginfluenceoftrainingdataselectedbydistributionconsistency AT wonkyungsung analysisoflearninginfluenceoftrainingdataselectedbydistributionconsistency
_version_	1724284635325661184

Analysis of Learning Influence of Training Data Selected by Distribution Consistency

Similar Items