Sample Contribution Pattern Based Big Data Mining Optimization Algorithms

As is the case in many big data mining scenarios with a large scale of samples, the heavy computation cost hinders the application of machine learning, which has to iteratively compute by passing over the whole dataset without considering the roles of different samples in training computation. Howev...

Full description

Bibliographic Details
Main Authors:	Xiaodong Shi, Yang Liu
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Big data mining gradient descent gradient reuse sample contribution pattern
Online Access:	https://ieeexplore.ieee.org/document/9359798/

Description
Summary:	As is the case in many big data mining scenarios with a large scale of samples, the heavy computation cost hinders the application of machine learning, which has to iteratively compute by passing over the whole dataset without considering the roles of different samples in training computation. However, we argue that most of the samples dominating computation resources contribute little to the gradient-based model update, particularly when the model is close to convergence. We define this observation as the Sample Contribution Pattern (SCP) in machine learning. This paper proposes two approaches to exploit SCP by detecting gradient characteristics and triggering the reuse of outdated gradients. In particular, this paper reports research results in (1) the definition and description of SCP to reveal an intrinsic gradient contribution pattern of different samples; (2) a novel SCP-based optimizing algorithm (SCPOA) that outperforms alternative tested algorithms in terms of computation overhead; (3) a variant of SCPOA that incorporates discarding-recovering mechanisms to carefully tradeoff between model accuracy and computation cost; (4) the implementation and evaluation of two algorithms based on popular distributed big data mining platforms running typical sample-sets; (5) intuitive convergence proof of the algorithms. Our experimental results illustrate that the proposed approaches can significantly reduce the computation cost with competitive accuracy.
ISSN:	2169-3536

Sample Contribution Pattern Based Big Data Mining Optimization Algorithms

Similar Items