Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related Data

Distributed computing technologies allow a wide variety of tasks that use large amounts of data to be solved. Various paradigms and technologies are already widely used, but many of them are lacking when it comes to the optimization of resource usage. The aim of this paper is to present the optimiza...

Full description

Bibliographic Details
Main Authors: Martin Sarnovsky, Marek Olejnik
Format: Article
Language:English
Published: MDPI AG 2019-03-01
Series:Informatics
Subjects:
Online Access:http://www.mdpi.com/2227-9709/6/1/12
id doaj-db6ebcbcc77d4ca79606386365c7b367
record_format Article
spelling doaj-db6ebcbcc77d4ca79606386365c7b3672020-11-25T01:23:29ZengMDPI AGInformatics2227-97092019-03-01611210.3390/informatics6010012informatics6010012Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related DataMartin Sarnovsky0Marek Olejnik1Department of Cybernetics and Artificial Intelligence, Technical University Košice, Letná 9/A, 040 01 Košice, SlovakiaDepartment of Cybernetics and Artificial Intelligence, Technical University Košice, Letná 9/A, 040 01 Košice, SlovakiaDistributed computing technologies allow a wide variety of tasks that use large amounts of data to be solved. Various paradigms and technologies are already widely used, but many of them are lacking when it comes to the optimization of resource usage. The aim of this paper is to present the optimization methods used to increase the efficiency of distributed implementations of a text-mining model utilizing information about the text-mining task extracted from the data and information about the current state of the distributed environment obtained from a computational node, and to improve the distribution of the task on the distributed infrastructure. Two optimization solutions are developed and implemented, both based on the prediction of the expected task duration on the existing infrastructure. The solutions are experimentally evaluated in a scenario where a distributed tree-based multi-label classifier is built based on two standard text data collections.http://www.mdpi.com/2227-9709/6/1/12text classificationmulti-label classificationdistributed text-miningtask assignmentresource optimizationgrid computing
collection DOAJ
language English
format Article
sources DOAJ
author Martin Sarnovsky
Marek Olejnik
spellingShingle Martin Sarnovsky
Marek Olejnik
Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related Data
Informatics
text classification
multi-label classification
distributed text-mining
task assignment
resource optimization
grid computing
author_facet Martin Sarnovsky
Marek Olejnik
author_sort Martin Sarnovsky
title Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related Data
title_short Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related Data
title_full Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related Data
title_fullStr Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related Data
title_full_unstemmed Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related Data
title_sort improvement in the efficiency of a distributed multi-label text classification algorithm using infrastructure and task-related data
publisher MDPI AG
series Informatics
issn 2227-9709
publishDate 2019-03-01
description Distributed computing technologies allow a wide variety of tasks that use large amounts of data to be solved. Various paradigms and technologies are already widely used, but many of them are lacking when it comes to the optimization of resource usage. The aim of this paper is to present the optimization methods used to increase the efficiency of distributed implementations of a text-mining model utilizing information about the text-mining task extracted from the data and information about the current state of the distributed environment obtained from a computational node, and to improve the distribution of the task on the distributed infrastructure. Two optimization solutions are developed and implemented, both based on the prediction of the expected task duration on the existing infrastructure. The solutions are experimentally evaluated in a scenario where a distributed tree-based multi-label classifier is built based on two standard text data collections.
topic text classification
multi-label classification
distributed text-mining
task assignment
resource optimization
grid computing
url http://www.mdpi.com/2227-9709/6/1/12
work_keys_str_mv AT martinsarnovsky improvementintheefficiencyofadistributedmultilabeltextclassificationalgorithmusinginfrastructureandtaskrelateddata
AT marekolejnik improvementintheefficiencyofadistributedmultilabeltextclassificationalgorithmusinginfrastructureandtaskrelateddata
_version_ 1725121989182488576