Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related Data

Distributed computing technologies allow a wide variety of tasks that use large amounts of data to be solved. Various paradigms and technologies are already widely used, but many of them are lacking when it comes to the optimization of resource usage. The aim of this paper is to present the optimiza...

Full description

Bibliographic Details
Main Authors:	Martin Sarnovsky, Marek Olejnik
Format:	Article
Language:	English
Published:	MDPI AG 2019-03-01
Series:	Informatics
Subjects:	text classification multi-label classification distributed text-mining task assignment resource optimization grid computing
Online Access:	http://www.mdpi.com/2227-9709/6/1/12

id	doaj-db6ebcbcc77d4ca79606386365c7b367
record_format	Article
spelling	doaj-db6ebcbcc77d4ca79606386365c7b3672020-11-25T01:23:29ZengMDPI AGInformatics2227-97092019-03-01611210.3390/informatics6010012informatics6010012Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related DataMartin Sarnovsky0Marek Olejnik1Department of Cybernetics and Artificial Intelligence, Technical University Košice, Letná 9/A, 040 01 Košice, SlovakiaDepartment of Cybernetics and Artificial Intelligence, Technical University Košice, Letná 9/A, 040 01 Košice, SlovakiaDistributed computing technologies allow a wide variety of tasks that use large amounts of data to be solved. Various paradigms and technologies are already widely used, but many of them are lacking when it comes to the optimization of resource usage. The aim of this paper is to present the optimization methods used to increase the efficiency of distributed implementations of a text-mining model utilizing information about the text-mining task extracted from the data and information about the current state of the distributed environment obtained from a computational node, and to improve the distribution of the task on the distributed infrastructure. Two optimization solutions are developed and implemented, both based on the prediction of the expected task duration on the existing infrastructure. The solutions are experimentally evaluated in a scenario where a distributed tree-based multi-label classifier is built based on two standard text data collections.http://www.mdpi.com/2227-9709/6/1/12text classificationmulti-label classificationdistributed text-miningtask assignmentresource optimizationgrid computing
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Martin Sarnovsky Marek Olejnik
spellingShingle	Martin Sarnovsky Marek Olejnik Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related Data Informatics text classification multi-label classification distributed text-mining task assignment resource optimization grid computing
author_facet	Martin Sarnovsky Marek Olejnik
author_sort	Martin Sarnovsky
title	Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related Data
title_short	Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related Data
title_full	Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related Data
title_fullStr	Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related Data
title_full_unstemmed	Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related Data
title_sort	improvement in the efficiency of a distributed multi-label text classification algorithm using infrastructure and task-related data
publisher	MDPI AG
series	Informatics
issn	2227-9709
publishDate	2019-03-01
description	Distributed computing technologies allow a wide variety of tasks that use large amounts of data to be solved. Various paradigms and technologies are already widely used, but many of them are lacking when it comes to the optimization of resource usage. The aim of this paper is to present the optimization methods used to increase the efficiency of distributed implementations of a text-mining model utilizing information about the text-mining task extracted from the data and information about the current state of the distributed environment obtained from a computational node, and to improve the distribution of the task on the distributed infrastructure. Two optimization solutions are developed and implemented, both based on the prediction of the expected task duration on the existing infrastructure. The solutions are experimentally evaluated in a scenario where a distributed tree-based multi-label classifier is built based on two standard text data collections.
topic	text classification multi-label classification distributed text-mining task assignment resource optimization grid computing
url	http://www.mdpi.com/2227-9709/6/1/12
work_keys_str_mv	AT martinsarnovsky improvementintheefficiencyofadistributedmultilabeltextclassificationalgorithmusinginfrastructureandtaskrelateddata AT marekolejnik improvementintheefficiencyofadistributedmultilabeltextclassificationalgorithmusinginfrastructureandtaskrelateddata
_version_	1725121989182488576

Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related Data

Similar Items