Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related Data
Distributed computing technologies allow a wide variety of tasks that use large amounts of data to be solved. Various paradigms and technologies are already widely used, but many of them are lacking when it comes to the optimization of resource usage. The aim of this paper is to present the optimiza...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2019-03-01
|
Series: | Informatics |
Subjects: | |
Online Access: | http://www.mdpi.com/2227-9709/6/1/12 |
id |
doaj-db6ebcbcc77d4ca79606386365c7b367 |
---|---|
record_format |
Article |
spelling |
doaj-db6ebcbcc77d4ca79606386365c7b3672020-11-25T01:23:29ZengMDPI AGInformatics2227-97092019-03-01611210.3390/informatics6010012informatics6010012Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related DataMartin Sarnovsky0Marek Olejnik1Department of Cybernetics and Artificial Intelligence, Technical University Košice, Letná 9/A, 040 01 Košice, SlovakiaDepartment of Cybernetics and Artificial Intelligence, Technical University Košice, Letná 9/A, 040 01 Košice, SlovakiaDistributed computing technologies allow a wide variety of tasks that use large amounts of data to be solved. Various paradigms and technologies are already widely used, but many of them are lacking when it comes to the optimization of resource usage. The aim of this paper is to present the optimization methods used to increase the efficiency of distributed implementations of a text-mining model utilizing information about the text-mining task extracted from the data and information about the current state of the distributed environment obtained from a computational node, and to improve the distribution of the task on the distributed infrastructure. Two optimization solutions are developed and implemented, both based on the prediction of the expected task duration on the existing infrastructure. The solutions are experimentally evaluated in a scenario where a distributed tree-based multi-label classifier is built based on two standard text data collections.http://www.mdpi.com/2227-9709/6/1/12text classificationmulti-label classificationdistributed text-miningtask assignmentresource optimizationgrid computing |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Martin Sarnovsky Marek Olejnik |
spellingShingle |
Martin Sarnovsky Marek Olejnik Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related Data Informatics text classification multi-label classification distributed text-mining task assignment resource optimization grid computing |
author_facet |
Martin Sarnovsky Marek Olejnik |
author_sort |
Martin Sarnovsky |
title |
Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related Data |
title_short |
Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related Data |
title_full |
Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related Data |
title_fullStr |
Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related Data |
title_full_unstemmed |
Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related Data |
title_sort |
improvement in the efficiency of a distributed multi-label text classification algorithm using infrastructure and task-related data |
publisher |
MDPI AG |
series |
Informatics |
issn |
2227-9709 |
publishDate |
2019-03-01 |
description |
Distributed computing technologies allow a wide variety of tasks that use large amounts of data to be solved. Various paradigms and technologies are already widely used, but many of them are lacking when it comes to the optimization of resource usage. The aim of this paper is to present the optimization methods used to increase the efficiency of distributed implementations of a text-mining model utilizing information about the text-mining task extracted from the data and information about the current state of the distributed environment obtained from a computational node, and to improve the distribution of the task on the distributed infrastructure. Two optimization solutions are developed and implemented, both based on the prediction of the expected task duration on the existing infrastructure. The solutions are experimentally evaluated in a scenario where a distributed tree-based multi-label classifier is built based on two standard text data collections. |
topic |
text classification multi-label classification distributed text-mining task assignment resource optimization grid computing |
url |
http://www.mdpi.com/2227-9709/6/1/12 |
work_keys_str_mv |
AT martinsarnovsky improvementintheefficiencyofadistributedmultilabeltextclassificationalgorithmusinginfrastructureandtaskrelateddata AT marekolejnik improvementintheefficiencyofadistributedmultilabeltextclassificationalgorithmusinginfrastructureandtaskrelateddata |
_version_ |
1725121989182488576 |