INTERNATIONAL SYSTEM OF KNOWLEDGE EXCHANGE FOR YOUNG SCIENTISTS

The paper proposes a system which is electronic data storage (of qualification works of students from different countries) and provides the capability to identify and connect young scientists conducting research on a related problem area. The purpose of developing this system is to provide opportun...

Full description

Bibliographic Details
Main Authors: Olesia Barkovska, Vladyslav Kholiev, Georgiy Ivaschenko, Dmytro Rosinskiy
Format: Article
Language:English
Published: National Technical University "Kharkiv Polytechnic Institute" 2021-02-01
Series:Сучасні інформаційні системи
Subjects:
nlp
Online Access:http://ais.khpi.edu.ua/article/view/226836/226384
id doaj-c4027de2bf2a4c30a1f37336b3636b9c
record_format Article
spelling doaj-c4027de2bf2a4c30a1f37336b3636b9c2021-05-18T05:55:05ZengNational Technical University "Kharkiv Polytechnic Institute"Сучасні інформаційні системи2522-90522021-02-0151697410.20998/2522-9052.2021.1.09INTERNATIONAL SYSTEM OF KNOWLEDGE EXCHANGE FOR YOUNG SCIENTISTSOlesia Barkovska0https://orcid.org/0000-0001-7496-4353Vladyslav Kholiev1https://orcid.org/0000-0002-9148-1561Georgiy Ivaschenko2https://orcid.org/0000-0003-1027-5262Dmytro Rosinskiy3https://orcid.org/0000-0002-0725-392XKharkiv National University of RadioElectronicsKharkiv National University of RadioElectronicsKharkiv National University of RadioElectronicsKharkiv National University of RadioElectronicsThe paper proposes a system which is electronic data storage (of qualification works of students from different countries) and provides the capability to identify and connect young scientists conducting research on a related problem area. The purpose of developing this system is to provide opportunities for knowledge exchange, research in a team on a common problem, as well as to identify scientific trends in different countries. In this paper, the preprocessing methods influence on the work of classifiers such as Logistic Regression, LSTM, BERT, LightGBM was researched. A study was conducted on the speed of classification and F1 assessment. Conclusions. Lemmatization showed to require a shorter operating time compared to steaming by almost twice and a better score by an average of 5 percent, so it was decided to use the Logistic Regression classifier with lemmatization at the stage of text preparation in the subsequent operation of the proposed ISKE.http://ais.khpi.edu.ua/article/view/226836/226384systemnlptextprocessingaccelerationshinglesproximitylikenessclassificationpreprocessinglemmatizationstemming
collection DOAJ
language English
format Article
sources DOAJ
author Olesia Barkovska
Vladyslav Kholiev
Georgiy Ivaschenko
Dmytro Rosinskiy
spellingShingle Olesia Barkovska
Vladyslav Kholiev
Georgiy Ivaschenko
Dmytro Rosinskiy
INTERNATIONAL SYSTEM OF KNOWLEDGE EXCHANGE FOR YOUNG SCIENTISTS
Сучасні інформаційні системи
system
nlp
text
processing
acceleration
shingles
proximity
likeness
classification
preprocessing
lemmatization
stemming
author_facet Olesia Barkovska
Vladyslav Kholiev
Georgiy Ivaschenko
Dmytro Rosinskiy
author_sort Olesia Barkovska
title INTERNATIONAL SYSTEM OF KNOWLEDGE EXCHANGE FOR YOUNG SCIENTISTS
title_short INTERNATIONAL SYSTEM OF KNOWLEDGE EXCHANGE FOR YOUNG SCIENTISTS
title_full INTERNATIONAL SYSTEM OF KNOWLEDGE EXCHANGE FOR YOUNG SCIENTISTS
title_fullStr INTERNATIONAL SYSTEM OF KNOWLEDGE EXCHANGE FOR YOUNG SCIENTISTS
title_full_unstemmed INTERNATIONAL SYSTEM OF KNOWLEDGE EXCHANGE FOR YOUNG SCIENTISTS
title_sort international system of knowledge exchange for young scientists
publisher National Technical University "Kharkiv Polytechnic Institute"
series Сучасні інформаційні системи
issn 2522-9052
publishDate 2021-02-01
description The paper proposes a system which is electronic data storage (of qualification works of students from different countries) and provides the capability to identify and connect young scientists conducting research on a related problem area. The purpose of developing this system is to provide opportunities for knowledge exchange, research in a team on a common problem, as well as to identify scientific trends in different countries. In this paper, the preprocessing methods influence on the work of classifiers such as Logistic Regression, LSTM, BERT, LightGBM was researched. A study was conducted on the speed of classification and F1 assessment. Conclusions. Lemmatization showed to require a shorter operating time compared to steaming by almost twice and a better score by an average of 5 percent, so it was decided to use the Logistic Regression classifier with lemmatization at the stage of text preparation in the subsequent operation of the proposed ISKE.
topic system
nlp
text
processing
acceleration
shingles
proximity
likeness
classification
preprocessing
lemmatization
stemming
url http://ais.khpi.edu.ua/article/view/226836/226384
work_keys_str_mv AT olesiabarkovska internationalsystemofknowledgeexchangeforyoungscientists
AT vladyslavkholiev internationalsystemofknowledgeexchangeforyoungscientists
AT georgiyivaschenko internationalsystemofknowledgeexchangeforyoungscientists
AT dmytrorosinskiy internationalsystemofknowledgeexchangeforyoungscientists
_version_ 1721437674021060608