Hybrid extraction of multi-word terms: an application on vibration-based condition monitoring technique

In this paper, we present an advanced domain-specific multi-word terminology extraction method. Our hybrid approach for automatic term identification benefits from both statistical and linguistic approaches. Our main goal is to reduce as much as possible the human effort in term selection tasks as w...

Full description

Bibliographic Details
Main Authors: Konstantinos Chatzitheodorou, Vassilios Kappatos
Format: Article
Language:English
Published: JVE International 2021-01-01
Series:Mathematical Models in Engineering
Subjects:
Online Access:https://www.jvejournals.com/article/21850
id doaj-57bc47cbcfe84d458be7b4af949e68ac
record_format Article
spelling doaj-57bc47cbcfe84d458be7b4af949e68ac2021-04-01T19:07:04ZengJVE InternationalMathematical Models in Engineering2351-52792424-46272021-01-01711910.21595/mme.2021.2185021850Hybrid extraction of multi-word terms: an application on vibration-based condition monitoring techniqueKonstantinos Chatzitheodorou0Vassilios Kappatos1Department of Foreign Languages, Translation and Interpreting, Ionian University, Corfu, GR49100, GreeceHellenic Institute of Transport, Centre for Research and Technology Hellas, 6th Km Charilaou Thermi, 60361, Thermi, Thessaloniki, GreeceIn this paper, we present an advanced domain-specific multi-word terminology extraction method. Our hybrid approach for automatic term identification benefits from both statistical and linguistic approaches. Our main goal is to reduce as much as possible the human effort in term selection tasks as well as to provide a wide-range and representative terminology of a domain. We emphasize in identification of verb or noun phrases multi-word terms, in neologisms and technical jargons. Our architecture applies the term frequency-inverse document frequency (TF-IDF) algorithm to a domain-specific textual corpus in order to measure a unit’s importance in it. We also use techniques to filter out nested terms of a candidate term taking into consideration its frequency by itself in the corpus. In addition, the exported terms are filtered out based on a stop-word list and linguistic criteria. To further reduce the size of the candidate terms and achieve accurate and precise terminologies, our method automatically validates them against a general-purpose corpus. Our study based on a small corpus of vibration-based condition monitoring domain shows that most extracted terms have nice correspondence to the domain of condition monitoring concepts and notions.https://www.jvejournals.com/article/21850hybrid extractionterminologymulti-word termscondition monitoringvibrationspecialized languagesterm-basetechnical language
collection DOAJ
language English
format Article
sources DOAJ
author Konstantinos Chatzitheodorou
Vassilios Kappatos
spellingShingle Konstantinos Chatzitheodorou
Vassilios Kappatos
Hybrid extraction of multi-word terms: an application on vibration-based condition monitoring technique
Mathematical Models in Engineering
hybrid extraction
terminology
multi-word terms
condition monitoring
vibration
specialized languages
term-base
technical language
author_facet Konstantinos Chatzitheodorou
Vassilios Kappatos
author_sort Konstantinos Chatzitheodorou
title Hybrid extraction of multi-word terms: an application on vibration-based condition monitoring technique
title_short Hybrid extraction of multi-word terms: an application on vibration-based condition monitoring technique
title_full Hybrid extraction of multi-word terms: an application on vibration-based condition monitoring technique
title_fullStr Hybrid extraction of multi-word terms: an application on vibration-based condition monitoring technique
title_full_unstemmed Hybrid extraction of multi-word terms: an application on vibration-based condition monitoring technique
title_sort hybrid extraction of multi-word terms: an application on vibration-based condition monitoring technique
publisher JVE International
series Mathematical Models in Engineering
issn 2351-5279
2424-4627
publishDate 2021-01-01
description In this paper, we present an advanced domain-specific multi-word terminology extraction method. Our hybrid approach for automatic term identification benefits from both statistical and linguistic approaches. Our main goal is to reduce as much as possible the human effort in term selection tasks as well as to provide a wide-range and representative terminology of a domain. We emphasize in identification of verb or noun phrases multi-word terms, in neologisms and technical jargons. Our architecture applies the term frequency-inverse document frequency (TF-IDF) algorithm to a domain-specific textual corpus in order to measure a unit’s importance in it. We also use techniques to filter out nested terms of a candidate term taking into consideration its frequency by itself in the corpus. In addition, the exported terms are filtered out based on a stop-word list and linguistic criteria. To further reduce the size of the candidate terms and achieve accurate and precise terminologies, our method automatically validates them against a general-purpose corpus. Our study based on a small corpus of vibration-based condition monitoring domain shows that most extracted terms have nice correspondence to the domain of condition monitoring concepts and notions.
topic hybrid extraction
terminology
multi-word terms
condition monitoring
vibration
specialized languages
term-base
technical language
url https://www.jvejournals.com/article/21850
work_keys_str_mv AT konstantinoschatzitheodorou hybridextractionofmultiwordtermsanapplicationonvibrationbasedconditionmonitoringtechnique
AT vassilioskappatos hybridextractionofmultiwordtermsanapplicationonvibrationbasedconditionmonitoringtechnique
_version_ 1724175918322155520