Hybrid extraction of multi-word terms: an application on vibration-based condition monitoring technique
In this paper, we present an advanced domain-specific multi-word terminology extraction method. Our hybrid approach for automatic term identification benefits from both statistical and linguistic approaches. Our main goal is to reduce as much as possible the human effort in term selection tasks as w...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
JVE International
2021-01-01
|
Series: | Mathematical Models in Engineering |
Subjects: | |
Online Access: | https://www.jvejournals.com/article/21850 |
id |
doaj-57bc47cbcfe84d458be7b4af949e68ac |
---|---|
record_format |
Article |
spelling |
doaj-57bc47cbcfe84d458be7b4af949e68ac2021-04-01T19:07:04ZengJVE InternationalMathematical Models in Engineering2351-52792424-46272021-01-01711910.21595/mme.2021.2185021850Hybrid extraction of multi-word terms: an application on vibration-based condition monitoring techniqueKonstantinos Chatzitheodorou0Vassilios Kappatos1Department of Foreign Languages, Translation and Interpreting, Ionian University, Corfu, GR49100, GreeceHellenic Institute of Transport, Centre for Research and Technology Hellas, 6th Km Charilaou Thermi, 60361, Thermi, Thessaloniki, GreeceIn this paper, we present an advanced domain-specific multi-word terminology extraction method. Our hybrid approach for automatic term identification benefits from both statistical and linguistic approaches. Our main goal is to reduce as much as possible the human effort in term selection tasks as well as to provide a wide-range and representative terminology of a domain. We emphasize in identification of verb or noun phrases multi-word terms, in neologisms and technical jargons. Our architecture applies the term frequency-inverse document frequency (TF-IDF) algorithm to a domain-specific textual corpus in order to measure a unit’s importance in it. We also use techniques to filter out nested terms of a candidate term taking into consideration its frequency by itself in the corpus. In addition, the exported terms are filtered out based on a stop-word list and linguistic criteria. To further reduce the size of the candidate terms and achieve accurate and precise terminologies, our method automatically validates them against a general-purpose corpus. Our study based on a small corpus of vibration-based condition monitoring domain shows that most extracted terms have nice correspondence to the domain of condition monitoring concepts and notions.https://www.jvejournals.com/article/21850hybrid extractionterminologymulti-word termscondition monitoringvibrationspecialized languagesterm-basetechnical language |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Konstantinos Chatzitheodorou Vassilios Kappatos |
spellingShingle |
Konstantinos Chatzitheodorou Vassilios Kappatos Hybrid extraction of multi-word terms: an application on vibration-based condition monitoring technique Mathematical Models in Engineering hybrid extraction terminology multi-word terms condition monitoring vibration specialized languages term-base technical language |
author_facet |
Konstantinos Chatzitheodorou Vassilios Kappatos |
author_sort |
Konstantinos Chatzitheodorou |
title |
Hybrid extraction of multi-word terms: an application on vibration-based condition monitoring technique |
title_short |
Hybrid extraction of multi-word terms: an application on vibration-based condition monitoring technique |
title_full |
Hybrid extraction of multi-word terms: an application on vibration-based condition monitoring technique |
title_fullStr |
Hybrid extraction of multi-word terms: an application on vibration-based condition monitoring technique |
title_full_unstemmed |
Hybrid extraction of multi-word terms: an application on vibration-based condition monitoring technique |
title_sort |
hybrid extraction of multi-word terms: an application on vibration-based condition monitoring technique |
publisher |
JVE International |
series |
Mathematical Models in Engineering |
issn |
2351-5279 2424-4627 |
publishDate |
2021-01-01 |
description |
In this paper, we present an advanced domain-specific multi-word terminology extraction method. Our hybrid approach for automatic term identification benefits from both statistical and linguistic approaches. Our main goal is to reduce as much as possible the human effort in term selection tasks as well as to provide a wide-range and representative terminology of a domain. We emphasize in identification of verb or noun phrases multi-word terms, in neologisms and technical jargons. Our architecture applies the term frequency-inverse document frequency (TF-IDF) algorithm to a domain-specific textual corpus in order to measure a unit’s importance in it. We also use techniques to filter out nested terms of a candidate term taking into consideration its frequency by itself in the corpus. In addition, the exported terms are filtered out based on a stop-word list and linguistic criteria. To further reduce the size of the candidate terms and achieve accurate and precise terminologies, our method automatically validates them against a general-purpose corpus. Our study based on a small corpus of vibration-based condition monitoring domain shows that most extracted terms have nice correspondence to the domain of condition monitoring concepts and notions. |
topic |
hybrid extraction terminology multi-word terms condition monitoring vibration specialized languages term-base technical language |
url |
https://www.jvejournals.com/article/21850 |
work_keys_str_mv |
AT konstantinoschatzitheodorou hybridextractionofmultiwordtermsanapplicationonvibrationbasedconditionmonitoringtechnique AT vassilioskappatos hybridextractionofmultiwordtermsanapplicationonvibrationbasedconditionmonitoringtechnique |
_version_ |
1724175918322155520 |