Hybrid extraction of multi-word terms: an application on vibration-based condition monitoring technique

In this paper, we present an advanced domain-specific multi-word terminology extraction method. Our hybrid approach for automatic term identification benefits from both statistical and linguistic approaches. Our main goal is to reduce as much as possible the human effort in term selection tasks as w...

Full description

Bibliographic Details
Main Authors: Konstantinos Chatzitheodorou, Vassilios Kappatos
Format: Article
Language:English
Published: JVE International 2021-01-01
Series:Mathematical Models in Engineering
Subjects:
Online Access:https://www.jvejournals.com/article/21850
Description
Summary:In this paper, we present an advanced domain-specific multi-word terminology extraction method. Our hybrid approach for automatic term identification benefits from both statistical and linguistic approaches. Our main goal is to reduce as much as possible the human effort in term selection tasks as well as to provide a wide-range and representative terminology of a domain. We emphasize in identification of verb or noun phrases multi-word terms, in neologisms and technical jargons. Our architecture applies the term frequency-inverse document frequency (TF-IDF) algorithm to a domain-specific textual corpus in order to measure a unit’s importance in it. We also use techniques to filter out nested terms of a candidate term taking into consideration its frequency by itself in the corpus. In addition, the exported terms are filtered out based on a stop-word list and linguistic criteria. To further reduce the size of the candidate terms and achieve accurate and precise terminologies, our method automatically validates them against a general-purpose corpus. Our study based on a small corpus of vibration-based condition monitoring domain shows that most extracted terms have nice correspondence to the domain of condition monitoring concepts and notions.
ISSN:2351-5279
2424-4627