Term acquisition : a text-probing approach

In order to assist terminologists in the compilation of terminology collections in specialist domains, a "text probing" approach to the acquisition of English terms from special language texts is specified, designed, implemented, and evaluated. This approach draws on aspects of general lan...

Full description

Bibliographic Details
Main Author: Fulford, Heather
Published: University of Surrey 1997
Subjects:
410
Online Access:https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.389321
Description
Summary:In order to assist terminologists in the compilation of terminology collections in specialist domains, a "text probing" approach to the acquisition of English terms from special language texts is specified, designed, implemented, and evaluated. This approach draws on aspects of general language corpus linguistics and computational lexicography, and follows current trends towards corpus-based terminology compilation work. Our text-probing approach is founded specifically on observations about the linguistic features of English terms and their collocational behaviour in special language texts, and represents an effort to extend the scope of existing collocation studies from general language to special language. It aims to be both domain- and text-type independent. By operating on the premise that a term is likely to reside in a special language text between boundary markers comprising closed class words/punctuation, it permits the acquisition of single- and multi-word terms spanning a range of word classes. Our approach has been implemented in a prototype computer program ("Termspotter") which has been written in Quintus Prolog. This program processes untagged special language texts, either individually or in batches. It functions by "probing" texts for closed class words and punctuation, extracting as term candidates those items which reside between them. A systematic evaluation of the text-probing approach is presented in which, using an innovative experimental design, the term acquisition efficiency of Termspotter is measured against the manual scanning output of domain experts, as well as compared with the scanning output of terminologists. Results in the special language texts studied so far indicate that, on average, Termspotter can accurately retrieve 80% of the terms identified by a domain expert, and can typically partially retrieve the remaining 20%. The program performed very favourably in comparison with human terminologists. Extensions of our text- probing approach to other languages are anticipated. Moreover, wider applications of the notion of text probing are envisaged, both within and beyond the terminology community, for abstracting other structures from special language texts.