Collocation extraction : a generic substitution-based approach

One of the fundamental aspects of any natural language is the set of words used within it. In addition to knowing how individual words can be combined to communicate meaning, competent language users also know a large number of specific word combinations whose grammatical or distributional behaviour...

Full description

Bibliographic Details
Main Author: Pearce, Darren Michael
Published: University of Sussex 2009
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.496797
id ndltd-bl.uk-oai-ethos.bl.uk-496797
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-4967972015-03-20T04:22:18ZCollocation extraction : a generic substitution-based approachPearce, Darren Michael2009One of the fundamental aspects of any natural language is the set of words used within it. In addition to knowing how individual words can be combined to communicate meaning, competent language users also know a large number of specific word combinations whose grammatical or distributional behaviour or meaning is idiosyncratic. This research is concerned with computational aspects of one important type of word combination: collocation. There is no agreed formal definition of collocation but it can be informally characterised as a sequence of words that occurs more often than would be expected by chance and whose combination tends to produce an element of added meaning. One of the often-cited characteristics of collocations is that they restrict substitution for their constituent words. This thesis develops a generic framework for the extraction of collocations that exploits this restriction. Experiments exploring the performance of such techniques use frequency counts derived from the WWW as well as large amounts of analysed text from conventional corpora and show that substitution-based techniques can out-perform many existing approaches to collocation extraction. The thesis concludes with a discussion of the many ways in which further research can leverage the genericity of the framework and utilise substitution for collocation extraction.401.43University of Sussexhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.496797Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 401.43
spellingShingle 401.43
Pearce, Darren Michael
Collocation extraction : a generic substitution-based approach
description One of the fundamental aspects of any natural language is the set of words used within it. In addition to knowing how individual words can be combined to communicate meaning, competent language users also know a large number of specific word combinations whose grammatical or distributional behaviour or meaning is idiosyncratic. This research is concerned with computational aspects of one important type of word combination: collocation. There is no agreed formal definition of collocation but it can be informally characterised as a sequence of words that occurs more often than would be expected by chance and whose combination tends to produce an element of added meaning. One of the often-cited characteristics of collocations is that they restrict substitution for their constituent words. This thesis develops a generic framework for the extraction of collocations that exploits this restriction. Experiments exploring the performance of such techniques use frequency counts derived from the WWW as well as large amounts of analysed text from conventional corpora and show that substitution-based techniques can out-perform many existing approaches to collocation extraction. The thesis concludes with a discussion of the many ways in which further research can leverage the genericity of the framework and utilise substitution for collocation extraction.
author Pearce, Darren Michael
author_facet Pearce, Darren Michael
author_sort Pearce, Darren Michael
title Collocation extraction : a generic substitution-based approach
title_short Collocation extraction : a generic substitution-based approach
title_full Collocation extraction : a generic substitution-based approach
title_fullStr Collocation extraction : a generic substitution-based approach
title_full_unstemmed Collocation extraction : a generic substitution-based approach
title_sort collocation extraction : a generic substitution-based approach
publisher University of Sussex
publishDate 2009
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.496797
work_keys_str_mv AT pearcedarrenmichael collocationextractionagenericsubstitutionbasedapproach
_version_ 1716784961271365632