Data-oriented parsing with discontinuous constituents and function tags

Statistical parsers are e ective but are typically limited to producing projective dependencies or constituents. On the other hand, linguisti- cally rich parsers recognize non-local relations and analyze both form and function phenomena but rely on extensive manual grammar development. We combine a...

Full description

Bibliographic Details
Main Authors:	Andreas van Cranenburgh, Remko Scha, Rens Bod
Format:	Article
Language:	English
Published:	Polish Academy of Sciences 2016-04-01
Series:	Journal of Language Modelling
Subjects:	discontinuous constituents statistical parsing tree-substitution grammar
Online Access:	https://jlm.ipipan.waw.pl/index.php/JLM/article/view/100

id	doaj-e46fd530c4294e7c8e7af9ab8a9f0305
record_format	Article
spelling	doaj-e46fd530c4294e7c8e7af9ab8a9f03052021-02-25T14:50:57ZengPolish Academy of SciencesJournal of Language Modelling2299-856X2299-84702016-04-014110.15398/jlm.v4i1.10051Data-oriented parsing with discontinuous constituents and function tagsAndreas van Cranenburgh0Remko Scha1Rens Bod21. Huygens ING, Royal Dutch Academy of Science 2. Institute for Logic, Language and Computation, University of AmsterdamInstitute for Logic, Language and Computation, University of AmsterdamInstitute for Logic, Language and Computation, University of Amsterdam Statistical parsers are e ective but are typically limited to producing projective dependencies or constituents. On the other hand, linguisti- cally rich parsers recognize non-local relations and analyze both form and function phenomena but rely on extensive manual grammar development. We combine advantages of the two by building a statistical parser that produces richer analyses. We investigate new techniques to implement treebank-based parsers that allow for discontinuous constituents. We present two systems. One system is based on a string-rewriting Linear Context-Free Rewriting System (LCFRS), while using a Probabilistic Discontinuous Tree Substitution Grammar (PDTSG) to improve disambiguation performance. Another system encodes the discontinuities in the labels of phrase structure trees, allowing for efficient context-free grammar parsing. The two systems demonstrate that tree fragments as used in tree-substitution grammar improve disambiguation performance while capturing non-local relations on an as-needed basis. Additionally, we present results of models that produce function tags, resulting in a more linguistically adequate model of the data. We report substantial accuracy improvements in discontinuous parsing for German, English, and Dutch, including results on spoken Dutch.https://jlm.ipipan.waw.pl/index.php/JLM/article/view/100discontinuous constituentsstatistical parsingtree-substitution grammar
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Andreas van Cranenburgh Remko Scha Rens Bod
spellingShingle	Andreas van Cranenburgh Remko Scha Rens Bod Data-oriented parsing with discontinuous constituents and function tags Journal of Language Modelling discontinuous constituents statistical parsing tree-substitution grammar
author_facet	Andreas van Cranenburgh Remko Scha Rens Bod
author_sort	Andreas van Cranenburgh
title	Data-oriented parsing with discontinuous constituents and function tags
title_short	Data-oriented parsing with discontinuous constituents and function tags
title_full	Data-oriented parsing with discontinuous constituents and function tags
title_fullStr	Data-oriented parsing with discontinuous constituents and function tags
title_full_unstemmed	Data-oriented parsing with discontinuous constituents and function tags
title_sort	data-oriented parsing with discontinuous constituents and function tags
publisher	Polish Academy of Sciences
series	Journal of Language Modelling
issn	2299-856X 2299-8470
publishDate	2016-04-01
description	Statistical parsers are e ective but are typically limited to producing projective dependencies or constituents. On the other hand, linguisti- cally rich parsers recognize non-local relations and analyze both form and function phenomena but rely on extensive manual grammar development. We combine advantages of the two by building a statistical parser that produces richer analyses. We investigate new techniques to implement treebank-based parsers that allow for discontinuous constituents. We present two systems. One system is based on a string-rewriting Linear Context-Free Rewriting System (LCFRS), while using a Probabilistic Discontinuous Tree Substitution Grammar (PDTSG) to improve disambiguation performance. Another system encodes the discontinuities in the labels of phrase structure trees, allowing for efficient context-free grammar parsing. The two systems demonstrate that tree fragments as used in tree-substitution grammar improve disambiguation performance while capturing non-local relations on an as-needed basis. Additionally, we present results of models that produce function tags, resulting in a more linguistically adequate model of the data. We report substantial accuracy improvements in discontinuous parsing for German, English, and Dutch, including results on spoken Dutch.
topic	discontinuous constituents statistical parsing tree-substitution grammar
url	https://jlm.ipipan.waw.pl/index.php/JLM/article/view/100
work_keys_str_mv	AT andreasvancranenburgh dataorientedparsingwithdiscontinuousconstituentsandfunctiontags AT remkoscha dataorientedparsingwithdiscontinuousconstituentsandfunctiontags AT rensbod dataorientedparsingwithdiscontinuousconstituentsandfunctiontags
_version_	1724251462893043712

Data-oriented parsing with discontinuous constituents and function tags

Similar Items