Parsing Chinese Sentences with Grammatical Relations

We report our work on building linguistic resources and data-driven parsers in the grammatical relation (GR) analysis for Mandarin Chinese. Chinese, as an analytic language, encodes grammatical information in a highly configurational rather than morphological way. Accordingly, it is possible and rea...

Full description

Bibliographic Details
Main Authors: Weiwei Sun, Yufei Chen, Xiaojun Wan, Meichun Liu
Format: Article
Language:English
Published: The MIT Press 2019-03-01
Series:Computational Linguistics
Online Access:https://www.mitpressjournals.org/doi/pdf/10.1162/coli_a_00343
id doaj-4a9f35009198403c88ba4acf3d5f9603
record_format Article
spelling doaj-4a9f35009198403c88ba4acf3d5f96032020-11-24T21:29:07ZengThe MIT PressComputational Linguistics1530-93122019-03-014519513610.1162/coli_a_00343coli_a_00343Parsing Chinese Sentences with Grammatical RelationsWeiwei Sun0Yufei Chen1Xiaojun Wan2Meichun Liu3Peking University, Institute of Computer Science and Technology and Center for Chinese Linguistics. ws@pku.edu.cnPeking University, Institute of Computer Science and Technology. yufei.chen@pku.edu.cnPeking University, Institute of Computer Science and Technology. wanxiaojun@pku.edu.cnCity University of Hong Kong, Department of Linguistics and Translation. meichliu@cityu.edu.hkWe report our work on building linguistic resources and data-driven parsers in the grammatical relation (GR) analysis for Mandarin Chinese. Chinese, as an analytic language, encodes grammatical information in a highly configurational rather than morphological way. Accordingly, it is possible and reasonable to represent almost all grammatical relations as bilexical dependencies. In this work, we propose to represent grammatical information using general directed dependency graphs. Both only-local and rich long-distance dependencies are explicitly represented. To create high-quality annotations, we take advantage of an existing TreeBank, namely, Chinese TreeBank (CTB), which is grounded on the Government and Binding theory. We define a set of linguistic rules to explore CTB’s implicit phrase structural information and build deep dependency graphs. The reliability of this linguistically motivated GR extraction procedure is highlighted by manual evaluation. Based on the converted corpus, data-driven, including graph- and transition-based, models are explored for Chinese GR parsing. For graph-based parsing, a new perspective, graph merging, is proposed for building flexible dependency graphs: constructing complex graphs via constructing simple subgraphs. Two key problems are discussed in this perspective: (1) how to decompose a complex graph into simple subgraphs, and (2) how to combine subgraphs into a coherent complex graph. For transition-based parsing, we introduce a neural parser based on a list-based transition system. We also discuss several other key problems, including dynamic oracle and beam search for neural transition-based parsing. Evaluation gauges how successful GR parsing for Chinese can be by applying data-driven models. The empirical analysis suggests several directions for future study.https://www.mitpressjournals.org/doi/pdf/10.1162/coli_a_00343
collection DOAJ
language English
format Article
sources DOAJ
author Weiwei Sun
Yufei Chen
Xiaojun Wan
Meichun Liu
spellingShingle Weiwei Sun
Yufei Chen
Xiaojun Wan
Meichun Liu
Parsing Chinese Sentences with Grammatical Relations
Computational Linguistics
author_facet Weiwei Sun
Yufei Chen
Xiaojun Wan
Meichun Liu
author_sort Weiwei Sun
title Parsing Chinese Sentences with Grammatical Relations
title_short Parsing Chinese Sentences with Grammatical Relations
title_full Parsing Chinese Sentences with Grammatical Relations
title_fullStr Parsing Chinese Sentences with Grammatical Relations
title_full_unstemmed Parsing Chinese Sentences with Grammatical Relations
title_sort parsing chinese sentences with grammatical relations
publisher The MIT Press
series Computational Linguistics
issn 1530-9312
publishDate 2019-03-01
description We report our work on building linguistic resources and data-driven parsers in the grammatical relation (GR) analysis for Mandarin Chinese. Chinese, as an analytic language, encodes grammatical information in a highly configurational rather than morphological way. Accordingly, it is possible and reasonable to represent almost all grammatical relations as bilexical dependencies. In this work, we propose to represent grammatical information using general directed dependency graphs. Both only-local and rich long-distance dependencies are explicitly represented. To create high-quality annotations, we take advantage of an existing TreeBank, namely, Chinese TreeBank (CTB), which is grounded on the Government and Binding theory. We define a set of linguistic rules to explore CTB’s implicit phrase structural information and build deep dependency graphs. The reliability of this linguistically motivated GR extraction procedure is highlighted by manual evaluation. Based on the converted corpus, data-driven, including graph- and transition-based, models are explored for Chinese GR parsing. For graph-based parsing, a new perspective, graph merging, is proposed for building flexible dependency graphs: constructing complex graphs via constructing simple subgraphs. Two key problems are discussed in this perspective: (1) how to decompose a complex graph into simple subgraphs, and (2) how to combine subgraphs into a coherent complex graph. For transition-based parsing, we introduce a neural parser based on a list-based transition system. We also discuss several other key problems, including dynamic oracle and beam search for neural transition-based parsing. Evaluation gauges how successful GR parsing for Chinese can be by applying data-driven models. The empirical analysis suggests several directions for future study.
url https://www.mitpressjournals.org/doi/pdf/10.1162/coli_a_00343
work_keys_str_mv AT weiweisun parsingchinesesentenceswithgrammaticalrelations
AT yufeichen parsingchinesesentenceswithgrammaticalrelations
AT xiaojunwan parsingchinesesentenceswithgrammaticalrelations
AT meichunliu parsingchinesesentenceswithgrammaticalrelations
_version_ 1725967321948749824