Learning Bilingual Parsing from Parallel Corpus and Monolingual Treebank

碩士 === 國立清華大學 === 資訊工程學系 === 94 === We present a new method for learning to parse a bilingual sentence using Inversion Transduction Grammar trained on a parallel corpus and a monolingual treebank. The method produces a parse tree for a bilingual sentence, showing the shared syntactic structures of i...

Full description

Bibliographic Details
Main Authors: Chung-Chi Huang, 黃仲淇
Other Authors: Jason S. Chang
Format: Others
Language:en_US
Published: 2006
Online Access:http://ndltd.ncl.edu.tw/handle/75066565675156337686
Description
Summary:碩士 === 國立清華大學 === 資訊工程學系 === 94 === We present a new method for learning to parse a bilingual sentence using Inversion Transduction Grammar trained on a parallel corpus and a monolingual treebank. The method produces a parse tree for a bilingual sentence, showing the shared syntactic structures of indivisual sentence and the differences of word order within a syntactic structure. The method involves estimating lexical translation probability based on an existing word alignment system, and inferring probability of ITG rules. At runtime, a CYK-styled bottom-up parser is employed to construct the most probable bilingual parse tree for any given sentene pair. We also describe an implementation of the proposed method. The experimental results indicate the proposed model produces word alignments better than those produced by Giza++, a state-of-the-art word alignment system, in terms of alignment error rate and F-measure. The bilingual parse trees produced for the parallel corpus can be exploited to refine the initial ITG rules and train a decoder for statistical machine translation.