Feature-Based Decipherment for Machine Translation
Orthographic similarities across languages provide a strong signal for unsupervised probabilistic transduction (decipherment) for closely related language pairs. The existing decipherment models, however, are not well suited for exploiting these orthographic similarities. We propose a log-linear mod...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
The MIT Press
2018-09-01
|
Series: | Computational Linguistics |
Online Access: | https://www.mitpressjournals.org/doi/pdf/10.1162/coli_a_00326 |
id |
doaj-36d2bf5e9ece46c2be3554857c91789d |
---|---|
record_format |
Article |
spelling |
doaj-36d2bf5e9ece46c2be3554857c91789d2020-11-25T01:45:01ZengThe MIT PressComputational Linguistics1530-93122018-09-0144352554610.1162/coli_a_00326coli_a_00326Feature-Based Decipherment for Machine TranslationIftekhar Naim0Parker Riley1Daniel Gildea2Google. iftekhar.naim@gmail.comUniversity of Rochester, Computer Science Department. priley3@cs.rochester.eduUniversity of Rochester, Computer Science Department. gildea@cs.rochester.eduOrthographic similarities across languages provide a strong signal for unsupervised probabilistic transduction (decipherment) for closely related language pairs. The existing decipherment models, however, are not well suited for exploiting these orthographic similarities. We propose a log-linear model with latent variables that incorporates orthographic similarity features. Maximum likelihood training is computationally expensive for the proposed log-linear model. To address this challenge, we perform approximate inference via Markov chain Monte Carlo sampling and contrastive divergence. Our results show that the proposed log-linear model with contrastive divergence outperforms the existing generative decipherment models by exploiting the orthographic features. The model both scales to large vocabularies and preserves accuracy in low- and no-resource contexts.https://www.mitpressjournals.org/doi/pdf/10.1162/coli_a_00326 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Iftekhar Naim Parker Riley Daniel Gildea |
spellingShingle |
Iftekhar Naim Parker Riley Daniel Gildea Feature-Based Decipherment for Machine Translation Computational Linguistics |
author_facet |
Iftekhar Naim Parker Riley Daniel Gildea |
author_sort |
Iftekhar Naim |
title |
Feature-Based Decipherment for Machine Translation |
title_short |
Feature-Based Decipherment for Machine Translation |
title_full |
Feature-Based Decipherment for Machine Translation |
title_fullStr |
Feature-Based Decipherment for Machine Translation |
title_full_unstemmed |
Feature-Based Decipherment for Machine Translation |
title_sort |
feature-based decipherment for machine translation |
publisher |
The MIT Press |
series |
Computational Linguistics |
issn |
1530-9312 |
publishDate |
2018-09-01 |
description |
Orthographic similarities across languages provide a strong signal for unsupervised probabilistic transduction (decipherment) for closely related language pairs. The existing decipherment models, however, are not well suited for exploiting these orthographic similarities. We propose a log-linear model with latent variables that incorporates orthographic similarity features. Maximum likelihood training is computationally expensive for the proposed log-linear model. To address this challenge, we perform approximate inference via Markov chain Monte Carlo sampling and contrastive divergence. Our results show that the proposed log-linear model with contrastive divergence outperforms the existing generative decipherment models by exploiting the orthographic features. The model both scales to large vocabularies and preserves accuracy in low- and no-resource contexts. |
url |
https://www.mitpressjournals.org/doi/pdf/10.1162/coli_a_00326 |
work_keys_str_mv |
AT iftekharnaim featurebaseddeciphermentformachinetranslation AT parkerriley featurebaseddeciphermentformachinetranslation AT danielgildea featurebaseddeciphermentformachinetranslation |
_version_ |
1725025792284426240 |