A Statistical Model for Lost Language Decipherment
URL to paper listed on conference site
Main Authors: | , , |
---|---|
Other Authors: | , |
Format: | Article |
Language: | English |
Published: |
Association for Computational Linguistics,
2011-05-10T17:57:45Z.
|
Subjects: | |
Online Access: | Get fulltext |
Summary: | URL to paper listed on conference site In this paper we propose a method for the automatic decipherment of lost langauges. Given a non-parallel corpus in a known related language, our model produces both alphabetic mappings and translations of words into their corresponding cognates. We employ a non-parametric Bayesian framework to simultaneously capture both low-level character mappings and high-level morphemic correspondences. This formulation enables us to encode some of the linguistic intuitions that have guided human decipherers. When applied to the ancient Semitic language Ugaritic, the model correctly maps nearly all letters to their Hebrew counterparts, and deduces the correct Hebrew cognate for over half of the Ugaritic words which have cognates in Hebrew. National Science Foundation (U.S.) (CAREER grant IIS-0448168) National Science Foundation (U.S.) (Career award IIS 0835445) |
---|