Bayesian Learning of Latent Representations of Language Structures

We borrow the concept of representation learning from deep learning research, and we argue that the quest for Greenbergian implicational universals can be reformulated as the learning of good latent representations of languages, or sequences of surface typological features. By projecting languages i...

Full description

Bibliographic Details
Main Author:	Yugo Murawaki
Format:	Article
Language:	English
Published:	The MIT Press 2019-06-01
Series:	Computational Linguistics
Online Access:	https://www.mitpressjournals.org/doi/pdf/10.1162/coli_a_00346

id	doaj-902bb8ada4b345a9a4d44e1ee004df64
record_format	Article
spelling	doaj-902bb8ada4b345a9a4d44e1ee004df642020-11-25T01:52:00ZengThe MIT PressComputational Linguistics1530-93122019-06-0145219922810.1162/coli_a_00346coli_a_00346Bayesian Learning of Latent Representations of Language StructuresYugo Murawaki0Kyoto University, Graduate School of Informatics. murawaki@i.kyoto-u.ac.jpWe borrow the concept of representation learning from deep learning research, and we argue that the quest for Greenbergian implicational universals can be reformulated as the learning of good latent representations of languages, or sequences of surface typological features. By projecting languages into latent representations and performing inference in the latent space, we can handle complex dependencies among features in an implicit manner. The most challenging problem in turning the idea into a concrete computational model is the alarmingly large number of missing values in existing typological databases. To address this problem, we keep the number of model parameters relatively small to avoid overfitting, adopt the Bayesian learning framework for its robustness, and exploit phylogenetically and/or spatially related languages as additional clues. Experiments show that the proposed model recovers missing values more accurately than others and that some latent variables exhibit phylogenetic and spatial signals comparable to those of surface features.https://www.mitpressjournals.org/doi/pdf/10.1162/coli_a_00346
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Yugo Murawaki
spellingShingle	Yugo Murawaki Bayesian Learning of Latent Representations of Language Structures Computational Linguistics
author_facet	Yugo Murawaki
author_sort	Yugo Murawaki
title	Bayesian Learning of Latent Representations of Language Structures
title_short	Bayesian Learning of Latent Representations of Language Structures
title_full	Bayesian Learning of Latent Representations of Language Structures
title_fullStr	Bayesian Learning of Latent Representations of Language Structures
title_full_unstemmed	Bayesian Learning of Latent Representations of Language Structures
title_sort	bayesian learning of latent representations of language structures
publisher	The MIT Press
series	Computational Linguistics
issn	1530-9312
publishDate	2019-06-01
description	We borrow the concept of representation learning from deep learning research, and we argue that the quest for Greenbergian implicational universals can be reformulated as the learning of good latent representations of languages, or sequences of surface typological features. By projecting languages into latent representations and performing inference in the latent space, we can handle complex dependencies among features in an implicit manner. The most challenging problem in turning the idea into a concrete computational model is the alarmingly large number of missing values in existing typological databases. To address this problem, we keep the number of model parameters relatively small to avoid overfitting, adopt the Bayesian learning framework for its robustness, and exploit phylogenetically and/or spatially related languages as additional clues. Experiments show that the proposed model recovers missing values more accurately than others and that some latent variables exhibit phylogenetic and spatial signals comparable to those of surface features.
url	https://www.mitpressjournals.org/doi/pdf/10.1162/coli_a_00346
work_keys_str_mv	AT yugomurawaki bayesianlearningoflatentrepresentationsoflanguagestructures
_version_	1724995473352163328

Bayesian Learning of Latent Representations of Language Structures

Similar Items