Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing

Linguistic typology aims to capture structural and semantic variation across the world’s languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled...

Full description

Bibliographic Details
Main Authors: Ponti, Edoardo Maria, O’Horan, Helen, Berzak, Yevgeni, Vulić, Ivan, Reichart, Roi, Poibeau, Thierry, Shutova, Ekaterina, Korhonen, Anna
Format: Article
Language:English
Published: The MIT Press 2019-09-01
Series:Computational Linguistics
Online Access:https://www.mitpressjournals.org/doi/abs/10.1162/coli_a_00357
id doaj-e766c2c989b842e388991cd857e2c997
record_format Article
spelling doaj-e766c2c989b842e388991cd857e2c9972020-11-25T03:21:26ZengThe MIT PressComputational Linguistics0891-20171530-93122019-09-0145355960110.1162/coli_a_00357Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language ProcessingPonti, Edoardo MariaO’Horan, HelenBerzak, YevgeniVulić, IvanReichart, RoiPoibeau, ThierryShutova, EkaterinaKorhonen, Anna Linguistic typology aims to capture structural and semantic variation across the world’s languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-utilization of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP. In particular, we suggest that such an approach could be facilitated by recent developments in data-driven induction of typological knowledge. https://www.mitpressjournals.org/doi/abs/10.1162/coli_a_00357
collection DOAJ
language English
format Article
sources DOAJ
author Ponti, Edoardo Maria
O’Horan, Helen
Berzak, Yevgeni
Vulić, Ivan
Reichart, Roi
Poibeau, Thierry
Shutova, Ekaterina
Korhonen, Anna
spellingShingle Ponti, Edoardo Maria
O’Horan, Helen
Berzak, Yevgeni
Vulić, Ivan
Reichart, Roi
Poibeau, Thierry
Shutova, Ekaterina
Korhonen, Anna
Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing
Computational Linguistics
author_facet Ponti, Edoardo Maria
O’Horan, Helen
Berzak, Yevgeni
Vulić, Ivan
Reichart, Roi
Poibeau, Thierry
Shutova, Ekaterina
Korhonen, Anna
author_sort Ponti, Edoardo Maria
title Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing
title_short Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing
title_full Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing
title_fullStr Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing
title_full_unstemmed Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing
title_sort modeling language variation and universals: a survey on typological linguistics for natural language processing
publisher The MIT Press
series Computational Linguistics
issn 0891-2017
1530-9312
publishDate 2019-09-01
description Linguistic typology aims to capture structural and semantic variation across the world’s languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-utilization of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP. In particular, we suggest that such an approach could be facilitated by recent developments in data-driven induction of typological knowledge.
url https://www.mitpressjournals.org/doi/abs/10.1162/coli_a_00357
work_keys_str_mv AT pontiedoardomaria modelinglanguagevariationanduniversalsasurveyontypologicallinguisticsfornaturallanguageprocessing
AT ohoranhelen modelinglanguagevariationanduniversalsasurveyontypologicallinguisticsfornaturallanguageprocessing
AT berzakyevgeni modelinglanguagevariationanduniversalsasurveyontypologicallinguisticsfornaturallanguageprocessing
AT vulicivan modelinglanguagevariationanduniversalsasurveyontypologicallinguisticsfornaturallanguageprocessing
AT reichartroi modelinglanguagevariationanduniversalsasurveyontypologicallinguisticsfornaturallanguageprocessing
AT poibeauthierry modelinglanguagevariationanduniversalsasurveyontypologicallinguisticsfornaturallanguageprocessing
AT shutovaekaterina modelinglanguagevariationanduniversalsasurveyontypologicallinguisticsfornaturallanguageprocessing
AT korhonenanna modelinglanguagevariationanduniversalsasurveyontypologicallinguisticsfornaturallanguageprocessing
_version_ 1724614730850500608