On the accuracy of language trees.

Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or characters for many different...

Full description

Bibliographic Details
Main Authors:	Simone Pompei, Vittorio Loreto, Francesca Tria
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2011-01-01
Series:	PLoS ONE
Online Access:	http://europepmc.org/articles/PMC3108590?pdf=render

id	doaj-01f785d9f2c346619ea2966f06326938
record_format	Article
spelling	doaj-01f785d9f2c346619ea2966f063269382020-11-25T02:00:15ZengPublic Library of Science (PLoS)PLoS ONE1932-62032011-01-0166e2010910.1371/journal.pone.0020109On the accuracy of language trees.Simone PompeiVittorio LoretoFrancesca TriaHistorical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or characters for many different languages: a set of parallel corpora whose compilation represents a paramount achievement in linguistics. From this perspective the reconstruction of language trees is an example of inverse problems: starting from present, incomplete and often noisy, information, one aims at inferring the most likely past evolutionary history. A fundamental issue in inverse problems is the evaluation of the inference made. A standard way of dealing with this question is to generate data with artificial models in order to have full access to the evolutionary process one is going to infer. This procedure presents an intrinsic limitation: when dealing with real data sets, one typically does not know which model of evolution is the most suitable for them. A possible way out is to compare algorithmic inference with expert classifications. This is the point of view we take here by conducting a thorough survey of the accuracy of reconstruction methods as compared with the Ethnologue expert classifications. We focus in particular on state-of-the-art distance-based methods for phylogeny reconstruction using worldwide linguistic databases. In order to assess the accuracy of the inferred trees we introduce and characterize two generalizations of standard definitions of distances between trees. Based on these scores we quantify the relative performances of the distance-based algorithms considered. Further we quantify how the completeness and the coverage of the available databases affect the accuracy of the reconstruction. Finally we draw some conclusions about where the accuracy of the reconstructions in historical linguistics stands and about the leading directions to improve it.http://europepmc.org/articles/PMC3108590?pdf=render
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Simone Pompei Vittorio Loreto Francesca Tria
spellingShingle	Simone Pompei Vittorio Loreto Francesca Tria On the accuracy of language trees. PLoS ONE
author_facet	Simone Pompei Vittorio Loreto Francesca Tria
author_sort	Simone Pompei
title	On the accuracy of language trees.
title_short	On the accuracy of language trees.
title_full	On the accuracy of language trees.
title_fullStr	On the accuracy of language trees.
title_full_unstemmed	On the accuracy of language trees.
title_sort	on the accuracy of language trees.
publisher	Public Library of Science (PLoS)
series	PLoS ONE
issn	1932-6203
publishDate	2011-01-01
description	Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or characters for many different languages: a set of parallel corpora whose compilation represents a paramount achievement in linguistics. From this perspective the reconstruction of language trees is an example of inverse problems: starting from present, incomplete and often noisy, information, one aims at inferring the most likely past evolutionary history. A fundamental issue in inverse problems is the evaluation of the inference made. A standard way of dealing with this question is to generate data with artificial models in order to have full access to the evolutionary process one is going to infer. This procedure presents an intrinsic limitation: when dealing with real data sets, one typically does not know which model of evolution is the most suitable for them. A possible way out is to compare algorithmic inference with expert classifications. This is the point of view we take here by conducting a thorough survey of the accuracy of reconstruction methods as compared with the Ethnologue expert classifications. We focus in particular on state-of-the-art distance-based methods for phylogeny reconstruction using worldwide linguistic databases. In order to assess the accuracy of the inferred trees we introduce and characterize two generalizations of standard definitions of distances between trees. Based on these scores we quantify the relative performances of the distance-based algorithms considered. Further we quantify how the completeness and the coverage of the available databases affect the accuracy of the reconstruction. Finally we draw some conclusions about where the accuracy of the reconstructions in historical linguistics stands and about the leading directions to improve it.
url	http://europepmc.org/articles/PMC3108590?pdf=render
work_keys_str_mv	AT simonepompei ontheaccuracyoflanguagetrees AT vittorioloreto ontheaccuracyoflanguagetrees AT francescatria ontheaccuracyoflanguagetrees
_version_	1724961820508160000

On the accuracy of language trees.

Similar Items