The Potential of Automatic Word Comparison for Historical Linguistics.

The amount of data from languages spoken all over the world is rapidly increasing. Traditional manual methods in historical linguistics need to face the challenges brought by this influx of data. Automatic approaches to word comparison could provide invaluable help to pre-analyze data which can be l...

Full description

Bibliographic Details
Main Authors:	Johann-Mattis List, Simon J Greenhill, Russell D Gray
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2017-01-01
Series:	PLoS ONE
Online Access:	http://europepmc.org/articles/PMC5271327?pdf=render

id	doaj-4c6835a47a604322916aad99a4c9e556
record_format	Article
spelling	doaj-4c6835a47a604322916aad99a4c9e5562020-11-24T20:45:59ZengPublic Library of Science (PLoS)PLoS ONE1932-62032017-01-01121e017004610.1371/journal.pone.0170046The Potential of Automatic Word Comparison for Historical Linguistics.Johann-Mattis ListSimon J GreenhillRussell D GrayThe amount of data from languages spoken all over the world is rapidly increasing. Traditional manual methods in historical linguistics need to face the challenges brought by this influx of data. Automatic approaches to word comparison could provide invaluable help to pre-analyze data which can be later enhanced by experts. In this way, computational approaches can take care of the repetitive and schematic tasks leaving experts to concentrate on answering interesting questions. Here we test the potential of automatic methods to detect etymologically related words (cognates) in cross-linguistic data. Using a newly compiled database of expert cognate judgments across five different language families, we compare how well different automatic approaches distinguish related from unrelated words. Our results show that automatic methods can identify cognates with a very high degree of accuracy, reaching 89% for the best-performing method Infomap. We identify the specific strengths and weaknesses of these different methods and point to major challenges for future approaches. Current automatic approaches for cognate detection-although not perfect-could become an important component of future research in historical linguistics.http://europepmc.org/articles/PMC5271327?pdf=render
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Johann-Mattis List Simon J Greenhill Russell D Gray
spellingShingle	Johann-Mattis List Simon J Greenhill Russell D Gray The Potential of Automatic Word Comparison for Historical Linguistics. PLoS ONE
author_facet	Johann-Mattis List Simon J Greenhill Russell D Gray
author_sort	Johann-Mattis List
title	The Potential of Automatic Word Comparison for Historical Linguistics.
title_short	The Potential of Automatic Word Comparison for Historical Linguistics.
title_full	The Potential of Automatic Word Comparison for Historical Linguistics.
title_fullStr	The Potential of Automatic Word Comparison for Historical Linguistics.
title_full_unstemmed	The Potential of Automatic Word Comparison for Historical Linguistics.
title_sort	potential of automatic word comparison for historical linguistics.
publisher	Public Library of Science (PLoS)
series	PLoS ONE
issn	1932-6203
publishDate	2017-01-01
description	The amount of data from languages spoken all over the world is rapidly increasing. Traditional manual methods in historical linguistics need to face the challenges brought by this influx of data. Automatic approaches to word comparison could provide invaluable help to pre-analyze data which can be later enhanced by experts. In this way, computational approaches can take care of the repetitive and schematic tasks leaving experts to concentrate on answering interesting questions. Here we test the potential of automatic methods to detect etymologically related words (cognates) in cross-linguistic data. Using a newly compiled database of expert cognate judgments across five different language families, we compare how well different automatic approaches distinguish related from unrelated words. Our results show that automatic methods can identify cognates with a very high degree of accuracy, reaching 89% for the best-performing method Infomap. We identify the specific strengths and weaknesses of these different methods and point to major challenges for future approaches. Current automatic approaches for cognate detection-although not perfect-could become an important component of future research in historical linguistics.
url	http://europepmc.org/articles/PMC5271327?pdf=render
work_keys_str_mv	AT johannmattislist thepotentialofautomaticwordcomparisonforhistoricallinguistics AT simonjgreenhill thepotentialofautomaticwordcomparisonforhistoricallinguistics AT russelldgray thepotentialofautomaticwordcomparisonforhistoricallinguistics AT johannmattislist potentialofautomaticwordcomparisonforhistoricallinguistics AT simonjgreenhill potentialofautomaticwordcomparisonforhistoricallinguistics AT russelldgray potentialofautomaticwordcomparisonforhistoricallinguistics
_version_	1716813510491504640

The Potential of Automatic Word Comparison for Historical Linguistics.

Similar Items