Generalising Ward's Method for Use with Manhattan Distances.

The claim that Ward's linkage algorithm in hierarchical clustering is limited to use with Euclidean distances is investigated. In this paper, Ward's clustering algorithm is generalised to use with l1 norm or Manhattan distances. We argue that the generalisation of Ward's linkage metho...

Full description

Bibliographic Details
Main Authors: Trudie Strauss, Michael Johan von Maltitz
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2017-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC5235383?pdf=render
id doaj-c2a58292bff14c9caa11fbc4e4d112a8
record_format Article
spelling doaj-c2a58292bff14c9caa11fbc4e4d112a82020-11-25T02:47:44ZengPublic Library of Science (PLoS)PLoS ONE1932-62032017-01-01121e016828810.1371/journal.pone.0168288Generalising Ward's Method for Use with Manhattan Distances.Trudie StraussMichael Johan von MaltitzThe claim that Ward's linkage algorithm in hierarchical clustering is limited to use with Euclidean distances is investigated. In this paper, Ward's clustering algorithm is generalised to use with l1 norm or Manhattan distances. We argue that the generalisation of Ward's linkage method to incorporate Manhattan distances is theoretically sound and provide an example of where this method outperforms the method using Euclidean distances. As an application, we perform statistical analyses on languages using methods normally applied to biology and genetic classification. We aim to quantify differences in character traits between languages and use a statistical language signature based on relative bi-gram (sequence of two letters) frequencies to calculate a distance matrix between 32 Indo-European languages. We then use Ward's method of hierarchical clustering to classify the languages, using the Euclidean distance and the Manhattan distance. Results obtained from using the different distance metrics are compared to show that the Ward's algorithm characteristic of minimising intra-cluster variation and maximising inter-cluster variation is not violated when using the Manhattan metric.http://europepmc.org/articles/PMC5235383?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Trudie Strauss
Michael Johan von Maltitz
spellingShingle Trudie Strauss
Michael Johan von Maltitz
Generalising Ward's Method for Use with Manhattan Distances.
PLoS ONE
author_facet Trudie Strauss
Michael Johan von Maltitz
author_sort Trudie Strauss
title Generalising Ward's Method for Use with Manhattan Distances.
title_short Generalising Ward's Method for Use with Manhattan Distances.
title_full Generalising Ward's Method for Use with Manhattan Distances.
title_fullStr Generalising Ward's Method for Use with Manhattan Distances.
title_full_unstemmed Generalising Ward's Method for Use with Manhattan Distances.
title_sort generalising ward's method for use with manhattan distances.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2017-01-01
description The claim that Ward's linkage algorithm in hierarchical clustering is limited to use with Euclidean distances is investigated. In this paper, Ward's clustering algorithm is generalised to use with l1 norm or Manhattan distances. We argue that the generalisation of Ward's linkage method to incorporate Manhattan distances is theoretically sound and provide an example of where this method outperforms the method using Euclidean distances. As an application, we perform statistical analyses on languages using methods normally applied to biology and genetic classification. We aim to quantify differences in character traits between languages and use a statistical language signature based on relative bi-gram (sequence of two letters) frequencies to calculate a distance matrix between 32 Indo-European languages. We then use Ward's method of hierarchical clustering to classify the languages, using the Euclidean distance and the Manhattan distance. Results obtained from using the different distance metrics are compared to show that the Ward's algorithm characteristic of minimising intra-cluster variation and maximising inter-cluster variation is not violated when using the Manhattan metric.
url http://europepmc.org/articles/PMC5235383?pdf=render
work_keys_str_mv AT trudiestrauss generalisingwardsmethodforusewithmanhattandistances
AT michaeljohanvonmaltitz generalisingwardsmethodforusewithmanhattandistances
_version_ 1724751758595457024