Generalising Ward's Method for Use with Manhattan Distances.
The claim that Ward's linkage algorithm in hierarchical clustering is limited to use with Euclidean distances is investigated. In this paper, Ward's clustering algorithm is generalised to use with l1 norm or Manhattan distances. We argue that the generalisation of Ward's linkage metho...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2017-01-01
|
Series: | PLoS ONE |
Online Access: | http://europepmc.org/articles/PMC5235383?pdf=render |
id |
doaj-c2a58292bff14c9caa11fbc4e4d112a8 |
---|---|
record_format |
Article |
spelling |
doaj-c2a58292bff14c9caa11fbc4e4d112a82020-11-25T02:47:44ZengPublic Library of Science (PLoS)PLoS ONE1932-62032017-01-01121e016828810.1371/journal.pone.0168288Generalising Ward's Method for Use with Manhattan Distances.Trudie StraussMichael Johan von MaltitzThe claim that Ward's linkage algorithm in hierarchical clustering is limited to use with Euclidean distances is investigated. In this paper, Ward's clustering algorithm is generalised to use with l1 norm or Manhattan distances. We argue that the generalisation of Ward's linkage method to incorporate Manhattan distances is theoretically sound and provide an example of where this method outperforms the method using Euclidean distances. As an application, we perform statistical analyses on languages using methods normally applied to biology and genetic classification. We aim to quantify differences in character traits between languages and use a statistical language signature based on relative bi-gram (sequence of two letters) frequencies to calculate a distance matrix between 32 Indo-European languages. We then use Ward's method of hierarchical clustering to classify the languages, using the Euclidean distance and the Manhattan distance. Results obtained from using the different distance metrics are compared to show that the Ward's algorithm characteristic of minimising intra-cluster variation and maximising inter-cluster variation is not violated when using the Manhattan metric.http://europepmc.org/articles/PMC5235383?pdf=render |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Trudie Strauss Michael Johan von Maltitz |
spellingShingle |
Trudie Strauss Michael Johan von Maltitz Generalising Ward's Method for Use with Manhattan Distances. PLoS ONE |
author_facet |
Trudie Strauss Michael Johan von Maltitz |
author_sort |
Trudie Strauss |
title |
Generalising Ward's Method for Use with Manhattan Distances. |
title_short |
Generalising Ward's Method for Use with Manhattan Distances. |
title_full |
Generalising Ward's Method for Use with Manhattan Distances. |
title_fullStr |
Generalising Ward's Method for Use with Manhattan Distances. |
title_full_unstemmed |
Generalising Ward's Method for Use with Manhattan Distances. |
title_sort |
generalising ward's method for use with manhattan distances. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS ONE |
issn |
1932-6203 |
publishDate |
2017-01-01 |
description |
The claim that Ward's linkage algorithm in hierarchical clustering is limited to use with Euclidean distances is investigated. In this paper, Ward's clustering algorithm is generalised to use with l1 norm or Manhattan distances. We argue that the generalisation of Ward's linkage method to incorporate Manhattan distances is theoretically sound and provide an example of where this method outperforms the method using Euclidean distances. As an application, we perform statistical analyses on languages using methods normally applied to biology and genetic classification. We aim to quantify differences in character traits between languages and use a statistical language signature based on relative bi-gram (sequence of two letters) frequencies to calculate a distance matrix between 32 Indo-European languages. We then use Ward's method of hierarchical clustering to classify the languages, using the Euclidean distance and the Manhattan distance. Results obtained from using the different distance metrics are compared to show that the Ward's algorithm characteristic of minimising intra-cluster variation and maximising inter-cluster variation is not violated when using the Manhattan metric. |
url |
http://europepmc.org/articles/PMC5235383?pdf=render |
work_keys_str_mv |
AT trudiestrauss generalisingwardsmethodforusewithmanhattandistances AT michaeljohanvonmaltitz generalisingwardsmethodforusewithmanhattandistances |
_version_ |
1724751758595457024 |