Termediator-II: Identification of Interdisciplinary Term Ambiguity Through Hierarchical Cluster Analysis
Technical disciplines are evolving rapidly leading to changes in their associated vocabularies. Confusion in interdisciplinary communication occurs due to this evolving terminology. Two causes of confusion are multiple definitions (overloaded terms) and synonymous terms. The formal names for these t...
Main Author: | |
---|---|
Format: | Others |
Published: |
BYU ScholarsArchive
2014
|
Subjects: | |
Online Access: | https://scholarsarchive.byu.edu/etd/4030 https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=5029&context=etd |
id |
ndltd-BGMYU2-oai-scholarsarchive.byu.edu-etd-5029 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-BGMYU2-oai-scholarsarchive.byu.edu-etd-50292021-09-12T05:01:08Z Termediator-II: Identification of Interdisciplinary Term Ambiguity Through Hierarchical Cluster Analysis Riley, Owen G. Technical disciplines are evolving rapidly leading to changes in their associated vocabularies. Confusion in interdisciplinary communication occurs due to this evolving terminology. Two causes of confusion are multiple definitions (overloaded terms) and synonymous terms. The formal names for these two problems are polysemy and synonymy. Termediator-I, a web application built on top of a collection of glossaries, uses definition count as a measure of term confusion. This tool was an attempt to identify confusing cross-disciplinary terms. As more glossaries were added to the collection, this measure became ineffective. This thesis provides a measure of term polysemy. Term polysemy is effectively measured by semantically clustering the text concepts, or definitions, of each term and counting the number of resulting clusters. Hierarchical clustering uses a measure of proximity between the text concepts. Three such measures are evaluated: cosine similarity, latent semantic indexing, and latent Dirichlet allocation. Two linkage types, for determining cluster proximity during the hierarchical clustering process, are also evaluated: complete linkage and average linkage. Crowdsourcing through a web application was unsuccessfully attempted to obtain a viable clustering threshold by public consensus. An alternate metric of polysemy, convergence value, is identified and tested as a viable clustering threshold. Six resulting lists of terms ranked by cluster count based on convergence values are generated, one for each similarity measure and linkage type combination. Each combination produces a competitive list, and no clear combination can be determined as superior. Semantic clustering successfully identifies polysemous terms, but each similarity measure and linkage type combination provides slightly different results. 2014-04-23T07:00:00Z text application/pdf https://scholarsarchive.byu.edu/etd/4030 https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=5029&context=etd http://lib.byu.edu/about/copyright/ Theses and Dissertations BYU ScholarsArchive cosine similarity LSI LDA text similarity hierarchical clustering polysemy Construction Engineering and Management |
collection |
NDLTD |
format |
Others
|
sources |
NDLTD |
topic |
cosine similarity LSI LDA text similarity hierarchical clustering polysemy Construction Engineering and Management |
spellingShingle |
cosine similarity LSI LDA text similarity hierarchical clustering polysemy Construction Engineering and Management Riley, Owen G. Termediator-II: Identification of Interdisciplinary Term Ambiguity Through Hierarchical Cluster Analysis |
description |
Technical disciplines are evolving rapidly leading to changes in their associated vocabularies. Confusion in interdisciplinary communication occurs due to this evolving terminology. Two causes of confusion are multiple definitions (overloaded terms) and synonymous terms. The formal names for these two problems are polysemy and synonymy. Termediator-I, a web application built on top of a collection of glossaries, uses definition count as a measure of term confusion. This tool was an attempt to identify confusing cross-disciplinary terms. As more glossaries were added to the collection, this measure became ineffective. This thesis provides a measure of term polysemy. Term polysemy is effectively measured by semantically clustering the text concepts, or definitions, of each term and counting the number of resulting clusters. Hierarchical clustering uses a measure of proximity between the text concepts. Three such measures are evaluated: cosine similarity, latent semantic indexing, and latent Dirichlet allocation. Two linkage types, for determining cluster proximity during the hierarchical clustering process, are also evaluated: complete linkage and average linkage. Crowdsourcing through a web application was unsuccessfully attempted to obtain a viable clustering threshold by public consensus. An alternate metric of polysemy, convergence value, is identified and tested as a viable clustering threshold. Six resulting lists of terms ranked by cluster count based on convergence values are generated, one for each similarity measure and linkage type combination. Each combination produces a competitive list, and no clear combination can be determined as superior. Semantic clustering successfully identifies polysemous terms, but each similarity measure and linkage type combination provides slightly different results. |
author |
Riley, Owen G. |
author_facet |
Riley, Owen G. |
author_sort |
Riley, Owen G. |
title |
Termediator-II: Identification of Interdisciplinary Term Ambiguity Through Hierarchical Cluster Analysis |
title_short |
Termediator-II: Identification of Interdisciplinary Term Ambiguity Through Hierarchical Cluster Analysis |
title_full |
Termediator-II: Identification of Interdisciplinary Term Ambiguity Through Hierarchical Cluster Analysis |
title_fullStr |
Termediator-II: Identification of Interdisciplinary Term Ambiguity Through Hierarchical Cluster Analysis |
title_full_unstemmed |
Termediator-II: Identification of Interdisciplinary Term Ambiguity Through Hierarchical Cluster Analysis |
title_sort |
termediator-ii: identification of interdisciplinary term ambiguity through hierarchical cluster analysis |
publisher |
BYU ScholarsArchive |
publishDate |
2014 |
url |
https://scholarsarchive.byu.edu/etd/4030 https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=5029&context=etd |
work_keys_str_mv |
AT rileyoweng termediatoriiidentificationofinterdisciplinarytermambiguitythroughhierarchicalclusteranalysis |
_version_ |
1719480334951645184 |