Termediator-II: Identification of Interdisciplinary Term Ambiguity Through Hierarchical Cluster Analysis

Technical disciplines are evolving rapidly leading to changes in their associated vocabularies. Confusion in interdisciplinary communication occurs due to this evolving terminology. Two causes of confusion are multiple definitions (overloaded terms) and synonymous terms. The formal names for these t...

Full description

Bibliographic Details
Main Author: Riley, Owen G.
Format: Others
Published: BYU ScholarsArchive 2014
Subjects:
LSI
LDA
Online Access:https://scholarsarchive.byu.edu/etd/4030
https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=5029&context=etd
id ndltd-BGMYU2-oai-scholarsarchive.byu.edu-etd-5029
record_format oai_dc
spelling ndltd-BGMYU2-oai-scholarsarchive.byu.edu-etd-50292021-09-12T05:01:08Z Termediator-II: Identification of Interdisciplinary Term Ambiguity Through Hierarchical Cluster Analysis Riley, Owen G. Technical disciplines are evolving rapidly leading to changes in their associated vocabularies. Confusion in interdisciplinary communication occurs due to this evolving terminology. Two causes of confusion are multiple definitions (overloaded terms) and synonymous terms. The formal names for these two problems are polysemy and synonymy. Termediator-I, a web application built on top of a collection of glossaries, uses definition count as a measure of term confusion. This tool was an attempt to identify confusing cross-disciplinary terms. As more glossaries were added to the collection, this measure became ineffective. This thesis provides a measure of term polysemy. Term polysemy is effectively measured by semantically clustering the text concepts, or definitions, of each term and counting the number of resulting clusters. Hierarchical clustering uses a measure of proximity between the text concepts. Three such measures are evaluated: cosine similarity, latent semantic indexing, and latent Dirichlet allocation. Two linkage types, for determining cluster proximity during the hierarchical clustering process, are also evaluated: complete linkage and average linkage. Crowdsourcing through a web application was unsuccessfully attempted to obtain a viable clustering threshold by public consensus. An alternate metric of polysemy, convergence value, is identified and tested as a viable clustering threshold. Six resulting lists of terms ranked by cluster count based on convergence values are generated, one for each similarity measure and linkage type combination. Each combination produces a competitive list, and no clear combination can be determined as superior. Semantic clustering successfully identifies polysemous terms, but each similarity measure and linkage type combination provides slightly different results. 2014-04-23T07:00:00Z text application/pdf https://scholarsarchive.byu.edu/etd/4030 https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=5029&context=etd http://lib.byu.edu/about/copyright/ Theses and Dissertations BYU ScholarsArchive cosine similarity LSI LDA text similarity hierarchical clustering polysemy Construction Engineering and Management
collection NDLTD
format Others
sources NDLTD
topic cosine similarity
LSI
LDA
text similarity
hierarchical clustering
polysemy
Construction Engineering and Management
spellingShingle cosine similarity
LSI
LDA
text similarity
hierarchical clustering
polysemy
Construction Engineering and Management
Riley, Owen G.
Termediator-II: Identification of Interdisciplinary Term Ambiguity Through Hierarchical Cluster Analysis
description Technical disciplines are evolving rapidly leading to changes in their associated vocabularies. Confusion in interdisciplinary communication occurs due to this evolving terminology. Two causes of confusion are multiple definitions (overloaded terms) and synonymous terms. The formal names for these two problems are polysemy and synonymy. Termediator-I, a web application built on top of a collection of glossaries, uses definition count as a measure of term confusion. This tool was an attempt to identify confusing cross-disciplinary terms. As more glossaries were added to the collection, this measure became ineffective. This thesis provides a measure of term polysemy. Term polysemy is effectively measured by semantically clustering the text concepts, or definitions, of each term and counting the number of resulting clusters. Hierarchical clustering uses a measure of proximity between the text concepts. Three such measures are evaluated: cosine similarity, latent semantic indexing, and latent Dirichlet allocation. Two linkage types, for determining cluster proximity during the hierarchical clustering process, are also evaluated: complete linkage and average linkage. Crowdsourcing through a web application was unsuccessfully attempted to obtain a viable clustering threshold by public consensus. An alternate metric of polysemy, convergence value, is identified and tested as a viable clustering threshold. Six resulting lists of terms ranked by cluster count based on convergence values are generated, one for each similarity measure and linkage type combination. Each combination produces a competitive list, and no clear combination can be determined as superior. Semantic clustering successfully identifies polysemous terms, but each similarity measure and linkage type combination provides slightly different results.
author Riley, Owen G.
author_facet Riley, Owen G.
author_sort Riley, Owen G.
title Termediator-II: Identification of Interdisciplinary Term Ambiguity Through Hierarchical Cluster Analysis
title_short Termediator-II: Identification of Interdisciplinary Term Ambiguity Through Hierarchical Cluster Analysis
title_full Termediator-II: Identification of Interdisciplinary Term Ambiguity Through Hierarchical Cluster Analysis
title_fullStr Termediator-II: Identification of Interdisciplinary Term Ambiguity Through Hierarchical Cluster Analysis
title_full_unstemmed Termediator-II: Identification of Interdisciplinary Term Ambiguity Through Hierarchical Cluster Analysis
title_sort termediator-ii: identification of interdisciplinary term ambiguity through hierarchical cluster analysis
publisher BYU ScholarsArchive
publishDate 2014
url https://scholarsarchive.byu.edu/etd/4030
https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=5029&context=etd
work_keys_str_mv AT rileyoweng termediatoriiidentificationofinterdisciplinarytermambiguitythroughhierarchicalclusteranalysis
_version_ 1719480334951645184