Using term proximity measures for identifying compound concepts : an expolatory study

With the rapid development of information technology, individuals using the technology are liable to be overwhelmed by the excessive amounts of information available when conducting online (local or remote) document searches. It is important therefore that users specify the correct search terms. How...

Full description

Bibliographic Details
Main Author: Yin, Nawei
Format: Others
Language:English
Published: 2009
Online Access:http://hdl.handle.net/2429/15815
id ndltd-UBC-oai-circle.library.ubc.ca-2429-15815
record_format oai_dc
spelling ndltd-UBC-oai-circle.library.ubc.ca-2429-158152018-01-05T17:37:58Z Using term proximity measures for identifying compound concepts : an expolatory study Yin, Nawei With the rapid development of information technology, individuals using the technology are liable to be overwhelmed by the excessive amounts of information available when conducting online (local or remote) document searches. It is important therefore that users specify the correct search terms. However, a user does not always know which terms to use and often the same idea can be described by different terms. Constructing lists of possible search terms for different domains would require a very substantial effort by experts in each domain. To alleviate these problems, automated techniques can be valuable to extract concepts and meaningful phrases for specific domains. This work is an exploratory study of automated extraction of compound concepts from a collection of documents in a specific domain. The concept-extraction methods used in this study employed clustering techniques based on distance measures that reflect term affinity statistics rather than techniques based on similarity measures adopted in most previous works. The study compared the effects of different methods of calculating affinities, depending on the sizes of textual units where terms co-occur and on directionality and asymmetry between terms. The accounting context was used as a case study to provide the data. An accounting expert evaluated the resulting clusters produced by the clustering program. As demonstrated by our results, the method identified meaningful accounting compound concepts and phrases. The research also indicated which affinity types generated better results. For example, affinities based on occurrence of terms within a document produced the poorest results. There was a significant manual effort involved in "preprocessing" the data prior to compound concept identification. However, we believe the techniques explored might be useful for users to search relevant information within individual domains and can be extended to support the construction of domain-specific thesauri. Business, Sauder School of Management Information Systems, Division of Graduate 2009-11-26T21:11:01Z 2009-11-26T21:11:01Z 2004 2004-11 Text Thesis/Dissertation http://hdl.handle.net/2429/15815 eng For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. 5784465 bytes application/pdf
collection NDLTD
language English
format Others
sources NDLTD
description With the rapid development of information technology, individuals using the technology are liable to be overwhelmed by the excessive amounts of information available when conducting online (local or remote) document searches. It is important therefore that users specify the correct search terms. However, a user does not always know which terms to use and often the same idea can be described by different terms. Constructing lists of possible search terms for different domains would require a very substantial effort by experts in each domain. To alleviate these problems, automated techniques can be valuable to extract concepts and meaningful phrases for specific domains. This work is an exploratory study of automated extraction of compound concepts from a collection of documents in a specific domain. The concept-extraction methods used in this study employed clustering techniques based on distance measures that reflect term affinity statistics rather than techniques based on similarity measures adopted in most previous works. The study compared the effects of different methods of calculating affinities, depending on the sizes of textual units where terms co-occur and on directionality and asymmetry between terms. The accounting context was used as a case study to provide the data. An accounting expert evaluated the resulting clusters produced by the clustering program. As demonstrated by our results, the method identified meaningful accounting compound concepts and phrases. The research also indicated which affinity types generated better results. For example, affinities based on occurrence of terms within a document produced the poorest results. There was a significant manual effort involved in "preprocessing" the data prior to compound concept identification. However, we believe the techniques explored might be useful for users to search relevant information within individual domains and can be extended to support the construction of domain-specific thesauri. === Business, Sauder School of === Management Information Systems, Division of === Graduate
author Yin, Nawei
spellingShingle Yin, Nawei
Using term proximity measures for identifying compound concepts : an expolatory study
author_facet Yin, Nawei
author_sort Yin, Nawei
title Using term proximity measures for identifying compound concepts : an expolatory study
title_short Using term proximity measures for identifying compound concepts : an expolatory study
title_full Using term proximity measures for identifying compound concepts : an expolatory study
title_fullStr Using term proximity measures for identifying compound concepts : an expolatory study
title_full_unstemmed Using term proximity measures for identifying compound concepts : an expolatory study
title_sort using term proximity measures for identifying compound concepts : an expolatory study
publishDate 2009
url http://hdl.handle.net/2429/15815
work_keys_str_mv AT yinnawei usingtermproximitymeasuresforidentifyingcompoundconceptsanexpolatorystudy
_version_ 1718590018495184896