Using term proximity measures for identifying compound concepts : an expolatory study

With the rapid development of information technology, individuals using the technology are liable to be overwhelmed by the excessive amounts of information available when conducting online (local or remote) document searches. It is important therefore that users specify the correct search terms. How...

Full description

Bibliographic Details
Main Author:	Yin, Nawei
Format:	Others
Language:	English
Published:	2009
Online Access:	http://hdl.handle.net/2429/15815

id	ndltd-UBC-oai-circle.library.ubc.ca-2429-15815
record_format	oai_dc
spelling	ndltd-UBC-oai-circle.library.ubc.ca-2429-158152018-01-05T17:37:58Z Using term proximity measures for identifying compound concepts : an expolatory study Yin, Nawei With the rapid development of information technology, individuals using the technology are liable to be overwhelmed by the excessive amounts of information available when conducting online (local or remote) document searches. It is important therefore that users specify the correct search terms. However, a user does not always know which terms to use and often the same idea can be described by different terms. Constructing lists of possible search terms for different domains would require a very substantial effort by experts in each domain. To alleviate these problems, automated techniques can be valuable to extract concepts and meaningful phrases for specific domains. This work is an exploratory study of automated extraction of compound concepts from a collection of documents in a specific domain. The concept-extraction methods used in this study employed clustering techniques based on distance measures that reflect term affinity statistics rather than techniques based on similarity measures adopted in most previous works. The study compared the effects of different methods of calculating affinities, depending on the sizes of textual units where terms co-occur and on directionality and asymmetry between terms. The accounting context was used as a case study to provide the data. An accounting expert evaluated the resulting clusters produced by the clustering program. As demonstrated by our results, the method identified meaningful accounting compound concepts and phrases. The research also indicated which affinity types generated better results. For example, affinities based on occurrence of terms within a document produced the poorest results. There was a significant manual effort involved in "preprocessing" the data prior to compound concept identification. However, we believe the techniques explored might be useful for users to search relevant information within individual domains and can be extended to support the construction of domain-specific thesauri. Business, Sauder School of Management Information Systems, Division of Graduate 2009-11-26T21:11:01Z 2009-11-26T21:11:01Z 2004 2004-11 Text Thesis/Dissertation http://hdl.handle.net/2429/15815 eng For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. 5784465 bytes application/pdf
collection	NDLTD
language	English
format	Others
sources	NDLTD
description	With the rapid development of information technology, individuals using the technology are liable to be overwhelmed by the excessive amounts of information available when conducting online (local or remote) document searches. It is important therefore that users specify the correct search terms. However, a user does not always know which terms to use and often the same idea can be described by different terms. Constructing lists of possible search terms for different domains would require a very substantial effort by experts in each domain. To alleviate these problems, automated techniques can be valuable to extract concepts and meaningful phrases for specific domains. This work is an exploratory study of automated extraction of compound concepts from a collection of documents in a specific domain. The concept-extraction methods used in this study employed clustering techniques based on distance measures that reflect term affinity statistics rather than techniques based on similarity measures adopted in most previous works. The study compared the effects of different methods of calculating affinities, depending on the sizes of textual units where terms co-occur and on directionality and asymmetry between terms. The accounting context was used as a case study to provide the data. An accounting expert evaluated the resulting clusters produced by the clustering program. As demonstrated by our results, the method identified meaningful accounting compound concepts and phrases. The research also indicated which affinity types generated better results. For example, affinities based on occurrence of terms within a document produced the poorest results. There was a significant manual effort involved in "preprocessing" the data prior to compound concept identification. However, we believe the techniques explored might be useful for users to search relevant information within individual domains and can be extended to support the construction of domain-specific thesauri. === Business, Sauder School of === Management Information Systems, Division of === Graduate
author	Yin, Nawei
spellingShingle	Yin, Nawei Using term proximity measures for identifying compound concepts : an expolatory study
author_facet	Yin, Nawei
author_sort	Yin, Nawei
title	Using term proximity measures for identifying compound concepts : an expolatory study
title_short	Using term proximity measures for identifying compound concepts : an expolatory study
title_full	Using term proximity measures for identifying compound concepts : an expolatory study
title_fullStr	Using term proximity measures for identifying compound concepts : an expolatory study
title_full_unstemmed	Using term proximity measures for identifying compound concepts : an expolatory study
title_sort	using term proximity measures for identifying compound concepts : an expolatory study
publishDate	2009
url	http://hdl.handle.net/2429/15815
work_keys_str_mv	AT yinnawei usingtermproximitymeasuresforidentifyingcompoundconceptsanexpolatorystudy
_version_	1718590018495184896

Using term proximity measures for identifying compound concepts : an expolatory study

Similar Items