An effective information representation for opinion-oriented applications.
当今,越来越多的用倾向于使用论坛、博客、脸书等在线工具来表达关于商品、电影和政治等话题的观点。这些观点不仅可以帮助用进行决策,同时也为各个商业和社会领域提供了具有重要价值的反馈信息。因此,面向观点应用成为了当前最活跃的研究领域之一,其中包括观点检索,观点摘要,观点问答。面向观点应用与面向事实应用的根本区别是信息需求的不同,分别是传统的客观信息和主观信息。所谓主观信息是指对于某个特定目标的观点或评论。为了表示主观信息,应该综合考虑观点性、主题相关性,以及观点与主题之间的关联。现有的基于词袋的表示方法将词作为描述客观信息的基本语义单元,它可以有效的表示主题相关性以满足客观信息的需求。而主观信息需要...
Other Authors: | |
---|---|
Format: | Others |
Language: | English Chinese |
Published: |
2013
|
Subjects: | |
Online Access: | http://library.cuhk.edu.hk/record=b5549839 http://repository.lib.cuhk.edu.hk/en/item/cuhk-328173 |
id |
ndltd-cuhk.edu.hk-oai-cuhk-dr-cuhk_328173 |
---|---|
record_format |
oai_dc |
collection |
NDLTD |
language |
English Chinese |
format |
Others
|
sources |
NDLTD |
topic |
Public opinion--Data processing User-generated content Online social networks |
spellingShingle |
Public opinion--Data processing User-generated content Online social networks An effective information representation for opinion-oriented applications. |
description |
当今,越来越多的用倾向于使用论坛、博客、脸书等在线工具来表达关于商品、电影和政治等话题的观点。这些观点不仅可以帮助用进行决策,同时也为各个商业和社会领域提供了具有重要价值的反馈信息。因此,面向观点应用成为了当前最活跃的研究领域之一,其中包括观点检索,观点摘要,观点问答。面向观点应用与面向事实应用的根本区别是信息需求的不同,分别是传统的客观信息和主观信息。所谓主观信息是指对于某个特定目标的观点或评论。为了表示主观信息,应该综合考虑观点性、主题相关性,以及观点与主题之间的关联。现有的基于词袋的表示方法将词作为描述客观信息的基本语义单元,它可以有效的表示主题相关性以满足客观信息的需求。而主观信息需要同时考虑观点性和主题相关性,由于单独一个词不能同时表示观点性和相关性,因此词不再是最小的语义单位。此外,基于词袋的表示方法忽略了词序和词义,这使得观点性和相关性两类信息通常混在一起,难以区分。因此,基于词袋方法不能够准确的表示主观信息,并严重的影响了面向观点应用的性能。 === 本文回答了以下几个由主观信息表示不当所引发的研究问题: 1. 对于主观信息而言单个词将不再是基本语义单元,是否存在一种有效的表示方法对其进行描述? 2. 由于主观信息是观点信息和相关性信息的结合,如何利用新的表示方法来描述这二者之间的关联信息?3. 如何对主观信息进行量化,以便对文档进行检索和分析? 4. 如何在面向观点应用中实现全新的主观信息表示方法? === 由于观点检索的结果会直接影响到其它面向观点应用的性能,因此本文从观点检索这一问题入手。首先,我们提出了一种基于句子的方法来分析词袋表示方法的局限性。以此为据,定义了一种具有丰富语义的表达方式来表示主观信息,即词对,它是由出现在同一句子中的情感词和与之关联的目标词共同组成的。然后,我们提出了一系列方法来描述和获取两类语境信息:1)观点内信息:我们给出了三种提取词对的方法以获取观点与主题的关联信息;2)观点间信息:我们提出了一种权重计算方法来度量词对间的相关程度,从而获取词对与词对之间的关系。最后,我们集成了观点内信息和观点间信息并提出了潜在情感关联模型来解决观点检索这一问题。在标准数据集上的实验结果表明,基于词对的表示方法可以有效地描述主观信息,同时潜在情感关联模型能够获取词与词之间的关联信息,从而实现了利用语境信息提高观点检索的效果。 === 此外,我们将词对应用于观点摘要和观点问答中,标准数据集上的评测结果显示基于词对的主观信息表示方法对于其它面向观点应用也同样有效。 === There is a growing interest for users to express their opinions about products, films, politics, by using on-line tools such as forums, blogs, facebooks, etc. These opinions cannot only help users make decisions, e.g., whether to buy a product, but also to ob-tain valuable feedback for business and social events. Today, research on opin-ion-oriented applications (OOAs) including opinion retrieval, opinion summarization and opinion question and answering is attracting much attention. The difference be-tween fact-based and opinion-oriented applications lies in users‘ information need. The former requires objective information and the latter subjective, which comprises of opinions or comments expressed on a specific target. To meet the need of subjective information, both opinionatedness and relevance together with the association between them should be taken into account. Existing systems represent documents in bag-of-word. However, this representation fails to distinguish opinionatedness from relevance. Moreover, due to the ignorance of word sequence, words associations are lost. For this reason, bag-of-word representation is ineffective for subjective information, and affects the performance of OOAs seriously. === In this thesis, we try to answer the following challenging questions arose in subjective information representation. Since word is no longer the basic semantic unit, how would subjective information be represented? Subjective information is a combination of opinionatedness and relevance, so how would the association between them be modeled? How would subjective information be measured for the purpose of document ranking, retrieval, and analysis? How would opinion-oriented applications benefit from subjective information? === We start from solving the problem of opinion retrieval whose results can directly influence the performance of other opinion-oriented applications. We first present a sentence-based approach to analyze the limitation of bag-of-word representation and define a semantically richer representation, namely word pair for subjective infor-mation. A word pair is constructed by a sentiment word and its associated target co-occurring in a sentence. We then propose techniques to capture two kinds of con-textual information. 1) Intra-opinion information: three methods are proposed to ex-tract the word pair. 2) Inter-opinion information: a weighting scheme is present to measure the weight of individual word pair. Finally, we devise an algorithm to integrate both intra-opinion and inter-opinion information into a latent sentimental association model for opinion retrieval. The evaluation on three benchmark datasets suggests the effectiveness of word pair and the latent sentimental association retrieval model provide insight into the words association to support opinion retrieval beneficial from pairwise representation. We also apply word pair to opinion summarization and opinion question answering. The evaluation on two benchmark datasets shows that word pair performs effectively in the applications. === Detailed summary in vernacular field only. === Detailed summary in vernacular field only. === Detailed summary in vernacular field only. === Detailed summary in vernacular field only. === Li, Binyang. === Thesis (Ph.D.)--Chinese University of Hong Kong, 2013. === Includes bibliographical references (leaves [96]-103). === Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. === Abstract also in Chinese. === Abstract --- p.ii === Abstract in Chinese --- p.iv === Acknowledgements --- p.vi === Contents --- p.viii === List of Tables --- p.xi === List of Figures --- p.xiii === Chapter 1. --- Introduction --- p.1 === Chapter 1.1. --- Problem and Challenges --- p.3 === Chapter 1.1.1 --- Subjective Information Representation --- p.3 === Chapter 1.1.2 --- Associative Information in an Opinion Expression --- p.4 === Chapter 1.1.3 --- Opinion Expression Measurement --- p.5 === Chapter 1.1.4 --- Applications of Subjective Information Representation to Different OOAs --- p.6 === Chapter 1.2. --- Contributions --- p.6 === Chapter 1.3. --- Chapter Summary --- p.7 === Chapter 2. --- Pairwise Representation --- p.9 === Chapter 2.1 --- Related Woks on Opinion Retrieval --- p.10 === Chapter 2.1.1 --- Opinion Retrieval Models --- p.10 === Chapter 2.1.2 --- Lexicon-based Opinion Identification --- p.12 === Chapter 2.2 --- Sentence-based Approach for Opinion Retrieval --- p.13 === Chapter 2.2.1 --- The Limitations of Document-based Approaches for Opinion Retrieval --- p.13 === Chapter 2.2.2 --- Sentence-based Approach for Opinion Retrieval --- p.16 === Chapter 2.2.3 --- Evaluation and Results --- p.21 === Chapter 2.2.4 --- Summary --- p.26 === Chapter 2.3 --- Pairwise Representation --- p.28 === Chapter 2.3.1 --- Definition of Word Pair --- p.28 === Chapter 2.3.2 --- Sentiment Lexicon Construction --- p.29 === Chapter 2.3.3 --- Topic Term Lexicon Construction --- p.30 === Chapter 2.3.4 --- Word Pair Construction --- p.31 === Chapter 2.4 --- Graph-based Model for Opinion Retrieval --- p.33 === Chapter 2.4.1 --- HITS Model for Opinion Retrieval --- p.34 === Chapter 2.4.2 --- PageRank Model for Opinion Retrieval --- p.37 === Chapter 2.4.3 --- Evaluation and Results --- p.40 === Chapter 2.5 --- Chapter Summary --- p.50 === Chapter 3. --- Pairwise Representation Measurement --- p.51 === Chapter 3.1 --- Word Pair Weighting Scheme --- p.52 === Chapter 3.1.1 --- PMI-based Weighting Scheme --- p.52 === Chapter 3.1.2 --- Evaluation and Results --- p.56 === Chapter 3.1.3 --- Summary --- p.60 === Chapter 3.2 --- Latent Sentimental Association --- p.61 === Chapter 3.2.1 --- Problem Formulation --- p.61 === Chapter 3.2.2 --- LSA Integrated Generative Model --- p.62 === Chapter 3.2.3 --- Modeling the Dependency between Q and d --- p.64 === Chapter 3.2.4 --- Modeling the Dependency between O and d --- p.67 === Chapter 3.3 --- Parameter Estimation --- p.67 === Chapter 3.3.1 --- Estimating P(Q --- p.67 === Chapter 3.3.2 --- Estimating MI(Q,O --- p.69 === Chapter 3.4 --- Evaluation and Results --- p.69 === Chapter 3.5 --- Chapter Summary --- p.72 === Chapter 4. --- Pairwise Representation in Opinion-oriented Application --- p.75 === Chapter 4.1. --- Opinion Questioning and Answering --- p.76 === Chapter 4.1.1 --- Problem Statement --- p.76 === Chapter 4.1.2 --- Existing Solution --- p.78 === Chapter 4.1.3 --- A Word Pair based Approach for Sentence Ranking --- p.79 === Chapter 4.1.4 --- Answer Generation --- p.82 === Chapter 4.1.5 --- Evaluation and Results --- p.82 === Chapter 4.2. --- Opinion Summarization --- p.86 === Chapter 4.2.1 --- Problem Statement --- p.86 === Chapter 4.2.2 --- Existing Solution --- p.87 === Chapter 4.2.3 --- Sentence Ranking --- p.88 === Chapter 4.2.4 --- Summary Generation --- p.88 === Chapter 4.2.5 --- Evaluation and Results --- p.89 === Chapter 4.3. --- Chapter Summary --- p.91 === Chapter 5. --- Conclusions and Future Works --- p.93 === Bibliography --- p.97 |
author2 |
Li, Binyang |
author_facet |
Li, Binyang |
title |
An effective information representation for opinion-oriented applications. |
title_short |
An effective information representation for opinion-oriented applications. |
title_full |
An effective information representation for opinion-oriented applications. |
title_fullStr |
An effective information representation for opinion-oriented applications. |
title_full_unstemmed |
An effective information representation for opinion-oriented applications. |
title_sort |
effective information representation for opinion-oriented applications. |
publishDate |
2013 |
url |
http://library.cuhk.edu.hk/record=b5549839 http://repository.lib.cuhk.edu.hk/en/item/cuhk-328173 |
_version_ |
1719001553796333568 |
spelling |
ndltd-cuhk.edu.hk-oai-cuhk-dr-cuhk_3281732019-03-12T03:35:30Z An effective information representation for opinion-oriented applications. CUHK electronic theses & dissertations collection Public opinion--Data processing User-generated content Online social networks 当今,越来越多的用倾向于使用论坛、博客、脸书等在线工具来表达关于商品、电影和政治等话题的观点。这些观点不仅可以帮助用进行决策,同时也为各个商业和社会领域提供了具有重要价值的反馈信息。因此,面向观点应用成为了当前最活跃的研究领域之一,其中包括观点检索,观点摘要,观点问答。面向观点应用与面向事实应用的根本区别是信息需求的不同,分别是传统的客观信息和主观信息。所谓主观信息是指对于某个特定目标的观点或评论。为了表示主观信息,应该综合考虑观点性、主题相关性,以及观点与主题之间的关联。现有的基于词袋的表示方法将词作为描述客观信息的基本语义单元,它可以有效的表示主题相关性以满足客观信息的需求。而主观信息需要同时考虑观点性和主题相关性,由于单独一个词不能同时表示观点性和相关性,因此词不再是最小的语义单位。此外,基于词袋的表示方法忽略了词序和词义,这使得观点性和相关性两类信息通常混在一起,难以区分。因此,基于词袋方法不能够准确的表示主观信息,并严重的影响了面向观点应用的性能。 本文回答了以下几个由主观信息表示不当所引发的研究问题: 1. 对于主观信息而言单个词将不再是基本语义单元,是否存在一种有效的表示方法对其进行描述? 2. 由于主观信息是观点信息和相关性信息的结合,如何利用新的表示方法来描述这二者之间的关联信息?3. 如何对主观信息进行量化,以便对文档进行检索和分析? 4. 如何在面向观点应用中实现全新的主观信息表示方法? 由于观点检索的结果会直接影响到其它面向观点应用的性能,因此本文从观点检索这一问题入手。首先,我们提出了一种基于句子的方法来分析词袋表示方法的局限性。以此为据,定义了一种具有丰富语义的表达方式来表示主观信息,即词对,它是由出现在同一句子中的情感词和与之关联的目标词共同组成的。然后,我们提出了一系列方法来描述和获取两类语境信息:1)观点内信息:我们给出了三种提取词对的方法以获取观点与主题的关联信息;2)观点间信息:我们提出了一种权重计算方法来度量词对间的相关程度,从而获取词对与词对之间的关系。最后,我们集成了观点内信息和观点间信息并提出了潜在情感关联模型来解决观点检索这一问题。在标准数据集上的实验结果表明,基于词对的表示方法可以有效地描述主观信息,同时潜在情感关联模型能够获取词与词之间的关联信息,从而实现了利用语境信息提高观点检索的效果。 此外,我们将词对应用于观点摘要和观点问答中,标准数据集上的评测结果显示基于词对的主观信息表示方法对于其它面向观点应用也同样有效。 There is a growing interest for users to express their opinions about products, films, politics, by using on-line tools such as forums, blogs, facebooks, etc. These opinions cannot only help users make decisions, e.g., whether to buy a product, but also to ob-tain valuable feedback for business and social events. Today, research on opin-ion-oriented applications (OOAs) including opinion retrieval, opinion summarization and opinion question and answering is attracting much attention. The difference be-tween fact-based and opinion-oriented applications lies in users‘ information need. The former requires objective information and the latter subjective, which comprises of opinions or comments expressed on a specific target. To meet the need of subjective information, both opinionatedness and relevance together with the association between them should be taken into account. Existing systems represent documents in bag-of-word. However, this representation fails to distinguish opinionatedness from relevance. Moreover, due to the ignorance of word sequence, words associations are lost. For this reason, bag-of-word representation is ineffective for subjective information, and affects the performance of OOAs seriously. In this thesis, we try to answer the following challenging questions arose in subjective information representation. Since word is no longer the basic semantic unit, how would subjective information be represented? Subjective information is a combination of opinionatedness and relevance, so how would the association between them be modeled? How would subjective information be measured for the purpose of document ranking, retrieval, and analysis? How would opinion-oriented applications benefit from subjective information? We start from solving the problem of opinion retrieval whose results can directly influence the performance of other opinion-oriented applications. We first present a sentence-based approach to analyze the limitation of bag-of-word representation and define a semantically richer representation, namely word pair for subjective infor-mation. A word pair is constructed by a sentiment word and its associated target co-occurring in a sentence. We then propose techniques to capture two kinds of con-textual information. 1) Intra-opinion information: three methods are proposed to ex-tract the word pair. 2) Inter-opinion information: a weighting scheme is present to measure the weight of individual word pair. Finally, we devise an algorithm to integrate both intra-opinion and inter-opinion information into a latent sentimental association model for opinion retrieval. The evaluation on three benchmark datasets suggests the effectiveness of word pair and the latent sentimental association retrieval model provide insight into the words association to support opinion retrieval beneficial from pairwise representation. We also apply word pair to opinion summarization and opinion question answering. The evaluation on two benchmark datasets shows that word pair performs effectively in the applications. Detailed summary in vernacular field only. Detailed summary in vernacular field only. Detailed summary in vernacular field only. Detailed summary in vernacular field only. Li, Binyang. Thesis (Ph.D.)--Chinese University of Hong Kong, 2013. Includes bibliographical references (leaves [96]-103). Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. Abstract also in Chinese. Abstract --- p.ii Abstract in Chinese --- p.iv Acknowledgements --- p.vi Contents --- p.viii List of Tables --- p.xi List of Figures --- p.xiii Chapter 1. --- Introduction --- p.1 Chapter 1.1. --- Problem and Challenges --- p.3 Chapter 1.1.1 --- Subjective Information Representation --- p.3 Chapter 1.1.2 --- Associative Information in an Opinion Expression --- p.4 Chapter 1.1.3 --- Opinion Expression Measurement --- p.5 Chapter 1.1.4 --- Applications of Subjective Information Representation to Different OOAs --- p.6 Chapter 1.2. --- Contributions --- p.6 Chapter 1.3. --- Chapter Summary --- p.7 Chapter 2. --- Pairwise Representation --- p.9 Chapter 2.1 --- Related Woks on Opinion Retrieval --- p.10 Chapter 2.1.1 --- Opinion Retrieval Models --- p.10 Chapter 2.1.2 --- Lexicon-based Opinion Identification --- p.12 Chapter 2.2 --- Sentence-based Approach for Opinion Retrieval --- p.13 Chapter 2.2.1 --- The Limitations of Document-based Approaches for Opinion Retrieval --- p.13 Chapter 2.2.2 --- Sentence-based Approach for Opinion Retrieval --- p.16 Chapter 2.2.3 --- Evaluation and Results --- p.21 Chapter 2.2.4 --- Summary --- p.26 Chapter 2.3 --- Pairwise Representation --- p.28 Chapter 2.3.1 --- Definition of Word Pair --- p.28 Chapter 2.3.2 --- Sentiment Lexicon Construction --- p.29 Chapter 2.3.3 --- Topic Term Lexicon Construction --- p.30 Chapter 2.3.4 --- Word Pair Construction --- p.31 Chapter 2.4 --- Graph-based Model for Opinion Retrieval --- p.33 Chapter 2.4.1 --- HITS Model for Opinion Retrieval --- p.34 Chapter 2.4.2 --- PageRank Model for Opinion Retrieval --- p.37 Chapter 2.4.3 --- Evaluation and Results --- p.40 Chapter 2.5 --- Chapter Summary --- p.50 Chapter 3. --- Pairwise Representation Measurement --- p.51 Chapter 3.1 --- Word Pair Weighting Scheme --- p.52 Chapter 3.1.1 --- PMI-based Weighting Scheme --- p.52 Chapter 3.1.2 --- Evaluation and Results --- p.56 Chapter 3.1.3 --- Summary --- p.60 Chapter 3.2 --- Latent Sentimental Association --- p.61 Chapter 3.2.1 --- Problem Formulation --- p.61 Chapter 3.2.2 --- LSA Integrated Generative Model --- p.62 Chapter 3.2.3 --- Modeling the Dependency between Q and d --- p.64 Chapter 3.2.4 --- Modeling the Dependency between O and d --- p.67 Chapter 3.3 --- Parameter Estimation --- p.67 Chapter 3.3.1 --- Estimating P(Q --- p.67 Chapter 3.3.2 --- Estimating MI(Q,O --- p.69 Chapter 3.4 --- Evaluation and Results --- p.69 Chapter 3.5 --- Chapter Summary --- p.72 Chapter 4. --- Pairwise Representation in Opinion-oriented Application --- p.75 Chapter 4.1. --- Opinion Questioning and Answering --- p.76 Chapter 4.1.1 --- Problem Statement --- p.76 Chapter 4.1.2 --- Existing Solution --- p.78 Chapter 4.1.3 --- A Word Pair based Approach for Sentence Ranking --- p.79 Chapter 4.1.4 --- Answer Generation --- p.82 Chapter 4.1.5 --- Evaluation and Results --- p.82 Chapter 4.2. --- Opinion Summarization --- p.86 Chapter 4.2.1 --- Problem Statement --- p.86 Chapter 4.2.2 --- Existing Solution --- p.87 Chapter 4.2.3 --- Sentence Ranking --- p.88 Chapter 4.2.4 --- Summary Generation --- p.88 Chapter 4.2.5 --- Evaluation and Results --- p.89 Chapter 4.3. --- Chapter Summary --- p.91 Chapter 5. --- Conclusions and Future Works --- p.93 Bibliography --- p.97 Li, Binyang Chinese University of Hong Kong Graduate School. Division of Systems Engineering and Engineering Management. 2013 Text bibliography electronic resource electronic resource remote 1 online resource (xiv, 103 leaves) : ill. cuhk:328173 http://library.cuhk.edu.hk/record=b5549839 eng chi Use of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/) http://repository.lib.cuhk.edu.hk/en/islandora/object/cuhk%3A328173/datastream/TN/view/An%20%20effective%20information%20representation%20for%20opinion-oriented%20applications.jpghttp://repository.lib.cuhk.edu.hk/en/item/cuhk-328173 |