Extracting Opinion Targets from User-Generated Discourse with an Application to Recommendation Systems

With the growing popularity of online shopping, most e-commerce websites nowadays offer their customers to leave feedback about their purchases. This form of customer or user interaction is also very popular among Web 2.0 websites. Online databases, e.g. of movies, offer their users incentives to pa...

Full description

Bibliographic Details
Main Author:	Jakob, Niklas
Format:	Others
Language:	English en
Published:	2011
Online Access:	https://tuprints.ulb.tu-darmstadt.de/2609/1/Diss.pdf Jakob, Niklas <http://tuprints.ulb.tu-darmstadt.de/view/person/Jakob=3ANiklas=3A=3A.html> (2011): Extracting Opinion Targets from User-Generated Discourse with an Application to Recommendation Systems.Darmstadt, Technische Universität, [Ph.D. Thesis]

id	ndltd-tu-darmstadt.de-oai-tuprints.ulb.tu-darmstadt.de-2609
record_format	oai_dc
collection	NDLTD
language	English en
format	Others
sources	NDLTD
description	With the growing popularity of online shopping, most e-commerce websites nowadays offer their customers to leave feedback about their purchases. This form of customer or user interaction is also very popular among Web 2.0 websites. Online databases, e.g. of movies, offer their users incentives to participate in the content creation by giving them the opportunity to rate films and write reviews about them. Complete websites, e.g. rateitall.com, have emerged, which allow their users to rate and review virtually anything they care about. As more and more content is created and aggregated on these websites, a strong demand for automatic approaches which are capable of extracting structured information from mostly unstructured text has emerged. An automatic extraction of the opinions uttered in the thousands of user-generated texts can provide interesting data for several other tasks such as question answering, information retrieval and summarization. All of these tasks require an opinion mining system, which analyzes the individual elements of an opinion on a sentence level, i.e. the terms which express the opinion, their polarity, and what the opinion is about. In this thesis, we present a comprehensive study of the automatic extraction of opinions with a focus on opinion targets, which is an essential step in order to enable other tasks, e.g. information retrieval or question answering on opinionated content. We analyze the state-of-the-art in opinion mining and divide it into three subtasks, one of which is the extraction of opinion targets. We perform a comparative evaluation of two unsupervised algorithms in the task of opinion target extraction on datasets of customer reviews and blog postings which span the following four different domains: digital cameras, cars, movies and web-services. We show how the identification of opinion expressions influences the opinion target extraction performance of each algorithm. We also show that a simple word distance-based heuristic significantly outperforms both unsupervised algorithms, which make their relevance decision by analyzing word frequencies in the corpus. The word distance-based heuristic reaches an F-Measure between 0.372 and 0.491 on the four datasets. We furthermore evaluate a state-of-the-art supervised algorithm in the task of opinion target extraction and present a new approach which is based on Conditional Random Fields (CRF). Our approach outperforms the state-of-the-art baseline significantly on all four datasets reaching an F-Measure between 0.497 and 0.702. We also evaluate both algorithms in a cross-domain opinion target extraction task, since a common problem with supervised algorithms is the domain dependence of the learned model. In this setting, our CRF-based approach also outperforms the baseline on all four datasets and it outperforms the best unsupervised approach, which is by design not prone to domain dependence, on three of the four datasets mentioned above. In the cross-domain opinion target extraction task, the CRF-based approach reaches an F-Measure between 0.360 and 0.518 on the four datasets. The extraction of opinion targets, which are referenced by anaphoric expressions, is a challenge which is frequently encountered in opinion mining at the phrase level. For the first time, we integrate anaphora resolution algorithms in a supervised opinion mining system. We perform a comparative evaluation of two algorithms, in which we require them to extract the correct antecedent of anaphoric targets. Our results indicate that one of the algorithms, which was designed for high-precision anaphora resolution, is better suited in the opinion mining setting. By extending the algorithm, which yields the best results in the off-the-shelf configuration, we yield significant improvements regarding the extraction of opinion targets on three of the four datasets. Finally, we show how an opinion mining system can be successfully employed to improve another application. Recommendation systems are nowadays widely used in online platforms and desktop applications in order to suggest goods or pieces of art to users, which they do not know yet, but are likely to enjoy. The recommendations for a user U1 are determined by first profiling the taste and interests of all users of the recommendation system. Then the algorithm identifies other users U2 ... Un which have a similar taste as user U1, and then recommends items to U1 which the users who have a similar taste enjoyed. A user's taste and interests are typically profiled by giving him the option to rate entities, which he has consumed. As mentioned above, website operators have also given users the opportunity to leave their ratings not only on a numerical scale, but also via a free-text review. We hypothesize that these free-text reviews contain a lot of information, expressed in the users' opinions, which would allow us to model his taste and preferences on a very fine granularity. We show that, by integrating our opinion mining system as a feature provider to a state-of-the-art recommendation system, we can significantly improve the accuracy of the recommendations, which we evaluate on a dataset of movie ratings and reviews.
author	Jakob, Niklas
spellingShingle	Jakob, Niklas Extracting Opinion Targets from User-Generated Discourse with an Application to Recommendation Systems
author_facet	Jakob, Niklas
author_sort	Jakob, Niklas
title	Extracting Opinion Targets from User-Generated Discourse with an Application to Recommendation Systems
title_short	Extracting Opinion Targets from User-Generated Discourse with an Application to Recommendation Systems
title_full	Extracting Opinion Targets from User-Generated Discourse with an Application to Recommendation Systems
title_fullStr	Extracting Opinion Targets from User-Generated Discourse with an Application to Recommendation Systems
title_full_unstemmed	Extracting Opinion Targets from User-Generated Discourse with an Application to Recommendation Systems
title_sort	extracting opinion targets from user-generated discourse with an application to recommendation systems
publishDate	2011
url	https://tuprints.ulb.tu-darmstadt.de/2609/1/Diss.pdf Jakob, Niklas <http://tuprints.ulb.tu-darmstadt.de/view/person/Jakob=3ANiklas=3A=3A.html> (2011): Extracting Opinion Targets from User-Generated Discourse with an Application to Recommendation Systems.Darmstadt, Technische Universität, [Ph.D. Thesis]
work_keys_str_mv	AT jakobniklas extractingopiniontargetsfromusergenerateddiscoursewithanapplicationtorecommendationsystems
_version_	1719326881570881536
spelling	ndltd-tu-darmstadt.de-oai-tuprints.ulb.tu-darmstadt.de-26092020-07-15T07:09:31Z http://tuprints.ulb.tu-darmstadt.de/2609/ Extracting Opinion Targets from User-Generated Discourse with an Application to Recommendation Systems Jakob, Niklas With the growing popularity of online shopping, most e-commerce websites nowadays offer their customers to leave feedback about their purchases. This form of customer or user interaction is also very popular among Web 2.0 websites. Online databases, e.g. of movies, offer their users incentives to participate in the content creation by giving them the opportunity to rate films and write reviews about them. Complete websites, e.g. rateitall.com, have emerged, which allow their users to rate and review virtually anything they care about. As more and more content is created and aggregated on these websites, a strong demand for automatic approaches which are capable of extracting structured information from mostly unstructured text has emerged. An automatic extraction of the opinions uttered in the thousands of user-generated texts can provide interesting data for several other tasks such as question answering, information retrieval and summarization. All of these tasks require an opinion mining system, which analyzes the individual elements of an opinion on a sentence level, i.e. the terms which express the opinion, their polarity, and what the opinion is about. In this thesis, we present a comprehensive study of the automatic extraction of opinions with a focus on opinion targets, which is an essential step in order to enable other tasks, e.g. information retrieval or question answering on opinionated content. We analyze the state-of-the-art in opinion mining and divide it into three subtasks, one of which is the extraction of opinion targets. We perform a comparative evaluation of two unsupervised algorithms in the task of opinion target extraction on datasets of customer reviews and blog postings which span the following four different domains: digital cameras, cars, movies and web-services. We show how the identification of opinion expressions influences the opinion target extraction performance of each algorithm. We also show that a simple word distance-based heuristic significantly outperforms both unsupervised algorithms, which make their relevance decision by analyzing word frequencies in the corpus. The word distance-based heuristic reaches an F-Measure between 0.372 and 0.491 on the four datasets. We furthermore evaluate a state-of-the-art supervised algorithm in the task of opinion target extraction and present a new approach which is based on Conditional Random Fields (CRF). Our approach outperforms the state-of-the-art baseline significantly on all four datasets reaching an F-Measure between 0.497 and 0.702. We also evaluate both algorithms in a cross-domain opinion target extraction task, since a common problem with supervised algorithms is the domain dependence of the learned model. In this setting, our CRF-based approach also outperforms the baseline on all four datasets and it outperforms the best unsupervised approach, which is by design not prone to domain dependence, on three of the four datasets mentioned above. In the cross-domain opinion target extraction task, the CRF-based approach reaches an F-Measure between 0.360 and 0.518 on the four datasets. The extraction of opinion targets, which are referenced by anaphoric expressions, is a challenge which is frequently encountered in opinion mining at the phrase level. For the first time, we integrate anaphora resolution algorithms in a supervised opinion mining system. We perform a comparative evaluation of two algorithms, in which we require them to extract the correct antecedent of anaphoric targets. Our results indicate that one of the algorithms, which was designed for high-precision anaphora resolution, is better suited in the opinion mining setting. By extending the algorithm, which yields the best results in the off-the-shelf configuration, we yield significant improvements regarding the extraction of opinion targets on three of the four datasets. Finally, we show how an opinion mining system can be successfully employed to improve another application. Recommendation systems are nowadays widely used in online platforms and desktop applications in order to suggest goods or pieces of art to users, which they do not know yet, but are likely to enjoy. The recommendations for a user U1 are determined by first profiling the taste and interests of all users of the recommendation system. Then the algorithm identifies other users U2 ... Un which have a similar taste as user U1, and then recommends items to U1 which the users who have a similar taste enjoyed. A user's taste and interests are typically profiled by giving him the option to rate entities, which he has consumed. As mentioned above, website operators have also given users the opportunity to leave their ratings not only on a numerical scale, but also via a free-text review. We hypothesize that these free-text reviews contain a lot of information, expressed in the users' opinions, which would allow us to model his taste and preferences on a very fine granularity. We show that, by integrating our opinion mining system as a feature provider to a state-of-the-art recommendation system, we can significantly improve the accuracy of the recommendations, which we evaluate on a dataset of movie ratings and reviews. 2011-05-24 Ph.D. Thesis NonPeerReviewed application/pdf eng CC-BY-NC-ND 2.5 de - Creative Commons, Attribution Non-commerical, No-derivatives https://tuprints.ulb.tu-darmstadt.de/2609/1/Diss.pdf Jakob, Niklas <http://tuprints.ulb.tu-darmstadt.de/view/person/Jakob=3ANiklas=3A=3A.html> (2011): Extracting Opinion Targets from User-Generated Discourse with an Application to Recommendation Systems.Darmstadt, Technische Universität, [Ph.D. Thesis] en info:eu-repo/semantics/doctoralThesis info:eu-repo/semantics/openAccess

Extracting Opinion Targets from User-Generated Discourse with an Application to Recommendation Systems

Similar Items