Two-Phase Crowdsourced Comment Integration Method Based on Reward Prediction and Policy Gradient

In recent years, with the rapid development of the Internet, people frequently post comments about a specific object on the Internet. Mastering the critical information from the crowdsourced comments promptly is crucial to the decision-making and service adjustment, with non-negligible application v...

Full description

Bibliographic Details
Main Author: RONG Huan, MA Tinghuai
Format: Article
Language:zho
Published: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press 2021-08-01
Series:Jisuanji kexue yu tansuo
Subjects:
Online Access:http://fcst.ceaj.org/CN/abstract/abstract2832.shtml
Description
Summary:In recent years, with the rapid development of the Internet, people frequently post comments about a specific object on the Internet. Mastering the critical information from the crowdsourced comments promptly is crucial to the decision-making and service adjustment, with non-negligible application value. Therefore, it is imperative to devote effort to the research on crowdsourced comment integration problem. The goal of the crowdsourced comment integration is to integrate different users comments on the target object into a shorter integrated document by a given compression rate, so as to form a comparatively matched description of the target object according to the public cognition. To solve such problem, a two-phase crowdsourced comment integration method based on reward prediction and policy gradient is proposed. The proposed method does not rely on any man-made ground truth, only requiring the crowdsourced comments. Then, an agent, guided by the experience or reward, will extract key sentence from the crowdsourced comments to generate the integrated comment. Specifically, in the first phase, measuring the content quality of the integrated comment by the relevance and redundancy of sentences, taking the content quality as reward, the long-term reward from selecting a current sentence to the end of the whole comment integration process will be predicted by Q-value, based on which the agent is guided to learn an optimal sentence selection policy. Then, in the second phase, taking the sentiment intensity of the integrated comment as reward, the sentence selection policy learnt in the first phase will be further adjusted by policy gradient, so that the integrated comment generated by the agent can highlight the sentiment intensity from an objective perspective and reflect users attitude more obviously, at the same time, maintaining the content quality. According to the experimental results, compared with the other existing methods, the proposed method can achieve the best overall performance in terms of the content quality as well as the sentiment intensity of the integrated comment, and the time consumed for generation is still controlled at an acceptable level.
ISSN:1673-9418