Summary: | 碩士 === 國立臺灣師範大學 === 資訊工程研究所 === 99 === In the recent researches on opinion mining, the feature terms of products are usually manual assigned or determined according to the term frequencies. Consequently, it would take lots of costs when we choose different products. For this reason, the goal of this thesis is to study how to extract feature terms of products from documents in a forum automatically and effectively. We select forum and expert commentaries as the corpora. Within a corpus, the nouns appearing in the documents are selected as the candidate feature terms. The term frequency is counted for each candidate term for the documents discussing a certain brand, which shows the popularity of a feature term. The divergence of probability between different brands is calculated for each candidate term, which shows the particular feature term of a brand. The correlation of a feature term with a brand is also calculated to show the related terms of a brand. Furthermore, the divergence of probability between the two different corpora is calculated for a candidate term to show the special terms of different corpora. Finally, we propose an importance measure function of terms to evaluate the importance of terms, which combine the scores of the above various evaluation methods. The experimental results show that the rank list of feature terms obtained by using the importance measure function could extract product feature terms automatically and effectively.
|