A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example

碩士 === 國立臺灣科技大學 === 資訊管理系 === 104 === Due to the popularity of the Internet, people are willing to share their opinions on using a product through posting review articles on the Internet. Review articles affect a customer's attitude on purchasing a product. In the past, consumers may ask th...

Full description

Bibliographic Details
Main Authors: Yin-Hsuan Hsieh, 謝尹瑄
Other Authors: Yung-Ho Leu
Format: Others
Language:zh-TW
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/d4qmy9
id ndltd-TW-104NTUS5396090
record_format oai_dc
spelling ndltd-TW-104NTUS53960902019-05-15T23:01:18Z http://ndltd.ncl.edu.tw/handle/d4qmy9 A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example 以資料探勘技術判斷網路上產品使用心得文章的適當性以美妝產品為例 Yin-Hsuan Hsieh 謝尹瑄 碩士 國立臺灣科技大學 資訊管理系 104 Due to the popularity of the Internet, people are willing to share their opinions on using a product through posting review articles on the Internet. Review articles affect a customer's attitude on purchasing a product. In the past, consumers may ask their friends' or relatives' opinions on a product before purchasing the product. Today, consumers usually browse the review articles on using a product on a blog or a forum before buying the product. As review articles are influential on customer's purchasing behavior, they are regulated by the law. A review article may exaggerate the effect on using a product to entice a customer to purchase the product. Therefore, there are regulations on the contents of a review article. This thesis aims at automatically screening out improper review articles from review articles on the Internet. In this thesis, we chose the cosmetics as the subject of this study. First, we built a thesaurus of illegal words be referencing the website of Ministry of Health and Welfare of Taiwan. Then, we randomly selected 500 articles from 6000 review articles on Urcosme which is a forum on cosmetics in Taiwan. Then, we classified the selected articles into 2 categories—proper and improper. A review article is improper if it contains words from the thesaurus; otherwise, it is proper. Subsequently, we used Naïve Bayes and Decision Tree algorithms of Weka to classify this training dataset. Under 10-fold cross validation and defining the improper category as the positive class, the experimental results showed that the recalls of both algorithms were greater than 70 percent and specificities were all greater than 90 percent. The experimental results showed that the proposed method offered an effective way in automatically identifying improper review articles from the Internet. Yung-Ho Leu 呂永和 2016 學位論文 ; thesis 44 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣科技大學 === 資訊管理系 === 104 === Due to the popularity of the Internet, people are willing to share their opinions on using a product through posting review articles on the Internet. Review articles affect a customer's attitude on purchasing a product. In the past, consumers may ask their friends' or relatives' opinions on a product before purchasing the product. Today, consumers usually browse the review articles on using a product on a blog or a forum before buying the product. As review articles are influential on customer's purchasing behavior, they are regulated by the law. A review article may exaggerate the effect on using a product to entice a customer to purchase the product. Therefore, there are regulations on the contents of a review article. This thesis aims at automatically screening out improper review articles from review articles on the Internet. In this thesis, we chose the cosmetics as the subject of this study. First, we built a thesaurus of illegal words be referencing the website of Ministry of Health and Welfare of Taiwan. Then, we randomly selected 500 articles from 6000 review articles on Urcosme which is a forum on cosmetics in Taiwan. Then, we classified the selected articles into 2 categories—proper and improper. A review article is improper if it contains words from the thesaurus; otherwise, it is proper. Subsequently, we used Naïve Bayes and Decision Tree algorithms of Weka to classify this training dataset. Under 10-fold cross validation and defining the improper category as the positive class, the experimental results showed that the recalls of both algorithms were greater than 70 percent and specificities were all greater than 90 percent. The experimental results showed that the proposed method offered an effective way in automatically identifying improper review articles from the Internet.
author2 Yung-Ho Leu
author_facet Yung-Ho Leu
Yin-Hsuan Hsieh
謝尹瑄
author Yin-Hsuan Hsieh
謝尹瑄
spellingShingle Yin-Hsuan Hsieh
謝尹瑄
A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example
author_sort Yin-Hsuan Hsieh
title A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example
title_short A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example
title_full A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example
title_fullStr A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example
title_full_unstemmed A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example
title_sort data mining approach for identifying improper review articles on the internet - taking cosmetics as an example
publishDate 2016
url http://ndltd.ncl.edu.tw/handle/d4qmy9
work_keys_str_mv AT yinhsuanhsieh adataminingapproachforidentifyingimproperreviewarticlesontheinternettakingcosmeticsasanexample
AT xièyǐnxuān adataminingapproachforidentifyingimproperreviewarticlesontheinternettakingcosmeticsasanexample
AT yinhsuanhsieh yǐzīliàotànkānjìshùpànduànwǎnglùshàngchǎnpǐnshǐyòngxīndéwénzhāngdeshìdāngxìngyǐměizhuāngchǎnpǐnwèilì
AT xièyǐnxuān yǐzīliàotànkānjìshùpànduànwǎnglùshàngchǎnpǐnshǐyòngxīndéwénzhāngdeshìdāngxìngyǐměizhuāngchǎnpǐnwèilì
AT yinhsuanhsieh dataminingapproachforidentifyingimproperreviewarticlesontheinternettakingcosmeticsasanexample
AT xièyǐnxuān dataminingapproachforidentifyingimproperreviewarticlesontheinternettakingcosmeticsasanexample
_version_ 1719139243451744256