A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example

碩士 === 國立臺灣科技大學 === 資訊管理系 === 104 === Due to the popularity of the Internet, people are willing to share their opinions on using a product through posting review articles on the Internet. Review articles affect a customer's attitude on purchasing a product. In the past, consumers may ask th...

Full description

Bibliographic Details
Main Authors:	Yin-Hsuan Hsieh, 謝尹瑄
Other Authors:	Yung-Ho Leu
Format:	Others
Language:	zh-TW
Published:	2016
Online Access:	http://ndltd.ncl.edu.tw/handle/d4qmy9

id	ndltd-TW-104NTUS5396090
record_format	oai_dc
spelling	ndltd-TW-104NTUS53960902019-05-15T23:01:18Z http://ndltd.ncl.edu.tw/handle/d4qmy9 A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example 以資料探勘技術判斷網路上產品使用心得文章的適當性以美妝產品為例 Yin-Hsuan Hsieh 謝尹瑄碩士國立臺灣科技大學資訊管理系 104 Due to the popularity of the Internet, people are willing to share their opinions on using a product through posting review articles on the Internet. Review articles affect a customer's attitude on purchasing a product. In the past, consumers may ask their friends' or relatives' opinions on a product before purchasing the product. Today, consumers usually browse the review articles on using a product on a blog or a forum before buying the product. As review articles are influential on customer's purchasing behavior, they are regulated by the law. A review article may exaggerate the effect on using a product to entice a customer to purchase the product. Therefore, there are regulations on the contents of a review article. This thesis aims at automatically screening out improper review articles from review articles on the Internet. In this thesis, we chose the cosmetics as the subject of this study. First, we built a thesaurus of illegal words be referencing the website of Ministry of Health and Welfare of Taiwan. Then, we randomly selected 500 articles from 6000 review articles on Urcosme which is a forum on cosmetics in Taiwan. Then, we classified the selected articles into 2 categories—proper and improper. A review article is improper if it contains words from the thesaurus; otherwise, it is proper. Subsequently, we used Naïve Bayes and Decision Tree algorithms of Weka to classify this training dataset. Under 10-fold cross validation and defining the improper category as the positive class, the experimental results showed that the recalls of both algorithms were greater than 70 percent and specificities were all greater than 90 percent. The experimental results showed that the proposed method offered an effective way in automatically identifying improper review articles from the Internet. Yung-Ho Leu 呂永和 2016 學位論文 ; thesis 44 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立臺灣科技大學 === 資訊管理系 === 104 === Due to the popularity of the Internet, people are willing to share their opinions on using a product through posting review articles on the Internet. Review articles affect a customer's attitude on purchasing a product. In the past, consumers may ask their friends' or relatives' opinions on a product before purchasing the product. Today, consumers usually browse the review articles on using a product on a blog or a forum before buying the product. As review articles are influential on customer's purchasing behavior, they are regulated by the law. A review article may exaggerate the effect on using a product to entice a customer to purchase the product. Therefore, there are regulations on the contents of a review article. This thesis aims at automatically screening out improper review articles from review articles on the Internet. In this thesis, we chose the cosmetics as the subject of this study. First, we built a thesaurus of illegal words be referencing the website of Ministry of Health and Welfare of Taiwan. Then, we randomly selected 500 articles from 6000 review articles on Urcosme which is a forum on cosmetics in Taiwan. Then, we classified the selected articles into 2 categories—proper and improper. A review article is improper if it contains words from the thesaurus; otherwise, it is proper. Subsequently, we used Naïve Bayes and Decision Tree algorithms of Weka to classify this training dataset. Under 10-fold cross validation and defining the improper category as the positive class, the experimental results showed that the recalls of both algorithms were greater than 70 percent and specificities were all greater than 90 percent. The experimental results showed that the proposed method offered an effective way in automatically identifying improper review articles from the Internet.
author2	Yung-Ho Leu
author_facet	Yung-Ho Leu Yin-Hsuan Hsieh 謝尹瑄
author	Yin-Hsuan Hsieh 謝尹瑄
spellingShingle	Yin-Hsuan Hsieh 謝尹瑄 A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example
author_sort	Yin-Hsuan Hsieh
title	A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example
title_short	A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example
title_full	A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example
title_fullStr	A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example
title_full_unstemmed	A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example
title_sort	data mining approach for identifying improper review articles on the internet - taking cosmetics as an example
publishDate	2016
url	http://ndltd.ncl.edu.tw/handle/d4qmy9
work_keys_str_mv	AT yinhsuanhsieh adataminingapproachforidentifyingimproperreviewarticlesontheinternettakingcosmeticsasanexample AT xièyǐnxuān adataminingapproachforidentifyingimproperreviewarticlesontheinternettakingcosmeticsasanexample AT yinhsuanhsieh yǐzīliàotànkānjìshùpànduànwǎnglùshàngchǎnpǐnshǐyòngxīndéwénzhāngdeshìdāngxìngyǐměizhuāngchǎnpǐnwèilì AT xièyǐnxuān yǐzīliàotànkānjìshùpànduànwǎnglùshàngchǎnpǐnshǐyòngxīndéwénzhāngdeshìdāngxìngyǐměizhuāngchǎnpǐnwèilì AT yinhsuanhsieh dataminingapproachforidentifyingimproperreviewarticlesontheinternettakingcosmeticsasanexample AT xièyǐnxuān dataminingapproachforidentifyingimproperreviewarticlesontheinternettakingcosmeticsasanexample
_version_	1719139243451744256

A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example

Similar Items