A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example
碩士 === 國立臺灣科技大學 === 資訊管理系 === 104 === Due to the popularity of the Internet, people are willing to share their opinions on using a product through posting review articles on the Internet. Review articles affect a customer's attitude on purchasing a product. In the past, consumers may ask th...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2016
|
Online Access: | http://ndltd.ncl.edu.tw/handle/d4qmy9 |
id |
ndltd-TW-104NTUS5396090 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-104NTUS53960902019-05-15T23:01:18Z http://ndltd.ncl.edu.tw/handle/d4qmy9 A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example 以資料探勘技術判斷網路上產品使用心得文章的適當性以美妝產品為例 Yin-Hsuan Hsieh 謝尹瑄 碩士 國立臺灣科技大學 資訊管理系 104 Due to the popularity of the Internet, people are willing to share their opinions on using a product through posting review articles on the Internet. Review articles affect a customer's attitude on purchasing a product. In the past, consumers may ask their friends' or relatives' opinions on a product before purchasing the product. Today, consumers usually browse the review articles on using a product on a blog or a forum before buying the product. As review articles are influential on customer's purchasing behavior, they are regulated by the law. A review article may exaggerate the effect on using a product to entice a customer to purchase the product. Therefore, there are regulations on the contents of a review article. This thesis aims at automatically screening out improper review articles from review articles on the Internet. In this thesis, we chose the cosmetics as the subject of this study. First, we built a thesaurus of illegal words be referencing the website of Ministry of Health and Welfare of Taiwan. Then, we randomly selected 500 articles from 6000 review articles on Urcosme which is a forum on cosmetics in Taiwan. Then, we classified the selected articles into 2 categories—proper and improper. A review article is improper if it contains words from the thesaurus; otherwise, it is proper. Subsequently, we used Naïve Bayes and Decision Tree algorithms of Weka to classify this training dataset. Under 10-fold cross validation and defining the improper category as the positive class, the experimental results showed that the recalls of both algorithms were greater than 70 percent and specificities were all greater than 90 percent. The experimental results showed that the proposed method offered an effective way in automatically identifying improper review articles from the Internet. Yung-Ho Leu 呂永和 2016 學位論文 ; thesis 44 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣科技大學 === 資訊管理系 === 104 === Due to the popularity of the Internet, people are willing to share their opinions on using a product through posting review articles on the Internet. Review articles affect a customer's attitude on purchasing a product. In the past, consumers may ask their friends' or relatives' opinions on a product before purchasing the product. Today, consumers usually browse the review articles on using a product on a blog or a forum before buying the product. As review articles are influential on customer's purchasing behavior, they are regulated by the law. A review article may exaggerate the effect on using a product to entice a customer to purchase the product. Therefore, there are regulations on the contents of a review article. This thesis aims at automatically screening out improper review articles from review articles on the Internet. In this thesis, we chose the cosmetics as the subject of this study. First, we built a thesaurus of illegal words be referencing the website of Ministry of Health and Welfare of Taiwan. Then, we randomly selected 500 articles from 6000 review articles on Urcosme which is a forum on cosmetics in Taiwan. Then, we classified the selected articles into 2 categories—proper and improper. A review article is improper if it contains words from the thesaurus; otherwise, it is proper. Subsequently, we used Naïve Bayes and Decision Tree algorithms of Weka to classify this training dataset. Under 10-fold cross validation and defining the improper category as the positive class, the experimental results showed that the recalls of both algorithms were greater than 70 percent and specificities were all greater than 90 percent. The experimental results showed that the proposed method offered an effective way in automatically identifying improper review articles from the Internet.
|
author2 |
Yung-Ho Leu |
author_facet |
Yung-Ho Leu Yin-Hsuan Hsieh 謝尹瑄 |
author |
Yin-Hsuan Hsieh 謝尹瑄 |
spellingShingle |
Yin-Hsuan Hsieh 謝尹瑄 A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example |
author_sort |
Yin-Hsuan Hsieh |
title |
A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example |
title_short |
A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example |
title_full |
A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example |
title_fullStr |
A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example |
title_full_unstemmed |
A data mining approach for identifying improper review articles on the internet - Taking Cosmetics as an example |
title_sort |
data mining approach for identifying improper review articles on the internet - taking cosmetics as an example |
publishDate |
2016 |
url |
http://ndltd.ncl.edu.tw/handle/d4qmy9 |
work_keys_str_mv |
AT yinhsuanhsieh adataminingapproachforidentifyingimproperreviewarticlesontheinternettakingcosmeticsasanexample AT xièyǐnxuān adataminingapproachforidentifyingimproperreviewarticlesontheinternettakingcosmeticsasanexample AT yinhsuanhsieh yǐzīliàotànkānjìshùpànduànwǎnglùshàngchǎnpǐnshǐyòngxīndéwénzhāngdeshìdāngxìngyǐměizhuāngchǎnpǐnwèilì AT xièyǐnxuān yǐzīliàotànkānjìshùpànduànwǎnglùshàngchǎnpǐnshǐyòngxīndéwénzhāngdeshìdāngxìngyǐměizhuāngchǎnpǐnwèilì AT yinhsuanhsieh dataminingapproachforidentifyingimproperreviewarticlesontheinternettakingcosmeticsasanexample AT xièyǐnxuān dataminingapproachforidentifyingimproperreviewarticlesontheinternettakingcosmeticsasanexample |
_version_ |
1719139243451744256 |