Customers’ Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth
Customers of various services are often invited to type a summarizing review via an Internet portal. Such reviews, written in natural languages, are typically unstructured, giving also a numeric evaluation within the scale “good” and “bad.” The more reviews, the better feedback can be acquired for i...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Mendel University Press
2015-01-01
|
Series: | Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis |
Subjects: | |
Online Access: | https://acta.mendelu.cz/63/6/2229/ |
id |
doaj-fe6b2cdedf14476da1508d2a508ea61a |
---|---|
record_format |
Article |
spelling |
doaj-fe6b2cdedf14476da1508d2a508ea61a2020-11-25T00:18:28ZengMendel University PressActa Universitatis Agriculturae et Silviculturae Mendelianae Brunensis1211-85162464-83102015-01-016362229223710.11118/actaun201563062229Customers’ Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge GrowthJan Žižka0Arnošt Svoboda1Department of Informatics, Faculty of Business and Economics, Mendel University in Brno, Zemědělská 1, 613 00 Brno, Czech RepublicDepartment of Applied Mathematics and Computer Science, Faculty of Economics and Administration, Masaryk University, Žerotínovo nám. 617/9, 601 77 Brno, Czech RepublicCustomers of various services are often invited to type a summarizing review via an Internet portal. Such reviews, written in natural languages, are typically unstructured, giving also a numeric evaluation within the scale “good” and “bad.” The more reviews, the better feedback can be acquired for improving the service. However, after accumulating massive data, the non-linearly growing processing complexity may exceed the computational abilities to analyze the text contents. Decision tree inducers like c5 can reveal understandable knowledge from data but they need the data as a whole. This article describes an application of windowing, which is a technique for generating dataset subsamples that provide enough information for an inducer to train a classifier and get results similar to those achieved by training a model from the entire dataset. The windowing results, significantly reducing the complexity of the learning problem, are demonstrated using hundreds of thousands reviews written in English by hotel-service customers. A user obtains knowledge represented by significant words. The results show classification accuracy errors, training and testing time, tree sizes, and words relevant for the review meaning in dependence on the training subsample size. Finally, a method of suitable training-set size estimation is suggested.https://acta.mendelu.cz/63/6/2229/text miningcustomer opinion analysisdecision treesdecision ruleswindowinglarge data volumes |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jan Žižka Arnošt Svoboda |
spellingShingle |
Jan Žižka Arnošt Svoboda Customers’ Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis text mining customer opinion analysis decision trees decision rules windowing large data volumes |
author_facet |
Jan Žižka Arnošt Svoboda |
author_sort |
Jan Žižka |
title |
Customers’ Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth |
title_short |
Customers’ Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth |
title_full |
Customers’ Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth |
title_fullStr |
Customers’ Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth |
title_full_unstemmed |
Customers’ Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth |
title_sort |
customers’ opinion mining from extensive amount of textual reviews in relation to induced knowledge growth |
publisher |
Mendel University Press |
series |
Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis |
issn |
1211-8516 2464-8310 |
publishDate |
2015-01-01 |
description |
Customers of various services are often invited to type a summarizing review via an Internet portal. Such reviews, written in natural languages, are typically unstructured, giving also a numeric evaluation within the scale “good” and “bad.” The more reviews, the better feedback can be acquired for improving the service. However, after accumulating massive data, the non-linearly growing processing complexity may exceed the computational abilities to analyze the text contents. Decision tree inducers like c5 can reveal understandable knowledge from data but they need the data as a whole. This article describes an application of windowing, which is a technique for generating dataset subsamples that provide enough information for an inducer to train a classifier and get results similar to those achieved by training a model from the entire dataset. The windowing results, significantly reducing the complexity of the learning problem, are demonstrated using hundreds of thousands reviews written in English by hotel-service customers. A user obtains knowledge represented by significant words. The results show classification accuracy errors, training and testing time, tree sizes, and words relevant for the review meaning in dependence on the training subsample size. Finally, a method of suitable training-set size estimation is suggested. |
topic |
text mining customer opinion analysis decision trees decision rules windowing large data volumes |
url |
https://acta.mendelu.cz/63/6/2229/ |
work_keys_str_mv |
AT janzizka customersopinionminingfromextensiveamountoftextualreviewsinrelationtoinducedknowledgegrowth AT arnostsvoboda customersopinionminingfromextensiveamountoftextualreviewsinrelationtoinducedknowledgegrowth |
_version_ |
1725376407525130240 |