Customers’ Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth

Customers of various services are often invited to type a summarizing review via an Internet portal. Such reviews, written in natural languages, are typically unstructured, giving also a numeric evaluation within the scale “good” and “bad.” The more reviews, the better feedback can be acquired for i...

Full description

Bibliographic Details
Main Authors: Jan Žižka, Arnošt Svoboda
Format: Article
Language:English
Published: Mendel University Press 2015-01-01
Series:Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis
Subjects:
Online Access:https://acta.mendelu.cz/63/6/2229/
id doaj-fe6b2cdedf14476da1508d2a508ea61a
record_format Article
spelling doaj-fe6b2cdedf14476da1508d2a508ea61a2020-11-25T00:18:28ZengMendel University PressActa Universitatis Agriculturae et Silviculturae Mendelianae Brunensis1211-85162464-83102015-01-016362229223710.11118/actaun201563062229Customers’ Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge GrowthJan Žižka0Arnošt Svoboda1Department of Informatics, Faculty of Business and Economics, Mendel University in Brno, Zemědělská 1, 613 00 Brno, Czech RepublicDepartment of Applied Mathematics and Computer Science, Faculty of Economics and Administration, Masaryk University, Žerotínovo nám. 617/9, 601 77 Brno, Czech RepublicCustomers of various services are often invited to type a summarizing review via an Internet portal. Such reviews, written in natural languages, are typically unstructured, giving also a numeric evaluation within the scale “good” and “bad.” The more reviews, the better feedback can be acquired for improving the service. However, after accumulating massive data, the non-linearly growing processing complexity may exceed the computational abilities to analyze the text contents. Decision tree inducers like c5 can reveal understandable knowledge from data but they need the data as a whole. This article describes an application of windowing, which is a technique for generating dataset subsamples that provide enough information for an inducer to train a classifier and get results similar to those achieved by training a model from the entire dataset. The windowing results, significantly reducing the complexity of the learning problem, are demonstrated using hundreds of thousands reviews written in English by hotel-service customers. A user obtains knowledge represented by significant words. The results show classification accuracy errors, training and testing time, tree sizes, and words relevant for the review meaning in dependence on the training subsample size. Finally, a method of suitable training-set size estimation is suggested.https://acta.mendelu.cz/63/6/2229/text miningcustomer opinion analysisdecision treesdecision ruleswindowinglarge data volumes
collection DOAJ
language English
format Article
sources DOAJ
author Jan Žižka
Arnošt Svoboda
spellingShingle Jan Žižka
Arnošt Svoboda
Customers’ Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth
Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis
text mining
customer opinion analysis
decision trees
decision rules
windowing
large data volumes
author_facet Jan Žižka
Arnošt Svoboda
author_sort Jan Žižka
title Customers’ Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth
title_short Customers’ Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth
title_full Customers’ Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth
title_fullStr Customers’ Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth
title_full_unstemmed Customers’ Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth
title_sort customers’ opinion mining from extensive amount of textual reviews in relation to induced knowledge growth
publisher Mendel University Press
series Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis
issn 1211-8516
2464-8310
publishDate 2015-01-01
description Customers of various services are often invited to type a summarizing review via an Internet portal. Such reviews, written in natural languages, are typically unstructured, giving also a numeric evaluation within the scale “good” and “bad.” The more reviews, the better feedback can be acquired for improving the service. However, after accumulating massive data, the non-linearly growing processing complexity may exceed the computational abilities to analyze the text contents. Decision tree inducers like c5 can reveal understandable knowledge from data but they need the data as a whole. This article describes an application of windowing, which is a technique for generating dataset subsamples that provide enough information for an inducer to train a classifier and get results similar to those achieved by training a model from the entire dataset. The windowing results, significantly reducing the complexity of the learning problem, are demonstrated using hundreds of thousands reviews written in English by hotel-service customers. A user obtains knowledge represented by significant words. The results show classification accuracy errors, training and testing time, tree sizes, and words relevant for the review meaning in dependence on the training subsample size. Finally, a method of suitable training-set size estimation is suggested.
topic text mining
customer opinion analysis
decision trees
decision rules
windowing
large data volumes
url https://acta.mendelu.cz/63/6/2229/
work_keys_str_mv AT janzizka customersopinionminingfromextensiveamountoftextualreviewsinrelationtoinducedknowledgegrowth
AT arnostsvoboda customersopinionminingfromextensiveamountoftextualreviewsinrelationtoinducedknowledgegrowth
_version_ 1725376407525130240