Flexible and efficient exploration of rated datasets

As users increasingly rely on collaborative rating sites to achieve mundane tasks such as purchasing a product or renting a movie, they are facing the data deluge of ratings and reviews. Traditionally, the exploration of rated data sets has been enabled by rating averages that allow user-centric,...

Full description

Bibliographic Details
Main Author:	Kolloju, Naresh Kumar
Language:	English
Published:	University of British Columbia 2013
Online Access:	http://hdl.handle.net/2429/44028

id	ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.2429-44028
record_format	oai_dc
spelling	ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.2429-440282014-03-26T03:39:30Z Flexible and efficient exploration of rated datasets Kolloju, Naresh Kumar As users increasingly rely on collaborative rating sites to achieve mundane tasks such as purchasing a product or renting a movie, they are facing the data deluge of ratings and reviews. Traditionally, the exploration of rated data sets has been enabled by rating averages that allow user-centric, itemcentric and top-k exploration. More speci cally, canned queries on user demographics aggregate opinion for an item or a collection of items such as 18-29 year old males in CA rated the movie The Social Network at 8:2 on average. Combining ratings, demographics, and item attributes is a powerful exploration mechanism that allows operations such as comparing the opinion of the same users for two items, comparing two groups of users on their opinion for a given class of items, and nding a group whose rating distribution is nearly unanimous for an item. To enable those operations, it is necessary to (i) adopt the right measure to compare ratings, and to (ii) develop e cient algorithms to nd relevant <user,item,rating> groups. We argue that rating average is a weak measure for capturing such comparisons. We propose contrasting and querying rating distributions, instead, using the Earth Mover's Distance (EMD), a measure that intuitively re ects the amount of work needed to transform one distribution into another. We show that the problem of nding groups matching given rating distributions is NP-hard under di erent settings and develop appropriate heuristics. Our experiments on real and synthetic datasets validate the utility of our approach and demonstrate the scalability of our algorithms. 2013-03-13T18:44:23Z 2013-08-31T07:00:00Z 2013 2013-03-13 2013-05 Electronic Thesis or Dissertation http://hdl.handle.net/2429/44028 eng University of British Columbia
collection	NDLTD
language	English
sources	NDLTD
description	As users increasingly rely on collaborative rating sites to achieve mundane tasks such as purchasing a product or renting a movie, they are facing the data deluge of ratings and reviews. Traditionally, the exploration of rated data sets has been enabled by rating averages that allow user-centric, itemcentric and top-k exploration. More speci cally, canned queries on user demographics aggregate opinion for an item or a collection of items such as 18-29 year old males in CA rated the movie The Social Network at 8:2 on average. Combining ratings, demographics, and item attributes is a powerful exploration mechanism that allows operations such as comparing the opinion of the same users for two items, comparing two groups of users on their opinion for a given class of items, and nding a group whose rating distribution is nearly unanimous for an item. To enable those operations, it is necessary to (i) adopt the right measure to compare ratings, and to (ii) develop e cient algorithms to nd relevant <user,item,rating> groups. We argue that rating average is a weak measure for capturing such comparisons. We propose contrasting and querying rating distributions, instead, using the Earth Mover's Distance (EMD), a measure that intuitively re ects the amount of work needed to transform one distribution into another. We show that the problem of nding groups matching given rating distributions is NP-hard under di erent settings and develop appropriate heuristics. Our experiments on real and synthetic datasets validate the utility of our approach and demonstrate the scalability of our algorithms.
author	Kolloju, Naresh Kumar
spellingShingle	Kolloju, Naresh Kumar Flexible and efficient exploration of rated datasets
author_facet	Kolloju, Naresh Kumar
author_sort	Kolloju, Naresh Kumar
title	Flexible and efficient exploration of rated datasets
title_short	Flexible and efficient exploration of rated datasets
title_full	Flexible and efficient exploration of rated datasets
title_fullStr	Flexible and efficient exploration of rated datasets
title_full_unstemmed	Flexible and efficient exploration of rated datasets
title_sort	flexible and efficient exploration of rated datasets
publisher	University of British Columbia
publishDate	2013
url	http://hdl.handle.net/2429/44028
work_keys_str_mv	AT kollojunareshkumar flexibleandefficientexplorationofrateddatasets
_version_	1716656623916679168

Flexible and efficient exploration of rated datasets

Similar Items