Flexible and efficient exploration of rated datasets
As users increasingly rely on collaborative rating sites to achieve mundane tasks such as purchasing a product or renting a movie, they are facing the data deluge of ratings and reviews. Traditionally, the exploration of rated data sets has been enabled by rating averages that allow user-centric,...
Main Author: | |
---|---|
Language: | English |
Published: |
University of British Columbia
2013
|
Online Access: | http://hdl.handle.net/2429/44028 |
id |
ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.2429-44028 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.2429-440282014-03-26T03:39:30Z Flexible and efficient exploration of rated datasets Kolloju, Naresh Kumar As users increasingly rely on collaborative rating sites to achieve mundane tasks such as purchasing a product or renting a movie, they are facing the data deluge of ratings and reviews. Traditionally, the exploration of rated data sets has been enabled by rating averages that allow user-centric, itemcentric and top-k exploration. More speci cally, canned queries on user demographics aggregate opinion for an item or a collection of items such as 18-29 year old males in CA rated the movie The Social Network at 8:2 on average. Combining ratings, demographics, and item attributes is a powerful exploration mechanism that allows operations such as comparing the opinion of the same users for two items, comparing two groups of users on their opinion for a given class of items, and nding a group whose rating distribution is nearly unanimous for an item. To enable those operations, it is necessary to (i) adopt the right measure to compare ratings, and to (ii) develop e cient algorithms to nd relevant <user,item,rating> groups. We argue that rating average is a weak measure for capturing such comparisons. We propose contrasting and querying rating distributions, instead, using the Earth Mover's Distance (EMD), a measure that intuitively re ects the amount of work needed to transform one distribution into another. We show that the problem of nding groups matching given rating distributions is NP-hard under di erent settings and develop appropriate heuristics. Our experiments on real and synthetic datasets validate the utility of our approach and demonstrate the scalability of our algorithms. 2013-03-13T18:44:23Z 2013-08-31T07:00:00Z 2013 2013-03-13 2013-05 Electronic Thesis or Dissertation http://hdl.handle.net/2429/44028 eng University of British Columbia |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
description |
As users increasingly rely on collaborative rating sites to achieve mundane
tasks such as purchasing a product or renting a movie, they are facing the
data deluge of ratings and reviews. Traditionally, the exploration of rated
data sets has been enabled by rating averages that allow user-centric, itemcentric
and top-k exploration. More speci cally, canned queries on user
demographics aggregate opinion for an item or a collection of items such as
18-29 year old males in CA rated the movie The Social Network at 8:2 on
average. Combining ratings, demographics, and item attributes is a powerful
exploration mechanism that allows operations such as comparing the
opinion of the same users for two items, comparing two groups of users on
their opinion for a given class of items, and nding a group whose rating
distribution is nearly unanimous for an item. To enable those operations,
it is necessary to (i) adopt the right measure to compare ratings, and to
(ii) develop e cient algorithms to nd relevant <user,item,rating> groups.
We argue that rating average is a weak measure for capturing such comparisons.
We propose contrasting and querying rating distributions, instead,
using the Earth Mover's Distance (EMD), a measure that intuitively re
ects
the amount of work needed to transform one distribution into another. We
show that the problem of nding groups matching given rating distributions
is NP-hard under di erent settings and develop appropriate heuristics.
Our experiments on real and synthetic datasets validate the utility of our
approach and demonstrate the scalability of our algorithms. |
author |
Kolloju, Naresh Kumar |
spellingShingle |
Kolloju, Naresh Kumar Flexible and efficient exploration of rated datasets |
author_facet |
Kolloju, Naresh Kumar |
author_sort |
Kolloju, Naresh Kumar |
title |
Flexible and efficient exploration of rated datasets |
title_short |
Flexible and efficient exploration of rated datasets |
title_full |
Flexible and efficient exploration of rated datasets |
title_fullStr |
Flexible and efficient exploration of rated datasets |
title_full_unstemmed |
Flexible and efficient exploration of rated datasets |
title_sort |
flexible and efficient exploration of rated datasets |
publisher |
University of British Columbia |
publishDate |
2013 |
url |
http://hdl.handle.net/2429/44028 |
work_keys_str_mv |
AT kollojunareshkumar flexibleandefficientexplorationofrateddatasets |
_version_ |
1716656623916679168 |