Summary: | As users increasingly rely on collaborative rating sites to achieve mundane
tasks such as purchasing a product or renting a movie, they are facing the
data deluge of ratings and reviews. Traditionally, the exploration of rated
data sets has been enabled by rating averages that allow user-centric, itemcentric
and top-k exploration. More speci cally, canned queries on user
demographics aggregate opinion for an item or a collection of items such as
18-29 year old males in CA rated the movie The Social Network at 8:2 on
average. Combining ratings, demographics, and item attributes is a powerful
exploration mechanism that allows operations such as comparing the
opinion of the same users for two items, comparing two groups of users on
their opinion for a given class of items, and nding a group whose rating
distribution is nearly unanimous for an item. To enable those operations,
it is necessary to (i) adopt the right measure to compare ratings, and to
(ii) develop e cient algorithms to nd relevant <user,item,rating> groups.
We argue that rating average is a weak measure for capturing such comparisons.
We propose contrasting and querying rating distributions, instead,
using the Earth Mover's Distance (EMD), a measure that intuitively re
ects
the amount of work needed to transform one distribution into another. We
show that the problem of nding groups matching given rating distributions
is NP-hard under di erent settings and develop appropriate heuristics.
Our experiments on real and synthetic datasets validate the utility of our
approach and demonstrate the scalability of our algorithms.
|