Flexible and efficient exploration of rated datasets

As users increasingly rely on collaborative rating sites to achieve mundane tasks such as purchasing a product or renting a movie, they are facing the data deluge of ratings and reviews. Traditionally, the exploration of rated data sets has been enabled by rating averages that allow user-centric,...

Full description

Bibliographic Details
Main Author: Kolloju, Naresh Kumar
Language:English
Published: University of British Columbia 2013
Online Access:http://hdl.handle.net/2429/44028
id ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.2429-44028
record_format oai_dc
spelling ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.2429-440282014-03-26T03:39:30Z Flexible and efficient exploration of rated datasets Kolloju, Naresh Kumar As users increasingly rely on collaborative rating sites to achieve mundane tasks such as purchasing a product or renting a movie, they are facing the data deluge of ratings and reviews. Traditionally, the exploration of rated data sets has been enabled by rating averages that allow user-centric, itemcentric and top-k exploration. More speci cally, canned queries on user demographics aggregate opinion for an item or a collection of items such as 18-29 year old males in CA rated the movie The Social Network at 8:2 on average. Combining ratings, demographics, and item attributes is a powerful exploration mechanism that allows operations such as comparing the opinion of the same users for two items, comparing two groups of users on their opinion for a given class of items, and nding a group whose rating distribution is nearly unanimous for an item. To enable those operations, it is necessary to (i) adopt the right measure to compare ratings, and to (ii) develop e cient algorithms to nd relevant <user,item,rating> groups. We argue that rating average is a weak measure for capturing such comparisons. We propose contrasting and querying rating distributions, instead, using the Earth Mover's Distance (EMD), a measure that intuitively re ects the amount of work needed to transform one distribution into another. We show that the problem of nding groups matching given rating distributions is NP-hard under di erent settings and develop appropriate heuristics. Our experiments on real and synthetic datasets validate the utility of our approach and demonstrate the scalability of our algorithms. 2013-03-13T18:44:23Z 2013-08-31T07:00:00Z 2013 2013-03-13 2013-05 Electronic Thesis or Dissertation http://hdl.handle.net/2429/44028 eng University of British Columbia
collection NDLTD
language English
sources NDLTD
description As users increasingly rely on collaborative rating sites to achieve mundane tasks such as purchasing a product or renting a movie, they are facing the data deluge of ratings and reviews. Traditionally, the exploration of rated data sets has been enabled by rating averages that allow user-centric, itemcentric and top-k exploration. More speci cally, canned queries on user demographics aggregate opinion for an item or a collection of items such as 18-29 year old males in CA rated the movie The Social Network at 8:2 on average. Combining ratings, demographics, and item attributes is a powerful exploration mechanism that allows operations such as comparing the opinion of the same users for two items, comparing two groups of users on their opinion for a given class of items, and nding a group whose rating distribution is nearly unanimous for an item. To enable those operations, it is necessary to (i) adopt the right measure to compare ratings, and to (ii) develop e cient algorithms to nd relevant <user,item,rating> groups. We argue that rating average is a weak measure for capturing such comparisons. We propose contrasting and querying rating distributions, instead, using the Earth Mover's Distance (EMD), a measure that intuitively re ects the amount of work needed to transform one distribution into another. We show that the problem of nding groups matching given rating distributions is NP-hard under di erent settings and develop appropriate heuristics. Our experiments on real and synthetic datasets validate the utility of our approach and demonstrate the scalability of our algorithms.
author Kolloju, Naresh Kumar
spellingShingle Kolloju, Naresh Kumar
Flexible and efficient exploration of rated datasets
author_facet Kolloju, Naresh Kumar
author_sort Kolloju, Naresh Kumar
title Flexible and efficient exploration of rated datasets
title_short Flexible and efficient exploration of rated datasets
title_full Flexible and efficient exploration of rated datasets
title_fullStr Flexible and efficient exploration of rated datasets
title_full_unstemmed Flexible and efficient exploration of rated datasets
title_sort flexible and efficient exploration of rated datasets
publisher University of British Columbia
publishDate 2013
url http://hdl.handle.net/2429/44028
work_keys_str_mv AT kollojunareshkumar flexibleandefficientexplorationofrateddatasets
_version_ 1716656623916679168