Visualizing and modeling partial incomplete ranking data
Analyzing ranking data is an essential component in a wide range of important applications including web-search and recommendation systems. Rankings are difficult to visualize or model due to the computational difficulties associated with the large number of items. On the other hand, partial or inco...
Main Author: | |
---|---|
Published: |
Georgia Institute of Technology
2013
|
Subjects: | |
Online Access: | http://hdl.handle.net/1853/45793 |
id |
ndltd-GATECH-oai-smartech.gatech.edu-1853-45793 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-GATECH-oai-smartech.gatech.edu-1853-457932013-05-30T03:05:55ZVisualizing and modeling partial incomplete ranking dataSun, MingxuanRecommender systemsWeighted hoeffding distanceKernel smoothingSearch algorithm dissimilarityPartial incomplete rankingAlgorithmsRanking and selection (Statistics)Analyzing ranking data is an essential component in a wide range of important applications including web-search and recommendation systems. Rankings are difficult to visualize or model due to the computational difficulties associated with the large number of items. On the other hand, partial or incomplete rankings induce more difficulties since approaches that adapt well to typical types of rankings cannot apply generally to all types. While analyzing ranking data has a long history in statistics, construction of an efficient framework to analyze incomplete ranking data (with or without ties) is currently an open problem. This thesis addresses the problem of scalability for visualizing and modeling partial incomplete rankings. In particular, we propose a distance measure for top-k rankings with the following three properties: (1) metric, (2) emphasis on top ranks, and (3) computational efficiency. Given the distance measure, the data can be projected into a low dimensional continuous vector space via multi-dimensional scaling (MDS) for easy visualization. We further propose a non-parametric model for estimating distributions of partial incomplete rankings. For the non-parametric estimator, we use a triangular kernel that is a direct analogue of the Euclidean triangular kernel. The computational difficulties for large n are simplified using combinatorial properties and generating functions associated with symmetric groups. We show that our estimator is computational efficient for rankings of arbitrary incompleteness and tie structure. Moreover, we propose an efficient learning algorithm to construct a preference elicitation system from partial incomplete rankings, which can be used to solve the cold-start problems in ranking recommendations. The proposed approaches are examined in experiments with real search engine and movie recommendation data.Georgia Institute of Technology2013-01-17T21:10:22Z2013-01-17T21:10:22Z2012-08-23Dissertationhttp://hdl.handle.net/1853/45793 |
collection |
NDLTD |
sources |
NDLTD |
topic |
Recommender systems Weighted hoeffding distance Kernel smoothing Search algorithm dissimilarity Partial incomplete ranking Algorithms Ranking and selection (Statistics) |
spellingShingle |
Recommender systems Weighted hoeffding distance Kernel smoothing Search algorithm dissimilarity Partial incomplete ranking Algorithms Ranking and selection (Statistics) Sun, Mingxuan Visualizing and modeling partial incomplete ranking data |
description |
Analyzing ranking data is an essential component in a wide range of important applications including web-search and recommendation systems. Rankings are difficult to visualize or model due to the computational difficulties associated with the large number of items. On the other hand, partial or incomplete rankings induce more difficulties since approaches that adapt well to typical types of rankings cannot apply generally to all types. While analyzing ranking data has a long history in statistics, construction of an efficient framework to analyze incomplete ranking data (with or without ties) is currently an open problem. This thesis addresses the problem of scalability for visualizing and modeling partial incomplete rankings. In particular, we propose a distance measure for top-k rankings with the following three properties: (1) metric, (2) emphasis on top ranks, and (3) computational efficiency. Given the distance measure, the data can be projected into a low dimensional continuous vector space via multi-dimensional scaling (MDS) for easy visualization. We further propose a non-parametric model for estimating distributions of partial incomplete rankings. For the non-parametric estimator, we use a triangular kernel that is a direct analogue of the Euclidean triangular kernel. The computational difficulties for large n are simplified using combinatorial properties and generating functions associated with symmetric groups. We show that our estimator is computational efficient for rankings of arbitrary incompleteness and tie structure. Moreover, we propose an efficient learning algorithm to construct a preference elicitation system from partial incomplete rankings, which can be used to solve the cold-start problems in ranking recommendations. The proposed approaches are examined in experiments with real search engine and movie recommendation data. |
author |
Sun, Mingxuan |
author_facet |
Sun, Mingxuan |
author_sort |
Sun, Mingxuan |
title |
Visualizing and modeling partial incomplete ranking data |
title_short |
Visualizing and modeling partial incomplete ranking data |
title_full |
Visualizing and modeling partial incomplete ranking data |
title_fullStr |
Visualizing and modeling partial incomplete ranking data |
title_full_unstemmed |
Visualizing and modeling partial incomplete ranking data |
title_sort |
visualizing and modeling partial incomplete ranking data |
publisher |
Georgia Institute of Technology |
publishDate |
2013 |
url |
http://hdl.handle.net/1853/45793 |
work_keys_str_mv |
AT sunmingxuan visualizingandmodelingpartialincompleterankingdata |
_version_ |
1716585981778329600 |