Learning Distributed Representations for Statistical Language Modelling and Collaborative Filtering

With the increasing availability of large datasets machine learning techniques are becoming an increasingly attractive alternative to expert-designed approaches to solving complex problems in domains where data is abundant. In this thesis we introduce several models for large sparse discrete dataset...

Full description

Bibliographic Details
Main Author:	Mnih, Andriy
Other Authors:	Hinton, Geoffrey
Language:	en_ca
Published:	2010
Subjects:	statistical language modelling collaborative filtering neural networks distributed representations machine learning 0800
Online Access:	http://hdl.handle.net/1807/24832

id	ndltd-LACETR-oai-collectionscanada.gc.ca-OTU.1807-24832
record_format	oai_dc
spelling	ndltd-LACETR-oai-collectionscanada.gc.ca-OTU.1807-248322013-04-17T04:18:17ZLearning Distributed Representations for Statistical Language Modelling and Collaborative FilteringMnih, Andriystatistical language modellingcollaborative filteringneural networksdistributed representationsmachine learning0800With the increasing availability of large datasets machine learning techniques are becoming an increasingly attractive alternative to expert-designed approaches to solving complex problems in domains where data is abundant. In this thesis we introduce several models for large sparse discrete datasets. Our approach, which is based on probabilistic models that use distributed representations to alleviate the effects of data sparsity, is applied to statistical language modelling and collaborative filtering. We introduce three probabilistic language models that represent words using learned real-valued vectors. Two of the models are based on the Restricted Boltzmann Machine (RBM) architecture while the third one is a simple deterministic model. We show that the deterministic model outperforms the widely used n-gram models and learns sensible word representations. To reduce the time complexity of training and making predictions with the deterministic model, we introduce a hierarchical version of the model, that can be exponentially faster. The speedup is achieved by structuring the vocabulary as a tree over words and taking advantage of this structure. We propose a simple feature-based algorithm for automatic construction of trees over words from data and show that the resulting models can outperform non-hierarchical neural models as well as the best n-gram models. We then turn our attention to collaborative filtering and show how RBM models can be used to model the distribution of sparse high-dimensional user rating vectors efficiently, presenting inference and learning algorithms that scale linearly in the number of observed ratings. We also introduce the Probabilistic Matrix Factorization model which is based on the probabilistic formulation of the low-rank matrix approximation problem for partially observed matrices. The two models are then extended to allow conditioning on the identities of the rated items whether or not the actual rating values are known. Our results on the Netflix Prize dataset show that both RBM and PMF models outperform online SVD models.Hinton, Geoffrey2010-062010-08-31T19:47:40ZNO_RESTRICTION2010-08-31T19:47:40Z2010-08-31T19:47:40ZThesishttp://hdl.handle.net/1807/24832en_ca
collection	NDLTD
language	en_ca
sources	NDLTD
topic	statistical language modelling collaborative filtering neural networks distributed representations machine learning 0800
spellingShingle	statistical language modelling collaborative filtering neural networks distributed representations machine learning 0800 Mnih, Andriy Learning Distributed Representations for Statistical Language Modelling and Collaborative Filtering
description	With the increasing availability of large datasets machine learning techniques are becoming an increasingly attractive alternative to expert-designed approaches to solving complex problems in domains where data is abundant. In this thesis we introduce several models for large sparse discrete datasets. Our approach, which is based on probabilistic models that use distributed representations to alleviate the effects of data sparsity, is applied to statistical language modelling and collaborative filtering. We introduce three probabilistic language models that represent words using learned real-valued vectors. Two of the models are based on the Restricted Boltzmann Machine (RBM) architecture while the third one is a simple deterministic model. We show that the deterministic model outperforms the widely used n-gram models and learns sensible word representations. To reduce the time complexity of training and making predictions with the deterministic model, we introduce a hierarchical version of the model, that can be exponentially faster. The speedup is achieved by structuring the vocabulary as a tree over words and taking advantage of this structure. We propose a simple feature-based algorithm for automatic construction of trees over words from data and show that the resulting models can outperform non-hierarchical neural models as well as the best n-gram models. We then turn our attention to collaborative filtering and show how RBM models can be used to model the distribution of sparse high-dimensional user rating vectors efficiently, presenting inference and learning algorithms that scale linearly in the number of observed ratings. We also introduce the Probabilistic Matrix Factorization model which is based on the probabilistic formulation of the low-rank matrix approximation problem for partially observed matrices. The two models are then extended to allow conditioning on the identities of the rated items whether or not the actual rating values are known. Our results on the Netflix Prize dataset show that both RBM and PMF models outperform online SVD models.
author2	Hinton, Geoffrey
author_facet	Hinton, Geoffrey Mnih, Andriy
author	Mnih, Andriy
author_sort	Mnih, Andriy
title	Learning Distributed Representations for Statistical Language Modelling and Collaborative Filtering
title_short	Learning Distributed Representations for Statistical Language Modelling and Collaborative Filtering
title_full	Learning Distributed Representations for Statistical Language Modelling and Collaborative Filtering
title_fullStr	Learning Distributed Representations for Statistical Language Modelling and Collaborative Filtering
title_full_unstemmed	Learning Distributed Representations for Statistical Language Modelling and Collaborative Filtering
title_sort	learning distributed representations for statistical language modelling and collaborative filtering
publishDate	2010
url	http://hdl.handle.net/1807/24832
work_keys_str_mv	AT mnihandriy learningdistributedrepresentationsforstatisticallanguagemodellingandcollaborativefiltering
_version_	1716580395041947648

Learning Distributed Representations for Statistical Language Modelling and Collaborative Filtering

Similar Items