Application of MapReduce to Ranking SVM for Large-Scale Datasets

碩士 === 國立中山大學 === 電機工程學系研究所 === 98 === Nowadays, search engines are more relying on machine learning techniques to construct a model, using past user queries and clicks as training data, for ranking web pages. There are several learning to rank methods for information retrieval, and among them ranki...

Full description

Bibliographic Details
Main Authors: Su-Hsien Hu, 胡書嫻
Other Authors: Shie-Jue Lee
Format: Others
Language:zh-TW
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/06090824360552388888
id ndltd-TW-098NSYS5442077
record_format oai_dc
spelling ndltd-TW-098NSYS54420772015-10-13T18:39:46Z http://ndltd.ncl.edu.tw/handle/06090824360552388888 Application of MapReduce to Ranking SVM for Large-Scale Datasets 應用雲端運算於排序支援向量機之研究 Su-Hsien Hu 胡書嫻 碩士 國立中山大學 電機工程學系研究所 98 Nowadays, search engines are more relying on machine learning techniques to construct a model, using past user queries and clicks as training data, for ranking web pages. There are several learning to rank methods for information retrieval, and among them ranking support vector machine (SVM) attracts a lot of attention in the information retrieval community. One difficulty with Ranking SVM is that the computation cost is very high for constructing a ranking model due to the huge number of training data pairs when the size of training dataset is large. We adopt the MapReduce programming model to solve this difficulty. MapReduce is a distributed computing framework introduced by Google and is commonly adopted in cloud computing centers. It can deal easily with large-scale datasets using a large number of computers. Moreover, it hides the messy details of parallelization, fault-tolerance, data distribution, and load balancing from the programmer and allows him/her to focus on only the underlying problem to be solved. In this paper, we apply MapReduce to Ranking SVM for processing large-scale datasets. We specify the Map function to solve the dual sub problems involved in Ranking SVM and the Reduce function to aggregate all the outputs having the same intermediate key from Map functions of distributed machines. Experimental results show efficiency improvement on ranking SVM by our proposed approach. Shie-Jue Lee 李錫智 2010 學位論文 ; thesis 87 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立中山大學 === 電機工程學系研究所 === 98 === Nowadays, search engines are more relying on machine learning techniques to construct a model, using past user queries and clicks as training data, for ranking web pages. There are several learning to rank methods for information retrieval, and among them ranking support vector machine (SVM) attracts a lot of attention in the information retrieval community. One difficulty with Ranking SVM is that the computation cost is very high for constructing a ranking model due to the huge number of training data pairs when the size of training dataset is large. We adopt the MapReduce programming model to solve this difficulty. MapReduce is a distributed computing framework introduced by Google and is commonly adopted in cloud computing centers. It can deal easily with large-scale datasets using a large number of computers. Moreover, it hides the messy details of parallelization, fault-tolerance, data distribution, and load balancing from the programmer and allows him/her to focus on only the underlying problem to be solved. In this paper, we apply MapReduce to Ranking SVM for processing large-scale datasets. We specify the Map function to solve the dual sub problems involved in Ranking SVM and the Reduce function to aggregate all the outputs having the same intermediate key from Map functions of distributed machines. Experimental results show efficiency improvement on ranking SVM by our proposed approach.
author2 Shie-Jue Lee
author_facet Shie-Jue Lee
Su-Hsien Hu
胡書嫻
author Su-Hsien Hu
胡書嫻
spellingShingle Su-Hsien Hu
胡書嫻
Application of MapReduce to Ranking SVM for Large-Scale Datasets
author_sort Su-Hsien Hu
title Application of MapReduce to Ranking SVM for Large-Scale Datasets
title_short Application of MapReduce to Ranking SVM for Large-Scale Datasets
title_full Application of MapReduce to Ranking SVM for Large-Scale Datasets
title_fullStr Application of MapReduce to Ranking SVM for Large-Scale Datasets
title_full_unstemmed Application of MapReduce to Ranking SVM for Large-Scale Datasets
title_sort application of mapreduce to ranking svm for large-scale datasets
publishDate 2010
url http://ndltd.ncl.edu.tw/handle/06090824360552388888
work_keys_str_mv AT suhsienhu applicationofmapreducetorankingsvmforlargescaledatasets
AT húshūxián applicationofmapreducetorankingsvmforlargescaledatasets
AT suhsienhu yīngyòngyúnduānyùnsuànyúpáixùzhīyuánxiàngliàngjīzhīyánjiū
AT húshūxián yīngyòngyúnduānyùnsuànyúpáixùzhīyuánxiàngliàngjīzhīyánjiū
_version_ 1718035896995938304