Application of MapReduce to Ranking SVM for Large-Scale Datasets

碩士 === 國立中山大學 === 電機工程學系研究所 === 98 === Nowadays, search engines are more relying on machine learning techniques to construct a model, using past user queries and clicks as training data, for ranking web pages. There are several learning to rank methods for information retrieval, and among them ranki...

Full description

Bibliographic Details
Main Authors:	Su-Hsien Hu, 胡書嫻
Other Authors:	Shie-Jue Lee
Format:	Others
Language:	zh-TW
Published:	2010
Online Access:	http://ndltd.ncl.edu.tw/handle/06090824360552388888

id	ndltd-TW-098NSYS5442077
record_format	oai_dc
spelling	ndltd-TW-098NSYS54420772015-10-13T18:39:46Z http://ndltd.ncl.edu.tw/handle/06090824360552388888 Application of MapReduce to Ranking SVM for Large-Scale Datasets 應用雲端運算於排序支援向量機之研究 Su-Hsien Hu 胡書嫻碩士國立中山大學電機工程學系研究所 98 Nowadays, search engines are more relying on machine learning techniques to construct a model, using past user queries and clicks as training data, for ranking web pages. There are several learning to rank methods for information retrieval, and among them ranking support vector machine (SVM) attracts a lot of attention in the information retrieval community. One difficulty with Ranking SVM is that the computation cost is very high for constructing a ranking model due to the huge number of training data pairs when the size of training dataset is large. We adopt the MapReduce programming model to solve this difficulty. MapReduce is a distributed computing framework introduced by Google and is commonly adopted in cloud computing centers. It can deal easily with large-scale datasets using a large number of computers. Moreover, it hides the messy details of parallelization, fault-tolerance, data distribution, and load balancing from the programmer and allows him/her to focus on only the underlying problem to be solved. In this paper, we apply MapReduce to Ranking SVM for processing large-scale datasets. We specify the Map function to solve the dual sub problems involved in Ranking SVM and the Reduce function to aggregate all the outputs having the same intermediate key from Map functions of distributed machines. Experimental results show efficiency improvement on ranking SVM by our proposed approach. Shie-Jue Lee 李錫智 2010 學位論文 ; thesis 87 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立中山大學 === 電機工程學系研究所 === 98 === Nowadays, search engines are more relying on machine learning techniques to construct a model, using past user queries and clicks as training data, for ranking web pages. There are several learning to rank methods for information retrieval, and among them ranking support vector machine (SVM) attracts a lot of attention in the information retrieval community. One difficulty with Ranking SVM is that the computation cost is very high for constructing a ranking model due to the huge number of training data pairs when the size of training dataset is large. We adopt the MapReduce programming model to solve this difficulty. MapReduce is a distributed computing framework introduced by Google and is commonly adopted in cloud computing centers. It can deal easily with large-scale datasets using a large number of computers. Moreover, it hides the messy details of parallelization, fault-tolerance, data distribution, and load balancing from the programmer and allows him/her to focus on only the underlying problem to be solved. In this paper, we apply MapReduce to Ranking SVM for processing large-scale datasets. We specify the Map function to solve the dual sub problems involved in Ranking SVM and the Reduce function to aggregate all the outputs having the same intermediate key from Map functions of distributed machines. Experimental results show efficiency improvement on ranking SVM by our proposed approach.
author2	Shie-Jue Lee
author_facet	Shie-Jue Lee Su-Hsien Hu 胡書嫻
author	Su-Hsien Hu 胡書嫻
spellingShingle	Su-Hsien Hu 胡書嫻 Application of MapReduce to Ranking SVM for Large-Scale Datasets
author_sort	Su-Hsien Hu
title	Application of MapReduce to Ranking SVM for Large-Scale Datasets
title_short	Application of MapReduce to Ranking SVM for Large-Scale Datasets
title_full	Application of MapReduce to Ranking SVM for Large-Scale Datasets
title_fullStr	Application of MapReduce to Ranking SVM for Large-Scale Datasets
title_full_unstemmed	Application of MapReduce to Ranking SVM for Large-Scale Datasets
title_sort	application of mapreduce to ranking svm for large-scale datasets
publishDate	2010
url	http://ndltd.ncl.edu.tw/handle/06090824360552388888
work_keys_str_mv	AT suhsienhu applicationofmapreducetorankingsvmforlargescaledatasets AT húshūxián applicationofmapreducetorankingsvmforlargescaledatasets AT suhsienhu yīngyòngyúnduānyùnsuànyúpáixùzhīyuánxiàngliàngjīzhīyánjiū AT húshūxián yīngyòngyúnduānyùnsuànyúpáixùzhīyuánxiàngliàngjīzhīyánjiū
_version_	1718035896995938304

Application of MapReduce to Ranking SVM for Large-Scale Datasets

Similar Items