Application of MapReduce to Ranking SVM for Large-Scale Datasets
碩士 === 國立中山大學 === 電機工程學系研究所 === 98 === Nowadays, search engines are more relying on machine learning techniques to construct a model, using past user queries and clicks as training data, for ranking web pages. There are several learning to rank methods for information retrieval, and among them ranki...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2010
|
Online Access: | http://ndltd.ncl.edu.tw/handle/06090824360552388888 |
id |
ndltd-TW-098NSYS5442077 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-098NSYS54420772015-10-13T18:39:46Z http://ndltd.ncl.edu.tw/handle/06090824360552388888 Application of MapReduce to Ranking SVM for Large-Scale Datasets 應用雲端運算於排序支援向量機之研究 Su-Hsien Hu 胡書嫻 碩士 國立中山大學 電機工程學系研究所 98 Nowadays, search engines are more relying on machine learning techniques to construct a model, using past user queries and clicks as training data, for ranking web pages. There are several learning to rank methods for information retrieval, and among them ranking support vector machine (SVM) attracts a lot of attention in the information retrieval community. One difficulty with Ranking SVM is that the computation cost is very high for constructing a ranking model due to the huge number of training data pairs when the size of training dataset is large. We adopt the MapReduce programming model to solve this difficulty. MapReduce is a distributed computing framework introduced by Google and is commonly adopted in cloud computing centers. It can deal easily with large-scale datasets using a large number of computers. Moreover, it hides the messy details of parallelization, fault-tolerance, data distribution, and load balancing from the programmer and allows him/her to focus on only the underlying problem to be solved. In this paper, we apply MapReduce to Ranking SVM for processing large-scale datasets. We specify the Map function to solve the dual sub problems involved in Ranking SVM and the Reduce function to aggregate all the outputs having the same intermediate key from Map functions of distributed machines. Experimental results show efficiency improvement on ranking SVM by our proposed approach. Shie-Jue Lee 李錫智 2010 學位論文 ; thesis 87 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中山大學 === 電機工程學系研究所 === 98 === Nowadays, search engines are more relying on machine learning techniques to construct a model, using past user queries and clicks as training data, for ranking web pages. There are several learning to rank methods for information retrieval, and among them ranking support vector machine (SVM) attracts a lot of attention in the information retrieval community. One difficulty with Ranking SVM is that the computation cost is very high for constructing a ranking model due to the huge number of training data pairs when the size of training dataset is large. We adopt the MapReduce programming model to solve this difficulty. MapReduce is a distributed computing framework introduced by Google and is commonly adopted in cloud computing centers. It can deal easily with large-scale datasets using a large number of computers. Moreover, it hides the messy details of parallelization, fault-tolerance, data distribution, and load balancing from the programmer and allows him/her to focus on only the underlying problem to be solved. In this paper, we apply MapReduce to Ranking SVM for processing large-scale datasets. We specify the Map function to solve the dual sub problems involved in Ranking SVM and the Reduce function to aggregate all the outputs having the same intermediate key from Map functions of distributed machines. Experimental results show efficiency improvement on ranking SVM by our proposed approach.
|
author2 |
Shie-Jue Lee |
author_facet |
Shie-Jue Lee Su-Hsien Hu 胡書嫻 |
author |
Su-Hsien Hu 胡書嫻 |
spellingShingle |
Su-Hsien Hu 胡書嫻 Application of MapReduce to Ranking SVM for Large-Scale Datasets |
author_sort |
Su-Hsien Hu |
title |
Application of MapReduce to Ranking SVM for Large-Scale Datasets |
title_short |
Application of MapReduce to Ranking SVM for Large-Scale Datasets |
title_full |
Application of MapReduce to Ranking SVM for Large-Scale Datasets |
title_fullStr |
Application of MapReduce to Ranking SVM for Large-Scale Datasets |
title_full_unstemmed |
Application of MapReduce to Ranking SVM for Large-Scale Datasets |
title_sort |
application of mapreduce to ranking svm for large-scale datasets |
publishDate |
2010 |
url |
http://ndltd.ncl.edu.tw/handle/06090824360552388888 |
work_keys_str_mv |
AT suhsienhu applicationofmapreducetorankingsvmforlargescaledatasets AT húshūxián applicationofmapreducetorankingsvmforlargescaledatasets AT suhsienhu yīngyòngyúnduānyùnsuànyúpáixùzhīyuánxiàngliàngjīzhīyánjiū AT húshūxián yīngyòngyúnduānyùnsuànyúpáixùzhīyuánxiàngliàngjīzhīyánjiū |
_version_ |
1718035896995938304 |