Implementations of TSQR for Cloud Platforms and Its Applications of SSVD and Collaborative Filtering
碩士 === 國立清華大學 === 資訊工程學系 === 103 === Scalability of algorithms and implementations that ensures the computational efficiency can sustain with more machines is one of the most crucial performance factor in big data processing. Nowadays, the scale of machines and storages can be extended to match...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2014
|
Online Access: | http://ndltd.ncl.edu.tw/handle/01260555691292068543 |
id |
ndltd-TW-103NTHU5392002 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-103NTHU53920022016-12-19T04:14:35Z http://ndltd.ncl.edu.tw/handle/01260555691292068543 Implementations of TSQR for Cloud Platforms and Its Applications of SSVD and Collaborative Filtering 以雲端平台實作瘦長QR分解及其應用 Yu, Hsiu-Cheng 余修丞 碩士 國立清華大學 資訊工程學系 103 Scalability of algorithms and implementations that ensures the computational efficiency can sustain with more machines is one of the most crucial performance factor in big data processing. Nowadays, the scale of machines and storages can be extended to match the growth of data size easily. However, without scalable algorithms, more machines can even slow down the data processing. In this thesis, we investigated and improved the scalability of the algorithms and implementations of the QR decomposition for tall-and-skinny matrices on cloud platforms. Our algorithm is based on the TSQR (Tall-and-Skinny QR) al-gorithm, proposed by Demmel et al., which has been shown optimal in communi-cation cost for QR decomposing tall-and-skinny matrices. However, our analysis shows that the disk IO dominates the entire performance of MapReduce implemen-tation. Therefore, we implemented it using Apache Spark, an in-memory pro-cessing programming model for distributed computing environment. We applied our TSQR implementation to the SSVD-based Collaborative Fil-tering (CF). CF is a computational kernel commonly used in e-commerce, such as Amazon recommendation, Goggle Ads, Facebook friend suggestion, etc. The SSVD-based CF has superior performance and accuracy comparing to existing methods. However, it has a performance bottleneck of QR decomposition step in the SSVD (Stochastic SVD) step. Experiments show that our implementation of TSQR in Spark is more efficient than that of in Hadoop MapReduce, and the over-all performance of TSQR can be improved by upto 400% for several benchmarks. Lee, Che-Rung 李哲榮 2014 學位論文 ; thesis 70 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立清華大學 === 資訊工程學系 === 103 === Scalability of algorithms and implementations that ensures the computational efficiency can sustain with more machines is one of the most crucial performance factor in big data processing. Nowadays, the scale of machines and storages can be extended to match the growth of data size easily. However, without scalable algorithms, more machines can even slow down the data processing.
In this thesis, we investigated and improved the scalability of the algorithms and implementations of the QR decomposition for tall-and-skinny matrices on cloud platforms. Our algorithm is based on the TSQR (Tall-and-Skinny QR) al-gorithm, proposed by Demmel et al., which has been shown optimal in communi-cation cost for QR decomposing tall-and-skinny matrices. However, our analysis shows that the disk IO dominates the entire performance of MapReduce implemen-tation. Therefore, we implemented it using Apache Spark, an in-memory pro-cessing programming model for distributed computing environment.
We applied our TSQR implementation to the SSVD-based Collaborative Fil-tering (CF). CF is a computational kernel commonly used in e-commerce, such as Amazon recommendation, Goggle Ads, Facebook friend suggestion, etc. The SSVD-based CF has superior performance and accuracy comparing to existing methods. However, it has a performance bottleneck of QR decomposition step in the SSVD (Stochastic SVD) step. Experiments show that our implementation of TSQR in Spark is more efficient than that of in Hadoop MapReduce, and the over-all performance of TSQR can be improved by upto 400% for several benchmarks.
|
author2 |
Lee, Che-Rung |
author_facet |
Lee, Che-Rung Yu, Hsiu-Cheng 余修丞 |
author |
Yu, Hsiu-Cheng 余修丞 |
spellingShingle |
Yu, Hsiu-Cheng 余修丞 Implementations of TSQR for Cloud Platforms and Its Applications of SSVD and Collaborative Filtering |
author_sort |
Yu, Hsiu-Cheng |
title |
Implementations of TSQR for Cloud Platforms and Its Applications of SSVD and Collaborative Filtering |
title_short |
Implementations of TSQR for Cloud Platforms and Its Applications of SSVD and Collaborative Filtering |
title_full |
Implementations of TSQR for Cloud Platforms and Its Applications of SSVD and Collaborative Filtering |
title_fullStr |
Implementations of TSQR for Cloud Platforms and Its Applications of SSVD and Collaborative Filtering |
title_full_unstemmed |
Implementations of TSQR for Cloud Platforms and Its Applications of SSVD and Collaborative Filtering |
title_sort |
implementations of tsqr for cloud platforms and its applications of ssvd and collaborative filtering |
publishDate |
2014 |
url |
http://ndltd.ncl.edu.tw/handle/01260555691292068543 |
work_keys_str_mv |
AT yuhsiucheng implementationsoftsqrforcloudplatformsanditsapplicationsofssvdandcollaborativefiltering AT yúxiūchéng implementationsoftsqrforcloudplatformsanditsapplicationsofssvdandcollaborativefiltering AT yuhsiucheng yǐyúnduānpíngtáishízuòshòuzhǎngqrfēnjiějíqíyīngyòng AT yúxiūchéng yǐyúnduānpíngtáishízuòshòuzhǎngqrfēnjiějíqíyīngyòng |
_version_ |
1718401245738172416 |