Implementations of TSQR for Cloud Platforms and Its Applications of SSVD and Collaborative Filtering

碩士 === 國立清華大學 === 資訊工程學系 === 103 === Scalability of algorithms and implementations that ensures the computational efficiency can sustain with more machines is one of the most crucial performance factor in big data processing. Nowadays, the scale of machines and storages can be extended to match...

Full description

Bibliographic Details
Main Authors:	Yu, Hsiu-Cheng, 余修丞
Other Authors:	Lee, Che-Rung
Format:	Others
Language:	en_US
Published:	2014
Online Access:	http://ndltd.ncl.edu.tw/handle/01260555691292068543

id	ndltd-TW-103NTHU5392002
record_format	oai_dc
spelling	ndltd-TW-103NTHU53920022016-12-19T04:14:35Z http://ndltd.ncl.edu.tw/handle/01260555691292068543 Implementations of TSQR for Cloud Platforms and Its Applications of SSVD and Collaborative Filtering 以雲端平台實作瘦長QR分解及其應用 Yu, Hsiu-Cheng 余修丞碩士國立清華大學資訊工程學系 103 Scalability of algorithms and implementations that ensures the computational efficiency can sustain with more machines is one of the most crucial performance factor in big data processing. Nowadays, the scale of machines and storages can be extended to match the growth of data size easily. However, without scalable algorithms, more machines can even slow down the data processing. In this thesis, we investigated and improved the scalability of the algorithms and implementations of the QR decomposition for tall-and-skinny matrices on cloud platforms. Our algorithm is based on the TSQR (Tall-and-Skinny QR) al-gorithm, proposed by Demmel et al., which has been shown optimal in communi-cation cost for QR decomposing tall-and-skinny matrices. However, our analysis shows that the disk IO dominates the entire performance of MapReduce implemen-tation. Therefore, we implemented it using Apache Spark, an in-memory pro-cessing programming model for distributed computing environment. We applied our TSQR implementation to the SSVD-based Collaborative Fil-tering (CF). CF is a computational kernel commonly used in e-commerce, such as Amazon recommendation, Goggle Ads, Facebook friend suggestion, etc. The SSVD-based CF has superior performance and accuracy comparing to existing methods. However, it has a performance bottleneck of QR decomposition step in the SSVD (Stochastic SVD) step. Experiments show that our implementation of TSQR in Spark is more efficient than that of in Hadoop MapReduce, and the over-all performance of TSQR can be improved by upto 400% for several benchmarks. Lee, Che-Rung 李哲榮 2014 學位論文 ; thesis 70 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立清華大學 === 資訊工程學系 === 103 === Scalability of algorithms and implementations that ensures the computational efficiency can sustain with more machines is one of the most crucial performance factor in big data processing. Nowadays, the scale of machines and storages can be extended to match the growth of data size easily. However, without scalable algorithms, more machines can even slow down the data processing. In this thesis, we investigated and improved the scalability of the algorithms and implementations of the QR decomposition for tall-and-skinny matrices on cloud platforms. Our algorithm is based on the TSQR (Tall-and-Skinny QR) al-gorithm, proposed by Demmel et al., which has been shown optimal in communi-cation cost for QR decomposing tall-and-skinny matrices. However, our analysis shows that the disk IO dominates the entire performance of MapReduce implemen-tation. Therefore, we implemented it using Apache Spark, an in-memory pro-cessing programming model for distributed computing environment. We applied our TSQR implementation to the SSVD-based Collaborative Fil-tering (CF). CF is a computational kernel commonly used in e-commerce, such as Amazon recommendation, Goggle Ads, Facebook friend suggestion, etc. The SSVD-based CF has superior performance and accuracy comparing to existing methods. However, it has a performance bottleneck of QR decomposition step in the SSVD (Stochastic SVD) step. Experiments show that our implementation of TSQR in Spark is more efficient than that of in Hadoop MapReduce, and the over-all performance of TSQR can be improved by upto 400% for several benchmarks.
author2	Lee, Che-Rung
author_facet	Lee, Che-Rung Yu, Hsiu-Cheng 余修丞
author	Yu, Hsiu-Cheng 余修丞
spellingShingle	Yu, Hsiu-Cheng 余修丞 Implementations of TSQR for Cloud Platforms and Its Applications of SSVD and Collaborative Filtering
author_sort	Yu, Hsiu-Cheng
title	Implementations of TSQR for Cloud Platforms and Its Applications of SSVD and Collaborative Filtering
title_short	Implementations of TSQR for Cloud Platforms and Its Applications of SSVD and Collaborative Filtering
title_full	Implementations of TSQR for Cloud Platforms and Its Applications of SSVD and Collaborative Filtering
title_fullStr	Implementations of TSQR for Cloud Platforms and Its Applications of SSVD and Collaborative Filtering
title_full_unstemmed	Implementations of TSQR for Cloud Platforms and Its Applications of SSVD and Collaborative Filtering
title_sort	implementations of tsqr for cloud platforms and its applications of ssvd and collaborative filtering
publishDate	2014
url	http://ndltd.ncl.edu.tw/handle/01260555691292068543
work_keys_str_mv	AT yuhsiucheng implementationsoftsqrforcloudplatformsanditsapplicationsofssvdandcollaborativefiltering AT yúxiūchéng implementationsoftsqrforcloudplatformsanditsapplicationsofssvdandcollaborativefiltering AT yuhsiucheng yǐyúnduānpíngtáishízuòshòuzhǎngqrfēnjiějíqíyīngyòng AT yúxiūchéng yǐyúnduānpíngtáishízuòshòuzhǎngqrfēnjiějíqíyīngyòng
_version_	1718401245738172416

Implementations of TSQR for Cloud Platforms and Its Applications of SSVD and Collaborative Filtering

Similar Items