Identifying the Data Discrepancy Existing in Hadoop Clusters

碩士 === 輔仁大學 === 資訊工程學系碩士班 === 104 === In recent years, cloud computing is developing rapidly in the real of Internet.Among many cloud computing platforms, Hadoop is widely used because of it's stability and performance. It can easiliy handle a large number of files in a very efficient way. Had...

Full description

Bibliographic Details
Main Authors:	YU TZU-TING, 游資婷
Other Authors:	葉佐任
Format:	Others
Language:	zh-TW
Published:	2016
Online Access:	http://ndltd.ncl.edu.tw/handle/21290145999940618636

id	ndltd-TW-104FJU00396006
record_format	oai_dc
spelling	ndltd-TW-104FJU003960062017-04-29T04:31:40Z http://ndltd.ncl.edu.tw/handle/21290145999940618636 Identifying the Data Discrepancy Existing in Hadoop Clusters 實現雲端運算Hadoop叢集儲存資料之差異分析 YU TZU-TING 游資婷碩士輔仁大學資訊工程學系碩士班 104 In recent years, cloud computing is developing rapidly in the real of Internet.Among many cloud computing platforms, Hadoop is widely used because of it's stability and performance. It can easiliy handle a large number of files in a very efficient way. Hadoop is a distributed system, Hadoop Distributed File System(HDFS) is the default file system used in Hadoop platform. HDFS consists of a NameNode and multiple DataNodes. NameNode records the file metadata, including file location, file owner, and other related information. DataNodes are the actual places storing all the files. Each file is depleted on several DataNodes in general. However, file contents can still not be retrieved of the NameNode is lost, or all DataNodes storing those files are destroyed at the same file. To fix this problem, we can backup important files on multiple Hadoop cluster. Nevertheless errors could occur during the process of file duplication. We design and implement a scheme to identify the discrepancy between Hadoop cluster so user can fixed dismatch between files duplicated on different Hadoop Clusters. 葉佐任 2016 學位論文 ; thesis 46 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 輔仁大學 === 資訊工程學系碩士班 === 104 === In recent years, cloud computing is developing rapidly in the real of Internet.Among many cloud computing platforms, Hadoop is widely used because of it's stability and performance. It can easiliy handle a large number of files in a very efficient way. Hadoop is a distributed system, Hadoop Distributed File System(HDFS) is the default file system used in Hadoop platform. HDFS consists of a NameNode and multiple DataNodes. NameNode records the file metadata, including file location, file owner, and other related information. DataNodes are the actual places storing all the files. Each file is depleted on several DataNodes in general. However, file contents can still not be retrieved of the NameNode is lost, or all DataNodes storing those files are destroyed at the same file. To fix this problem, we can backup important files on multiple Hadoop cluster. Nevertheless errors could occur during the process of file duplication. We design and implement a scheme to identify the discrepancy between Hadoop cluster so user can fixed dismatch between files duplicated on different Hadoop Clusters.
author2	葉佐任
author_facet	葉佐任 YU TZU-TING 游資婷
author	YU TZU-TING 游資婷
spellingShingle	YU TZU-TING 游資婷 Identifying the Data Discrepancy Existing in Hadoop Clusters
author_sort	YU TZU-TING
title	Identifying the Data Discrepancy Existing in Hadoop Clusters
title_short	Identifying the Data Discrepancy Existing in Hadoop Clusters
title_full	Identifying the Data Discrepancy Existing in Hadoop Clusters
title_fullStr	Identifying the Data Discrepancy Existing in Hadoop Clusters
title_full_unstemmed	Identifying the Data Discrepancy Existing in Hadoop Clusters
title_sort	identifying the data discrepancy existing in hadoop clusters
publishDate	2016
url	http://ndltd.ncl.edu.tw/handle/21290145999940618636
work_keys_str_mv	AT yutzuting identifyingthedatadiscrepancyexistinginhadoopclusters AT yóuzītíng identifyingthedatadiscrepancyexistinginhadoopclusters AT yutzuting shíxiànyúnduānyùnsuànhadoopcóngjíchǔcúnzīliàozhīchàyìfēnxī AT yóuzītíng shíxiànyúnduānyùnsuànhadoopcóngjíchǔcúnzīliàozhīchàyìfēnxī
_version_	1718445389693059072

Identifying the Data Discrepancy Existing in Hadoop Clusters

Similar Items