Dynamic De-duplication Decision in a Hadoop Distributed File System

碩士 === 國立東華大學 === 資訊工程學系 === 101 === Nowadays, data is generated and updated per second and this makes coping with those tremendously fast and multiform amounts of data a heavy challenge. The Hadoop Distributed File System (HDFS) is the first choice solution for most people. However, data is usually...

Full description

Bibliographic Details
Main Authors: Kuo-Zheng Fan, 范國拯
Other Authors: Ruay-Shiung Chang
Format: Others
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/12180320597103126420
id ndltd-TW-101NDHU5392060
record_format oai_dc
spelling ndltd-TW-101NDHU53920602016-02-21T04:20:16Z http://ndltd.ncl.edu.tw/handle/12180320597103126420 Dynamic De-duplication Decision in a Hadoop Distributed File System 動態重複資料刪除在Hadoop分散式檔案系統上 Kuo-Zheng Fan 范國拯 碩士 國立東華大學 資訊工程學系 101 Nowadays, data is generated and updated per second and this makes coping with those tremendously fast and multiform amounts of data a heavy challenge. The Hadoop Distributed File System (HDFS) is the first choice solution for most people. However, data is usually prevented from being lost with many backups, and HDFS also does this. Obviously, these duplicates occupy a lot of storage space, and this also means that we need to invest sufficient funding in infrastructure. However, this is not a good method for everybody, since it may be unaffordable. Therefore, using De-duplication technology can improve the memory space effectively, which has been gaining increasing attention in many researches, products, and which has also been applied in our implementation. In this paper, we proposed a dynamic De-duplication decision to improve the memory space which runs on HDFS. Under the memory space limitation, the system according to the ability of clusters and the utility of storage space can formulate a proper De-duplication strategy. By doing so, the usage of storage systems can be improved. Ruay-Shiung Chang 張瑞雄 2013 學位論文 ; thesis 58
collection NDLTD
format Others
sources NDLTD
description 碩士 === 國立東華大學 === 資訊工程學系 === 101 === Nowadays, data is generated and updated per second and this makes coping with those tremendously fast and multiform amounts of data a heavy challenge. The Hadoop Distributed File System (HDFS) is the first choice solution for most people. However, data is usually prevented from being lost with many backups, and HDFS also does this. Obviously, these duplicates occupy a lot of storage space, and this also means that we need to invest sufficient funding in infrastructure. However, this is not a good method for everybody, since it may be unaffordable. Therefore, using De-duplication technology can improve the memory space effectively, which has been gaining increasing attention in many researches, products, and which has also been applied in our implementation. In this paper, we proposed a dynamic De-duplication decision to improve the memory space which runs on HDFS. Under the memory space limitation, the system according to the ability of clusters and the utility of storage space can formulate a proper De-duplication strategy. By doing so, the usage of storage systems can be improved.
author2 Ruay-Shiung Chang
author_facet Ruay-Shiung Chang
Kuo-Zheng Fan
范國拯
author Kuo-Zheng Fan
范國拯
spellingShingle Kuo-Zheng Fan
范國拯
Dynamic De-duplication Decision in a Hadoop Distributed File System
author_sort Kuo-Zheng Fan
title Dynamic De-duplication Decision in a Hadoop Distributed File System
title_short Dynamic De-duplication Decision in a Hadoop Distributed File System
title_full Dynamic De-duplication Decision in a Hadoop Distributed File System
title_fullStr Dynamic De-duplication Decision in a Hadoop Distributed File System
title_full_unstemmed Dynamic De-duplication Decision in a Hadoop Distributed File System
title_sort dynamic de-duplication decision in a hadoop distributed file system
publishDate 2013
url http://ndltd.ncl.edu.tw/handle/12180320597103126420
work_keys_str_mv AT kuozhengfan dynamicdeduplicationdecisioninahadoopdistributedfilesystem
AT fànguózhěng dynamicdeduplicationdecisioninahadoopdistributedfilesystem
AT kuozhengfan dòngtàizhòngfùzīliàoshānchúzàihadoopfēnsànshìdàngànxìtǒngshàng
AT fànguózhěng dòngtàizhòngfùzīliàoshānchúzàihadoopfēnsànshìdàngànxìtǒngshàng
_version_ 1718192595891388416