Dynamic De-duplication Decision in a Hadoop Distributed File System
碩士 === 國立東華大學 === 資訊工程學系 === 101 === Nowadays, data is generated and updated per second and this makes coping with those tremendously fast and multiform amounts of data a heavy challenge. The Hadoop Distributed File System (HDFS) is the first choice solution for most people. However, data is usually...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Published: |
2013
|
Online Access: | http://ndltd.ncl.edu.tw/handle/12180320597103126420 |
id |
ndltd-TW-101NDHU5392060 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-101NDHU53920602016-02-21T04:20:16Z http://ndltd.ncl.edu.tw/handle/12180320597103126420 Dynamic De-duplication Decision in a Hadoop Distributed File System 動態重複資料刪除在Hadoop分散式檔案系統上 Kuo-Zheng Fan 范國拯 碩士 國立東華大學 資訊工程學系 101 Nowadays, data is generated and updated per second and this makes coping with those tremendously fast and multiform amounts of data a heavy challenge. The Hadoop Distributed File System (HDFS) is the first choice solution for most people. However, data is usually prevented from being lost with many backups, and HDFS also does this. Obviously, these duplicates occupy a lot of storage space, and this also means that we need to invest sufficient funding in infrastructure. However, this is not a good method for everybody, since it may be unaffordable. Therefore, using De-duplication technology can improve the memory space effectively, which has been gaining increasing attention in many researches, products, and which has also been applied in our implementation. In this paper, we proposed a dynamic De-duplication decision to improve the memory space which runs on HDFS. Under the memory space limitation, the system according to the ability of clusters and the utility of storage space can formulate a proper De-duplication strategy. By doing so, the usage of storage systems can be improved. Ruay-Shiung Chang 張瑞雄 2013 學位論文 ; thesis 58 |
collection |
NDLTD |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立東華大學 === 資訊工程學系 === 101 === Nowadays, data is generated and updated per second and this makes coping with those tremendously fast and multiform amounts of data a heavy challenge. The Hadoop Distributed File System (HDFS) is the first choice solution for most people. However, data is usually prevented from being lost with many backups, and HDFS also does this. Obviously, these duplicates occupy a lot of storage space, and this also means that we need to invest sufficient funding in infrastructure. However, this is not a good method for everybody, since it may be unaffordable.
Therefore, using De-duplication technology can improve the memory space effectively, which has been gaining increasing attention in many researches, products, and which has also been applied in our implementation. In this paper, we proposed a dynamic De-duplication decision to improve the memory space which runs on HDFS. Under the memory space limitation, the system according to the ability of clusters and the utility of storage space can formulate a proper De-duplication strategy. By doing so, the usage of storage systems can be improved.
|
author2 |
Ruay-Shiung Chang |
author_facet |
Ruay-Shiung Chang Kuo-Zheng Fan 范國拯 |
author |
Kuo-Zheng Fan 范國拯 |
spellingShingle |
Kuo-Zheng Fan 范國拯 Dynamic De-duplication Decision in a Hadoop Distributed File System |
author_sort |
Kuo-Zheng Fan |
title |
Dynamic De-duplication Decision in a Hadoop Distributed File System |
title_short |
Dynamic De-duplication Decision in a Hadoop Distributed File System |
title_full |
Dynamic De-duplication Decision in a Hadoop Distributed File System |
title_fullStr |
Dynamic De-duplication Decision in a Hadoop Distributed File System |
title_full_unstemmed |
Dynamic De-duplication Decision in a Hadoop Distributed File System |
title_sort |
dynamic de-duplication decision in a hadoop distributed file system |
publishDate |
2013 |
url |
http://ndltd.ncl.edu.tw/handle/12180320597103126420 |
work_keys_str_mv |
AT kuozhengfan dynamicdeduplicationdecisioninahadoopdistributedfilesystem AT fànguózhěng dynamicdeduplicationdecisioninahadoopdistributedfilesystem AT kuozhengfan dòngtàizhòngfùzīliàoshānchúzàihadoopfēnsànshìdàngànxìtǒngshàng AT fànguózhěng dòngtàizhòngfùzīliàoshānchúzàihadoopfēnsànshìdàngànxìtǒngshàng |
_version_ |
1718192595891388416 |