Design and Implementation of a Big Data Processing and Analysis Framework on the Hadoop Ecosystem

碩士 === 國立中興大學 === 資訊科學與工程學系所 === 101 === A research conducted by IDC indicates that information worldwide has doubled its amount in every two years, which has broken Moore’s Law. Besides, , with the increase of digital information and the universalization of Cloud Computing, it is pridicted that the...

Full description

Bibliographic Details
Main Authors: Chien-Chung Chien, 簡玠忠
Other Authors: Hsung-Pin Chang
Format: Others
Language:zh-TW
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/44145308637212771842
Description
Summary:碩士 === 國立中興大學 === 資訊科學與工程學系所 === 101 === A research conducted by IDC indicates that information worldwide has doubled its amount in every two years, which has broken Moore’s Law. Besides, , with the increase of digital information and the universalization of Cloud Computing, it is pridicted that the amount of digital data will reach 35ZB by 2020. In addition, one third of digital data will be stored and processed through Cloud Computing. Consequently, large amounts of digital data will be the business opportunity for corporations and individuals. However, while we analyze the mega data, the limit of current technology is still a problem because most of these data is non-structured and stored in different systems. As a result, it is hard to analyze by database and other conventional ways. The new ways to retrieve, research, discover, and analyze the mega data would be the challenge issues of data processing. The main purpose of this research is to build a mega data processing platform on a private cloud environment , so as to enable efficiently and promptly analyze the potential profits of mega data Based on the Hadoop Ecosystem, we integrate Hbase, Pig, and other similar tools, understand their purpose goal and usage of each tool, and construct a log data analysis framework, providing enterprises or organizations with a platform that achieve high-speed process and analysis of mega data information.