Correlation Aware Technique for SQL to NoSQL Transformation

碩士 === 中華大學 === 資訊工程學系碩士班 === 102 === For better efficiency of parallel and distributed computing, Apache Hadoop distributes the imported data randomly on data nodes. This mechanism provides some advantages for general data analysis, when the data have the relationship between the data sets (example...

Full description

Bibliographic Details
Main Authors: Hsu, Jen-Chun, 徐仁淳
Other Authors: Hsu, Ching-Hsien
Format: Others
Language:zh-TW
Published: 2014
Online Access:http://ndltd.ncl.edu.tw/handle/77732462514698260699
id ndltd-TW-102CHPI5392021
record_format oai_dc
spelling ndltd-TW-102CHPI53920212017-02-17T16:16:41Z http://ndltd.ncl.edu.tw/handle/77732462514698260699 Correlation Aware Technique for SQL to NoSQL Transformation 關聯感知技術應用於SQL 與 NoSQL 資料轉換 Hsu, Jen-Chun 徐仁淳 碩士 中華大學 資訊工程學系碩士班 102 For better efficiency of parallel and distributed computing, Apache Hadoop distributes the imported data randomly on data nodes. This mechanism provides some advantages for general data analysis, when the data have the relationship between the data sets (example: Database), it’s a popular issue that stores the data with relevance. With the data sets increasing, a lot of data to be stored in a database, if we still use traditional database that has been unable to capable of providing an efficient service to real-time system. Most people wanted to use Hadoop to improve database performance. At this time, Apache provided a tool named Sqoop that can import all databases to Hadoop environment by command line interface. Have the same concept with Hadoop, Apache Sqoop separates each table into four parts and randomly distributes them on data nodes. However, there is still a database performance concern with this data placement mechanism. This paper proposes a Correlation Aware method on Sqoop (CA_Sqoop) to improve the data placement. By gathering related data as close as it could be to reduce the data transformation cost of the network and improve the performance in terms of database usage. The CA_Sqoop also considers the table correlation and size for better data locality and query efficiency. Simulation results show the data locality of CA_Sqoop is two times better than that of original Apache Sqoop. Hsu, Ching-Hsien 許慶賢 2014 學位論文 ; thesis 28 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 中華大學 === 資訊工程學系碩士班 === 102 === For better efficiency of parallel and distributed computing, Apache Hadoop distributes the imported data randomly on data nodes. This mechanism provides some advantages for general data analysis, when the data have the relationship between the data sets (example: Database), it’s a popular issue that stores the data with relevance. With the data sets increasing, a lot of data to be stored in a database, if we still use traditional database that has been unable to capable of providing an efficient service to real-time system. Most people wanted to use Hadoop to improve database performance. At this time, Apache provided a tool named Sqoop that can import all databases to Hadoop environment by command line interface. Have the same concept with Hadoop, Apache Sqoop separates each table into four parts and randomly distributes them on data nodes. However, there is still a database performance concern with this data placement mechanism. This paper proposes a Correlation Aware method on Sqoop (CA_Sqoop) to improve the data placement. By gathering related data as close as it could be to reduce the data transformation cost of the network and improve the performance in terms of database usage. The CA_Sqoop also considers the table correlation and size for better data locality and query efficiency. Simulation results show the data locality of CA_Sqoop is two times better than that of original Apache Sqoop.
author2 Hsu, Ching-Hsien
author_facet Hsu, Ching-Hsien
Hsu, Jen-Chun
徐仁淳
author Hsu, Jen-Chun
徐仁淳
spellingShingle Hsu, Jen-Chun
徐仁淳
Correlation Aware Technique for SQL to NoSQL Transformation
author_sort Hsu, Jen-Chun
title Correlation Aware Technique for SQL to NoSQL Transformation
title_short Correlation Aware Technique for SQL to NoSQL Transformation
title_full Correlation Aware Technique for SQL to NoSQL Transformation
title_fullStr Correlation Aware Technique for SQL to NoSQL Transformation
title_full_unstemmed Correlation Aware Technique for SQL to NoSQL Transformation
title_sort correlation aware technique for sql to nosql transformation
publishDate 2014
url http://ndltd.ncl.edu.tw/handle/77732462514698260699
work_keys_str_mv AT hsujenchun correlationawaretechniqueforsqltonosqltransformation
AT xúrénchún correlationawaretechniqueforsqltonosqltransformation
AT hsujenchun guānliángǎnzhījìshùyīngyòngyúsqlyǔnosqlzīliàozhuǎnhuàn
AT xúrénchún guānliángǎnzhījìshùyīngyòngyúsqlyǔnosqlzīliàozhuǎnhuàn
_version_ 1718414970369081344