Benchmarking Performance for Migrating a Relational Application to a Parallel Implementation

Bibliographic Details
Main Author: Gadiraju, Krishna Karthik
Language:English
Published: University of Cincinnati / OhioLINK 2014
Subjects:
SQL
Online Access:http://rave.ohiolink.edu/etdc/view?acc_num=ucin1409065914
id ndltd-OhioLink-oai-etd.ohiolink.edu-ucin1409065914
record_format oai_dc
spelling ndltd-OhioLink-oai-etd.ohiolink.edu-ucin14090659142021-08-03T06:27:12Z Benchmarking Performance for Migrating a Relational Application to a Parallel Implementation Gadiraju, Krishna Karthik Computer Science Hive Hadoop benchmarking big data SQL queries Many organizations rely on relational database platforms for OLAP-style querying (aggregation and filtering) for small to medium size applications. We investigate the impact of scaling up the data sizes for such queries. We intend to illustrate what kind of performance results an organization could expect should they migrate current applications to big data environments. This thesis benchmarks the performance of Hive, a parallel data warehouse platform that is a part of the Hadoop software stack. We set up a 4-node Hadoop cluster using Hortonworks HDP 1.3.2. We use the data generator provided by the TPC-DS benchmark to generate data of different scales. We use a representative query provided in the TPC-DS query set and run the SQL and Hive Query Language (HiveQL) versions of the same query on a relational database installation (MySQL) and on the Hive cluster. An analysis of the results shows that for all the dataset sizes used, Hive is faster than MySQL when executing the query. Hive loads the large datasets faster than MySQL, while it is marginally slower than MySQL when loading the smaller datasets. 2014-10-13 English text University of Cincinnati / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=ucin1409065914 http://rave.ohiolink.edu/etdc/view?acc_num=ucin1409065914 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws.
collection NDLTD
language English
sources NDLTD
topic Computer Science
Hive
Hadoop
benchmarking
big data
SQL
queries
spellingShingle Computer Science
Hive
Hadoop
benchmarking
big data
SQL
queries
Gadiraju, Krishna Karthik
Benchmarking Performance for Migrating a Relational Application to a Parallel Implementation
author Gadiraju, Krishna Karthik
author_facet Gadiraju, Krishna Karthik
author_sort Gadiraju, Krishna Karthik
title Benchmarking Performance for Migrating a Relational Application to a Parallel Implementation
title_short Benchmarking Performance for Migrating a Relational Application to a Parallel Implementation
title_full Benchmarking Performance for Migrating a Relational Application to a Parallel Implementation
title_fullStr Benchmarking Performance for Migrating a Relational Application to a Parallel Implementation
title_full_unstemmed Benchmarking Performance for Migrating a Relational Application to a Parallel Implementation
title_sort benchmarking performance for migrating a relational application to a parallel implementation
publisher University of Cincinnati / OhioLINK
publishDate 2014
url http://rave.ohiolink.edu/etdc/view?acc_num=ucin1409065914
work_keys_str_mv AT gadirajukrishnakarthik benchmarkingperformanceformigratingarelationalapplicationtoaparallelimplementation
_version_ 1719437082870415360