Performance Evaluation of Analytical Queries on a Stand-alone and Sharded Document Store

Bibliographic Details
Main Author: Raghavendra, Aarthi
Language:English
Published: University of Cincinnati / OhioLINK 2015
Subjects:
Online Access:http://rave.ohiolink.edu/etdc/view?acc_num=ucin1447688210
id ndltd-OhioLink-oai-etd.ohiolink.edu-ucin1447688210
record_format oai_dc
spelling ndltd-OhioLink-oai-etd.ohiolink.edu-ucin14476882102021-08-03T06:33:43Z Performance Evaluation of Analytical Queries on a Stand-alone and Sharded Document Store Raghavendra, Aarthi Computer Science MongoDB Sharded system Stand-alone system Denormalized data model Normalized data model Analytical queries Numerous organizations perform data analytics using relational databases by executing data mining queries. These queries include complex joins and aggregate functions. However, due to an explosion of data in terms of volume, variety, veracity, velocity, and value, known as Big Data [1], many organizations such as Foursquare, Adobe, and Bosch have migrated to NoSQL databases [2] such as MongoDB [3] and Cassandra [4]. We intend to demonstrate the performance impact an organization can expect for analytical queries on a NoSQL document store. In this thesis, we benchmark the performance of MongoDB [3], a cross-platform document-oriented database for datasets of sizes 1GB and 5GB in a stand-alone environment and a sharded environment. The stand-alone MongoDB environment for all the datasets is the same whereas the configurations of the MongoDB cluster vary based on the dataset size. The TPC-DS benchmark [5] is used to generate data of different scales and selected data mining queries are executed in both the environments. Our experimental results show that along with choosing the environment, data modeling in MongoDB also has a significant impact on query execution times. MongoDB is an appropriate choice when the data has a flexible structure and analytical query performance is best when data is stored in a denormalized fashion. When the data is sharded, due to multiple query predicates in an analytical query, aggregating data from a few or all nodes proves to be an expensive process and hence performs poorly when compared to the alternative process of executing the same in a stand-alone environment. 2015 English text University of Cincinnati / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=ucin1447688210 http://rave.ohiolink.edu/etdc/view?acc_num=ucin1447688210 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws.
collection NDLTD
language English
sources NDLTD
topic Computer Science
MongoDB
Sharded system
Stand-alone system
Denormalized data model
Normalized data model
Analytical queries
spellingShingle Computer Science
MongoDB
Sharded system
Stand-alone system
Denormalized data model
Normalized data model
Analytical queries
Raghavendra, Aarthi
Performance Evaluation of Analytical Queries on a Stand-alone and Sharded Document Store
author Raghavendra, Aarthi
author_facet Raghavendra, Aarthi
author_sort Raghavendra, Aarthi
title Performance Evaluation of Analytical Queries on a Stand-alone and Sharded Document Store
title_short Performance Evaluation of Analytical Queries on a Stand-alone and Sharded Document Store
title_full Performance Evaluation of Analytical Queries on a Stand-alone and Sharded Document Store
title_fullStr Performance Evaluation of Analytical Queries on a Stand-alone and Sharded Document Store
title_full_unstemmed Performance Evaluation of Analytical Queries on a Stand-alone and Sharded Document Store
title_sort performance evaluation of analytical queries on a stand-alone and sharded document store
publisher University of Cincinnati / OhioLINK
publishDate 2015
url http://rave.ohiolink.edu/etdc/view?acc_num=ucin1447688210
work_keys_str_mv AT raghavendraaarthi performanceevaluationofanalyticalqueriesonastandaloneandshardeddocumentstore
_version_ 1719439167655510016