Performance Evaluation of Analytical Queries on a Stand-alone and Sharded Document Store
Main Author: | |
---|---|
Language: | English |
Published: |
University of Cincinnati / OhioLINK
2015
|
Subjects: | |
Online Access: | http://rave.ohiolink.edu/etdc/view?acc_num=ucin1447688210 |
id |
ndltd-OhioLink-oai-etd.ohiolink.edu-ucin1447688210 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-OhioLink-oai-etd.ohiolink.edu-ucin14476882102021-08-03T06:33:43Z Performance Evaluation of Analytical Queries on a Stand-alone and Sharded Document Store Raghavendra, Aarthi Computer Science MongoDB Sharded system Stand-alone system Denormalized data model Normalized data model Analytical queries Numerous organizations perform data analytics using relational databases by executing data mining queries. These queries include complex joins and aggregate functions. However, due to an explosion of data in terms of volume, variety, veracity, velocity, and value, known as Big Data [1], many organizations such as Foursquare, Adobe, and Bosch have migrated to NoSQL databases [2] such as MongoDB [3] and Cassandra [4]. We intend to demonstrate the performance impact an organization can expect for analytical queries on a NoSQL document store. In this thesis, we benchmark the performance of MongoDB [3], a cross-platform document-oriented database for datasets of sizes 1GB and 5GB in a stand-alone environment and a sharded environment. The stand-alone MongoDB environment for all the datasets is the same whereas the configurations of the MongoDB cluster vary based on the dataset size. The TPC-DS benchmark [5] is used to generate data of different scales and selected data mining queries are executed in both the environments. Our experimental results show that along with choosing the environment, data modeling in MongoDB also has a significant impact on query execution times. MongoDB is an appropriate choice when the data has a flexible structure and analytical query performance is best when data is stored in a denormalized fashion. When the data is sharded, due to multiple query predicates in an analytical query, aggregating data from a few or all nodes proves to be an expensive process and hence performs poorly when compared to the alternative process of executing the same in a stand-alone environment. 2015 English text University of Cincinnati / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=ucin1447688210 http://rave.ohiolink.edu/etdc/view?acc_num=ucin1447688210 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws. |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
topic |
Computer Science MongoDB Sharded system Stand-alone system Denormalized data model Normalized data model Analytical queries |
spellingShingle |
Computer Science MongoDB Sharded system Stand-alone system Denormalized data model Normalized data model Analytical queries Raghavendra, Aarthi Performance Evaluation of Analytical Queries on a Stand-alone and Sharded Document Store |
author |
Raghavendra, Aarthi |
author_facet |
Raghavendra, Aarthi |
author_sort |
Raghavendra, Aarthi |
title |
Performance Evaluation of Analytical Queries on a Stand-alone and Sharded Document Store |
title_short |
Performance Evaluation of Analytical Queries on a Stand-alone and Sharded Document Store |
title_full |
Performance Evaluation of Analytical Queries on a Stand-alone and Sharded Document Store |
title_fullStr |
Performance Evaluation of Analytical Queries on a Stand-alone and Sharded Document Store |
title_full_unstemmed |
Performance Evaluation of Analytical Queries on a Stand-alone and Sharded Document Store |
title_sort |
performance evaluation of analytical queries on a stand-alone and sharded document store |
publisher |
University of Cincinnati / OhioLINK |
publishDate |
2015 |
url |
http://rave.ohiolink.edu/etdc/view?acc_num=ucin1447688210 |
work_keys_str_mv |
AT raghavendraaarthi performanceevaluationofanalyticalqueriesonastandaloneandshardeddocumentstore |
_version_ |
1719439167655510016 |