Monitoring and Analysis of CPU Utilization, Disk Throughput and Latency in servers running Cassandra database : An Experimental Investigation

Context Light weight process virtualization has been used in the past e.g., Solaris zones, jails in Free BSD and Linux’s containers (LXC). But only since 2013 is there a kernel support for user namespace and process grouping control that make the use of lightweight virtualization interesting to crea...

Full description

Bibliographic Details
Main Author: Chekkilla, Avinash Goud
Format: Others
Language:English
Published: Blekinge Tekniska Högskola, Institutionen för tillämpad signalbehandling 2017
Subjects:
VM
CQL
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:bth-13706
id ndltd-UPSALLA1-oai-DiVA.org-bth-13706
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-bth-137062017-01-04T05:23:03ZMonitoring and Analysis of CPU Utilization, Disk Throughput and Latency in servers running Cassandra database : An Experimental InvestigationengChekkilla, Avinash GoudBlekinge Tekniska Högskola, Institutionen för tillämpad signalbehandling2017Cassandra-stressNoSQLDockerVMVirtualizationCQLBare-MetalLinuxContext Light weight process virtualization has been used in the past e.g., Solaris zones, jails in Free BSD and Linux’s containers (LXC). But only since 2013 is there a kernel support for user namespace and process grouping control that make the use of lightweight virtualization interesting to create virtual environments comparable to virtual machines. Telecom providers have to handle the massive growth of information due to the growing number of customers and devices. Traditional databases are not designed to handle such massive data ballooning. NoSQL databases were developed for this purpose. Cassandra, with its high read and write throughputs, is a popular NoSQL database to handle this kind of data. Running the database using operating system virtualization or containerization would offer a significant performance gain when compared to that of virtual machines and also gives the benefits of migration, fast boot up and shut down times, lower latency and less use of physical resources of the servers. Objectives This thesis aims to investigate the trade-off in performance while loading a Cassandra cluster in bare-metal and containerized environments. A detailed study of the effect of loading the cluster in each individual node in terms of Latency, CPU and Disk throughput will be analyzed. Method We implement the physical model of the Cassandra cluster based on realistic and commonly used scenarios or database analysis for our experiment. We generate different load cases on the cluster for Bare-Metal and Docker and see the values of CPU utilization, Disk throughput and latency using standard tools like sar and iostat. Statistical analysis (Mean value analysis, higher moment analysis and confidence intervals) are done on measurements on specific interfaces in order to show the reliability of the results. Results Experimental results show a quantitative analysis of measurements consisting Latency, CPU and Disk throughput while running a Cassandra cluster in Bare Metal and Container Environments. A statistical analysis summarizing the performance of Cassandra cluster while running single Cassandra is surveyed. Conclusions With the detailed analysis, the resource utilization of the database was similar in both the bare-metal and container scenarios. From the results the CPU utilization for the bare-metal servers is equivalent in the case of mixed, read and write loads. The latency values inside the container are slightly higher for all the cases. The mean value analysis and higher moment analysis helps us in doing a finer analysis of the results. The confidence intervals calculated show that there is a lot of variation in the disk performance which might be due to compactions happening randomly. Further work can be done by configuring the compaction strategies, memory, read and write rates. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:bth-13706application/pdfinfo:eu-repo/semantics/openAccessapplication/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic Cassandra-stress
NoSQL
Docker
VM
Virtualization
CQL
Bare-Metal
Linux
spellingShingle Cassandra-stress
NoSQL
Docker
VM
Virtualization
CQL
Bare-Metal
Linux
Chekkilla, Avinash Goud
Monitoring and Analysis of CPU Utilization, Disk Throughput and Latency in servers running Cassandra database : An Experimental Investigation
description Context Light weight process virtualization has been used in the past e.g., Solaris zones, jails in Free BSD and Linux’s containers (LXC). But only since 2013 is there a kernel support for user namespace and process grouping control that make the use of lightweight virtualization interesting to create virtual environments comparable to virtual machines. Telecom providers have to handle the massive growth of information due to the growing number of customers and devices. Traditional databases are not designed to handle such massive data ballooning. NoSQL databases were developed for this purpose. Cassandra, with its high read and write throughputs, is a popular NoSQL database to handle this kind of data. Running the database using operating system virtualization or containerization would offer a significant performance gain when compared to that of virtual machines and also gives the benefits of migration, fast boot up and shut down times, lower latency and less use of physical resources of the servers. Objectives This thesis aims to investigate the trade-off in performance while loading a Cassandra cluster in bare-metal and containerized environments. A detailed study of the effect of loading the cluster in each individual node in terms of Latency, CPU and Disk throughput will be analyzed. Method We implement the physical model of the Cassandra cluster based on realistic and commonly used scenarios or database analysis for our experiment. We generate different load cases on the cluster for Bare-Metal and Docker and see the values of CPU utilization, Disk throughput and latency using standard tools like sar and iostat. Statistical analysis (Mean value analysis, higher moment analysis and confidence intervals) are done on measurements on specific interfaces in order to show the reliability of the results. Results Experimental results show a quantitative analysis of measurements consisting Latency, CPU and Disk throughput while running a Cassandra cluster in Bare Metal and Container Environments. A statistical analysis summarizing the performance of Cassandra cluster while running single Cassandra is surveyed. Conclusions With the detailed analysis, the resource utilization of the database was similar in both the bare-metal and container scenarios. From the results the CPU utilization for the bare-metal servers is equivalent in the case of mixed, read and write loads. The latency values inside the container are slightly higher for all the cases. The mean value analysis and higher moment analysis helps us in doing a finer analysis of the results. The confidence intervals calculated show that there is a lot of variation in the disk performance which might be due to compactions happening randomly. Further work can be done by configuring the compaction strategies, memory, read and write rates.
author Chekkilla, Avinash Goud
author_facet Chekkilla, Avinash Goud
author_sort Chekkilla, Avinash Goud
title Monitoring and Analysis of CPU Utilization, Disk Throughput and Latency in servers running Cassandra database : An Experimental Investigation
title_short Monitoring and Analysis of CPU Utilization, Disk Throughput and Latency in servers running Cassandra database : An Experimental Investigation
title_full Monitoring and Analysis of CPU Utilization, Disk Throughput and Latency in servers running Cassandra database : An Experimental Investigation
title_fullStr Monitoring and Analysis of CPU Utilization, Disk Throughput and Latency in servers running Cassandra database : An Experimental Investigation
title_full_unstemmed Monitoring and Analysis of CPU Utilization, Disk Throughput and Latency in servers running Cassandra database : An Experimental Investigation
title_sort monitoring and analysis of cpu utilization, disk throughput and latency in servers running cassandra database : an experimental investigation
publisher Blekinge Tekniska Högskola, Institutionen för tillämpad signalbehandling
publishDate 2017
url http://urn.kb.se/resolve?urn=urn:nbn:se:bth-13706
work_keys_str_mv AT chekkillaavinashgoud monitoringandanalysisofcpuutilizationdiskthroughputandlatencyinserversrunningcassandradatabaseanexperimentalinvestigation
_version_ 1718406675124191232