Replica selection in Apache Cassandra : Reducing the tail latency for reads using the C3 algorithm

Keeping response times low is crucial in order to provide a good user experience. Especially the tail latency proves to be a challenge to keep low as size, complexity and overall use of services scale up. In this thesis we look at reducing the tail latency for reads in the Apache Cassandra database...

Full description

Bibliographic Details
Main Author:	Thorsen, Sofie
Format:	Others
Language:	English
Published:	KTH, Skolan för datavetenskap och kommunikation (CSC) 2015
Subjects:	cassandra replica selection distributed database tail latency Computer Sciences Datavetenskap (datalogi)
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-170129

id	ndltd-UPSALLA1-oai-DiVA.org-kth-170129
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-kth-1701292018-01-12T05:10:15ZReplica selection in Apache Cassandra : Reducing the tail latency for reads using the C3 algorithmengThorsen, SofieKTH, Skolan för datavetenskap och kommunikation (CSC)2015cassandrareplica selectiondistributed databasetail latencyComputer SciencesDatavetenskap (datalogi)Keeping response times low is crucial in order to provide a good user experience. Especially the tail latency proves to be a challenge to keep low as size, complexity and overall use of services scale up. In this thesis we look at reducing the tail latency for reads in the Apache Cassandra database system by implementing the new replica selection algorithm called C3, recently developed by Lalith Suresh, Marco Canini, Stefan Schmid and Anja Feldmann. Through extensive benchmarks with several stress tools, we find that C3 indeed decreases the tail latencies of Cassandra on generated load. However, when evaluating C3 on production load, results does not show any particular improvement. We argue that this is mostly due to the variable size records in the data set and token awareness in the production client. We also present a client-side implementation of C3 in the DataStax Java driver in an attempt to remove the caveat of token aware clients. The client-side implementation did give positive results, but as the benchmark results showed a lot of variance we deem the results to be too inconclusive to confirm that the implementation works as intended. We conclude that the server-side C3 algorithm will work effectively for systems with homogeneous row sizes where the clients are not token aware. För att kunna erbjuda en bra användarupplevelse så är det av högsta vikt att hålla responstiden låg. Speciellt svanslatensen är en utmaning att hålla låg då dagens applikationer växer både i storlek, komplexitet och användning. I denna rapport undersöker vi svanslatensen vid läsning i databassystemet Apache Cassandra och huruvida den går att förbättra. Detta genom att implementera den nya selektionsalgoritmen för replikor, kallad C3, nyligen framtagen av Lalith Suresh, Marco Canini, Stefan Schmid och Anja Feldmann. Genom utförliga tester med flera olika stressverktyg så finner vi att C3 verkligen förbättrar Cassandras svanslatenser på genererad last. Dock så visade använding av C3 på produktionslast ingen större förbättring. Vi hävdar att detta framförallt beror på en variabel storlek på datasetet och att produktionsklienten är tokenmedveten. Vi presenterar också en klientimplementation av C3 i Java-drivrutinen från DataStax, i ett försök att åtgärda problemet med tokenmedventa klienter. Klientimplementationen av C3 gav positiva resultat, men då testresultaten uppvisade stor varians så anser vi att resultaten är för osäkra för att kunna bekräfta att implentationen fungerar så som den är avsedd. Vi drar slutsatsen att C3, implementerad på servern, fungerar effektivt på system med homogen storlek på datat och där klienter ej är tokenmedvetna. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-170129application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	cassandra replica selection distributed database tail latency Computer Sciences Datavetenskap (datalogi)
spellingShingle	cassandra replica selection distributed database tail latency Computer Sciences Datavetenskap (datalogi) Thorsen, Sofie Replica selection in Apache Cassandra : Reducing the tail latency for reads using the C3 algorithm
description	Keeping response times low is crucial in order to provide a good user experience. Especially the tail latency proves to be a challenge to keep low as size, complexity and overall use of services scale up. In this thesis we look at reducing the tail latency for reads in the Apache Cassandra database system by implementing the new replica selection algorithm called C3, recently developed by Lalith Suresh, Marco Canini, Stefan Schmid and Anja Feldmann. Through extensive benchmarks with several stress tools, we find that C3 indeed decreases the tail latencies of Cassandra on generated load. However, when evaluating C3 on production load, results does not show any particular improvement. We argue that this is mostly due to the variable size records in the data set and token awareness in the production client. We also present a client-side implementation of C3 in the DataStax Java driver in an attempt to remove the caveat of token aware clients. The client-side implementation did give positive results, but as the benchmark results showed a lot of variance we deem the results to be too inconclusive to confirm that the implementation works as intended. We conclude that the server-side C3 algorithm will work effectively for systems with homogeneous row sizes where the clients are not token aware. === För att kunna erbjuda en bra användarupplevelse så är det av högsta vikt att hålla responstiden låg. Speciellt svanslatensen är en utmaning att hålla låg då dagens applikationer växer både i storlek, komplexitet och användning. I denna rapport undersöker vi svanslatensen vid läsning i databassystemet Apache Cassandra och huruvida den går att förbättra. Detta genom att implementera den nya selektionsalgoritmen för replikor, kallad C3, nyligen framtagen av Lalith Suresh, Marco Canini, Stefan Schmid och Anja Feldmann. Genom utförliga tester med flera olika stressverktyg så finner vi att C3 verkligen förbättrar Cassandras svanslatenser på genererad last. Dock så visade använding av C3 på produktionslast ingen större förbättring. Vi hävdar att detta framförallt beror på en variabel storlek på datasetet och att produktionsklienten är tokenmedveten. Vi presenterar också en klientimplementation av C3 i Java-drivrutinen från DataStax, i ett försök att åtgärda problemet med tokenmedventa klienter. Klientimplementationen av C3 gav positiva resultat, men då testresultaten uppvisade stor varians så anser vi att resultaten är för osäkra för att kunna bekräfta att implentationen fungerar så som den är avsedd. Vi drar slutsatsen att C3, implementerad på servern, fungerar effektivt på system med homogen storlek på datat och där klienter ej är tokenmedvetna.
author	Thorsen, Sofie
author_facet	Thorsen, Sofie
author_sort	Thorsen, Sofie
title	Replica selection in Apache Cassandra : Reducing the tail latency for reads using the C3 algorithm
title_short	Replica selection in Apache Cassandra : Reducing the tail latency for reads using the C3 algorithm
title_full	Replica selection in Apache Cassandra : Reducing the tail latency for reads using the C3 algorithm
title_fullStr	Replica selection in Apache Cassandra : Reducing the tail latency for reads using the C3 algorithm
title_full_unstemmed	Replica selection in Apache Cassandra : Reducing the tail latency for reads using the C3 algorithm
title_sort	replica selection in apache cassandra : reducing the tail latency for reads using the c3 algorithm
publisher	KTH, Skolan för datavetenskap och kommunikation (CSC)
publishDate	2015
url	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-170129
work_keys_str_mv	AT thorsensofie replicaselectioninapachecassandrareducingthetaillatencyforreadsusingthec3algorithm
_version_	1718605284403838976

Replica selection in Apache Cassandra : Reducing the tail latency for reads using the C3 algorithm

Similar Items