SAHA: A String Adaptive Hash Table for Analytical Databases
Hash tables are the fundamental data structure for analytical database workloads, such as aggregation, joining, set filtering and records deduplication. The performance aspects of hash tables differ drastically with respect to what kind of data are being processed or how many inserts, lookups and de...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-03-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/10/6/1915 |
id |
doaj-70e3a74e41dc4244b9b86474889a06c7 |
---|---|
record_format |
Article |
spelling |
doaj-70e3a74e41dc4244b9b86474889a06c72020-11-25T02:01:59ZengMDPI AGApplied Sciences2076-34172020-03-01106191510.3390/app10061915app10061915SAHA: A String Adaptive Hash Table for Analytical DatabasesTianqi Zheng0Zhibin Zhang1Xueqi Cheng2CAS Key Laboratory of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, ChinaCAS Key Laboratory of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, ChinaCAS Key Laboratory of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, ChinaHash tables are the fundamental data structure for analytical database workloads, such as aggregation, joining, set filtering and records deduplication. The performance aspects of hash tables differ drastically with respect to what kind of data are being processed or how many inserts, lookups and deletes are constructed. In this paper, we address some common use cases of hash tables: aggregating and joining over arbitrary string data. We designed a new hash table, SAHA, which is tightly integrated with modern analytical databases and optimized for string data with the following advantages: (1) it inlines short strings and saves hash values for long strings only; (2) it uses special memory loading techniques to do quick dispatching and hashing computations; and (3) it utilizes vectorized processing to batch hashing operations. Our evaluation results reveal that SAHA outperforms state-of-the-art hash tables by one to five times in analytical workloads, including Google’s SwissTable and Facebook’s F14Table. It has been merged into the ClickHouse database and shows promising results in production.https://www.mdpi.com/2076-3417/10/6/1915hash tableanalytical databasestring data |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Tianqi Zheng Zhibin Zhang Xueqi Cheng |
spellingShingle |
Tianqi Zheng Zhibin Zhang Xueqi Cheng SAHA: A String Adaptive Hash Table for Analytical Databases Applied Sciences hash table analytical database string data |
author_facet |
Tianqi Zheng Zhibin Zhang Xueqi Cheng |
author_sort |
Tianqi Zheng |
title |
SAHA: A String Adaptive Hash Table for Analytical Databases |
title_short |
SAHA: A String Adaptive Hash Table for Analytical Databases |
title_full |
SAHA: A String Adaptive Hash Table for Analytical Databases |
title_fullStr |
SAHA: A String Adaptive Hash Table for Analytical Databases |
title_full_unstemmed |
SAHA: A String Adaptive Hash Table for Analytical Databases |
title_sort |
saha: a string adaptive hash table for analytical databases |
publisher |
MDPI AG |
series |
Applied Sciences |
issn |
2076-3417 |
publishDate |
2020-03-01 |
description |
Hash tables are the fundamental data structure for analytical database workloads, such as aggregation, joining, set filtering and records deduplication. The performance aspects of hash tables differ drastically with respect to what kind of data are being processed or how many inserts, lookups and deletes are constructed. In this paper, we address some common use cases of hash tables: aggregating and joining over arbitrary string data. We designed a new hash table, SAHA, which is tightly integrated with modern analytical databases and optimized for string data with the following advantages: (1) it inlines short strings and saves hash values for long strings only; (2) it uses special memory loading techniques to do quick dispatching and hashing computations; and (3) it utilizes vectorized processing to batch hashing operations. Our evaluation results reveal that SAHA outperforms state-of-the-art hash tables by one to five times in analytical workloads, including Google’s SwissTable and Facebook’s F14Table. It has been merged into the ClickHouse database and shows promising results in production. |
topic |
hash table analytical database string data |
url |
https://www.mdpi.com/2076-3417/10/6/1915 |
work_keys_str_mv |
AT tianqizheng sahaastringadaptivehashtableforanalyticaldatabases AT zhibinzhang sahaastringadaptivehashtableforanalyticaldatabases AT xueqicheng sahaastringadaptivehashtableforanalyticaldatabases |
_version_ |
1724954582279258112 |