Summary: | HBase is a distributed database management system and is becoming increasingly popular for applications that need fast random access to a large amount of data. However, it has a number of performancecritical configuration parameters, which may interact with each other in a complex way, making manually tuning them for optimal performance extremely difficult. In this paper, we propose a novel approach to auto-tune the configuration parameters for a given HBase application, called Auto-Tuning HBase (ATH). The key is an accurate performance model with low cost, which takes configuration parameters as inputs. To this end, we systematically explore different modeling techniques and decide to employ an ensemble learning algorithm to build the performance model. Subsequently, we leverage genetic algorithm to search the optimal configuration parameters for the application by using the performance model. As such, ATH can quickly as well as automatically identify a set of configuration parameter values to make the performance of the application optimal. We validate ATH in a cluster with ten nodes by using five typical applications from Yahoo! Cloud Serving Benchmark. The experimental results show that ATH can improve throughput by 41% on average and up to 97% compared with the default configurations. At the same time, the latency of HBase operations is reduced by 11.3% on average and up to 57%.
|