A Strategy of Efficient and Accurate Cardinality Estimation Based on Query Result
Cardinality estimation is an important component of query optimization. Its accuracy and efficiency directly decide effect of query optimization. Traditional cardinality estimation strategy is based on original table or sample to collect statistics, then inferring cardinality by collected statistics...
Format: | Article |
---|---|
Language: | zho |
Published: |
The Northwestern Polytechnical University
2018-08-01
|
Series: | Xibei Gongye Daxue Xuebao |
Subjects: | |
Online Access: | https://www.jnwpu.org/articles/jnwpu/pdf/2018/04/jnwpu2018364p768.pdf |
id |
doaj-7ad6fc9355a549e0bc1d7196bf9ebc3c |
---|---|
record_format |
Article |
spelling |
doaj-7ad6fc9355a549e0bc1d7196bf9ebc3c2021-05-02T19:21:47ZzhoThe Northwestern Polytechnical UniversityXibei Gongye Daxue Xuebao1000-27582609-71252018-08-0136476877710.1051/jnwpu/20183640768jnwpu2018364p768A Strategy of Efficient and Accurate Cardinality Estimation Based on Query Result012School of Computer Science, Northwestern Polytechnical UniversitySchool of Computer Science, Northwestern Polytechnical UniversitySchool of Computer Science, Northwestern Polytechnical UniversityCardinality estimation is an important component of query optimization. Its accuracy and efficiency directly decide effect of query optimization. Traditional cardinality estimation strategy is based on original table or sample to collect statistics, then inferring cardinality by collected statistics. It will be low-efficiency when handling big data; Statistics exist update latency and are gotten by inferring, which can not guarantee correctness; Some strategies can get the actual cardinality by executing some subqueries, but they do not keep the result, leading to low efficiency of fetching statistics. Against these problems, this paper proposes a novel cardinality estimation strategy, called cardinality estimation based on query result(CEQR). For keeping correctness of cardinality, CEQR directly gets statistics from query results, which is not related with data size; we build a cardinality table to store the statistics of basic tables and middle results under specific predicates. Cardinality table can provide cardinality services for subsequent queries, and we build a suit of rules to maintain cardinality table; To improve the efficiency of fetching statistics, we introduce the source aware strategy, which hashes cardinality item to appropriate cache. This paper gives the adaptability and deviation analytic of CEQR, and proves that CEQR is more efficient than traditional cardinality estimation strategy by experiments.https://www.jnwpu.org/articles/jnwpu/pdf/2018/04/jnwpu2018364p768.pdfbig datacardinality estimationquery optimizationquery resultefficientaccurate |
collection |
DOAJ |
language |
zho |
format |
Article |
sources |
DOAJ |
title |
A Strategy of Efficient and Accurate Cardinality Estimation Based on Query Result |
spellingShingle |
A Strategy of Efficient and Accurate Cardinality Estimation Based on Query Result Xibei Gongye Daxue Xuebao big data cardinality estimation query optimization query result efficient accurate |
title_short |
A Strategy of Efficient and Accurate Cardinality Estimation Based on Query Result |
title_full |
A Strategy of Efficient and Accurate Cardinality Estimation Based on Query Result |
title_fullStr |
A Strategy of Efficient and Accurate Cardinality Estimation Based on Query Result |
title_full_unstemmed |
A Strategy of Efficient and Accurate Cardinality Estimation Based on Query Result |
title_sort |
strategy of efficient and accurate cardinality estimation based on query result |
publisher |
The Northwestern Polytechnical University |
series |
Xibei Gongye Daxue Xuebao |
issn |
1000-2758 2609-7125 |
publishDate |
2018-08-01 |
description |
Cardinality estimation is an important component of query optimization. Its accuracy and efficiency directly decide effect of query optimization. Traditional cardinality estimation strategy is based on original table or sample to collect statistics, then inferring cardinality by collected statistics. It will be low-efficiency when handling big data; Statistics exist update latency and are gotten by inferring, which can not guarantee correctness; Some strategies can get the actual cardinality by executing some subqueries, but they do not keep the result, leading to low efficiency of fetching statistics. Against these problems, this paper proposes a novel cardinality estimation strategy, called cardinality estimation based on query result(CEQR). For keeping correctness of cardinality, CEQR directly gets statistics from query results, which is not related with data size; we build a cardinality table to store the statistics of basic tables and middle results under specific predicates. Cardinality table can provide cardinality services for subsequent queries, and we build a suit of rules to maintain cardinality table; To improve the efficiency of fetching statistics, we introduce the source aware strategy, which hashes cardinality item to appropriate cache. This paper gives the adaptability and deviation analytic of CEQR, and proves that CEQR is more efficient than traditional cardinality estimation strategy by experiments. |
topic |
big data cardinality estimation query optimization query result efficient accurate |
url |
https://www.jnwpu.org/articles/jnwpu/pdf/2018/04/jnwpu2018364p768.pdf |
_version_ |
1721488430132625408 |