Volume 36, Number 4, August 2018
|Page(s)||768 - 777|
|Published online||24 October 2018|
A Strategy of Efficient and Accurate Cardinality Estimation Based on Query Result
School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China
Cardinality estimation is an important component of query optimization. Its accuracy and efficiency directly decide effect of query optimization. Traditional cardinality estimation strategy is based on original table or sample to collect statistics, then inferring cardinality by collected statistics. It will be low-efficiency when handling big data; Statistics exist update latency and are gotten by inferring, which can not guarantee correctness; Some strategies can get the actual cardinality by executing some subqueries, but they do not keep the result, leading to low efficiency of fetching statistics. Against these problems, this paper proposes a novel cardinality estimation strategy, called cardinality estimation based on query result(CEQR). For keeping correctness of cardinality, CEQR directly gets statistics from query results, which is not related with data size; we build a cardinality table to store the statistics of basic tables and middle results under specific predicates. Cardinality table can provide cardinality services for subsequent queries, and we build a suit of rules to maintain cardinality table; To improve the efficiency of fetching statistics, we introduce the source aware strategy, which hashes cardinality item to appropriate cache. This paper gives the adaptability and deviation analytic of CEQR, and proves that CEQR is more efficient than traditional cardinality estimation strategy by experiments.
基数估计是查询优化的重要组成部分，其高效性、准确性直接影响查询优化效果。传统基数估计策略基于原表或原表样本进行统计信息收集，然后利用收集好的统计信息推导出基数。该策略在数据量大时，统计信息收集效率低；统计信息存在延迟，并且基数通过推导得到，准确度无法保证；一些策略通过子查询的反馈信息得到基数，但结果没有保存，基数获取效率低。为解决这些问题，提出了一种高效准确的基于查询结果的基数估计策略（cardinality estimation based on query result，CEQR），特点是统计信息来源为查询执行结果，不需要进行推导，保证基数的准确度，并且收集效率与原表数据量无关；建立一种基数表，保存基本表和中间结果在某种谓词下的统计信息，为后续查询提供服务，并建立基数维护规则，合理管理基数表；建立资源感知策略，将基数项映射到缓存，加快统计信息获取效率。给出了基于CEQR策略的适应性以及误差分析，并通过实验得出CEQR策略在效率上优于传统基数估计策略。
Key words: big data / cardinality estimation / query optimization / query result / efficient / accurate
关键字 : 大数据 / 基数估计 / 查询优化 / 查询结果 / 高效 / 准确
© 2018 Journal of Northwestern Polytechnical University. All rights reserved.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.