涉及范围时,索引中的基数较高的列首先?

Higher cardinality column first in an index when involving a range?(涉及范围时,索引中的基数较高的列首先?)
本文介绍了涉及范围时,索引中的基数较高的列首先?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

CREATE TABLE `files` (
  `did` int(10) unsigned NOT NULL DEFAULT '0',
  `filename` varbinary(200) NOT NULL,
  `ext` varbinary(5) DEFAULT NULL,
  `fsize` double DEFAULT NULL,
  `filetime` datetime DEFAULT NULL,
  PRIMARY KEY (`did`,`filename`),
  KEY `fe` (`filetime`,`ext`),          -- This?
  KEY `ef` (`ext`,`filetime`)           -- or This?
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;

表中有一百万行.文件时间大多是不同的.ext 值的数量是有限的.因此,filetime 具有高基数,而 ext 具有低得多的基数.

There are a million rows in the table. The filetimes are mostly distinct. There are a finite number of ext values. So, filetimehas a high cardinality and ext has a much lower cardinality.

查询涉及extfiletime:

WHERE ext = '...'
  AND filetime BETWEEN ... AND ...

这两个索引哪个更好?为什么?

Which of those two indexes is better? And why?

推荐答案

首先,让我们尝试 FORCE INDEX 来选择 effe.时间太短了,无法清楚地了解哪个更快,但 `EXPLAIN 显示了差异:

First, let's try FORCE INDEX to pick either ef or fe. The timings are too short to get a clear picture of which is faster, but `EXPLAIN shows a difference:

首先在 filetime 上强制范围.(注意:WHERE中的顺序没有影响.)

Forcing the range on filetime first. (Note: The order in WHERE has no impact.)

mysql> EXPLAIN SELECT COUNT(*), AVG(fsize)
    FROM files FORCE INDEX(fe)
    WHERE ext = 'gif' AND filetime >= '2015-01-01'
                      AND filetime <  '2015-01-01' + INTERVAL 1 MONTH;
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows  | Extra                 |
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+
|  1 | SIMPLE      | files | range | fe            | fe   | 14      | NULL | 16684 | Using index condition |
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+

首先强制低基数ext:

mysql> EXPLAIN SELECT COUNT(*), AVG(fsize)
    FROM files FORCE INDEX(ef)
    WHERE ext = 'gif' AND filetime >= '2015-01-01'
                      AND filetime <  '2015-01-01' + INTERVAL 1 MONTH;
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                 |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|  1 | SIMPLE      | files | range | ef            | ef   | 14      | NULL |  538 | Using index condition |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+

显然,rowsef 更好.但是让我们检查一下优化器跟踪.输出相当庞大;我只会展示有趣的部分.不需要FORCE;跟踪将显示两个选项,然后选择更好的.

Clearly, the rows says ef is better. But let's check with the Optimizer trace. The output is rather bulky; I'll show only the interesting parts. No FORCE is needed; the trace will show both options then pick the better.

             ...
             "potential_range_indices": [
                ...
                {
                  "index": "fe",
                  "usable": true,
                  "key_parts": [
                    "filetime",
                    "ext",
                    "did",
                    "filename"
                  ]
                },
                {
                  "index": "ef",
                  "usable": true,
                  "key_parts": [
                    "ext",
                    "filetime",
                    "did",
                    "filename"
                  ]
                }
              ],

...

              "analyzing_range_alternatives": {
                "range_scan_alternatives": [
                  {
                    "index": "fe",
                    "ranges": [
                      "2015-01-01 00:00:00 <= filetime < 2015-02-01 00:00:00"
                    ],
                    "index_dives_for_eq_ranges": true,
                    "rowid_ordered": false,
                    "using_mrr": false,
                    "index_only": false,
                    "rows": 16684,
                    "cost": 20022,               <-- Here's the critical number
                    "chosen": true
                  },
                  {
                    "index": "ef",
                    "ranges": [
                      "gif <= ext <= gif AND 2015-01-01 00:00:00 <= filetime < 2015-02-01 00:00:00"
                    ],
                    "index_dives_for_eq_ranges": true,
                    "rowid_ordered": false,
                    "using_mrr": false,
                    "index_only": false,
                    "rows": 538,
                    "cost": 646.61,               <-- Here's the critical number
                    "chosen": true
                  }
                ],

...

          "attached_conditions_computation": [
            {
              "access_type_changed": {
                "table": "`files`",
                "index": "ef",
                "old_type": "ref",
                "new_type": "range",
                "cause": "uses_more_keyparts"   <-- Also interesting
              }
            }

使用 fe(范围列在前),可以使用范围,但它估计要扫描 16684 行以寻找 ext='gif'.

With fe (range column first), the range could be used, but it estimated scanning through 16684 rows fishing for ext='gif'.

使用ef(低基数ext优先),它可以使用索引的两列并在BTree中更有效地向下钻取.然后它发现了大约 538 行,所有这些都对查询有用——不需要进一步过滤.

With ef (low cardinality ext first), it could use both columns of the index and drill down more efficiently in the BTree. Then it found an estimated 538 rows, all of which are useful for the query -- no further filtering needed.

结论:

  • INDEX(filetime, ext) 只使用了第一列.
  • INDEX(ext, filetime) 使用了这两列.
  • 将涉及= 测试的列first 放在索引中无论基数如何.
  • 查询计划不会超出第一个范围"列.
  • 基数"对于复合索引和这种类型的查询是无关紧要的.
  • INDEX(filetime, ext) used only the first column.
  • INDEX(ext, filetime) used both columns.
  • Put columns involved in = tests first in the index regardless of cardinality.
  • The query plan won't go beyond the first 'range' column.
  • "Cardinality" is irrelevant for composite indexes and this type of query.

(使用索引条件"表示存储引擎 (InnoDB) 将使用索引中用于过滤的列之外的列.)

("Using index condition" means that the Storage Engine (InnoDB) will use columns of the index beyond the one used for filtering.)

这篇关于涉及范围时,索引中的基数较高的列首先?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Hibernate reactive No Vert.x context active in aws rds(AWS RDS中的休眠反应性非Vert.x上下文处于活动状态)
Bulk insert with mysql2 and NodeJs throws 500(使用mysql2和NodeJS的大容量插入抛出500)
Flask + PyMySQL giving error no attribute #39;settimeout#39;(FlASK+PyMySQL给出错误,没有属性#39;setTimeout#39;)
auto_increment column for a group of rows?(一组行的AUTO_INCREMENT列?)
Sort by ID DESC(按ID代码排序)
SQL/MySQL: split a quantity value into multiple rows by date(SQL/MySQL:按日期将数量值拆分为多行)