问题描述
检查问题此 SELECT 查询需要 180 秒才能完成(检查问题本身的评论).
IN只能与一个值进行比较,但时间差异仍然很大.
为什么会这样?
总结:这是一个 MySQL 中的>已知问题,并在 MySQL 5.6.x 中修复.该问题是由于使用 IN 的子查询被错误地识别为依赖子查询而不是独立子查询时缺少优化.
<小时>当您对原始查询运行 EXPLAIN 时,它会返回:
<前>1 'PRIMARY' 'question_law_version' 'ALL' '' '' '' '' 10148 '使用哪里'2 'DEPENDENT SUBQUERY' 'question_law_version' 'ALL' '' '' '' '' 10148 '使用地点'3 'DEPENDENT SUBQUERY' 'question_law' 'ALL' '' '' '' '' 10040 '使用哪里'当您将 IN
更改为 =
时,您会得到:
每个依赖子查询在它所包含的查询中的每一行运行一次,而子查询只运行一次.当存在可以转换为连接的条件时,MySQL 有时可以优化依赖子查询,但这里并非如此.
现在这当然留下了为什么 MySQL 认为 IN 版本需要是依赖子查询的问题.我制作了一个简化版的查询来帮助调查这个问题.我创建了两个表foo"和bar",其中前者只包含一个 id 列,后者包含一个 id 和一个 foo id(尽管我没有创建外键约束).然后我用 1000 行填充了两个表:
CREATE TABLE foo (id INT PRIMARY KEY NOT NULL);CREATE TABLE bar (id INT PRIMARY KEY, foo_id INT NOT NULL);-- 在每个表中填充 1000 行选择 ID从 foo在哪里(选择最大(foo_id)发件人栏);
这个简化的查询有和之前一样的问题——内部选择被当作依赖子查询处理,没有进行优化,导致内部查询每行运行一次.查询几乎需要一秒钟才能运行.再次将 IN
更改为 =
几乎可以立即运行查询.
我用来填充表格的代码如下,以防有人希望重现结果.
CREATE TABLE 填充符 (id INT NOT NULL PRIMARY KEY AUTO_INCREMENT) 引擎=内存;分隔符 $$创建程序 prc_filler(cnt INT)开始声明 _cnt INT;设置_cnt = 1;而 _cnt <= cnt 做插入INTO填料选择_cnt;SET _cnt = _cnt + 1;结束时;结尾$$分隔符;呼叫 prc_filler(1000);INSERT foo SELECT id FROM fills;INSERT bar SELECT id, id FROM 填充符;
Check the question This SELECT query takes 180 seconds to finish (check the comments on the question itself).
The IN get to be compared against only one value, but still the time difference is enormous.
Why is it like that?
Summary: This is a known problem in MySQL and was fixed in MySQL 5.6.x. The problem is due to a missing optimization when a subquery using IN is incorrectly indentified as dependent subquery instead of an independent subquery.
When you run EXPLAIN on the original query it returns this:
1 'PRIMARY' 'question_law_version' 'ALL' '' '' '' '' 10148 'Using where' 2 'DEPENDENT SUBQUERY' 'question_law_version' 'ALL' '' '' '' '' 10148 'Using where' 3 'DEPENDENT SUBQUERY' 'question_law' 'ALL' '' '' '' '' 10040 'Using where'
When you change IN
to =
you get this:
1 'PRIMARY' 'question_law_version' 'ALL' '' '' '' '' 10148 'Using where' 2 'SUBQUERY' 'question_law_version' 'ALL' '' '' '' '' 10148 'Using where' 3 'SUBQUERY' 'question_law' 'ALL' '' '' '' '' 10040 'Using where'
Each dependent subquery is run once per row in the query it is contained in, whereas the subquery is run only once. MySQL can sometimes optimize dependent subqueries when there is a condition that can be converted to a join but here that is not the case.
Now this of course leaves the question of why MySQL believes that the IN version needs to be a dependent subquery. I have made a simplified version of the query to help investigate this. I created two tables 'foo' and 'bar' where the former contains only an id column, and the latter contains both an id and a foo id (though I didn't create a foreign key constraint). Then I populated both tables with 1000 rows:
CREATE TABLE foo (id INT PRIMARY KEY NOT NULL);
CREATE TABLE bar (id INT PRIMARY KEY, foo_id INT NOT NULL);
-- populate tables with 1000 rows in each
SELECT id
FROM foo
WHERE id IN
(
SELECT MAX(foo_id)
FROM bar
);
This simplified query has the same problem as before - the inner select is treated as a dependent subquery and no optimization is performed, causing the inner query to be run once per row. The query takes almost one second to run. Changing the IN
to =
again allows the query to run almost instantly.
The code I used to populate the tables is below, in case anyone wishes to reproduce the results.
CREATE TABLE filler (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=Memory;
DELIMITER $$
CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
DECLARE _cnt INT;
SET _cnt = 1;
WHILE _cnt <= cnt DO
INSERT
INTO filler
SELECT _cnt;
SET _cnt = _cnt + 1;
END WHILE;
END
$$
DELIMITER ;
CALL prc_filler(1000);
INSERT foo SELECT id FROM filler;
INSERT bar SELECT id, id FROM filler;
这篇关于为什么 IN 条件比“="慢?在 sql 中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!