问题描述
我有一个包含大约 100.000 个博客帖子的表格,通过 1:n 关系链接到一个包含 50 个提要的表格.当我使用 select 语句查询两个表时,按发布表的日期时间字段排序,MySQL 总是使用文件排序,导致查询时间非常慢(> 1 秒).这是 postings 表的架构(简化):
+---------------------+--------------+------+-----+----------+----------------+|领域 |类型 |空 |钥匙 |默认 |额外 |+---------------------+--------------+------+-----+---------+----------------+|身份证 |整数(11) |否 |PRI |空 |自动增量||feed_id |整数(11) |否 |多 |空 |||crawl_date |日期时间 |否 ||空 |||is_active |tinyint(1) |否 |多 |0 |||链接 |varchar(255) |否 |多 |空 |||作者 |varchar(255) |否 ||空 |||标题 |varchar(255) |否 ||空 |||摘录|文字 |否 ||空 |||long_excerpt |文字 |否 ||空 |||user_offtopic_count |整数(11) |否 |多 |0 ||+---------------------+--------------+------+-----+---------+----------------+
这里是 feed
表:
+-------------+--------------+------+-----+---------+----------------+|领域 |类型 |空 |钥匙 |默认 |额外 |+--------------+--------------+------+-------+---------+--+|身份证 |整数(11) |否 |PRI |空 |自动增量||类型 |整数(11) |否 |多 |0 |||标题 |varchar(255) |否 ||空 |||网站 |varchar(255) |否 ||空 |||网址 |varchar(255) |否 ||空 ||+--------------+--------------+------+-------+---------+--+
这是执行时间超过 1 秒的查询.请注意 post_date
字段有一个索引,但 MySQL 没有使用它来对帖子表进行排序:
SELECT`帖子`.`id`,UNIX_TIMESTAMP(postings.post_date) 作为 post_date,`帖子`.`链接`,`postings`.`title`,`帖子`.`作者`,`帖子`.`摘录`,`postings`.`long_excerpt`,`feeds`.`title` AS feed_title,`feeds`.`website` AS feed_website从(`帖子`)加入`饲料`在`feeds`.`id` = `postings`.`feed_id`在哪里`feeds`.`type` = 1 AND`postings`.`user_offtopic_count` <10 与`postings`.`is_active` = 1订购者`postings`.`post_date` desc限制15
这个查询的explain extended
命令的结果表明MySQL正在使用文件排序:
+----+-------------+----------+--------+---------------------------------------+-----------+----------+------------------------------+-------+----------------------+|身份证 |选择类型 |表|类型 |可能的密钥|键 |key_len |参考 |行 |额外 |+----+-------------+---------+--------+---------------------------------------+-----------+---------+----------------------------+-------+----------------------------+|1 |简单 |帖子|参考 |feed_id,is_active,user_offtopic_count |is_active |1 |常量 |30996 |使用哪里;使用文件排序 ||1 |简单 |饲料|eq_ref |主要,类型 |主要 |4 |feedian.postings.feed_id |1 |使用 where |+----+-------------+---------+--------+---------------------------------------+-----------+---------+----------------------------+-------+----------------------------+
当我删除 order by
部分时,MySQL 停止使用文件排序.如果您对如何优化此查询以让 MySQL 使用索引对数据进行排序和选择有任何想法,请告诉我.我已经尝试了一些方法,例如根据一些博客帖子的建议,在所有 where/order by 字段上创建组合索引,但这也不起作用.
在 postings (is_active, post_date)
(按此顺序)上创建复合索引.
它将用于过滤 is_active
和按 post_date
排序.
MySQL
应该在 EXPLAIN EXTENDED
中显示对该索引的 REF
访问方法.
请注意,您在 user_offtopic_count
上有一个 RANGE
过滤条件,这就是为什么您不能在过滤和按其他字段排序时对该字段使用索引.>
根据您的 user_offtopic_count
的选择性(即有多少行满足 user_offtopic_count <10
),在 user_offtopic_count 上创建索引可能更有用
并让 post_dates 排序.
为此,请在 postings (is_active, user_offtopic_count)
上创建一个复合索引,并确保使用对该索引的 RANGE
访问方法.
哪个索引会更快取决于您的数据分布.创建两个索引,FORCE
它们,看看哪个更快:
CREATE INDEX ix_active_offtopic ON 发布 (is_active, user_offtopic_count);在帖子上创建索引 ix_active_date (is_active, post_date);选择`帖子`.`id`,UNIX_TIMESTAMP(postings.post_date) 作为 post_date,`帖子`.`链接`,`postings`.`title`,`帖子`.`作者`,`帖子`.`摘录`,`postings`.`long_excerpt`,`feeds`.`title` AS feed_title,`feeds`.`website` AS feed_website从`postings` FORCE INDEX (ix_active_offtopic)加入`饲料`在`feeds`.`id` = `postings`.`feed_id`在哪里`feeds`.`type` = 1 AND`postings`.`user_offtopic_count` <10 与`postings`.`is_active` = 1订购者`postings`.`post_date` desc限制15/* 这应该显示具有几行的 RANGE 访问并保留 FILESORT */选择`帖子`.`id`,UNIX_TIMESTAMP(postings.post_date) 作为 post_date,`帖子`.`链接`,`postings`.`title`,`帖子`.`作者`,`帖子`.`摘录`,`postings`.`long_excerpt`,`feeds`.`title` AS feed_title,`feeds`.`website` AS feed_website从`postings` FORCE INDEX (ix_active_date)加入`饲料`在`feeds`.`id` = `postings`.`feed_id`在哪里`feeds`.`type` = 1 AND`postings`.`user_offtopic_count` <10 与`postings`.`is_active` = 1订购者`postings`.`post_date` desc限制15/* 这应该显示具有大量行且没有 FILESORT 的 REF 访问 */
I have a table with roughly 100.000 blog postings, linked to a table with 50 feeds via an 1:n relationship. When I query both tables with a select statement, ordered by a datetime field of the postings table, MySQL always uses filesort, resulting in very slow query times (>1 second). Here's the schema of the postings
table (simplified):
+---------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| feed_id | int(11) | NO | MUL | NULL | |
| crawl_date | datetime | NO | | NULL | |
| is_active | tinyint(1) | NO | MUL | 0 | |
| link | varchar(255) | NO | MUL | NULL | |
| author | varchar(255) | NO | | NULL | |
| title | varchar(255) | NO | | NULL | |
| excerpt | text | NO | | NULL | |
| long_excerpt | text | NO | | NULL | |
| user_offtopic_count | int(11) | NO | MUL | 0 | |
+---------------------+--------------+------+-----+---------+----------------+
And here's the feed
table:
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| type | int(11) | NO | MUL | 0 | |
| title | varchar(255) | NO | | NULL | |
| website | varchar(255) | NO | | NULL | |
| url | varchar(255) | NO | | NULL | |
+-------------+--------------+------+-----+---------+----------------+
And here's the query that takes >1 second to execute. Please note that the post_date
field has an index, but MySQL isn't using it to sort the postings table:
SELECT
`postings`.`id`,
UNIX_TIMESTAMP(postings.post_date) as post_date,
`postings`.`link`,
`postings`.`title`,
`postings`.`author`,
`postings`.`excerpt`,
`postings`.`long_excerpt`,
`feeds`.`title` AS feed_title,
`feeds`.`website` AS feed_website
FROM
(`postings`)
JOIN
`feeds`
ON
`feeds`.`id` = `postings`.`feed_id`
WHERE
`feeds`.`type` = 1 AND
`postings`.`user_offtopic_count` < 10 AND
`postings`.`is_active` = 1
ORDER BY
`postings`.`post_date` desc
LIMIT
15
The result of the explain extended
command on this query shows that MySQL is using filesort:
+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+
| 1 | SIMPLE | postings | ref | feed_id,is_active,user_offtopic_count | is_active | 1 | const | 30996 | Using where; Using filesort |
| 1 | SIMPLE | feeds | eq_ref | PRIMARY,type | PRIMARY | 4 | feedian.postings.feed_id | 1 | Using where |
+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+
When I remove the order by
part, MySQL stops using filesort. Please let me know if you have any ideas on how to optimize this query to get MySQL to sort and select the data by using indexes. I have already tried a few things such as creating a combined index on all where/order by fields, as suggested by a few blog postings, but this didn't work either.
Create a composite index either on postings (is_active, post_date)
(in that order).
It will be used both for filtering on is_active
and ordering by post_date
.
MySQL
should show REF
access method over this index in EXPLAIN EXTENDED
.
Note that you have a RANGE
filtering condition over user_offtopic_count
, that's why you cannot use an index over this field both in filtering and in sorting by other field.
Depending on how selective is your user_offtopic_count
(i. e. how many rows satisfy user_offtopic_count < 10
), it may be more useful to create an index on user_offtopic_count
and let the post_dates be sorted.
To do this, create a composite index on postings (is_active, user_offtopic_count)
and make sure the RANGE
access method over this index is used.
Which index will be faster depends on your data distribuion. Create both indexes, FORCE
them and see which is faster:
CREATE INDEX ix_active_offtopic ON postings (is_active, user_offtopic_count);
CREATE INDEX ix_active_date ON postings (is_active, post_date);
SELECT
`postings`.`id`,
UNIX_TIMESTAMP(postings.post_date) as post_date,
`postings`.`link`,
`postings`.`title`,
`postings`.`author`,
`postings`.`excerpt`,
`postings`.`long_excerpt`,
`feeds`.`title` AS feed_title,
`feeds`.`website` AS feed_website
FROM
`postings` FORCE INDEX (ix_active_offtopic)
JOIN
`feeds`
ON
`feeds`.`id` = `postings`.`feed_id`
WHERE
`feeds`.`type` = 1 AND
`postings`.`user_offtopic_count` < 10 AND
`postings`.`is_active` = 1
ORDER BY
`postings`.`post_date` desc
LIMIT
15
/* This should show RANGE access with few rows and keep the FILESORT */
SELECT
`postings`.`id`,
UNIX_TIMESTAMP(postings.post_date) as post_date,
`postings`.`link`,
`postings`.`title`,
`postings`.`author`,
`postings`.`excerpt`,
`postings`.`long_excerpt`,
`feeds`.`title` AS feed_title,
`feeds`.`website` AS feed_website
FROM
`postings` FORCE INDEX (ix_active_date)
JOIN
`feeds`
ON
`feeds`.`id` = `postings`.`feed_id`
WHERE
`feeds`.`type` = 1 AND
`postings`.`user_offtopic_count` < 10 AND
`postings`.`is_active` = 1
ORDER BY
`postings`.`post_date` desc
LIMIT
15
/* This should show REF access with lots of rows and no FILESORT */
这篇关于MySQL 性能优化:按日期时间字段排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!