问题描述
我正在 Java EE 上使用 MySql 数据库开发 Web 项目.我们需要一个视图来汇总 3 个总行超过 300 万行的表中的数据.每个表都是用索引创建的.但是我还没有从我们用[group by]创建的视图中找到在条件选择语句检索中利用索引的方法.
I'm working on a web project with MySql database on Java EE. We needed a view to summarize data from 3 tables with over 3M rows overall. Each table was created with index. But I haven't found out a way to take advantages in the indexes in the conditional select statement retrieval from the view that we created with [group by].
我收到了一些人的建议,他们认为在 MySql 中使用视图不是一个好主意.因为您不能像在 oracle 中那样为 mysql 中的视图创建索引.但是在我进行的一些测试中,可以在视图选择语句中使用索引.也许我以错误的方式创建了这些视图.
I've getting suggestions from people that using views in MySql is not a good idea. Because you can't create index for views in mysql like in oracle. But in some test that I took, indexes can be used in view select statement. Maybe I've created those views in a wrong way.
我会用一个例子来描述我的问题.
I'll use a example to describe my problem.
我们有一个表格记录了 NBA 比赛的高分数据,在列 [happend_in] 上有索引
We have a table that records data for high scores in NBA games, with index on column [happend_in]
CREATE TABLE `highscores` (
`tbl_id` int(11) NOT NULL auto_increment,
`happened_in` int(4) default NULL,
`player` int(3) default NULL,
`score` int(3) default NULL,
PRIMARY KEY (`tbl_id`),
KEY `index_happened_in` (`happened_in`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
插入数据(8行)
INSERT INTO highscores(happened_in, player, score)
VALUES (2006, 24, 61),(2006, 24, 44),(2006, 24, 81),
(1998, 23, 51),(1997, 23, 46),(2006, 3, 55),(2007, 24, 34), (2008, 24, 37);
然后我创建一个视图来查看科比·布莱恩特每年获得的最高分
then I create a view to see the highest score that Kobe Bryant got in each year
CREATE OR REPLACE VIEW v_kobe_highScores
AS
SELECT player, max(score) AS highest_score, happened_in
FROM highscores
WHERE player = 24
GROUP BY happened_in;
我写了一个条件语句来查看kobe在2006获得的最高分;
I wrote a conditional statement to see the highest score that kobe got in 2006;
select * from v_kobe_highscores where happened_in = 2006;
当我在toad for mysql中解释的时候,我发现mysql已经扫描所有行形成视图,然后在其中查找带有条件的数据,而不使用[happened_in]上的索引.
When I explain it in toad for mysql, I found out that mysql have scan all rows to form the view, then find data with condition in it, without using index on [happened_in].
explain select * from v_kobe_highscores where happened_in = 2006;
我们在项目中使用的视图是在具有数百万行的表中构建的.在每个视图数据检索中扫描表中的所有行是不可接受的.请帮忙!谢谢!
The view that we use in our project is built among tables with millions of rows. Scanning all the rows from table in every view data retrieval is unacceptable. Please help! Thanks!
@zerkms 这是我在现实生活中测试的结果.我看不出有多大区别.我认为@spencer7593 的观点是正确的.MySQL 优化器不会在视图查询中下推"该谓词.
@zerkms Here is the result I tested on real-life. I don't see much differences between. I think @spencer7593 has the right point. The MySQL optimizer doesn't "push" that predicate down in the view query.
推荐答案
如何让 MySQL 使用索引进行视图查询?简短的回答,提供一个 MySQL 可以使用的索引.
How do you get MySQL to use an index for a view query? The short answer, provide an index that MySQL can use.
在这种情况下,最佳索引可能是覆盖"索引:
In this case, the optimum index is likely a "covering" index:
... ON highscores (player, happened_in, score)
MySQL 很可能会使用该索引,并且 EXPLAIN 将显示:"Using index"
由于 WHERE player = 24
(前导上的相等谓词索引中的列.GROUP BY changed_id
(索引中的第二列)可能允许 MySQL 使用索引来优化它以避免排序操作.包括 score
索引中的列将允许查询完全从索引中得到满足,而无需访问(查找)索引引用的数据页.
It's likely that MySQL will use that index, and the EXPLAIN will show: "Using index"
due to the WHERE player = 24
(an equality predicate on the leading column in the index. The GROUP BY happened_id
(the second column in the index), may allow MySQL to optimize that using the index to avoid a sort operation. Including the score
column in the index will allow the query to satisfied entirely from the index, without having to visit (lookup) the data pages referenced by the index.
这是一个快速的答案.更长的答案是 MySQL 不太可能使用带有 happened_id
前导列的索引进行视图查询.
That's the quick answer. The longer answer is that MySQL is very unlikely to use an index with leading column of happened_id
for the view query.
为什么视图会导致性能问题
MySQL 视图存在的问题之一是 MySQL 不会将谓词从外部查询下推"到视图查询中.
One of the issues you have with the MySQL view is that MySQL does not "push" the predicate from the outer query down into the view query.
您的外部查询指定 WHERE Haded_in = 2006
.MySQL 优化器在运行内部视图查询"时不考虑谓词.该视图查询在外部查询之前单独执行.执行该查询的结果集被物化";也就是说,结果存储为一个中间 MyISAM 表.(MySQL 称其为派生表",当您了解 MySQL 执行的操作时,他们使用的名称是有道理的.)
Your outer query specifies WHERE happened_in = 2006
. The MySQL optimizer does not consider the predicate when it runs the inner "view query". That query for the view gets executed separately, before the outer query. The resultset from the execution of that query get "materialized"; that is, the results are stored as an intermediate MyISAM table. (MySQL calls it a "derived table", and that name they use makes sense, when you understand the operations that MysQL performs.)
最重要的是,当 MySQL 运行形成视图定义的查询时,您在 happened_in
上定义的索引没有被 MySQL 使用.
The bottom line is that the index you have defined on happened_in
is not being used by MySQL when it rusn the query that forms the view definition.
创建中间派生表"后,然后执行外部查询,使用该派生表"作为行源.当外部查询运行时,happened_in = 2006
谓词被评估.
After the intermediate "derived table" is created, THEN the outer query is executed, using that "derived table" as a rowsource. It's when that outer query runs that the happened_in = 2006
predicate is evaluated.
请注意,视图查询中的所有行都已存储,(在您的情况下)是 happened_in
的每个值的一行,而不仅仅是您在外部查询.
Note that all of the rows from the view query are stored, which (in your case) is a row for EVERY value of happened_in
, not just the one you specify an equality predicate on in the outer query.
视图查询的处理方式可能有些出乎意料",这也是与其他关系数据库处理视图查询的方式相比,在 MySQL 中使用视图"会导致性能问题的原因之一.
The way that view queries are processed may be "unexpected" by some, and this is one reason that using "views" in MySQL can lead to performance problems, as compared to the way view queries are processed by other relational databases.
使用合适的覆盖索引提高视图查询的性能
鉴于您的视图定义和查询,您将获得的最佳视图查询是使用索引"访问方法.为此,您需要一个覆盖索引,例如
Given your view definition and your query, about the best you are going to get would be a "Using index" access method for the view query. To get that, you'd need a covering index, e.g.
... ON highscores (player, happened_in, score).
对于您现有的视图定义和现有查询来说,这可能是最有益的索引(性能方面).player
列是前导列,因为您在视图查询中的该列上有一个相等谓词.happened_in
列是下一个,因为您对该列进行了 GROUP BY 操作,并且 MySQL 将能够使用此索引来优化 GROUP BY 操作.我们还包括 score
列,因为这是您的查询中唯一引用的其他列.这使得索引成为覆盖"索引,因为 MySQL 可以直接从索引页面满足该查询,而无需访问基础表中的任何页面.这与我们将要退出查询计划一样好:使用索引"而不是使用文件排序".
That's likely to be the most beneficial index (performance wise) for your existing view definition and your existing query. The player
column is the leading column because you have an equality predicate on that column in the view query. The happened_in
column is next, because you've got a GROUP BY operation on that column, and MySQL is going to be able to use this index to optimize the GROUP BY operation. We also include the score
column, because that is the only other column referenced in your query. That makes the index a "covering" index, because MySQL can satisfy that query directly from index pages, without a need to visit any pages in the underlying table. And that's as good as we're going to get out of that query plan: "Using index" with no "Using filesort".
比较没有派生表的独立查询的性能
您可以将查询的执行计划与视图与等效的独立查询进行比较:
You could compare the execution plan for your query against the view vs. an equivalent standalone query:
SELECT player
, MAX(score) AS highest_score
, happened_in
FROM highscores
WHERE player = 24
AND happened_in = 2006
GROUP
BY player
, happened_in
独立查询也可以使用覆盖索引,例如
The standalone query can also make use of a covering index e.g.
... ON highscores (player, happened_in, score)
但不需要具体化一个中间 MyISAM 表.
but without a need to materialize an intermediate MyISAM table.
我不确定之前的任何一个都可以直接回答您提出的问题.
I am not sure that any of the previous provides a direct answer to the question you were asking.
问:如何让 MySQL 使用 INDEX 进行视图查询?
A:定义视图查询可以使用的合适的 INDEX.
简短的回答是提供一个覆盖索引"(索引包括视图查询中引用的所有列).该索引中的前导列应该是用等式谓词引用的列(在您的情况下,列 player
将是前导列,因为您有一个 player = 24
查询中的谓词.另外,在 GROUP BY 中引用的列应该是索引中的前导列,这允许 MySQL 通过使用索引而不是使用排序来优化 GROUP BY
操作操作.
The short answer is provide a "covering index" (index includes all columns referenced in the view query). The leading columns in that index should be the columns that are referenced with equality predicates (in your case, the column player
would be a leading column because you have a player = 24
predicate in the query. Also, the columns referenced in the GROUP BY should be leading columns in the index, which allows MySQL to optimize the GROUP BY
operation, by making use of the index rather than using a sort operation.
这里的关键是视图查询基本上是一个独立的查询;该查询的结果存储在一个中间派生"表(一个 MyISAM 表,当针对视图的查询运行时创建.
The key point here is that the view query is basically a standalone query; the results from that query get stored in an intermediate "derived" table (a MyISAM table that gets created when a query against the view gets run.
在 MySQL 中使用视图不一定是坏主意",但我强烈提醒那些选择在 MySQL 中使用视图的人要注意 MySQL 如何处理引用这些视图的查询.并且 MySQL 处理视图查询的方式(显着)不同于其他数据库(例如 Oracle、SQL Server)处理视图查询的方式.
Using views in MySQL is not necessarily a "bad idea", but I would strongly caution those who choose to use views within MySQL to be AWARE of how MySQL processes queries that reference those views. And the way MySQL processes view queries differs (significantly) from the way view queries are handled by other databases (e.g. Oracle, SQL Server).
这篇关于如何让 MySQL 使用 INDEX 进行视图查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!