问题描述
既然 Table Scan
和 Clustered Index Scan
本质上都是扫描表中的所有记录,那么为什么 Clustered Index Scan 应该更好?
Since both a Table Scan
and a Clustered Index Scan
essentially scan all records in the table, why is a Clustered Index Scan supposedly better?
举个例子 - 当有很多记录时,以下之间的性能差异是什么?:
As an example - what's the performance difference between the following when there are many records?:
declare @temp table(
SomeColumn varchar(50)
)
insert into @temp
select 'SomeVal'
select * from @temp
-----------------------------
declare @temp table(
RowID int not null identity(1,1) primary key,
SomeColumn varchar(50)
)
insert into @temp
select 'SomeVal'
select * from @temp
推荐答案
在没有聚集索引的表(堆表)中,数据页没有链接在一起——所以遍历页需要一个查找索引分配图.
In a table without a clustered index (a heap table), data pages are not linked together - so traversing pages requires a lookup into the Index Allocation Map.
然而,聚簇表将数据页链接在双向链表中 - 使顺序扫描更快一点.当然,作为交换,您有处理在 INSERT
、UPDATE
和 DELETE
上保持数据页顺序的开销.但是,堆表需要对 IAM 进行第二次写入.
A clustered table, however, has it's data pages linked in a doubly linked list - making sequential scans a bit faster. Of course, in exchange, you have the overhead of dealing with keeping the data pages in order on INSERT
, UPDATE
, and DELETE
. A heap table, however, requires a second write to the IAM.
如果您的查询具有 RANGE
运算符(例如:SELECT * FROM TABLE WHERE Id BETWEEN 1 AND 100
),则为聚簇表(按保证顺序)会更有效 - 因为它可以使用索引页来查找相关数据页.堆必须扫描所有行,因为它不能依赖排序.
If your query has a RANGE
operator (e.g.: SELECT * FROM TABLE WHERE Id BETWEEN 1 AND 100
), then a clustered table (being in a guaranteed order) would be more efficient - as it could use the index pages to find the relevant data page(s). A heap would have to scan all rows, since it cannot rely on ordering.
当然,聚集索引可以让您执行 CLUSTERED INDEX SEEK,这对于性能来说非常理想……没有索引的堆总是会导致表扫描.
And, of course, a clustered index lets you do a CLUSTERED INDEX SEEK, which is pretty much optimal for performance...a heap with no indexes would always result in a table scan.
所以:
对于您选择所有行的示例查询,唯一的区别是聚集索引维护的双向链表.这应该会使您的聚簇表比具有大量行的堆快一点点.
For your example query where you select all rows, the only difference is the doubly linked list a clustered index maintains. This should make your clustered table just a tiny bit faster than a heap with a large number of rows.
对于带有 WHERE
子句的查询,该子句可以(至少部分地)被聚集索引满足,你会因为排序而领先 - 所以你不会必须扫描整个表.
For a query with a WHERE
clause that can be (at least partially) satisfied by the clustered index, you'll come out ahead because of the ordering - so you won't have to scan the entire table.
对于聚簇索引不满足的查询,您几乎可以......同样,唯一的区别是用于顺序扫描的双向链表.无论哪种情况,你都不是最理想的.
For a query that is not satisified by the clustered index, you're pretty much even...again, the only difference being that doubly linked list for sequential scanning. In either case, you're suboptimal.
对于INSERT
、UPDATE
和DELETE
,堆可能会或可能不会获胜.堆不必维护顺序,但确实需要对 IAM 进行第二次写入.我认为相对性能差异可以忽略不计,但也非常依赖于数据.
For INSERT
, UPDATE
, and DELETE
a heap may or may not win. The heap doesn't have to maintain order, but does require a second write to the IAM. I think the relative performance difference would be negligible, but also pretty data dependent.
Microsoft 有一份白皮书,将聚集索引与堆上的等效非聚集索引(与我上面讨论的不完全相同,但很接近).他们的结论基本上是在所有表上放置聚集索引.我会尽力总结他们的结果(同样,请注意,他们在这里确实将非聚集索引与聚集索引进行了比较 - 但我认为它具有相对可比性):
Microsoft has a whitepaper which compares a clustered index to an equivalent non-clustered index on a heap (not exactly the same as I discussed above, but close). Their conclusion is basically to put a clustered index on all tables. I'll do my best to summarize their results (again, note that they're really comparing a non-clustered index to a clustered index here - but I think it's relatively comparable):
INSERT
性能:由于堆需要第二次写入,聚集索引的性能提高了大约 3%.UPDATE
性能:由于堆需要第二次查找,聚集索引的性能提高了约 8%.DELETE
性能:由于需要第二次查找以及需要从 IAM 中对堆进行第二次删除,聚集索引的性能提高了约 18%.- 单个
SELECT
性能:由于堆需要第二次查找,聚集索引胜出约 16%. - range
SELECT
性能:由于堆的随机排序,聚集索引胜出约 29%. - 并发
INSERT
:由于聚集索引的页面拆分,堆表在负载下胜出 30%.
INSERT
performance: clustered index wins by about 3% due to the second write needed for a heap.UPDATE
performance: clustered index wins by about 8% due to the second lookup needed for a heap.DELETE
performance: clustered index wins by about 18% due to the second lookup needed and the second delete needed from the IAM for a heap.- single
SELECT
performance: clustered index wins by about 16% due to the second lookup needed for a heap. - range
SELECT
performance: clustered index wins by about 29% due to the random ordering for a heap. - concurrent
INSERT
: heap table wins by 30% under load due to page splits for the clustered index.
这篇关于表扫描和聚集索引扫描有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!