问题描述
正如标题所示,这是一个关于 HashMap#resize
的实现细节的问题 - 即内部数组的大小翻倍.这有点罗嗦,但我真的试图证明我已经尽我所能理解这一点......
这发生在此特定存储桶/bin 中的条目以 Linked
方式存储时 - 因此具有准确的顺序并且在问题的上下文中 这很重要强>.
resize
通常也可以从其他地方调用,但我们只看这种情况.
假设您将这些字符串作为键放在 HashMap
中(右侧是 hashcode
after HashMap#hash代码> - 这是内部重新散列.)是的,这些是精心生成的,不是随机的.
DFHXR - 11111YSXFJ-01111图迪 - 11111AXVUH - 01111RUTWZ - 11111德杜克 - 01111WFCVW-11111ZETCU-01111GCVUR - 11111
这里有一个简单的模式需要注意 - 最后 4 位对于所有这些都是相同的 - 这意味着当我们插入 8 个这些键(总共有 9 个)时,它们最终将在同一个桶中;并且在第 9 天 HashMap#put
,将调用 resize
.
因此,如果当前 HashMap
中有 8 个条目(具有上述键之一) - 这意味着此映射中有 16 个桶,它们的最后 4 位键决定了条目的位置结果.
我们放了第九把钥匙.此时 TREEIFY_THRESHOLD
被命中并且 resize
被调用.bin 加倍为 32
并且键中的另一位决定该条目的去向(所以,现在是 5 位).
最终到达这段代码(当 resize
发生时):
节点<K,V>loHead = null, loTail = null;节点<K,V>hiHead = null, hiTail = null;节点<K,V>下一个;做 {下一个 = e.下一个;if ((e.hash & oldCap) == 0) {如果(loTail == null)低头 = e;别的loTail.next = e;loTail = e;}别的 {如果(hiTail == null)hiHead = e;别的hiTail.next = e;hiTail = e;}} while ((e = next) != null);如果(loTail!= null){loTail.next = null;新标签[j] = loHead;}如果(hiTail!= null){hiTail.next = null;newTab[j + oldCap] = hiHead;}
实际上并没有那么复杂...它的作用是将当前 bin 拆分为 将移动 到其他 bin 的条目和 不会移动 到其他 bin 的条目垃圾箱 - 但肯定会留在这个垃圾箱中.
它实际上是非常聪明的——通过这段代码:
if ((e.hash & oldCap) == 0)
它的作用是检查下一位(在我们的例子中是第 5 位)是否实际上为零 - 如果是,则意味着该条目将保持在原位;如果不是,它将在新 bin 中以 2 次方偏移量移动.
现在最后的问题是:调整大小中的那段代码是经过精心制作的,以便它保持条目的顺序在那个 bin 中.
所以在你将这 9 个键放入 HashMap
之后,顺序将是:
DFHXR ->TUDDY ->RUTWZ ->WFCVW->GCVUR(一箱)YSXFJ->AXVUH ->推理->ZETCU(另一个垃圾箱)
为什么要保留 HashMap
中某些条目的顺序.Map
中的顺序真的 很糟糕,详细这里或这里.
设计考虑已记录在同一源文件中,在 第 211 行
<块引用>* 当 bin 列表被树化、拆分或未树化时,我们保持* 它们以相同的相对访问/遍历顺序(即,字段* Node.next) 以更好地保持局部性,并稍微* 简化调用的拆分和遍历的处理*迭代器.删除.在插入时使用比较器,以保持* 总排序(或此处要求的接近)* 重新平衡,我们将类和 identityHashCodes 比较为* 决胜局.
由于通过迭代器删除映射不会触发调整大小,因此在 resize
中特别保留顺序的原因是更好地保留局部性,并稍微简化拆分处理",以及与政策保持一致.
As the title suggests this is a question about an implementation detail from HashMap#resize
- that's when the inner array is doubled in size.
It's a bit wordy, but I've really tried to prove that I did my best understanding this...
This happens at a point when entries in this particular bucket/bin are stored in a Linked
fashion - thus having an exact order and in the context of the question this is important.
Generally the resize
could be called from other places as well, but let's look at this case only.
Suppose you put these strings as keys in a HashMap
(on the right there's the hashcode
after HashMap#hash
- that's the internal re-hashing.) Yes, these are carefully generated, not random.
DFHXR - 11111
YSXFJ - 01111
TUDDY - 11111
AXVUH - 01111
RUTWZ - 11111
DEDUC - 01111
WFCVW - 11111
ZETCU - 01111
GCVUR - 11111
There's a simple pattern to notice here - the last 4 bits are the same for all of them - which means that when we insert 8 of these keys (there are 9 total), they will end-up in the same bucket; and on the 9-th HashMap#put
, the resize
will be called.
So if currently there are 8 entries (having one of the keys above) in the HashMap
- it means there are 16 buckets in this map and the last 4 bits of they key decided where the entries end-up.
We put the nine-th key. At this point TREEIFY_THRESHOLD
is hit and resize
is called. The bins are doubled to 32
and one more bit from the keys decides where that entry will go (so, 5 bits now).
Ultimately this piece of code is reached (when resize
happens):
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
if (loTail != null) {
loTail.next = null;
newTab[j] = loHead;
}
if (hiTail != null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
It's actually not that complicated... what it does it splits the current bin into entries that will move to other bins and to entries that will not move to other bins - but will remain into this one for sure.
And it's actually pretty smart how it does that - it's via this piece of code:
if ((e.hash & oldCap) == 0)
What this does is check if the next bit (the 5-th in our case) is actually zero - if it is, it means that this entry will stay where it is; if it's not it will move with a power of two offset in the new bin.
And now finally the question: that piece of code in the resize is carefully made so that it preserves the order of the entries there was in that bin.
So after you put those 9 keys in the HashMap
the order is going to be :
DFHXR -> TUDDY -> RUTWZ -> WFCVW -> GCVUR (one bin)
YSXFJ -> AXVUH -> DEDUC -> ZETCU (another bin)
Why would you want to preserve order of some entries in the HashMap
. Order in a Map
is really bad as detailed here or here.
The design consideration has been documented within the same source file, in a code comment in line 211
* When bin lists are treeified, split, or untreeified, we keep * them in the same relative access/traversal order (i.e., field * Node.next) to better preserve locality, and to slightly * simplify handling of splits and traversals that invoke * iterator.remove. When using comparators on insertion, to keep a * total ordering (or as close as is required here) across * rebalancings, we compare classes and identityHashCodes as * tie-breakers.
Since removing mappings via an iterator can’t trigger a resize, the reasons to retain the order specifically in resize
are "to better preserve locality, and to slightly simplify handling of splits", as well as being consistent regarding the policy.
这篇关于HashMap resize方法实现细节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!