问题描述
HashMap
的文档中有这样的短语:
A HashMap
has such a phrase from it's documentation:
如果初始容量大于最大条目数除以负载因子,则不会永远发生重新哈希操作.
If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.
注意文档是如何说 rehash,而不是 resize - 即使 rehash 只会在调整大小时发生;那是当桶的内部大小变成两倍大的时候.
Notice how the documentation says rehash, not resize - even if a rehash will only happen when a resize will; that is when the internal size of buckets gets twice as big.
当然HashMap
提供了这样一个构造函数,我们可以在其中定义这个初始容量.
And of course HashMap
provides such a constructor where we could define this initial capacity.
构造一个具有指定初始容量和默认加载因子 (0.75) 的空 HashMap.
Constructs an empty HashMap with the specified initial capacity and the default load factor (0.75).
好的,看起来很简单:
// these are NOT chosen randomly...
List<String> list = List.of("DFHXR", "YSXFJ", "TUDDY",
"AXVUH", "RUTWZ", "DEDUC", "WFCVW", "ZETCU", "GCVUR");
int maxNumberOfEntries = list.size(); // 9
double loadFactor = 0.75;
int capacity = (int) (maxNumberOfEntries / loadFactor + 1); // 13
所以容量是 13
(内部是 16
- 二次幂),这样我们保证文档部分不会重复.好的,让我们测试一下,但首先介绍一个方法,该方法将进入 HashMap
并查看值:
So capacity is 13
(internally it is 16
- next power of two), this way we guarantee that documentation part about no rehashes. Ok let's test this, but first introduce a method that will go into a HashMap
and look at the values:
private static <K, V> void debugResize(Map<K, V> map, K key, V value) throws Throwable {
Field table = map.getClass().getDeclaredField("table");
table.setAccessible(true);
Object[] nodes = ((Object[]) table.get(map));
// first put
if (nodes == null) {
// not incrementing currentResizeCalls because
// of lazy init; or the first call to resize is NOT actually a "resize"
map.put(key, value);
return;
}
int previous = nodes.length;
map.put(key, value);
int current = ((Object[]) table.get(map)).length;
if (previous != current) {
++HashMapResize.currentResizeCalls;
System.out.println(nodes.length + " " + current);
}
}
现在让我们测试一下:
static int currentResizeCalls = 0;
public static void main(String[] args) throws Throwable {
List<String> list = List.of("DFHXR", "YSXFJ", "TUDDY",
"AXVUH", "RUTWZ", "DEDUC", "WFCVW", "ZETCU", "GCVUR");
int maxNumberOfEntries = list.size(); // 9
double loadFactor = 0.75;
int capacity = (int) (maxNumberOfEntries / loadFactor + 1);
Map<String, String> map = new HashMap<>(capacity);
list.forEach(x -> {
try {
HashMapResize.debugResize(map, x, x);
} catch (Throwable throwable) {
throwable.printStackTrace();
}
});
System.out.println(HashMapResize.currentResizeCalls);
}
好吧,resize
被调用,因此条目被重新散列,而不是文档所说的.
Well, resize
was called and thus entries where rehashed, not what the documentation says.
如上所述,密钥不是随机选择的.这些设置是为了在存储桶转换为树时触发 static final int TREEIFY_THRESHOLD = 8;
属性.不是真的,因为我们还需要点击 MIN_TREEIFY_CAPACITY = 64
才能出现树;直到 resize
发生,或者桶的大小增加了一倍;因此会发生条目的重新散列.
As said, the keys were not chosen randomly. These were set-up so that they would trigger the static final int TREEIFY_THRESHOLD = 8;
property - when a bucket is converted to a tree. Well not really, since we need to hit also MIN_TREEIFY_CAPACITY = 64
for the tree to appear; until than resize
happens, or a bucket is doubled in size; thus rehashing of entries happens.
我只能提示为什么 HashMap
文档在那句话中是错误的,因为 在 java-8 之前,桶没有被转换为树;因此该属性将成立,从 java-8 开始,这不再是真的.由于我不确定这一点,因此我不会将其添加为答案.
I can only hint to why HashMap
documentation is wrong in that sentence, since before java-8, a bucket was not converted to a Tree; thus the property would hold, from java-8 and onwards that is not true anymore. Since I am not sure about this, I'm not adding this as an answer.
推荐答案
文档中的那一行,
如果初始容量大于最大条目数除以负载因子,则不会发生重新散列操作.
If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.
确实可以追溯到在 JDK 8 中添加树箱实现之前(JEP 180).您可以在 JDK 1.6 HashMap 文档中查看此文本.事实上,这篇文章可以追溯到 JDK 1.2,当时引入了 Collections Framework(包括 HashMap).您可以在 Web 上找到 JDK 1.2 文档的非官方版本,也可以从 存档 如果您想亲自查看.
indeed dates from before the tree-bin implementation was added in JDK 8 (JEP 180). You can see this text in the JDK 1.6 HashMap documentation. In fact, this text dates all the way back to JDK 1.2 when the Collections Framework (including HashMap) was introduced. You can find unofficial versions of the JDK 1.2 docs around the web, or you can download a version from the archives if you want to see for yourself.
我相信这个文档在添加树箱实现之前是正确的.但是,正如您所观察到的,现在有些情况下它是不正确的.该策略不仅是如果条目数除以负载因子超过容量(实际上是表长度),则会发生大小调整.正如您所指出的,如果单个存储桶中的条目数超过 TREEIFY_THRESHOLD(当前为 8)但表长度小于 MIN_TREEIFY_CAPACITY(当前为 64),则可能也发生调整大小.
I believe this documentation was correct up until the tree-bin implementation was added. However, as you've observed, there are now cases where it's incorrect. The policy is not only that resizing can occur if the number of entries divided by the load factor exceeds the capacity (really, table length). As you noted, resizes can also occur if the number of entries in a single bucket exceeds TREEIFY_THRESHOLD (currently 8) but the table length is smaller than MIN_TREEIFY_CAPACITY (currently 64).
您可以在 treeifyBin()
HashMap 方法.
You can see this decision in the treeifyBin()
method of HashMap.
if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
resize();
else if ((e = tab[index = (n - 1) & hash]) != null) {
当单个存储桶中的条目数超过 TREEIFY_THRESHOLD 时,就会到达代码中的这一点.如果表大小等于或大于 MIN_TREEIFY_CAPACITY,则此 bin 被树化;否则,表格只是调整大小.
This point in the code is reached when there are more than TREEIFY_THRESHOLD entries in a single bucket. If the table size is at or above MIN_TREEIFY_CAPACITY, this bin is treeified; otherwise, the table is simply resized.
请注意,在较小的表大小下,这可能会留下比 TREEIFY_THRESHOLD 更多的条目.这并不难证明.首先,一些反射HashMap-dumping代码:
Note that this can leave bins with rather more entries than TREEIFY_THRESHOLD at small table sizes. This isn't terribly difficult to demonstrate. First, some reflective HashMap-dumping code:
// run with --add-opens java.base/java.util=ALL-UNNAMED
static Class<?> classNode;
static Class<?> classTreeNode;
static Field fieldNodeNext;
static Field fieldHashMapTable;
static void init() throws ReflectiveOperationException {
classNode = Class.forName("java.util.HashMap$Node");
classTreeNode = Class.forName("java.util.HashMap$TreeNode");
fieldNodeNext = classNode.getDeclaredField("next");
fieldNodeNext.setAccessible(true);
fieldHashMapTable = HashMap.class.getDeclaredField("table");
fieldHashMapTable.setAccessible(true);
}
static void dumpMap(HashMap<?, ?> map) throws ReflectiveOperationException {
Object[] table = (Object[])fieldHashMapTable.get(map);
System.out.printf("map size = %d, table length = %d%n", map.size(), table.length);
for (int i = 0; i < table.length; i++) {
Object node = table[i];
if (node == null)
continue;
System.out.printf("table[%d] = %s", i,
classTreeNode.isInstance(node) ? "TreeNode" : "BasicNode");
for (; node != null; node = fieldNodeNext.get(node))
System.out.print(" " + node);
System.out.println();
}
}
现在,让我们添加一堆字符串,它们都落入同一个桶中.选择这些字符串时,它们的哈希值(由 HashMap 计算)均为 0 mod 64.
Now, let's add a bunch of strings that all fall into the same bucket. These strings are chosen such that their hash values, as computed by HashMap, are all 0 mod 64.
public static void main(String[] args) throws ReflectiveOperationException {
init();
List<String> list = List.of(
"LBCDD", "IKBNU", "WZQAG", "MKEAZ", "BBCHF", "KRQHE", "ZZMWH", "FHLVH",
"ZFLXM", "TXXPE", "NSJDQ", "BXDMJ", "OFBCR", "WVSIG", "HQDXY");
HashMap<String, String> map = new HashMap<>(1, 10.0f);
for (String s : list) {
System.out.println("===> put " + s);
map.put(s, s);
dumpMap(map);
}
}
从 1 的初始表大小和荒谬的负载因子开始,这会将 8 个条目放入单独的存储桶中.然后,每次添加另一个条目时,表都会调整大小(加倍),但所有条目最终都在同一个存储桶中.这最终会产生一个大小为 64 的表,其中一个存储桶具有长度为 14 的线性节点链(基本节点"),然后添加下一个条目最终将其转换为树.
Starting from an initial table size of 1 and a ridiculous load factor, this puts 8 entries into the lone bucket. Then, each time another entry is added, the table is resized (doubled) but all the entries end up in the same bucket. This eventually results in a table of size 64 with one bucket having a linear chain of nodes ("basic nodes") of length 14, before adding the next entry finally converts this to a tree.
程序的输出如下:
===> put LBCDD
map size = 1, table length = 1
table[0] = BasicNode LBCDD=LBCDD
===> put IKBNU
map size = 2, table length = 1
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU
===> put WZQAG
map size = 3, table length = 1
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG
===> put MKEAZ
map size = 4, table length = 1
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ
===> put BBCHF
map size = 5, table length = 1
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ BBCHF=BBCHF
===> put KRQHE
map size = 6, table length = 1
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ BBCHF=BBCHF KRQHE=KRQHE
===> put ZZMWH
map size = 7, table length = 1
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ BBCHF=BBCHF KRQHE=KRQHE ZZMWH=ZZMWH
===> put FHLVH
map size = 8, table length = 1
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ BBCHF=BBCHF KRQHE=KRQHE ZZMWH=ZZMWH FHLVH=FHLVH
===> put ZFLXM
map size = 9, table length = 2
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ BBCHF=BBCHF KRQHE=KRQHE ZZMWH=ZZMWH FHLVH=FHLVH ZFLXM=ZFLXM
===> put TXXPE
map size = 10, table length = 4
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ BBCHF=BBCHF KRQHE=KRQHE ZZMWH=ZZMWH FHLVH=FHLVH ZFLXM=ZFLXM TXXPE=TXXPE
===> put NSJDQ
map size = 11, table length = 8
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ BBCHF=BBCHF KRQHE=KRQHE ZZMWH=ZZMWH FHLVH=FHLVH ZFLXM=ZFLXM TXXPE=TXXPE NSJDQ=NSJDQ
===> put BXDMJ
map size = 12, table length = 16
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ BBCHF=BBCHF KRQHE=KRQHE ZZMWH=ZZMWH FHLVH=FHLVH ZFLXM=ZFLXM TXXPE=TXXPE NSJDQ=NSJDQ BXDMJ=BXDMJ
===> put OFBCR
map size = 13, table length = 32
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ BBCHF=BBCHF KRQHE=KRQHE ZZMWH=ZZMWH FHLVH=FHLVH ZFLXM=ZFLXM TXXPE=TXXPE NSJDQ=NSJDQ BXDMJ=BXDMJ OFBCR=OFBCR
===> put WVSIG
map size = 14, table length = 64
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ BBCHF=BBCHF KRQHE=KRQHE ZZMWH=ZZMWH FHLVH=FHLVH ZFLXM=ZFLXM TXXPE=TXXPE NSJDQ=NSJDQ BXDMJ=BXDMJ OFBCR=OFBCR WVSIG=WVSIG
===> put HQDXY
map size = 15, table length = 64
table[0] = TreeNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ BBCHF=BBCHF KRQHE=KRQHE ZZMWH=ZZMWH FHLVH=FHLVH ZFLXM=ZFLXM TXXPE=TXXPE NSJDQ=NSJDQ BXDMJ=BXDMJ OFBCR=OFBCR WVSIG=WVSIG HQDXY=HQDXY
这篇关于HashMap rehash/resize容量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!