问题描述
我想知道 java HashMap 与 ArrayList 相比的内存开销是多少?
I am wondering what is the memory overhead of java HashMap compared to ArrayList?
更新:
我想提高搜索一大包(6 百万以上)相同对象的特定值的速度.
I would like to improve the speed for searching for specific values of a big pack (6 Millions+) of identical objects.
因此,我正在考虑使用一个或多个 HashMap 而不是使用 ArrayList.但我想知道 HashMap 的开销是多少.
Thus, I am thinking about using one or several HashMap instead of using ArrayList. But I am wondering what is the overhead of HashMap.
据我了解,key没有被存储,只有key的hash,所以应该是对象的hash大小+一个指针.
As far as i understand, the key is not stored, only the hash of the key, so it should be something like size of the hash of the object + one pointer.
但是使用什么哈希函数呢?是Object 提供的吗? 还是另一个?
But what hash function is used? Is it the one offered by Object or another one?
推荐答案
如果您将 HashMap 与 ArrayList 进行比较,我认为您正在对 ArrayList 进行某种搜索/索引,例如二进制搜索或自定义哈希表...?因为 .get(key) 到 600 万个条目使用线性搜索是不可行的.
If you're comparing HashMap with ArrayList, I presume you're doing some sort of searching/indexing of the ArrayList, such as binary search or custom hash table...? Because a .get(key) thru 6 million entries would be infeasible using a linear search.
使用该假设,我进行了一些实证测试并得出结论:如果您将 ArrayList 与二进制搜索或自定义哈希映射实现一起使用,您可以在相同数量的 RAM 中存储 2.5 倍的小对象,与 HashMap 相比".我的测试是基于只包含3个字段的小对象,其中一个是key,key是一个整数.我使用的是 32 位 jdk 1.6.有关2.5"图的注意事项,请参见下文.
Using that assumption, I've done some empirical tests and come up with the conclusion that "You can store 2.5 times as many small objects in the same amount of RAM if you use ArrayList with binary search or custom hash map implementation, versus HashMap". My test was based on small objects containing only 3 fields, of which one is the key, and the key is an integer. I used a 32bit jdk 1.6. See below for caveats on this figure of "2.5".
需要注意的关键点是:
(a) 杀死你的不是引用或负载因子"所需的空间,而是对象创建所需的开销.如果键是原始类型,或者是 2 个或更多原始值或引用值的组合,则每个键都需要自己的对象,这会带来 8 个字节的开销.
(a) it's not the space required for references or "load factor" that kills you, but rather the overhead required for object creation. If the key is a primitive type, or a combination of 2 or more primitive or reference values, then each key will require its own object, which carries an overhead of 8 bytes.
(b) 根据我的经验,您通常需要将键作为值的一部分(例如,为了存储由客户 ID 索引的客户记录,您仍然希望客户 ID 作为客户对象的一部分).这意味着 HashMap 单独存储对键和值的引用有点浪费.
(b) In my experience you usually need the key as part of the value, (e.g. to store customer records, indexed by customer id, you still want the customer id as part of the Customer object). This means it is IMO somewhat wasteful that a HashMap separately stores references to keys and values.
注意事项:
HashMap 键最常用的类型是字符串.对象创建开销不适用于此处,因此差异会更小.
The most common type used for HashMap keys is String. The object creation overhead doesn't apply here so the difference would be less.
我得到一个 2.8 的数字,即插入 ArrayList 的 8880502 个条目与 -Xmx256M JVM 上的 HashMap 的 3148004 个条目相比,但我的 ArrayList 负载因子是 80%,我的对象非常小 - 12 个字节加 8字节对象开销.
I got a figure of 2.8, being 8880502 entries inserted into the ArrayList compared with 3148004 into the HashMap on -Xmx256M JVM, but my ArrayList load factor was 80% and my objects were quite small - 12 bytes plus 8 byte object overhead.
我的图和我的实现要求键包含在值中,否则我会在对象创建开销方面遇到同样的问题,这只是 HashMap 的另一种实现.
My figure, and my implementation, requires that the key is contained within the value, otherwise I'd have the same problem with object creation overhead and it would be just another implementation of HashMap.
我的代码:
public class Payload {
int key,b,c;
Payload(int _key) { key = _key; }
}
import org.junit.Test;
import java.util.HashMap;
import java.util.Map;
public class Overhead {
@Test
public void useHashMap()
{
int i=0;
try {
Map<Integer, Payload> map = new HashMap<Integer, Payload>();
for (i=0; i < 4000000; i++) {
int key = (int)(Math.random() * Integer.MAX_VALUE);
map.put(key, new Payload(key));
}
}
catch (OutOfMemoryError e) {
System.out.println("Got up to: " + i);
}
}
@Test
public void useArrayList()
{
int i=0;
try {
ArrayListMap map = new ArrayListMap();
for (i=0; i < 9000000; i++) {
int key = (int)(Math.random() * Integer.MAX_VALUE);
map.put(key, new Payload(key));
}
}
catch (OutOfMemoryError e) {
System.out.println("Got up to: " + i);
}
}
}
import java.util.ArrayList;
public class ArrayListMap {
private ArrayList<Payload> map = new ArrayList<Payload>();
private int[] primes = new int[128];
static boolean isPrime(int n)
{
for (int i=(int)Math.sqrt(n); i >= 2; i--) {
if (n % i == 0)
return false;
}
return true;
}
ArrayListMap()
{
for (int i=0; i < 11000000; i++) // this is clumsy, I admit
map.add(null);
int n=31;
for (int i=0; i < 128; i++) {
while (! isPrime(n))
n+=2;
primes[i] = n;
n += 2;
}
System.out.println("Capacity = " + map.size());
}
public void put(int key, Payload value)
{
int hash = key % map.size();
int hash2 = primes[key % primes.length];
if (hash < 0)
hash += map.size();
do {
if (map.get(hash) == null) {
map.set(hash, value);
return;
}
hash += hash2;
if (hash >= map.size())
hash -= map.size();
} while (true);
}
public Payload get(int key)
{
int hash = key % map.size();
int hash2 = primes[key % primes.length];
if (hash < 0)
hash += map.size();
do {
Payload payload = map.get(hash);
if (payload == null)
return null;
if (payload.key == key)
return payload;
hash += hash2;
if (hash >= map.size())
hash -= map.size();
} while (true);
}
}
这篇关于Java HashMap 与 ArrayList 相比的内存开销的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!