问题描述
我正在尝试在 h2o 中加载大于内存大小的数据.
I am experimenting with loading data bigger than the memory size in h2o.
H2o blog 提到:关于大数据的注释和 GC:当 Java 堆太满时,我们会执行用户模式交换到磁盘,即,您使用的大数据比物理 DRAM 多.我们不会因为 GC 死亡螺旋而死,但我们会降级到核外速度.我们会在磁盘允许的范围内以最快的速度前进.我亲自测试了将 12Gb 数据集加载到 2Gb(32 位)JVM 中;加载数据大约需要 5 分钟,运行 Logistic 回归又需要 5 分钟.
这里是R
代码连接到h2o 3.6.0.8
:
h2o.init(max_mem_size = '60m') # alloting 60mb for h2o, R is running on 8GB RAM machine
给予
java version "1.8.0_65"
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
.Successfully connected to http://127.0.0.1:54321/
R is connected to the H2O cluster:
H2O cluster uptime: 2 seconds 561 milliseconds
H2O cluster version: 3.6.0.8
H2O cluster name: H2O_started_from_R_RILITS-HWLTP_tkn816
H2O cluster total nodes: 1
H2O cluster total memory: 0.06 GB
H2O cluster total cores: 4
H2O cluster allowed cores: 2
H2O cluster healthy: TRUE
Note: As started, H2O is limited to the CRAN default of 2 CPUs.
Shut down and restart H2O as shown below to use all your CPUs.
> h2o.shutdown()
> h2o.init(nthreads = -1)
IP Address: 127.0.0.1
Port : 54321
Session ID: _sid_b2e0af0f0c62cd64a8fcdee65b244d75
Key Count : 3
我尝试将 169 MB 的 csv 加载到 h2o 中.
I tried to load a 169 MB csv into h2o.
dat.hex <- h2o.importFile('dat.csv')
引发错误,
Error in .h2o.__checkConnectionHealth() :
H2O connection has been severed. Cannot connect to instance at http://127.0.0.1:54321/
Failed to connect to 127.0.0.1 port 54321: Connection refused
表示内存不足错误.
问题:如果 H2o 承诺加载大于其内存容量的数据集(如上面的博客引用所说的交换到磁盘机制),这是加载数据的正确方法吗?
Question: If H2o promises loading a data set larger than its memory capacity(swap to disk mechanism as the blog quote above says), is this the correct way to load the data?
推荐答案
Swap-to-disk 不久前被默认禁用,因为性能太差了.最前沿(不是最新的稳定版)有一个启用它的标志:--cleaner"(用于内存清理器").
请注意,您的集群有一个非常小的内存:H2O 集群总内存:0.06 GB
那是60MB!几乎不足以启动 JVM,更不用说运行 H2O.如果 H2O 可以在那里正常出现,我会感到惊讶,更不用说交换到磁盘了.交换仅限于单独交换数据.如果您尝试进行交换测试,请将您的 JVM 提升到 1 或 2 Gigs ram,然后加载总和超过此值的数据集.
Swap-to-disk was disabled by default awhile ago, because performance was so bad. The bleeding-edge (not latest stable) has a flag to enable it: "--cleaner" (for "memory cleaner").
Note that your cluster has an EXTREMELY tiny memory:
H2O cluster total memory: 0.06 GB
That's 60MB! Barely enough to start a JVM with, much less run H2O. I would be surprised if H2O could come up properly there at all, never mind the swap-to-disk. Swapping is limited to swapping the data alone. If you're trying to do a swap-test, up your JVM to 1 or 2 Gigs ram, and then load datasets that sum more than that.
悬崖
这篇关于在 h2o 中加载大于内存大小的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!