ndarray 比 recarray 访问快吗?

is ndarray faster than recarray access?(ndarray 比 recarray 访问快吗?)
本文介绍了ndarray 比 recarray 访问快吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我能够将我的 recarray 数据复制到 ndarray,进行一些计算并返回带有更新值的 ndarray.

I was able to copy my recarray data to a ndarray, do some calculations and return the ndarray with updated values.

然后,我在 numpy.lib.recfunctions 中发现了 append_fields() 功能,并认为将 2 个字段简单地附加到我原来的 recarray 会更聪明保存我的计算值.

Then, I discovered the append_fields() capability in numpy.lib.recfunctions, and thought it would be a lot smarter to simply append 2 fields to my original recarray to hold my calculated values.

当我这样做时,我发现操作要慢得多.我不需要计时,基于 ndarray 的过程需要几秒钟,而使用 recarray 需要一分钟以上,而且我的测试数组很小,<10,000 行.

When I did this, I found the operation was much, much slower. I didn't have to time it, the ndarray based process takes a few seconds compared to a minute+ with recarray and my test arrays are small, <10,000 rows.

这是典型的吗?ndarray 访问比 recarray 快得多?我预计会由于按字段名称访问而导致性能下降,但不会这么严重.

Is this typical? ndarray access is much faster than recarray? I expected some performance degradation due to access by field name, but not this much.

推荐答案

2018 年 11 月 15 日更新
我扩展了我的时序测试,以阐明 ndarray、结构化数组、recarray 和掩码数组(记录数组的类型?)的性能差异.每个都有细微的差别.请参阅此处的讨论:
numpy-discussion:structured-arrays-recarrays-and-record-arrays

这是我的性能测试结果.我构建了一个非常简单的示例(使用我的 HDF5 数据集之一)来比较存储在 4 种类型数组中的相同数据的性能:ndarray、结构化数组、recarray 和掩码数组.在构造数组之后,它们被传递给一个函数,该函数简单地遍历每一行并从每一行中提取 12 个值.这些函数从 timeit 函数调用一次(数字=1).该测试只测量数组读取函数,并避免所有其他计算.
下面给出了 9,000 行的结果:

Here are result of my performance tests. I built a very simple example (using 1 of my HDF5 data sets) to compare performance with the same data stored in 4 types of arrays: ndarray, structured array, recarray and masked array. After the arrays are constructed, they are passed to a function that simply loops thru each row and extracts 12 values from each row. The functions are called from the timeit function with a single pass (number=1). This test only measures the array read function, and avoids all other calculations.
Results given below for 9,000 rows:

for ndarray: 0.034137165047070615
for structured array: 0.1306827116913577
for recarray: 0.446010040784266
for masked array: 31.33269560998199

根据此测试,访问性能随每种类型而降低.结构化数组和 recarray 的访问时间比 ndarray 访问慢 4 到 13 倍(但都只有几分之一秒).但是,ndarray 访问比掩码数组访问快 1000 倍.这解释了我在完整示例中看到的秒到分钟的差异.希望这些数据对遇到此问题的其他人有用.

Based on this test, access performance decreases with each type. Access times for structured array and recarray are 4x-13x slower than ndarray access (but all are only a fraction of second). However, ndarray access is 1000x faster than masked array access. That explains the seconds to minutes difference I see in my complete example. Hopefully this data is useful to others that encounter this issue.

这篇关于ndarray 比 recarray 访问快吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Leetcode 234: Palindrome LinkedList(Leetcode 234:回文链接列表)
How do I read an Excel file directly from Dropbox#39;s API using pandas.read_excel()?(如何使用PANDAS.READ_EXCEL()直接从Dropbox的API读取Excel文件?)
subprocess.Popen tries to write to nonexistent pipe(子进程。打开尝试写入不存在的管道)
I want to realize Popen-code from Windows to Linux:(我想实现从Windows到Linux的POpen-code:)
Reading stdout from a subprocess in real time(实时读取子进程中的标准输出)
How to call type safely on a random file in Python?(如何在Python中安全地调用随机文件上的类型?)