设置与frozenset 性能

Set vs. frozenset performance(设置与frozenset 性能)
本文介绍了设置与frozenset 性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在修改 Python 的 setfrozenset 集合类型.

I was tinkering around with Python's set and frozenset collection types.

最初,我认为 frozenset 会提供比 set 更好的查找性能,因为它是不可变的,因此可以利用存储项目的结构.

Initially, I assumed that frozenset would provide a better lookup performance than set, as its immutable and thus could exploit the structure of the stored items.

但是,关于以下实验,情况似乎并非如此:

However, this does not seem to be the case, regarding the following experiment:

import random
import time
import sys

def main(n):
    numbers = []
    for _ in xrange(n):
        numbers.append(random.randint(0, sys.maxint))
    set_ = set(numbers)
    frozenset_ = frozenset(set_)

    start = time.time()
    for number in numbers:
        number in set_
    set_duration = time.time() - start

    start = time.time()
    for number in numbers:
        number in frozenset_
    frozenset_duration = time.time() - start

    print "set      : %.3f" % set_duration
    print "frozenset: %.3f" % frozenset_duration


if __name__ == "__main__":
    n = int(sys.argv[1])
    main(n)

我使用 CPython 和 PyPy 执行了这段代码,结果如下:

I executed this code using both CPython and PyPy, which gave the following results:

> pypy set.py 100000000
set      : 6.156
frozenset: 6.166

> python set.py 100000000
set      : 16.824
frozenset: 17.248

似乎 frozenset 在 CPython 和 PyPy 中的查找性能实际上都比较慢.有人知道为什么会这样吗?我没有研究实现.

It seems that frozenset is actually slower regarding the lookup performance, both in CPython and in PyPy. Does anybody have an idea why this is the case? I did not look into the implementations.

推荐答案

frozensetset 实现在很大程度上是共享的;set 只是一个 frozenset 添加了变异方法,具有完全相同的哈希表实现.请参阅 Objects/setobject.c 源文件;顶级 PyFrozenSet_Type 定义 与PySet_Type 定义.

The frozenset and set implementations are largely shared; a set is simply a frozenset with mutating methods added, with the exact same hashtable implementation. See the Objects/setobject.c source file; the top-level PyFrozenSet_Type definition shares functions with the PySet_Type definition.

这里没有对冻结集进行优化,因为当您测试成员资格时,不需要计算 in frozenset 中的项目的哈希值.您用来针对集合测试的项目仍然需要计算其哈希值,以便在集合哈希表中找到正确的插槽,以便您可以进行相等性测试.

There is no optimisation for a frozenset here, as there is no need to calculate the hashes for the items in the frozenset when you are testing for membership. The item that you use to test against the set still needs to have their hash calculated, in order to find the right slot in the set hashtable so you can do an equality test.

因此,您的计时结果可能由于系统上正在运行的其他进程而关闭;你测量了挂钟时间,没有禁用 Python 垃圾收集,也没有重复测试相同的东西.

As such, your timing results are probably off due to other processes running on your system; you measured wall-clock time, and did not disable Python garbage collection nor did you repeatedly test the same thing.

尝试使用 timeit 模块运行您的测试,其中一个numbers 中的值和一个不在集合中的值:

Try to run your test using the timeit module, with one value from numbers and one not in the set:

import random
import sys
import timeit

numbers = [random.randrange(sys.maxsize) for _ in range(10000)]
set_ = set(numbers)
fset = frozenset(numbers)
present = random.choice(numbers)
notpresent = -1
test = 'present in s; notpresent in s'

settime = timeit.timeit(
    test,
    'from __main__ import set_ as s, present, notpresent')
fsettime = timeit.timeit(
    test,
    'from __main__ import fset as s, present, notpresent')

print('set      : {:.3f} seconds'.format(settime))
print('frozenset: {:.3f} seconds'.format(fsettime))

这将每个测试重复 100 万次并产生:

This repeats each test 1 million times and produces:

set      : 0.050 seconds
frozenset: 0.050 seconds

这篇关于设置与frozenset 性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Leetcode 234: Palindrome LinkedList(Leetcode 234:回文链接列表)
How do I read an Excel file directly from Dropbox#39;s API using pandas.read_excel()?(如何使用PANDAS.READ_EXCEL()直接从Dropbox的API读取Excel文件?)
subprocess.Popen tries to write to nonexistent pipe(子进程。打开尝试写入不存在的管道)
I want to realize Popen-code from Windows to Linux:(我想实现从Windows到Linux的POpen-code:)
Reading stdout from a subprocess in real time(实时读取子进程中的标准输出)
How to call type safely on a random file in Python?(如何在Python中安全地调用随机文件上的类型?)