为什么“10000000000000000 在范围内(1000000000000001)"Python 3 这么快?

Why is quot;1000000000000000 in range(1000000000000001)quot; so fast in Python 3?(为什么“10000000000000000 在范围内(1000000000000001)Python 3 这么快?)
本文介绍了为什么“10000000000000000 在范围内(1000000000000001)"Python 3 这么快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我了解 range() 函数,其实就是 Python 3 中的一种对象类型,动态生成其内容,类似于生成器.

It is my understanding that the range() function, which is actually an object type in Python 3, generates its contents on the fly, similar to a generator.

在这种情况下,我预计以下行会花费过多的时间,因为为了确定 1 万亿是否在范围内,必须生成 1 万亿值:

This being the case, I would have expected the following line to take an inordinate amount of time because, in order to determine whether 1 quadrillion is in the range, a quadrillion values would have to be generated:

1_000_000_000_000_000 in range(1_000_000_000_000_001)

此外:似乎无论我添加多少个零,计算或多或少都需要相同的时间(基本上是瞬时的).

Furthermore: it seems that no matter how many zeroes I add on, the calculation more or less takes the same amount of time (basically instantaneous).

我也尝试过这样的事情,但计算仍然几乎是即时的:

I have also tried things like this, but the calculation is still almost instant:

# count by tens
1_000_000_000_000_000_000_000 in range(0,1_000_000_000_000_000_000_001,10)

如果我尝试实现自己的范围函数,结果就不那么好了!

If I try to implement my own range function, the result is not so nice!

def my_crappy_range(N):
    i = 0
    while i < N:
        yield i
        i += 1
    return

range() 对象在后台做了什么让它如此之快?

What is the range() object doing under the hood that makes it so fast?

Martijn Pieters 的答案 因其完整性而被选中,但另请参阅 abarnert's first answer 很好地讨论了 range 在 Python 3 中成为一个成熟的 sequence 意味着什么,以及关于 Python 实现中 __contains__ 函数优化的潜在不一致的一些信息/警告.abarnert 的其他答案 更详细地介绍了一些细节,并为那些对 Python 3 优化背后的历史感兴趣的人提供了链接(并且缺乏xrange 在 Python 2 中的优化).回答 通过戳 和 通过 wim 提供相关的C源代码和解释给有兴趣的人.

Martijn Pieters's answer was chosen for its completeness, but also see abarnert's first answer for a good discussion of what it means for range to be a full-fledged sequence in Python 3, and some information/warning regarding potential inconsistency for __contains__ function optimization across Python implementations. abarnert's other answer goes into some more detail and provides links for those interested in the history behind the optimization in Python 3 (and lack of optimization of xrange in Python 2). Answers by poke and by wim provide the relevant C source code and explanations for those who are interested.

推荐答案

Python 3 range() 对象不会立即产生数字;它是一个智能的序列对象,它产生数字按需.它只包含您的开始、停止和步长值,然后当您迭代对象时,每次迭代都会计算下一个整数.

The Python 3 range() object doesn't produce numbers immediately; it is a smart sequence object that produces numbers on demand. All it contains is your start, stop and step values, then as you iterate over the object the next integer is calculated each iteration.

该对象还实现了 object.__contains__ 钩子,如果您的号码在其范围内,则计算.计算是(接近)恒定时间操作*.永远不需要扫描范围内所有可能的整数.

The object also implements the object.__contains__ hook, and calculates if your number is part of its range. Calculating is a (near) constant time operation *. There is never a need to scan through all possible integers in the range.

来自 range() 对象文档:

From the range() object documentation:

range 类型优于常规 listtuple 类型的优点是范围对象将始终采用相同(小)量内存,无论它代表的范围大小(因为它只存储 startstopstep 值,计算单个项目和根据需要添加子范围).

The advantage of the range type over a regular list or tuple is that a range object will always take the same (small) amount of memory, no matter the size of the range it represents (as it only stores the start, stop and step values, calculating individual items and subranges as needed).

因此,您的 range() 对象至少可以:

So at a minimum, your range() object would do:

class my_range:
    def __init__(self, start, stop=None, step=1, /):
        if stop is None:
            start, stop = 0, start
        self.start, self.stop, self.step = start, stop, step
        if step < 0:
            lo, hi, step = stop, start, -step
        else:
            lo, hi = start, stop
        self.length = 0 if lo > hi else ((hi - lo - 1) // step) + 1

    def __iter__(self):
        current = self.start
        if self.step < 0:
            while current > self.stop:
                yield current
                current += self.step
        else:
            while current < self.stop:
                yield current
                current += self.step

    def __len__(self):
        return self.length

    def __getitem__(self, i):
        if i < 0:
            i += self.length
        if 0 <= i < self.length:
            return self.start + i * self.step
        raise IndexError('my_range object index out of range')

    def __contains__(self, num):
        if self.step < 0:
            if not (self.stop < num <= self.start):
                return False
        else:
            if not (self.start <= num < self.stop):
                return False
        return (num - self.start) % self.step == 0

这仍然缺少一些真正的 range() 支持的东西(例如 .index().count()方法、散列、相等性测试或切片),但应该给你一个想法.

This is still missing several things that a real range() supports (such as the .index() or .count() methods, hashing, equality testing, or slicing), but should give you an idea.

我还简化了 __contains__ 实现,只关注整数测试;如果你给一个真正的 range() 对象一个非整数值(包括 int 的子类),就会启动一个慢速扫描来查看是否有匹配,就像如果您对所有包含值的列表使用包含测试.这样做是为了继续支持其他恰好支持整数相等测试但预计也不支持整数算术的数字类型.请参阅实现遏制测试的原始 Python 问题.

I also simplified the __contains__ implementation to only focus on integer tests; if you give a real range() object a non-integer value (including subclasses of int), a slow scan is initiated to see if there is a match, just as if you use a containment test against a list of all the contained values. This was done to continue to support other numeric types that just happen to support equality testing with integers but are not expected to support integer arithmetic as well. See the original Python issue that implemented the containment test.

* 接近恒定时间,因为 Python 整数是无界的,因此数学运算也会随着 N 的增长而增长,这使得它成为 O(log N) 运算.由于这一切都是在优化的 C 代码中执行的,并且 Python 将整数值存储在 30 位块中,因此由于此处涉及的整数的大小,您会在看到任何性能影响之前耗尽内存.

* Near constant time because Python integers are unbounded and so math operations also grow in time as N grows, making this a O(log N) operation. Since it’s all executed in optimised C code and Python stores integer values in 30-bit chunks, you’d run out of memory before you saw any performance impact due to the size of the integers involved here.

这篇关于为什么“10000000000000000 在范围内(1000000000000001)"Python 3 这么快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Leetcode 234: Palindrome LinkedList(Leetcode 234:回文链接列表)
How do I read an Excel file directly from Dropbox#39;s API using pandas.read_excel()?(如何使用PANDAS.READ_EXCEL()直接从Dropbox的API读取Excel文件?)
subprocess.Popen tries to write to nonexistent pipe(子进程。打开尝试写入不存在的管道)
I want to realize Popen-code from Windows to Linux:(我想实现从Windows到Linux的POpen-code:)
Reading stdout from a subprocess in real time(实时读取子进程中的标准输出)
How to call type safely on a random file in Python?(如何在Python中安全地调用随机文件上的类型?)