问题描述
我有一个 Django Web 应用程序,它使用默认的自动递增正整数作为主键.此密钥在整个应用程序中使用,并经常插入到 URL 中.我不想向公众公开这个数字,以便他们猜测我的数据库中的用户或其他实体的数量.
I have a Django web application that uses the default auto-incremented positive integers as the primary key. This key is used throughout the application and is frequently inserted into the URL. I don't want to expose this number to the public so that they can guess the number of users or other entities in my Database.
这是一个常见的要求,我看到过类似问题的答案.大多数解决方案建议散列原始主键值.但是,这些答案中没有一个完全符合我的需要.这些是我的要求:
This is a frequent requirement and I have seen questions to similar mine with answers. Most solutions recommend hashing the original primary key value. However, none of those answers fit my need exactly. These are my requirements:
- 我想将主键字段类型保留为整数.
- 我也不希望每次读取或写入或与数据库进行比较时都不必散列/取消散列此值.这似乎很浪费 只做一次就好了:当记录最初插入数据库时
- 散列/加密函数不需要可逆,因为我不需要恢复原始顺序密钥.散列值只需要是唯一的.
- 散列值仅对于该表需要是唯一的——而不是普遍唯一的.
- 散列值应尽可能短.我想避免使用过长(超过 20 个字符)的网址
实现这一目标的最佳方法是什么?以下是否可行?
What is the best way to do achieve this? Would the following work?
def hash_function(int):
return fancy-hash-function # What function should I use??
def obfuscate_pk(sender, instance, created, **kwargs):
if created:
logger.info("MyClass #%s, created with created=%s: %s" % (instance.pk, created, instance))
instance.pk = hash_function(instance.pk)
instance.save()
logger.info("\tNew Pk=%s" % instance.pk)
class MyClass(models.Model):
blahblah = models.CharField(max_length=50, null=False, blank=False,)
post_save.connect(obfuscate_pk, sender=MyClass)
推荐答案
想法
我会向您推荐 Instagram.他们的要求似乎与您的要求非常接近.
The Idea
I would recommend to you the same approach that is used by Instagram. Their requirements seems to closely follow yours.
生成的 ID 应该可以按时间排序(所以照片 ID 列表,对于例如,可以在不获取更多信息的情况下进行排序照片)ID 理想情况下应该是 64 位(对于较小的索引,更好Redis 等系统中的存储)系统应该引入一些新的尽可能移动部件"——我们如何能够做到的很大一部分用很少的工程师扩展 Instagram 是通过选择简单的,我们信任的易于理解的解决方案.
Generated IDs should be sortable by time (so a list of photo IDs, for example, could be sorted without fetching more information about the photos) IDs should ideally be 64 bits (for smaller indexes, and better storage in systems like Redis) The system should introduce as few new ‘moving parts’ as possible—a large part of how we’ve been able to scale Instagram with very few engineers is by choosing simple, easy-to-understand solutions that we trust.
他们想出了一个系统,它有 41 位基于时间戳的系统,13 位是数据库分片,10 位是自动增量部分.因为您似乎没有使用碎片.您可以只使用 41 位作为基于时间的共模项,并随机选择 23 位.如果您同时插入记录,那么发生冲突的可能性是 830 万分之一.但在实践中,你永远不可能碰到这个.好吧,那么一些代码如何:
They came up with a system that has 41 bits based on the timestamp, 13 o the database shard and 10 for an auto increment portion. Sincce you don't appear to be using shards. You can just have 41 bits for a time based copmonent and 23 bits chosen at random. That does produce an extremely unlikely 1 in 8.3 million chance of getting a conflict if you insert records at the same time. But in practice you are never likely to hit this. Right so how about some code:
START_TIME = a constant that represents a unix timestamp
def make_id():
'''
inspired by http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram
'''
t = int(time.time()*1000) - START_TIME
u = random.SystemRandom().getrandbits(23)
id = (t << 23 ) | u
return id
def reverse_id(id):
t = id >> 23
return t + START_TIME
注意,上面代码中的START_TIME
是一些任意的开始时间.您可以使用 time.time()*1000 ,获取值并将其设置为 START_TIME
Note, START_TIME
in the above code is some arbitary starting time. You can use time.time()*1000 , get the value and set that as START_TIME
请注意,我发布的 reverse_id
方法允许您找出记录的创建时间.如果您需要跟踪该信息,您可以这样做而无需为其添加另一个字段!所以你的主键实际上是在节省你的存储空间而不是增加它!
Notice that the reverse_id
method I have posted allows you to find out at which time the record was created. If you need to keep track of that information you can do so without having to add another field for it! So your primary key is actually saving your storage rather than increasing it!
现在这就是您的模型的样子.
Now this is what your model would look like.
class MyClass(models.Model):
id = models.BigIntegerField(default = fields.make_id, primary_key=True)
如果您在 django 之外对数据库进行更改,则需要将 make_id
的等效项创建为 sql 函数
If you make changes to your database outside django you would need to create the equivalent of make_id
as an sql function
作为脚注.这有点像 Mongodb 使用的方法来为每个生成它的 _ID对象.
As a foot note. This is somewhat like the approach used by Mongodb to generate it's _ID for each object.
这篇关于如何使用对该表唯一的不同整数替换 Django 的主键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!