问题描述
我正在开发一个基于倒排索引的搜索程序.索引本身是一个字典,其键是术语,其值本身就是短文档的字典,ID 号作为键,文本内容作为值.
I am working on a search program over an inverted index. The index itself is a dictionary whose keys are terms and whose values are themselves dictionaries of short documents, with ID numbers as keys and their text content as values.
要对两个术语执行AND"搜索,因此我需要交叉它们的发布列表(字典).在 Python 中执行此操作的一种清晰(不一定过于聪明)的方法是什么?我开始尝试使用 iter
:
To perform an 'AND' search for two terms, I thus need to intersect their postings lists (dictionaries). What is a clear (not necessarily overly clever) way to do this in Python? I started out by trying it the long way with iter
:
p1 = index[term1]
p2 = index[term2]
i1 = iter(p1)
i2 = iter(p2)
while ... # not sure of the 'iter != end 'syntax in this case
...
推荐答案
一般来说,在Python中构造字典的交集,可以先使用&
operator 计算字典键集合的交集(字典键是 Python 3 中的类似集合的对象):
In general, to construct the intersection of dictionaries in Python, you can first use the &
operator to calculate the intersection of sets of the dictionary keys (dictionary keys are set-like objects in Python 3):
dict_a = {"a": 1, "b": 2}
dict_b = {"a": 2, "c": 3}
intersection = dict_a.keys() & dict_b.keys() # {'a'}
在 Python 2 上,您必须自己将字典键转换为集合:
On Python 2 you have to convert the dictionary keys to sets yourself:
keys_a = set(dict_a.keys())
keys_b = set(dict_b.keys())
intersection = keys_a & keys_b
然后给定键的交集,然后您可以构建您的值的交集,但是需要.您必须在此处做出选择,因为集合交集的概念不会告诉您如果相关值不同时该怎么做.(这大概就是为什么在 Python 中没有直接为字典定义 &
交集运算符的原因).
Then given the intersection of the keys, you can then build the intersection of your values however is desired. You have to make a choice here, since the concept of set intersection doesn't tell you what to do if the associated values differ. (This is presumably why the &
intersection operator is not defined directly for dictionaries in Python).
在这种情况下,听起来您对同一个键的值是相等的,因此您可以从其中一个字典中选择值:
In this case it sounds like your values for the same key would be equal, so you can just choose the value from one of the dictionaries:
dict_of_dicts_a = {"a": {"x":1}, "b": {"y":3}}
dict_of_dicts_b = {"a": {"x":1}, "c": {"z":4}}
shared_keys = dict_of_dicts_a.keys() & dict_of_dicts_b.keys()
# values equal so choose values from a:
dict_intersection = {k: dict_of_dicts_a[k] for k in shared_keys } # {"a":{"x":1}}
其他合理的值组合方法取决于字典中值的类型以及它们所代表的内容.例如,您可能还需要字典的字典共享键的值的联合.由于字典的并集不依赖于值,因此定义明确,在 python 中您可以使用 |
运算符获取它:
Other reasonable methods of combining values would depend on the types of the values in your dictionaries, and what they represent. For example you might also want the union of values for shared keys of dictionaries of dictionaries. Since the union of dictionaries doesn't depend on the values, it is well defined, and in python you can get it using the |
operator:
# union of values for each key in the intersection:
dict_intersection_2 = { k: dict_of_dicts_a[k] | dict_of_dicts_b[k] for k in shared_keys }
在这种情况下,如果两个键 a"
的字典值相同,则结果相同.
Which in this case, with identical dictionary values for key "a"
in both, would be the same result.
这篇关于交叉两个字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!