交叉两个字典

Intersecting two dictionaries(交叉两个字典)
本文介绍了交叉两个字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个基于倒排索引的搜索程序.索引本身是一个字典,其键是术语,其值本身就是短文档的字典,ID 号作为键,文本内容作为值.

I am working on a search program over an inverted index. The index itself is a dictionary whose keys are terms and whose values are themselves dictionaries of short documents, with ID numbers as keys and their text content as values.

要对两个术语执行AND"搜索,因此我需要交叉它们的发布列表(字典).在 Python 中执行此操作的一种清晰(不一定过于聪明)的方法是什么?我开始尝试使用 iter:

To perform an 'AND' search for two terms, I thus need to intersect their postings lists (dictionaries). What is a clear (not necessarily overly clever) way to do this in Python? I started out by trying it the long way with iter:

p1 = index[term1]  
p2 = index[term2]
i1 = iter(p1)
i2 = iter(p2)
while ...  # not sure of the 'iter != end 'syntax in this case
...

推荐答案

一般来说,在Python中构造字典的交集,可以先使用& operator 计算字典键集合的交集(字典键是 Python 3 中的类似集合的对象):

In general, to construct the intersection of dictionaries in Python, you can first use the & operator to calculate the intersection of sets of the dictionary keys (dictionary keys are set-like objects in Python 3):

dict_a = {"a": 1, "b": 2}
dict_b = {"a": 2, "c": 3} 

intersection = dict_a.keys() & dict_b.keys()  # {'a'}

在 Python 2 上,您必须自己将字典键转换为集合:

On Python 2 you have to convert the dictionary keys to sets yourself:

keys_a = set(dict_a.keys())
keys_b = set(dict_b.keys())
intersection = keys_a & keys_b

然后给定键的交集,然后您可以构建您的值的交集,但是需要.您必须在此处做出选择,因为集合交集的概念不会告诉您如果相关值不同时该怎么做.(这大概就是为什么在 Python 中没有直接为字典定义 & 交集运算符的原因).

Then given the intersection of the keys, you can then build the intersection of your values however is desired. You have to make a choice here, since the concept of set intersection doesn't tell you what to do if the associated values differ. (This is presumably why the & intersection operator is not defined directly for dictionaries in Python).

在这种情况下,听起来您对同一个键的值是相等的,因此您可以从其中一个字典中选择值:

In this case it sounds like your values for the same key would be equal, so you can just choose the value from one of the dictionaries:

dict_of_dicts_a = {"a": {"x":1}, "b": {"y":3}}
dict_of_dicts_b = {"a": {"x":1}, "c": {"z":4}} 

shared_keys = dict_of_dicts_a.keys() & dict_of_dicts_b.keys()

# values equal so choose values from a:
dict_intersection = {k: dict_of_dicts_a[k] for k in shared_keys }  # {"a":{"x":1}}

其他合理的值组合方法取决于字典中值的类型以及它们所代表的内容.例如,您可能还需要字典的字典共享键的值的联合.由于字典的并集不依赖于值,因此定义明确,在 python 中您可以使用 | 运算符获取它:

Other reasonable methods of combining values would depend on the types of the values in your dictionaries, and what they represent. For example you might also want the union of values for shared keys of dictionaries of dictionaries. Since the union of dictionaries doesn't depend on the values, it is well defined, and in python you can get it using the | operator:

# union of values for each key in the intersection:
dict_intersection_2 = { k: dict_of_dicts_a[k] | dict_of_dicts_b[k] for k in shared_keys }

在这种情况下,如果两个键 a" 的字典值相同,则结果相同.

Which in this case, with identical dictionary values for key "a" in both, would be the same result.

这篇关于交叉两个字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Leetcode 234: Palindrome LinkedList(Leetcode 234:回文链接列表)
How do I read an Excel file directly from Dropbox#39;s API using pandas.read_excel()?(如何使用PANDAS.READ_EXCEL()直接从Dropbox的API读取Excel文件?)
subprocess.Popen tries to write to nonexistent pipe(子进程。打开尝试写入不存在的管道)
I want to realize Popen-code from Windows to Linux:(我想实现从Windows到Linux的POpen-code:)
Reading stdout from a subprocess in real time(实时读取子进程中的标准输出)
How to call type safely on a random file in Python?(如何在Python中安全地调用随机文件上的类型?)