集合的交集作为 pandas 中的列

Intersection of sets as columns in pandas(集合的交集作为 pandas 中的列)
本文介绍了集合的交集作为 pandas 中的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 df,例如:

I have a df such as:

df=pd.DataFrame.from_items([('i', [set([1,2,3,4]), set([1,2,3,4]), set([1,2,3,4]),set([1,2,3,4])]), ('j', [set([2,3]), set([1]), set([4]),set([3,4])])])

看起来像

>>> df
              i       j
0  {1, 2, 3, 4}  {2, 3}
1  {1, 2, 3, 4}     {1}
2  {1, 2, 3, 4}     {4}
3  {1, 2, 3, 4}  {3, 4}

我想计算 df.i.intersection(df.j) 并将其指定为 k 列.也就是说,我想要这个:

I would like to compute df.i.intersection(df.j) and assign that to be column k. That is, I want this:

df['k']=[df.i.iloc[t].intersection(df.j.iloc[t]) for t in range(4)]

>>> df.k
0    {2, 3}
1       {1}
2       {4}
3    {3, 4}
Name: k, dtype: object

这个有 df.apply() 吗?实际的 df 是数百万行.

Is there a df.apply() for this? The actual df is millions of rows.

推荐答案

使用 sets, lists 和 dicts in pandas 有点问题,因为最好使用标量:

Working with sets, lists and dicts in pandas is a bit problematic, because best working with scalars:

df['k'] = [x[0] & x[1] for x in zip(df['i'], df['j'])]
print (df)
              i       j       k
0  {1, 2, 3, 4}  {2, 3}  {2, 3}
1  {1, 2, 3, 4}     {1}     {1}
2  {1, 2, 3, 4}     {4}     {4}
3  {1, 2, 3, 4}  {3, 4}  {3, 4}

<小时>

df['k'] = [x[0].intersection(x[1]) for x in zip(df['i'], df['j'])]
print (df)
              i       j       k
0  {1, 2, 3, 4}  {2, 3}  {2, 3}
1  {1, 2, 3, 4}     {1}     {1}
2  {1, 2, 3, 4}     {4}     {4}
3  {1, 2, 3, 4}  {3, 4}  {3, 4}

应用的解决方案:

df['k'] = df.apply(lambda x: x['i'].intersection(x['j']), axis=1)
print (df)
              i       j       k
0  {1, 2, 3, 4}  {2, 3}  {2, 3}
1  {1, 2, 3, 4}     {1}     {1}
2  {1, 2, 3, 4}     {4}     {4}
3  {1, 2, 3, 4}  {3, 4}  {3, 4}

这篇关于集合的交集作为 pandas 中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Leetcode 234: Palindrome LinkedList(Leetcode 234:回文链接列表)
How do I read an Excel file directly from Dropbox#39;s API using pandas.read_excel()?(如何使用PANDAS.READ_EXCEL()直接从Dropbox的API读取Excel文件?)
subprocess.Popen tries to write to nonexistent pipe(子进程。打开尝试写入不存在的管道)
I want to realize Popen-code from Windows to Linux:(我想实现从Windows到Linux的POpen-code:)
Reading stdout from a subprocess in real time(实时读取子进程中的标准输出)
How to call type safely on a random file in Python?(如何在Python中安全地调用随机文件上的类型?)