组内的 Cumsum 并在 pandas 的条件下重置

Cumsum within group and reset on condition in pandas(组内的 Cumsum 并在 pandas 的条件下重置)
本文介绍了组内的 Cumsum 并在 pandas 的条件下重置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

I have a dataframe with two columns ID and Activity. The activity is either 0 or 1. I want a new column containing a increasing number since the last activity was 1. However, the count should only be within one group (ID). If the activity is 1, the counting column should be reset to 0 and the count starts again.

So, I have a dataframe containing the following:

What is want is this:

Can someone help me?

解决方案

We using a new para 'G' here

df['G']=df.groupby('ID').Activeity.apply(lambda x :(x.diff().ne(0)&x==1)|x==1)

df.groupby([df.ID,df.G.cumsum()]).G.apply(lambda x : (~x).cumsum())

Out[713]: 
0     1
1     2
2     0
3     1
4     2
5     1
6     2
7     0
8     1
9     0
10    1
11    1
12    0
13    0
14    1
15    2
Name: G, dtype: int32

Data input

df=pd.DataFrame({'ID':list('AAAAABBBBBBCCCCC'),'Activeity':[0,0,1,0,0,0,0,1,0,1,0,0,1,1,0,0]})

Explanation :

Here we get the new para 'G'
df['G']=df.groupby('ID').Activeity.apply(lambda x :(x.diff().ne(0)&x==1)|x==1)
df
Out[134]: 
    Activeity ID      G
0           0  A  False
1           0  A  False
2           1  A   True
3           0  A  False
4           0  A  False
5           0  B  False
6           0  B  False
7           1  B   True
8           0  B  False
9           1  B   True
10          0  B  False
11          0  C  False
12          1  C   True
13          1  C   True
14          0  C  False
15          0  C  False

Then we do cumsum for G, is to getting where is the cycle we should set the number to 0

df.G.cumsum()
Out[135]: 
0     0
1     0
2     1
3     1
4     1
5     1
6     1
7     2
8     2
9     3
10    3
11    3
12    4
13    5
14    5
15    5
Name: G, dtype: int32

这篇关于组内的 Cumsum 并在 pandas 的条件下重置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

How do I read an Excel file directly from Dropbox#39;s API using pandas.read_excel()?(如何使用PANDAS.READ_EXCEL()直接从Dropbox的API读取Excel文件?)
Is there any way to quot;extractquot; the dtype conversion functionality from pandas read_csv?(有没有办法从 pandas Read_CSV中提取数据类型转换功能?)
Highlighting rows based on a condition(根据条件突出显示行)
How to downcast numeric columns in Pandas?(如何在 pandas 中向下转换数字列?)
Sort quot;Datequot; in Multi-Index(在多索引中排序日期(Q))
How to prevent groupby from surclassing index?(如何防止Groupby超越指数?)