基于时间轴递增的局部最小图像过滤数据帧

Filter Dataframe Based on Local Minima with Increasing Timeline(基于时间轴递增的局部最小图像过滤数据帧)
本文介绍了基于时间轴递增的局部最小图像过滤数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编辑:

我有以下学生数据框,显示他们在不同日期的考试成绩(已排序):

df = pd.DataFrame({'student': 'A A A B B B B C C'.split(),
                  'exam_date':[datetime.datetime(2013,4,1),datetime.datetime(2013,6,1),
                               datetime.datetime(2013,7,1),datetime.datetime(2013,9,2),
                               datetime.datetime(2013,10,1),datetime.datetime(2013,11,2),
                               datetime.datetime(2014,2,2),datetime.datetime(2013,7,1),
                               datetime.datetime(2013,9,2),],
                   'score': [15, 17, 32, 22, 28, 24, 33, 33, 15]})

print(df)

  student  exam_date  score
0       A 2013-04-01     15
1       A 2013-06-01     17
2       A 2013-07-01     32
3       B 2013-09-02     22
4       B 2013-10-01     28
5       B 2013-11-02     24
6       B 2014-02-02     33
7       C 2013-07-01     33
8       C 2013-09-02     15

我只需要保留分数从局部最小值增加了10以上的那些行。

例如,对于学生A,局部最小值为15,而分数在下一个最新数据中增加到32,因此我们将保留该值。

对于学生B,分数不会从局部极小值增加超过1028-2233-24均小于10

对于学生C,局部最小值为15,但分数在此之后不会增加,因此我们将删除该分数。

我正在尝试以下脚本:

out = df[df['score'] - df.groupby('student', as_index=False)['score'].cummin()['score']>= 10]

print(out)
2   A   2013-07-01  32
6   B   2014-02-02  33 #--Shouldn't capture this as it's increased by `9` from local minima of `24`

所需输出:

   student  exam_date  score
2        A  2013-07-01  32

# For A, score of 32 is increased by 17 from local minima of 15  

做这件事最聪明的方式是什么?如有任何建议,我们将不胜感激。谢谢!

推荐答案

假定您的数据帧已按日期排序:

highest_score = lambda x: x['score'] - x['score'].mask(x['score'].gt(x['score'].shift())).ffill() > 10
out = df[df.groupby('student').apply(highest_score).droplevel(0)]
print(out)

# Output
  student  exam_date  score
2       A 2013-07-01     32

关注lambda函数

让我们修改您的数据帧并提取一个学生以避免groupby

>>> df = df[df['student'] == 'B']
  student  exam_date  score
3       B 2013-09-02     22
4       B 2013-10-01     28
5       B 2013-11-02     24
6       B 2014-02-02     33

# Step-1: find row where value is not a local minima
>>> df['score'].gt(df['score'].shift())
3    False
4     True
5    False
6     True
Name: score, dtype: bool

# Step-2: hide non local minima values
>>> df['score'].mask(df['score'].gt(df['score'].shift()))
3    22.0
4     NaN
5    24.0
6     NaN
Name: score, dtype: float64

# Step-3: fill forward local minima values
>>> df['score'].mask(df['score'].gt(df['score'].shift()))
3    22.0
4    22.0
5    24.0
6    24.0
Name: score, dtype: float64

# Step-4: check if the condition is True
>>> df['score'] - df['score'].mask(df['score'].gt(df['score'].shift())) > 10
3    False
4    False
5    False
6    False
Name: score, dtype: bool

这篇关于基于时间轴递增的局部最小图像过滤数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Leetcode 234: Palindrome LinkedList(Leetcode 234:回文链接列表)
How do I read an Excel file directly from Dropbox#39;s API using pandas.read_excel()?(如何使用PANDAS.READ_EXCEL()直接从Dropbox的API读取Excel文件?)
subprocess.Popen tries to write to nonexistent pipe(子进程。打开尝试写入不存在的管道)
I want to realize Popen-code from Windows to Linux:(我想实现从Windows到Linux的POpen-code:)
Reading stdout from a subprocess in real time(实时读取子进程中的标准输出)
How to call type safely on a random file in Python?(如何在Python中安全地调用随机文件上的类型?)