如何更快地基于多个条件合并2个 pandas 大 pandas

本文介绍了如何更快地基于多个条件合并2个 pandas 大 pandas 的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个数据帧：

df1：

    RB  BeginDate   EndDate    Valindex0
0   00  19000100    19811231    45
1   00  19820100    19841299    47
2   00  19850100    20010699    50
3   00  20010700    99999999    39

df2：

    RB  IssueDate   gs
0   L3  19990201    8
1   00  19820101    G
2   48  19820101    G
3   50  19820101    G
4   50  19820101    G

如何在以下情况下合并这两个数据帧：

if df1['BeginDate'] <= df2['IssueDate'] <= df1['EndDate'] and df1['RB']==df2['RB']:
    merge the value of df1['Valindex0'] to df2

输出应为：

df2：

    RB  IssueDate   gs  Valindex0
0   L3  19990201    8   None
1   00  19820101    G   47    # df2['RB']==df1['RB'] and df2['IssueDate'] between df1['BeginDate'] and df1['EndDate'] of this row
2   48  19820101    G   None
3   50  19820101    G   None
4   50  19820101    G   None

我知道有一种方法可以做到这一点，但速度非常慢：

conditions = []

for index, row in df1.iterrows():
    conditions.append((df2['IssueDate']>= df1['BeginDate']) &
                      (df2['IssueDate']<= df1['BeginDate'])&
                      (df2['RB']==df1['RB']))

    df2['Valindex0'] = np.select(conditions, df1['Valindex0'], default=None)

有没有更快的解决方案？

推荐答案

您可以尝试使用SQL，因为在 pandas 中它更复杂：

import pandas as pd
import sqlite3

conn = sqlite3.connect(':memory:')

df_1.to_sql('A', conn, index=False)
df_2.to_sql('B', conn, index=False)

qry = '''
    select  
        B.RB, B.IssueDate, B.gs, A.Valindex0
    from
        B left join A on
        (B.IssueDate between A.BeginDate and A.EndDate and B.RB = A.RB)
    '''
df = pd.read_sql_query(qry, conn)

#    RB  IssueDate gs  Valindex0
# 0  L3   19990201  8        NaN
# 1  00   19820101  G       47.0
# 2  48   19820101  G        NaN
# 3  50   19820101  G        NaN
# 4  50   19820101  G        NaN

这篇关于如何更快地基于多个条件合并2个 pandas 大 pandas 的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持编程学习网！

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除！

如何更快地基于多个条件合并2个 pandas 大 pandas

问题描述

推荐答案

相关文档推荐