Python Pandas - 使用 to_sql 以块的形式写入大型数据帧

Python Pandas - Using to_sql to write large data frames in chunks(Python Pandas - 使用 to_sql 以块的形式写入大型数据帧)
本文介绍了Python Pandas - 使用 to_sql 以块的形式写入大型数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Pandas 的 to_sql 函数写入 MySQL,由于大帧大小(1M 行,20 列)导致超时.

I'm using Pandas' to_sql function to write to MySQL, which is timing out due to large frame size (1M rows, 20 columns).

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html

有没有更正式的方法来分块数据并在块中写入行?我已经编写了自己的代码,这似乎有效.不过,我更喜欢官方解决方案.谢谢!

Is there a more official way to chunk through the data and write rows in blocks? I've written my own code, which seems to work. I'd prefer an official solution though. Thanks!

def write_to_db(engine, frame, table_name, chunk_size):

    start_index = 0
    end_index = chunk_size if chunk_size < len(frame) else len(frame)

    frame = frame.where(pd.notnull(frame), None)
    if_exists_param = 'replace'

    while start_index != end_index:
        print "Writing rows %s through %s" % (start_index, end_index)
        frame.iloc[start_index:end_index, :].to_sql(con=engine, name=table_name, if_exists=if_exists_param)
        if_exists_param = 'append'

        start_index = min(start_index + chunk_size, len(frame))
        end_index = min(end_index + chunk_size, len(frame))

engine = sqlalchemy.create_engine('mysql://...') #database details omited
write_to_db(engine, frame, 'retail_pendingcustomers', 20000)

推荐答案

更新:此功能已合并到 pandas master 中,并将在 0.15(可能在 9 月底)发布,感谢 @artemyk!请参阅 https://github.com/pydata/pandas/pull/8062

Update: this functionality has been merged in pandas master and will be released in 0.15 (probably end of september), thanks to @artemyk! See https://github.com/pydata/pandas/pull/8062

所以从 0.15 开始,您可以指定 chunksize 参数,例如简单地做:

So starting from 0.15, you can specify the chunksize argument and e.g. simply do:

df.to_sql('table', engine, chunksize=20000)

这篇关于Python Pandas - 使用 to_sql 以块的形式写入大型数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Execute complex raw SQL query in EF6(在EF6中执行复杂的原始SQL查询)
Hibernate reactive No Vert.x context active in aws rds(AWS RDS中的休眠反应性非Vert.x上下文处于活动状态)
Bulk insert with mysql2 and NodeJs throws 500(使用mysql2和NodeJS的大容量插入抛出500)
Flask + PyMySQL giving error no attribute #39;settimeout#39;(FlASK+PyMySQL给出错误,没有属性#39;setTimeout#39;)
auto_increment column for a group of rows?(一组行的AUTO_INCREMENT列?)
Sort by ID DESC(按ID代码排序)