分块更新大型 sqlite 数据库

Update large sqlite database in chunks(分块更新大型 sqlite 数据库)
本文介绍了分块更新大型 sqlite 数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 sqlite 数据库(大约 11 GB),它有多个表,包括表 distancevertices.distance 表很大(120 mio 行),vertices 较小(15 000 行).我想在 python 中使用 sqlite3 通过 vertices 中另一列的值来更新 distance 的一列.表顶点在列 cat 上有一个索引,在 orig_cat 上有另一个索引.

I have a sqlite database (appr. 11 GB) that has multiple tables including the tables distance and vertices. The table distance is pretty large (120 mio rows), vertices is smaller (15 000 rows). I want to use sqlite3 in python to update one column of distance by values of another column in vertices. The table vertices has an index on column cat and another index on orig_cat.

我在做什么:

import sqlite3
db_path='path/to/db.db'

conn = sqlite3.connect(db_path)
cur = conn.cursor()

cur.execute('''UPDATE distance SET 
                from_orig_v = (SELECT orig_cat FROM vertices WHERE cat=distance.source)''')

但是,在如此大的数据库上运行该更新语句会导致内存错误.内存使用量稳步增加,直到崩溃.我正在寻找建议在不耗尽内存的情况下执行如此大的更新语句?也许以块的形式处理更新(即 distance 表的行)并在例如之后提交1000 次更新释放内存?这将如何在 python/sqlite 中完成?

However running that update statement on such a large database, causes a memory error. The memory usage is increasing steadily until it crashes. I am looking for advise to perform such a large update statement without running out of memory? Maybe processing the update in chunks (i.e. rows of distance table) and committing after e.g. 1000 updates to free memory? How would that be done in python/sqlite?

推荐答案

应该可以用这样的语句更新块:

It should be possible to update chunks with statements like this:

UPDATE distance SET ... WHERE rowid BETWEEN 100000 AND 200000;

您不需要使用多个事务;实际上必须保存在内存中的唯一内容是要在单个语句中更新的行列表.(理论上,当内存耗尽时,您应该会收到相应的错误消息.在实践中,某些操作系统 overcommit memory 并且在为时已晚之前不要告诉应用程序.)

You don't need to use multiple transactions; the only thing that actually must be kept in memory is the list of rows to be updated in a single statement. (In theory, when memory runs out, you should get an appropriate error message. In practice, some OSes overcommit memory and don't tell the application about it until it's too late.)

这篇关于分块更新大型 sqlite 数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Flask + PyMySQL giving error no attribute #39;settimeout#39;(FlASK+PyMySQL给出错误,没有属性#39;setTimeout#39;)
FastAPI + Tortoise ORM + FastAPI Users (Python) - Relationship - Many To Many(FastAPI+Tortoise ORM+FastAPI用户(Python)-关系-多对多)
Inserting NaN value into MySQL Database(将NaN值插入MySQL数据库)
Window functions not working in pd.read_sql; Its shows error(窗口函数在pd.read_sql中不起作用;它显示错误)
(Closed) Leaflet.js: How I can Do Editing Geometry On Specific Object I Select Only?((已关闭)Leaflet.js:如何仅在我选择的特定对象上编辑几何图形?)
in sqlite update trigger with multiple if/Case Conditions(在具有多个IF/CASE条件的SQLite UPDATE触发器中)