问题描述
我需要将 六亿五千万(650000000) 条记录插入到一个表中.我尝试过 extend insert 和 LOAD DATA LOCAL INFILE 方法.
I need to insert six hundred fifty million(650000000) records into a single table. I have tried extend insert and LOAD DATA LOCAL INFILE methods.
扩展插入结果:
我已经拆分了数据,每个扩展插入都包含 1250000.我面临的问题是每个批次的执行时间都会逐渐增加.
I have split the data and every extended insert contains 1250000. I'm facing issues like every batch gradually getting increased execution time.
Batch 1 completed in 40s
Batch 2 completed in 50s
Batch 3 completed in 60s
Batch 4 completed in 80s
Batch 5 completed in 100s
完成需要 15-20 小时.
It will take 15-20 hours to complete.
加载 DATA LOCAL INFILE 结果
我有一个 40G 的 CSV 文件,我将它分成 500 个文件.然后我将它加载到Mysql中.在这里我也面临同样的问题,每次文件加载都会增加执行时间.
I have a CSV file size 40G, I split it by 500 files. Then I loaded it in Mysql. Here also I'm facing the same issue, every file load getting increased execution time.
表结构供您参考:
如果是预期行为,有没有其他方法可以快速导入数据?
If it is expected behavior, is there any other way to import the data fastly?
推荐答案
按PRIMARY KEY
对数据进行排序.使用 sort 命令对 40GB 文件进行排序可能比让 MySQL 进行排序要快.
Sort the data by the PRIMARY KEY
. It is likely to be faster to sort a 40GB file with the sort command than to have MySQL do it.
将 innodb_buffer_pool_size
设置为可用 RAM 的 70% 左右.你有多少内存?
Set innodb_buffer_pool_size
to about 70% of available RAM. How much RAM do you have?
PRIMARY KEY
已经建立在桌子上.
没有任何二级索引或外键
或触发器.稍后添加它们.是的,这需要时间.
Don't have any secondary indexes or FOREIGN KEYs
or Triggers. Add them later. Yes, it takes time.
确认您确实需要所有这些索引.
Confirm that you will really need all those indexes.
选择小于 10 字节的 DECIMAL(20,6)
可以稍微缩小数据.
The data could be shrunk slightly by picking a smaller than the 10-byte DECIMAL(20,6)
.
这篇关于Mariadb - 每批扩展插入执行时间都在逐渐增加的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!