如何在非常大的 MySQL 表上提高 INSERT 性能

How to improve INSERT performance on a very large MySQL table(如何在非常大的 MySQL 表上提高 INSERT 性能)
本文介绍了如何在非常大的 MySQL 表上提高 INSERT 性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一个大型 MySQL 数据库,我需要提高特定表的 INSERT 性能.这一行包含大约 2 亿行,其结构如下:

I am working on a large MySQL database and I need to improve INSERT performance on a specific table. This one contains about 200 Millions rows and its structure is as follows:

(一个小前提:我不是数据库专家,所以我写的代码可能基于错误的基础.请帮助我理解我的错误:))

(a little premise: I am not a database expert, so the code I've written could be based on wrong foundations. Please help me to understand my mistakes :) )

CREATE TABLE IF NOT EXISTS items (
    id INT NOT NULL AUTO_INCREMENT,
    name VARCHAR(200) NOT NULL,
    key VARCHAR(10) NOT NULL,
    busy TINYINT(1) NOT NULL DEFAULT 1,
    created_at DATETIME NOT NULL,
    updated_at DATETIME NOT NULL,

    PRIMARY KEY (id, name),
    UNIQUE KEY name_key_unique_key (name, key),
    INDEX name_index (name)
) ENGINE=MyISAM
PARTITION BY LINEAR KEY(name)
PARTITIONS 25;

每天我都会收到许多 csv 文件,其中每一行都由一对name;key"组成,所以我必须解析这些文件(为每一行添加值 created_at 和 updated_at)并将值插入我的表中.在这一个中,name"和key"的组合必须是唯一的,所以我实现了插入过程如下:

Every day I receive many csv files in which each line is composed by the pair "name;key", so I have to parse these files (adding values created_at and updated_at for each row) and insert the values into my table. In this one, the combination of "name" and "key" MUST be UNIQUE, so I implemented the insert procedure as follows:

CREATE TEMPORARY TABLE temp_items (
    id INT NOT NULL AUTO_INCREMENT,
    name VARCHAR(200) NOT NULL, 
    key VARCHAR(10) NOT NULL, 
    busy TINYINT(1) NOT NULL DEFAULT 1,  
    created_at DATETIME NOT NULL, 
    updated_at DATETIME NOT NULL,  
    PRIMARY KEY (id) 
    ) 
ENGINE=MyISAM;

LOAD DATA LOCAL INFILE 'file_to_process.csv' 
INTO TABLE temp_items
FIELDS TERMINATED BY ',' 
OPTIONALLY ENCLOSED BY '"' 
(name, key, created_at, updated_at); 

INSERT INTO items (name, key, busy, created_at, updated_at) 
(
    SELECT temp_items.name, temp_items.key, temp_items.busy, temp_items.created_at, temp_items.updated_at 
    FROM temp_items
) 
ON DUPLICATE KEY UPDATE busy=1, updated_at=NOW();

DROP TEMPORARY TABLE temp_items;

刚刚显示的代码可以让我达到我的目标,但是要完成执行,它需要大约 48 小时,这是一个问题.我认为这种糟糕的性能是由于脚本必须检查一个非常大的表(2 亿行)并且对于每个插入名称;键"对是唯一的.

The code just shown allows me to reach my goal but, to complete the execution, it employs about 48 hours, and this is a problem. I think that this poor performance are caused by the fact that the script must check on a very large table (200 Millions rows) and for each insertion that the pair "name;key" is unique.

如何提高脚本的性能?

在此先感谢大家.

推荐答案

您可以使用以下方法来加快插入速度:

You can use the following methods to speed up inserts:

  1. 如果您同时从同一个客户端插入多行,请使用带有多个 VALUES 列表的 INSERT 语句一次插入多行.这比使用单独的单行 INSERT 语句要快得多(在某些情况下快很多倍).如果要向非空表添加数据,则可以调整 bulk_insert_buffer_size 变量以使数据插入更快.

  1. If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements. If you are adding data to a nonempty table, you can tune the bulk_insert_buffer_size variable to make data insertion even faster.

从文本文件加载表时,使用 LOAD DATA INFILE.这通常比使用 INSERT 语句快 20 倍.

When loading a table from a text file, use LOAD DATA INFILE. This is usually 20 times faster than using INSERT statements.

利用列具有默认值这一事实.仅当要插入的值与默认值不同时才显式插入值.这减少了 MySQL 必须做的解析并提高了插入速度.

Take advantage of the fact that columns have default values. Insert values explicitly only when the value to be inserted differs from the default. This reduces the parsing that MySQL must do and improves the insert speed.

参考:MySQL.com: 8.2.4.1 优化 INSERT声明

这篇关于如何在非常大的 MySQL 表上提高 INSERT 性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Hibernate reactive No Vert.x context active in aws rds(AWS RDS中的休眠反应性非Vert.x上下文处于活动状态)
Bulk insert with mysql2 and NodeJs throws 500(使用mysql2和NodeJS的大容量插入抛出500)
Flask + PyMySQL giving error no attribute #39;settimeout#39;(FlASK+PyMySQL给出错误,没有属性#39;setTimeout#39;)
auto_increment column for a group of rows?(一组行的AUTO_INCREMENT列?)
Sort by ID DESC(按ID代码排序)
SQL/MySQL: split a quantity value into multiple rows by date(SQL/MySQL:按日期将数量值拆分为多行)