MySQL 8.0 多线程 load data infile

load data infile 导入一个大的文件到MySQL表中,通常是一个非常耗时的过程,因为处理过程是单线程并且是单事务的,而目前的硬件条件在执行load data infile时,远远没有达到瓶颈。在MySQL 8.0.17版本,引入了一个多线程load data infile的功能,该功能实现在MySQL Shell工具中,因此需要先安装MySQL Shell。下面来对比一下传统单线程load data infile与多线程load data infile两种方式的性能差异有多大。

测试环境:

  • 4 core CPU
  • 8 GB RAM
  • SSD Disk
  • Centos 7

导入的文件load_data.csv,大约8G左右。

单线程测试:

mysql > load data infile '/usr/local/mysql/mysql-files/load_data.csv' 
into table single.single_load fields terminated by ',' optionally 
enclosed by '"' lines terminated by '\n' 
(user_id,visible,name,size,latitude,longitude,timestamp,public,description,inserted);

Query OK, 14244516 rows affected, 0 warnings (39 min 35.5036 sec)
Records: 14244516 Deleted: 0 Skipped: 0 Warnings: 0

一共耗时 39 min 35.5036 sec。

多线程测试:

mysql shell > util.importTable("/usr/local/mysql/mysql-files/load_data.csv", 
{schema: "parallel", table: "parallel_load", columns: 
["user_id","visible","name","size","latitude","longitude","timestamp","public","description"
,"inserted"], dialect: "csv-unix", skipRows: 0, showProgress: true, 
fieldsOptionallyEnclosed: true, fieldsTerminatedBy: ",", linesTerminatedBy: 
"\n",fieldsEnclosedBy: '"',threads: 8, bytesPerChunk: "1G", maxRate: "2G"})

Importing from file '/usr/local/mysql/mysql-files/load_data.csv' to table 
`parallel`.`parallel_load` in MySQL Server at /tmp%2Fmysql.sock using 8 threads
[Worker01] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0
[Worker02] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0
[Worker03] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0
[Worker04] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0
[Worker05] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0
[Worker06] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0
[Worker07] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0
[Worker08] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0

100% (8.06 GB / 8.06 GB), 535.17 KB/s
File '/usr/local/mysql/mysql-files/load_data.csv' (8.06 GB) was imported in 6 min 
30.0411 sec at 18.81 MB/s
Total rows affected in parallel.parallel_load: Records: 14244516 Deleted: 0 Skipped: 0 
Warnings: 0

一共耗时:6 min 30.0411 sec

结果对比:

从测试结果上看,单线程耗时 39 min 35.5036 sec,多线程耗时6 min 30.0411 sec,多线程效率大约是单线程的6倍。

多线程load data infile的几个参数:

  • threads,指定线程数
  • bytesPerChunk,每个chunk的大小
  • maxRate,线程每秒能够处理的数据最大限制

多线程load data infile内部处理过程:

  • 分析要导入的数据文件
  • 将大的数据文件分割成数据块(chunk)
  • 多线程将数据块导入到MySQL中

发表评论