Fast, parallel restore from SQL dumps (mysqldump) for MySQL

I have used the technique that I am going to describe below on several occasions to restore from a SQL dump generated using mysqldump. In some cases, the regular restore that had taken 2 days and still could not finish successfully was completed in less than 12 hours using this simple technique. Please keep in mind that this is not a pure parallel method but I'd say it is almost parallel and have proven to be faster than regular single-threaded restore (depending on your hardware).

I am going to use 9 Innodb tables with millions of rows in some of them to show you an example of improved speed. The tables are very simple in my case, so you might see more improvement in your case. All tables have a maximum of four columns and are one of the following:

+--------+-------------------+------+-----+---------+----------------+
| Field  | Type              | Null | Key | Default | Extra          |
+--------+-------------------+------+-----+---------+----------------+
| id     | int(11)           | NO   | PRI | NULL    | auto_increment |
| name   | varchar(100)      | YES  |     | NULL    |                |
| age    | float(4,1)        | YES  |     | NULL    |                |
| gender | enum('M','F','U') | YES  |     | NULL    |                |
+--------+-------------------+------+-----+---------+----------------+
4 rows in set (0.01 sec)

The approximate number of rows in each table are as follows:

+------------+------------+
| TABLE_NAME | TABLE_ROWS |
+------------+------------+
| t1         |   12583209 |
| t2         |    6291945 |
| t3         |     787046 |
| t4         |   19010057 |
| t5         |    6291981 |
| t6         |     393881 |
| t7         |   12583209 |
| t8         |    4000078 |
| t9         |     200589 |
+------------+------------+
10 rows in set (0.20 sec)

I created two separate dump files (i) with schema only and (ii) schema and data.

mysqldump -uroot --no-data test2 > test2_schema.sql
mysqldump -uroot -h127.0.0.1 -p test2 > test2.sql

I restored using the regular restore method below:

$ time mysql -uroot -h127.0.0.1 test3 < test2.sql
real    13m57.107s
user    0m50.160s
sys     0m3.106s

It took around 14 minutes for the backup to finish successfully. Now, I am going to explain the technique and show you the results running on the exact same hardware (basically, everything else remains the same).

First, I am to going to explain a few concepts and background:

  • mysql CLI program has an option --force that allows you to continue restoring even if an SQL error occurs.
  • I created a skeleton schema (test4) using the schema-only dump file.

    mysql -uroot -h127.0.0.1 test4 < test2_schema.sql

  • Also, using the table row counts (you can use table file size too) I divided the tables into three groups and created a separate user for each group (this is why I needed the tables created).

    Group 1 contains t1, t2, t3

    Group 2 contains t4, t5, t6

    Group 3 contains t7, t8, t9

  • I used the following GRANT commands to create three separate users, one for each group (see above) with INSERT privileges.

    GRANT INSERT ON test4.t1 TO 'u1'@'127.0.0.1';
    GRANT INSERT ON test4.t2 TO 'u1'@'127.0.0.1';
    GRANT INSERT ON test4.t3 TO 'u1'@'127.0.0.1';
    GRANT INSERT ON test4.t4 TO 'u2'@'127.0.0.1';
    GRANT INSERT ON test4.t5 TO 'u2'@'127.0.0.1';
    GRANT INSERT ON test4.t6 TO 'u2'@'127.0.0.1';
    GRANT INSERT ON test4.t7 TO 'u3'@'127.0.0.1';
    GRANT INSERT ON test4.t8 TO 'u3'@'127.0.0.1';
    GRANT INSERT ON test4.t9 TO 'u3'@'127.0.0.1';
    

Now, create three (equal to the number of groups) separate OS sessions (or run them as background processes) to restore for each group of tables in parallel.

$ time mysql -uu1 -h127.0.0.1 --force test4 < test2.sql > /dev/null 2>&1
real    7m56.070s
user    0m50.687s
sys     0m3.995s

$ time mysql -uu2 -h127.0.0.1 --force test4 < test2.sql > /dev/null 2>&1
real    9m27.720s
user    0m50.594s
sys     0m4.035s

$ time mysql -uu3 -h127.0.0.1 --force test4 < test2.sql > /dev/null 2>&1
real    8m0.634s
user    0m50.734s
sys     0m5.031s

As these commands will run in parallel, so the maximum time taken by any session is the actual time for the whole restore process. In the example above, it is ~9.5 minutes, which is approx. 30% improvement.

I also ran another test with 10 tables and 5 groups. The regular restore took 16m and 50s, whereas the parallel restore took 11m and 9s.

The reason why this technique works is because MySQL keeps the access privilege or authentication data in memory, so it can quickly determine if the user is allowed to insert rows in a table or not. Since we are using --force option so the insert statements that throw errors are skipped. I am redirecting error and output streams to /dev/null in this example but you can log them to a log file. Also, you do not have to do all the work every time you restore, you can create users once with appropriate privileges and use restore commands to run them quickly.

I hope this will help people who are still dependent on SQL dumps for taking backups. Please keep in mind that you need multiple processors to run multiple processes.

Did this tutorial help a little? How about buy me a cup of coffee?

Buy me a coffee at ko-fi.com

Please feel free to use the comments form below if you have any questions or need more explanation on anything. I do not guarantee a response.

IMPORTANT: You must thoroughy test any instructions on a production-like test environment first before trying anything on production systems. And, make sure it is tested for security, privacy, and safety. See our terms here.

Comments (write a comment):

I wonder how that would compare to using sed to find the relevant rows for each group and piping that into mysql. Posted by: Scott N on Jun 29, 2012

i'd use --no-beep option if on windows, since the beeping on errors slows it down! Posted by: sbester on Jun 29, 2012

Pretty intriguing, IMHO overcomplicated though. Why don't you just make separate dumps for each table and then restore them in as many threads as the number of tables/CPU cores/whatever? Also tools like http://www.mydumper.org/ will make your life easier. Posted by: Przemek on Jun 29, 2012

How do you manage foreign keys ?
As said above, mydumper can do the job Posted by: Cédric on Jun 29, 2012

I developed simple tools to dump and restore tables in parallel using GNU Parallel:

https://github.com/tart/SimpleMySQLParallelDumpRestore Posted by: Emre H on Jun 30, 2012

@Przemek: what if you have to take a consistent snapshot of database using table locks or --single-transaction?

multiple dump files wont work in such cases. Posted by: Shahryar on Jan 06, 2013

@C©dric: for foreign keys, you can put the tables with dependencies first and then other dependent tables. Also, you can simply turn foreign key checks OFF while the restore(s) are running.

Read: http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_foreign_key_checks Posted by: Shahryar on Jan 06, 2013

leave a comment on tutorial leave a comment