Fast, parallel restore from SQL dumps (mysqldump) for MySQL
I have used the technique that I am going to describe below on several occasions to restore from a SQL dump generated using mysqldump. In some cases, the regular restore that had taken 2 days and still could not finish successfully was completed in less than 12 hours using this simple technique. Please keep in mind that this is not a pure parallel method but I'd say it is almost parallel and have proven to be faster than regular single-threaded restore (depending on your hardware).
I am going to use 9 Innodb tables with millions of rows in some of them to show you an example of improved speed. The tables are very simple in my case, so you might see more improvement in your case. All tables have a maximum of four columns and are one of the following:
+--------+-------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +--------+-------------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | name | varchar(100) | YES | | NULL | | | age | float(4,1) | YES | | NULL | | | gender | enum('M','F','U') | YES | | NULL | | +--------+-------------------+------+-----+---------+----------------+ 4 rows in set (0.01 sec)
The approximate number of rows in each table are as follows:
+------------+------------+ | TABLE_NAME | TABLE_ROWS | +------------+------------+ | t1 | 12583209 | | t2 | 6291945 | | t3 | 787046 | | t4 | 19010057 | | t5 | 6291981 | | t6 | 393881 | | t7 | 12583209 | | t8 | 4000078 | | t9 | 200589 | +------------+------------+ 10 rows in set (0.20 sec)
I created two separate dump files (i) with schema only and (ii) schema and data.
mysqldump -uroot --no-data test2 > test2_schema.sql mysqldump -uroot -h127.0.0.1 -p test2 > test2.sql
I restored using the regular restore method below:
$ time mysql -uroot -h127.0.0.1 test3 < test2.sql real 13m57.107s user 0m50.160s sys 0m3.106s
It took around 14 minutes for the backup to finish successfully. Now, I am going to explain the technique and show you the results running on the exact same hardware (basically, everything else remains the same).
First, I am to going to explain a few concepts and background:
- mysql CLI program has an option --force that allows you to continue restoring even if an SQL error occurs.
- I created a skeleton schema (test4) using the schema-only dump file.
mysql -uroot -h127.0.0.1 test4 < test2_schema.sql
- Also, using the table row counts (you can use table file size too) I divided the tables into three groups and created a separate user for each group (this is why I needed the tables created).
Group 1 contains t1, t2, t3
Group 2 contains t4, t5, t6
Group 3 contains t7, t8, t9
- I used the following GRANT commands to create three separate users, one for each group (see above) with INSERT privileges.
GRANT INSERT ON test4.t1 TO 'u1'@'127.0.0.1'; GRANT INSERT ON test4.t2 TO 'u1'@'127.0.0.1'; GRANT INSERT ON test4.t3 TO 'u1'@'127.0.0.1'; GRANT INSERT ON test4.t4 TO 'u2'@'127.0.0.1'; GRANT INSERT ON test4.t5 TO 'u2'@'127.0.0.1'; GRANT INSERT ON test4.t6 TO 'u2'@'127.0.0.1'; GRANT INSERT ON test4.t7 TO 'u3'@'127.0.0.1'; GRANT INSERT ON test4.t8 TO 'u3'@'127.0.0.1'; GRANT INSERT ON test4.t9 TO 'u3'@'127.0.0.1';
Now, create three (equal to the number of groups) separate OS sessions (or run them as background processes) to restore for each group of tables in parallel.
$ time mysql -uu1 -h127.0.0.1 --force test4 < test2.sql > /dev/null 2>&1 real 7m56.070s user 0m50.687s sys 0m3.995s $ time mysql -uu2 -h127.0.0.1 --force test4 < test2.sql > /dev/null 2>&1 real 9m27.720s user 0m50.594s sys 0m4.035s $ time mysql -uu3 -h127.0.0.1 --force test4 < test2.sql > /dev/null 2>&1 real 8m0.634s user 0m50.734s sys 0m5.031s
As these commands will run in parallel, so the maximum time taken by any session is the actual time for the whole restore process. In the example above, it is ~9.5 minutes, which is approx. 30% improvement.
I also ran another test with 10 tables and 5 groups. The regular restore took 16m and 50s, whereas the parallel restore took 11m and 9s.
The reason why this technique works is because MySQL keeps the access privilege or authentication data in memory, so it can quickly determine if the user is allowed to insert rows in a table or not. Since we are using --force option so the insert statements that throw errors are skipped. I am redirecting error and output streams to /dev/null in this example but you can log them to a log file. Also, you do not have to do all the work every time you restore, you can create users once with appropriate privileges and use restore commands to run them quickly.
I hope this will help people who are still dependent on SQL dumps for taking backups. Please keep in mind that you need multiple processors to run multiple processes.
Did this tutorial help a little? How about buy me a cup of coffee?
Please feel free to use the comments form below if you have any questions or need more explanation on anything. I recommend thoroughy testing on a production-like test system first before moving to production.