- Welcome to Geeksww.com
Fast, parallel restore from SQL dumps (mysqldump) for MySQL
I have used the technique that I am going to describe below on several occasions to restore from a SQL dump generated using mysqldump. In some cases, the regular restore that had taken 2 days and still could not finish successfully was completed in less than 12 hours using this simple technique. Please keep in mind that this is not a pure parallel method but I'd say it is almost parallel and have proven to be faster than regular single-threaded restore (depending on your hardware).
I am going to use 9 Innodb tables with millions of rows in some of them to show you an example of improved speed. The tables are very simple in my case, so you might see more improvement in your case. All tables have a maximum of four columns and are one of the following:
+--------+-------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +--------+-------------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | name | varchar(100) | YES | | NULL | | | age | float(4,1) | YES | | NULL | | | gender | enum('M','F','U') | YES | | NULL | | +--------+-------------------+------+-----+---------+----------------+ 4 rows in set (0.01 sec)
The approximate number of rows in each table are as follows:
+------------+------------+ | TABLE_NAME | TABLE_ROWS | +------------+------------+ | t1 | 12583209 | | t2 | 6291945 | | t3 | 787046 | | t4 | 19010057 | | t5 | 6291981 | | t6 | 393881 | | t7 | 12583209 | | t8 | 4000078 | | t9 | 200589 | +------------+------------+ 10 rows in set (0.20 sec)
I created two separate dump files (i) with schema only and (ii) schema and data.
mysqldump -uroot --no-data test2 > test2_schema.sql mysqldump -uroot -h127.0.0.1 -p test2 > test2.sql
I restored using the regular restore method below:
$ time mysql -uroot -h127.0.0.1 test3 < test2.sql real 13m57.107s user 0m50.160s sys 0m3.106s
It took around 14 minutes for the backup to finish successfully. Now, I am going to explain the technique and show you the results running on the exact same hardware (basically, everything else remains the same).
First, I am to going to explain a few concepts and background:
- mysql CLI program has an option --force that allows you to continue restoring even if an SQL error occurs.
- I created a skeleton schema (test4) using the schema-only dump file.
mysql -uroot -h127.0.0.1 test4 < test2_schema.sql
- Also, using the table row counts (you can use table file size too) I divided the tables into three groups and created a separate user for each group (this is why I needed the tables created).
Group 1 contains t1, t2, t3
Group 2 contains t4, t5, t6
Group 3 contains t7, t8, t9
- I used the following GRANT commands to create three separate users, one for each group (see above) with INSERT privileges.
GRANT INSERT ON test4.t1 TO 'u1'@'127.0.0.1'; GRANT INSERT ON test4.t2 TO 'u1'@'127.0.0.1'; GRANT INSERT ON test4.t3 TO 'u1'@'127.0.0.1'; GRANT INSERT ON test4.t4 TO 'u2'@'127.0.0.1'; GRANT INSERT ON test4.t5 TO 'u2'@'127.0.0.1'; GRANT INSERT ON test4.t6 TO 'u2'@'127.0.0.1'; GRANT INSERT ON test4.t7 TO 'u3'@'127.0.0.1'; GRANT INSERT ON test4.t8 TO 'u3'@'127.0.0.1'; GRANT INSERT ON test4.t9 TO 'u3'@'127.0.0.1';
Now, create three (equal to the number of groups) separate OS sessions (or run them as background processes) to restore for each group of tables in parallel.
$ time mysql -uu1 -h127.0.0.1 --force test4 < test2.sql > /dev/null 2>&1 real 7m56.070s user 0m50.687s sys 0m3.995s $ time mysql -uu2 -h127.0.0.1 --force test4 < test2.sql > /dev/null 2>&1 real 9m27.720s user 0m50.594s sys 0m4.035s $ time mysql -uu3 -h127.0.0.1 --force test4 < test2.sql > /dev/null 2>&1 real 8m0.634s user 0m50.734s sys 0m5.031s
As these commands will run in parallel, so the maximum time taken by any session is the actual time for the whole restore process. In the example above, it is ~9.5 minutes, which is approx. 30% improvement.
I also ran another test with 10 tables and 5 groups. The regular restore took 16m and 50s, whereas the parallel restore took 11m and 9s.
The reason why this technique works is because MySQL keeps the access privilege or authentication data in memory, so it can quickly determine if the user is allowed to insert rows in a table or not. Since we are using --force option so the insert statements that throw errors are skipped. I am redirecting error and output streams to /dev/null in this example but you can log them to a log file. Also, you do not have to do all the work every time you restore, you can create users once with appropriate privileges and use restore commands to run them quickly.
I hope this will help people who are still dependent on SQL dumps for taking backups. Please keep in mind that you need multiple processors to run multiple processes.
Did this tutorial help a little? How about buy me a cup of coffee?
Please feel free to use the comments form below if you have any questions or need more explanation on anything. I do not guarantee a response.
IMPORTANT: You must thoroughy test any instructions on a production-like test environment first before trying anything on production systems. And, make sure it is tested for security, privacy, and safety. See our terms here.
tags cloud
popular searches
free download for mysql database server 5.1.5, bison, gearman, source code, php, install cairo, laptop, mysql, java, linux, install mysql, mysql initialization, mysql mysql, tools, ubuntu
Similar Tutorials:
- all-schemas: Tool to run multiple SQL queries on a MySQL database server
- MySQL Performance Optimization Part II
- How to find MySQL server uptime?
- The server requested authentication method unknown to the client [phpmyadmin]
- Install MySQL Server 5.0 and 5.1 from source code
Tutorials in 'Database Management Systems > MySQL' (more):
- What should be monitored on a busy Database/MysQL server?
- How to speed up a Mysql replica?
- The server requested authentication method unknown to the client [phpmyadmin]
- How to set default values for MySQL JSON columns
- How to delete rows in table when PK is referenced in another table
Comments (write a comment):
I wonder how that would compare to using sed to find the relevant rows for each group and piping that into mysql. Posted by: Scott N on Jun 29, 2012
i'd use --no-beep option if on windows, since the beeping on errors slows it down! Posted by: sbester on Jun 29, 2012
Pretty intriguing, IMHO overcomplicated though. Why don't you just make separate dumps for each table and then restore them in as many threads as the number of tables/CPU cores/whatever? Also tools like http://www.mydumper.org/ will make your life easier. Posted by: Przemek on Jun 29, 2012
How do you manage foreign keys ?
As said above, mydumper can do the job Posted by: Cédric on Jun 29, 2012
I developed simple tools to dump and restore tables in parallel using GNU Parallel:
https://github.com/tart/SimpleMySQLParallelDumpRestore Posted by: Emre H on Jun 30, 2012
@Przemek: what if you have to take a consistent snapshot of database using table locks or --single-transaction?
multiple dump files wont work in such cases. Posted by: Shahryar on Jan 06, 2013
@C©dric: for foreign keys, you can put the tables with dependencies first and then other dependent tables. Also, you can simply turn foreign key checks OFF while the restore(s) are running.
Read: http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_foreign_key_checks Posted by: Shahryar on Jan 06, 2013
leave a comment