MySQL table_cache, information_schema, and open files limit
Recently, on our production servers we started seeing strange behavior by MySQL server while running queries on information_schema. Before getting into the gory details, I'd first explain the environment then the problem and its solution.
We have MySQL version 5.0.67 instances with 81 databases in master-master replication setting. We are only using MyISAM and MyISAM merge tables. Here are more details about the database and table counts.
mysql> use information_schema; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> SELECT COUNT(*) FROM tables; +----------+ | COUNT(*) | +----------+ | 23687 | +----------+ 1 row in set (4 min 37.46 sec) mysql> SELECT COUNT(*) FROM schemata; +----------+ | COUNT(*) | +----------+ | 81 | +----------+ 1 row in set (0.00 sec)
As you can see that we have 23,687 tables in our database. Please note the time it took to get just the count of tables (almost 5 minutes). The problem that we started seeing recently appeared when we tried querying some of the tables in information_schema database. Our queries had joins on TABLES and COLUMNS tables in information_schema. One of the errors we got was:
..Can't read dir of '.'
ERROR 1 (HY000): Can't create/write to file '/data1/tmp/#sql_103a_1.MYD' (Errcode: 24)
I had a slight clue from the error messages that it could be a filesystem related issue. So, I looked at database user permissions on tmp folder, tmp folder usage %, and permissions on data dir etc. Nothing solved the problem for me, then I looked at the values of following variables.
table_cache=32768 and open_files_limit=65535
table_cache seemed quite high to me, so I decided to run a small test to find out what happens when I query tables in information_schema database.
First of all, I used FLUSH TABLES WITH READ LOCK; to flush all the open tables to disk. Here is the output of a few relevant status variables after running the command.
[email protected] (none)>SHOW STATUS LIKE 'open%'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | Open_files | 9 | | Open_streams | 0 | | Open_tables | 0 | | Opened_tables | 0 | +---------------+-------+ 4 rows in set (0.00 sec)
Then I unlocked the tables and tried running the following SQL:
use information_schema; SELECT COUNT(*) FROM COLUMNS;
And in a different MySQL session on the same server instance I started monitoring the output of the SHOW STATUS command mentioned above. In about a minute the output of SHOW STATUS command looked like below:
[email protected] (none)>SHOW STATUS LIKE 'open%'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | Open_files | 22696 | | Open_streams | 0 | | Open_tables | 8095 | | Opened_tables | 0 | +---------------+-------+ 4 rows in set (0.00 sec)
A similar test was performed on TABLES table and the output (after starting from scratch) of SHOW STATUS looked like below. In under a minute the output looked like below.
[email protected] (none)>SHOW STATUS LIKE 'open%'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | Open_files | 32848 | | Open_streams | 0 | | Open_tables | 12013 | | Opened_tables | 0 | +---------------+-------+ 4 rows in set (0.00 sec)
Now, that kind of explained why running a query that has a join on TABLES and COLUMNS tables could cause so many files to be opened. In my case, where I did not let the queries finish (since I did not want to break the system), I had already opened almost the max no. of allowed files and if I had let the query execute I'd have hit the max.
One obvious solution was to increase the value of open_files_limit system variable but that required a MySQL instance restart, so I instead changed the value of table_cache to an arbitrary value ie. 5000, since I don't know much about the queries we execute.
Now, you might be thinking what about performance after decreasing the cache size almost 7 times? You'd be amazed to hear that performance actually improves by setting table_cache to a reasonable no.(will talk about reasonable nos to choose for the variable some other time). The reason for such behaviour is that, if you have large no. of tables with complicated queries joining several tables and multiple connections running those complicated queries, you might end up using all your file descriptors' cache (table_cache) in that case MySQL uses an algorithm to find the least recently used descriptor, closes it, and replaces it with a new descriptor. But, the issue is probably in the algorithm that chooses the next file to close. Apparently, it takes too long to find the file to close. Hence, the file open time equals the file close time causing performance degradation in most cases (almost O(n), where n is no. of items in cache).
We can now run the same queries on information_schema without affecting other sessions. Here is a MySQL bug related to information_schema's slow performance and my article explains the reason :)
Did this tutorial help a little? How about buy me a cup of coffee?
Please feel free to use the comments form below if you have any questions or need more explanation on anything. I recommend thoroughy testing on a production-like test system first before moving to production.