Thursday, August 1, 2013

InnoDB Full-text Search in MySQL 5.6: Part 3, Performance [feedly]

InnoDB Full-text Search in MySQL 5.6: Part 3, Performance
http://www.mysqlperformanceblog.com/2013/07/31/innodb-full-text-search-in-mysql-5-6-part-3/

This is part 3 of a 3 part series covering the new InnoDB full-text search features in MySQL 5.6. To catch up on the previous parts, see part 1 or part 2

Some of you may recall a few months ago that I promised a third part in my InnoDB full-text search (FTS) series, in which I'd actually take a look at the performance of InnoDB FTS in MySQL 5.6 versus traditional MyISAM FTS. I hadn't planned on quite such a gap between part 2 and part 3, but as they say, better late than never. Recall that we have been working with two data sets, one which I call SEO (8000-keyword-stuffed web pages) and the other which I call DIR (800K directory records), and we are comparing MyISAM FTS in MySQL 5.5.30 versus InnoDB FTS in MySQL 5.6.10.

For reference, although this is not really what I would call a benchmark run, the platform I'm using here is a Core i7-2600 3.4GHz, 32GiB of RAM, and 2 Samsung 256GB 830 SSDs in RAID-0. The OS is CentOS 6.4, and the filesystem is XFS with dm-crypt/LUKS. All MySQL settings are their respective defaults, except for innodb_ft_min_token_size, which is set to 4 (instead of the default of 3) to match MyISAM's default ft_min_word_len.

Also, recall that the table definition for the DIR data set is:

CREATE TABLE dir_test (    id INT UNSIGNED NOT NULL PRIMARY KEY,    full_name VARCHAR(100),    details TEXT  );

The table definition for the SEO data set is:

CREATE TABLE seo_test (   id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,   title VARCHAR(255),   body MEDIUMTEXT  );

Table Load / Index Creation

First, let's try loading data and creating our FT indexes in one pass – i.e., we'll create the FT indexes as part of the original table definition itself. In particular, this means adding "FULLTEXT KEY (full_name, details)" to our DIR tables and adding "FULLTEXT KEY (title, body)" to the SEO tables. We'll then drop these tables, drop our file cache, restart MySQL, and try the same process in two passes: first we'll load the table, and then we'll do an ALTER to add the FT indexes. All times in seconds.

EngineData Setone-pass (load)two-pass (load, alter)
MyISAMSEO3.913.96 (0.76, 3.20)
InnoDBSEO3.7777.32 (1.53, 5.79)
MyISAMDIR43.15944.93 (6.99, 37.94)
InnoDBDIR330.7656.99 (12.70, 44.29)

Interesting. For MyISAM, we might say that it really doesn't make too much difference which way you proceed, as the numbers from the one-pass load and the two-pass load are within a few percent of each other, but for InnoDB, we have mixed behavior. With the smaller SEO data set, it makes more sense to do it in a one-pass process, but with the larger DIR data set, the two-pass load is much faster.

Recall that when adding the first FT index to an InnoDB table, the table itself has to be rebuilt to add the FTS_DOC_ID column, so I suspect that the size of the table when it gets rebuilt has a lot to do with the performance difference on the smaller data set. The SEO data set fits completely into the buffer pool, the DIR data set does not. That also suggests that it's worth comparing the time required to add a second FT index (this time we will just index each table's TEXT/MEDIUMTEXT field). While we're at it, let's look at the time required to drop the second FT index as well. Again, all times in seconds.

EngineData SetFT Index Create TimeFT Index Drop Time
MyISAMSEO6.343.17
InnoDBSEO3.260.01
MyISAMDIR74.9637.82
InnoDBDIR24.590.01

InnoDB wins this second test all around. I'd attribute InnoDB's win here partially to not having to rebuild the whole table with second (and subsequent) indexes, but also to the fact that at least some the InnoDB data was already in the buffer pool from when the first FT index was created. Also, we know that InnoDB generally drops indexes extremely quickly, whereas MyISAM requires a rebuild of the .MYI file, so InnoDB's win on the drop test isn't surprising.

Query Performance

Recall the queries that were used in the previous post from this series:

1. SELECT id, title, MATCH(title, body) AGAINST ('arizona business records'     IN NATURAL LANGUAGE MODE) AS score FROM seo_test_{myisam,innodb} ORDER BY 3     DESC LIMIT 5;  2. SELECT id, title, MATCH(title, body) AGAINST ('corporation commission forms'     IN NATURAL LANGUAGE MODE) AS score FROM seo_test_{myisam,innodb} ORDER BY 3 DESC     LIMIT 5;  3. SELECT id, full_name, MATCH(full_name, details) AGAINST ('+james +peterson +arizona'     IN BOOLEAN MODE) AS score FROM dir_test_{myisam,innodb} ORDER BY 3 DESC LIMIT 5;  4. SELECT id, full_name, MATCH(full_name, details) AGAINST ('+james +peterson arizona'     IN BOOLEAN MODE) AS score FROM dir_test_{myisam,innodb} ORDER BY 3 DESC LIMIT 5;  5. SELECT id, full_name, MATCH(full_name, details) AGAINST ('"Thomas B Smith"'     IN BOOLEAN MODE) AS score FROM dir_test_{myisam,innodb} ORDER BY 3 DESC LIMIT 1;

The queries were run consecutively from top to bottom, a total of 10 times each. Here are the results in tabular format:

Query #EngineMin. Execution TimeAvg. Execution TimeMax. Execution Time
1MyISAM0.0079530.0081020.008409
1InnoDB0.0149860.0153310.016243
2MyISAM0.0018150.0018930.001998
2InnoDB0.0019870.0020770.002156
3MyISAM0.0007480.0008170.000871
3InnoDB0.6701100.6765400.684837
4MyISAM0.0011990.0012830.001372
4InnoDB0.0554790.0562560.060985
5MyISAM0.0084710.0085970.008817
5InnoDB0.6243050.6309590.641415

Not a lot of variance in execution times for a given query, so that's good, but InnoDB is always coming back slower than MyISAM. In general, I'm not that surprised that MyISAM tends to be faster; this is a simple single-threaded, read-only test, so none of the areas where InnoDB shines (e.g., concurrent read/write access) are being exercised here, but I am quite surprised by queries #3 and #5, where InnoDB is just getting smoked.

I ran both versions of query 5 with profiling enabled, and for the most part, the time spent in each query state was identical between the InnoDB and MyISAM versions of the query, with one exception.

InnoDB: | Creating sort index | 0.626529 |
MyISAM: | Creating sort index | 0.014588 |

That's where the bulk of the execution time is. According to the docs, this thread state means that the thread is processing a SELECT which required an internal temporary table. Ok, sure, that makes sense, but it doesn't really explain why InnoDB is taking so much longer, and here's where things get a bit interesting. If you recall part 2 in this series, query 5 actually returned 0 results when run against InnoDB with the default configuration because of the middle initial "B", and I had to set innodb_ft_min_token_size to 1 in order to get results back. For the sake of completeness, I did that again here, then restarted the server and recreated my FT index. The results? Execution time dropped by 50% and 'Creating sort index' didn't even appear in the query profile:

mysql [localhost] {msandbox} (test): SELECT id, full_name, MATCH(full_name, details) AGAINST  ('"Thomas B Smith"' IN BOOLEAN MODE) AS score FROM dir_test_innodb ORDER BY 3 DESC LIMIT 1;  +-------+----------------+-------------------+  | id    | full_name      | score             |  +-------+----------------+-------------------+  | 62633 | Thomas B Smith | 32.89915466308594 |  +-------+----------------+-------------------+  1 row in set (0.31 sec)  mysql [localhost] {msandbox} (test): show profile;  +-------------------------+----------+  | Status                  | Duration |  +-------------------------+----------+  | starting                | 0.000090 |  | checking permissions    | 0.000007 |  | Opening tables          | 0.000017 |  | init                    | 0.000034 |  | System lock             | 0.000012 |  | optimizing              | 0.000008 |  | statistics              | 0.000027 |  | preparing               | 0.000012 |  | FULLTEXT initialization | 0.304933 |  | executing               | 0.000008 |  | Sending data            | 0.000684 |  | end                     | 0.000006 |  | query end               | 0.000006 |  | closing tables          | 0.000011 |  | freeing items           | 0.000019 |  | cleaning up             | 0.000003 |  +-------------------------+----------+

Hm. It's still slower than MyISAM by quite a bit, but much faster than before. The reason it's faster is because it found an exact match and I only asked for one row, but if I change LIMIT 1 to LIMIT 2 (or limit N>1), then 'Creating sort index' returns to the tune of roughly 0.5 to 0.6 seconds, and 'FULLTEXT initialization' remains at 0.3 seconds. So this answers another lingering question: there is a significant performance impact to using a lower innodb_ft_min_token_size (ifmts), and it can work for you or against you, depending upon your queries and how many rows you're searching for. The time spent in "Creating sort index" doesn't vary too much (maybe 0.05s) between ifmts=1 and ifmts=4, but the time spent in FULLTEXT initialization with ifmts=4 was typically only a few milliseconds, as opposed to the 300ms seen here.

Finally, I tried experimenting with different buffer pool sizes, temporary table sizes, per-thread buffer sizes, and I also tried changing from Antelope (ROW_FORMAT=COMPACT) to Barracuda (ROW_FORMAT=DYNAMIC) and switching character sets from utf8 to latin1, but none of these made any difference. The only thing which seemed to provide a bit of a performance improvement was upgrading to 5.6.12. The execution times for the InnoDB FTS queries under 5.6.12 were about 5-10 percent faster than with 5.6.10, and query #2 actually performed a bit better under InnoDB than MyISAM (average execution time 0.00075 seconds faster), but other than that, MyISAM still wins on raw SELECT performance.

Three blog posts later, then, what's my overall take on InnoDB FTS in MySQL 5.6? I don't think it's great, but it's serviceable. The performance for BOOLEAN MODE queries definitely leaves something to be desired, but I think InnoDB FTS fills a need for those people who want the features and capabilities of InnoDB but can't modify their existing applications or who just don't have enough FTS traffic to justify building out a Sphinx/Solr/Lucene-based solution.

The post InnoDB Full-text Search in MySQL 5.6: Part 3, Performance appeared first on MySQL Performance Blog.  





No comments:

Post a Comment