Friday, May 31, 2013

The small improvements of MySQL 5.6: Duplicate Index Detection

http://www.mysqlperformanceblog.com/2013/05/31/the-small-improvements-of-mysql-5-6-duplicate-index-detection/

Friday, May 31, 2013 10:41 AMThe small improvements of MySQL 5.6: Duplicate Index DetectionMySQL Performance BlogJaime Crespo

Here at the MySQL Performance Blog, we've been discussing the several new features that MySQL 5.6 brought: GTID-based replicationInnoDB FulltextMemcached integration, a more complete performance schema, online DDL and several other InnoDB and query optimizer improvements. However, I plan to focus on a series of posts on the small but handy improvements – changes and bug corrections – in MySQL 5.6 that can make our lives easier and have passed almost unnoticed by most (not all) DBAs.

Duplicate Index Detection

I commented about this on my last webinar, but did not have time to analyze it in-depth.  If you try to do something like this in MySQL 5.5, you will succeed without errors or warnings:

mysql> ALTER TABLE test ADD INDEX (col2);  Query OK, 0 rows affected (0.22 sec)  Records: 0  Duplicates: 0  Warnings: 0  mysql> ALTER TABLE test ADD INDEX (col2);  Query OK, 0 rows affected (0.25 sec)  Records: 0  Duplicates: 0  Warnings: 0  mysql> SHOW CREATE TABLE test\G  *************************** 1. row ***************************         Table: test  Create Table: CREATE TABLE `test` (    `id` int(11) NOT NULL AUTO_INCREMENT,    `col2` int(11) DEFAULT NULL,    `col3` varchar(200) DEFAULT NULL,    PRIMARY KEY (`id`),    KEY `col2` (`col2`),    KEY `col2_2` (`col2`)  ) ENGINE=InnoDB DEFAULT CHARSET=latin1  1 row in set (0.00 sec)

In previous versions of MySQL, you can create two indexes with the same columns (in the same order) and the server will not complain.

If we execute the same sentences in MySQL 5.6, the second ALTER will also succeed -and the index will be created-, but we will get a warning(note severity, to be exact):

mysql> ALTER TABLE test ADD INDEX (col2);  Query OK, 0 rows affected (0.36 sec)  Records: 0  Duplicates: 0  Warnings: 0  mysql> ALTER TABLE test ADD INDEX (col2);  Query OK, 0 rows affected, 1 warning (0.56 sec)  Records: 0  Duplicates: 0  Warnings: 1  mysql> SHOW WARNINGS\G  *************************** 1. row ***************************    Level: Note     Code: 1831  Message: Duplicate index 'col2_2' defined on the table 'test.test'. This is deprecated and will be disallowed in a future release.  1 row in set (0.00 sec)

As the message points correctly, this is a human mistake, as it is a waste of resources that could potentially impact our performance, and should be avoided. By the way, a good practice to avoid this is always naming your keys with a consistent pattern. This new behavior was introduced in 5.6.7 with the closing of this bug (although this was initially accepted as a bug as early as 2005!).

The report explains more in detail what the "will be disallowed in a future release" means. In MySQL 5.7 the checks will be stricter: in the default SQL mode, a duplicate index will throw a warning instead of a note. In strict mode, it will throw an error and the second ALTER will fail, preventing the creation of the duplicate index.

Does it mean that tools like pt-duplicate-key-checker will not be necessary for MySQL 5.6? Let's have a look at the code implementing this feature. The warning will only be thrown if the index has not been created automatically, it is not a foreign key, and it has the exact column definition in the same order. In other words, it checks for duplicate keys, but not redundant ones. What is the difference? Let's see an example. If we execute:

mysql> ALTER TABLE test ADD INDEX (col2);  Query OK, 0 rows affected (0.34 sec)  Records: 0  Duplicates: 0  Warnings: 0  mysql> ALTER TABLE test ADD INDEX (col2, col3);  Query OK, 0 rows affected (0.39 sec)  Records: 0  Duplicates: 0  Warnings: 0

We get no warnings and no errors in 5.6, as the indexes are different. But as you may know, we can use the second index not only for filtering on both columns, but also for filtering by just the first one. Why can't MySQL enforce this kind of constraints? For many reasons: the first one because it would break 99% of all applications out there that use MySQL, for which we at Percona tend to find redundant indexes. And second, because in some cases, we may need to have what at first seems redundant indexes but effectively they are not -for example, if one of the two indexes was unique or a foreign key.

This is the output of our tool when run on the same table, correctly identifying the redundancy:

$ pt-duplicate-key-checker --tables test.test  # ########################################################################  # test.test  # ########################################################################  # col2 is a left-prefix of col2_2  # Key definitions:  #   KEY `col2` (`col2`),  #   KEY `col2_2` (`col2`,`col3`)  # Column types:  #         `col2` int(11) default null  #         `col3` varchar(200) default null  # To remove this duplicate index, execute:  ALTER TABLE `test`.`test` DROP INDEX `col2`;  # ########################################################################  # Summary of indexes  # ########################################################################  # Size Duplicate Indexes   5  # Total Duplicate Indexes  1  # Total Indexes            3

Additionally, pt-duplicate-key-checker will detect subtle redundancies that are engine-dependent, like redundant suffixes for secondary keys in InnoDB. As some of this redundancies could be necessary, depending on the query optimizer and the MySQL version, we always recommend to check manually the optimizations proposed by Percona Toolkit. The MySQL server, of course, cannot risk to block directly all cases.

A set of MySQL utilities were introduced by Oracle recently, which includesmysqlindexcheck, similar to pt-duplicate-key-checker, but it does not detect all cases. For example:

mysql> alter table test add index redundant (col2, id);  Query OK, 0 rows affected (1.57 sec)  Records: 0  Duplicates: 0  Warnings: 0  $ mysqlindexcheck --server=user:pass@localhost test.test  # Source on localhost: ... connected.  $ pt-duplicate-key-checker --tables test.test  # ########################################################################  # test.test  # ########################################################################  # Key redundant ends with a prefix of the clustered index  # Key definitions:  #   KEY `redundant` (`col2`,`id`)  #   PRIMARY KEY (`id`),  # Column types:  #         `col2` int(11) default null  #         `id` int(11) not null auto_increment  # To shorten this duplicate clustered index, execute:  ALTER TABLE `test`.`test` DROP INDEX `redundant`, ADD INDEX `redundant` (`col2`);  # ########################################################################  # Summary of indexes  # ########################################################################  # Size Duplicate Indexes   9  # Total Duplicate Indexes  1  # Total Indexes            2

By the way, if you want to get more familiar with this and other particularities of the latest MySQL GA release, have a look at our upcoming sessions for the"Moving to 5.6″ training course in America (AustinSan Jose) and Europe (ManchesterUtrecht).

The post The small improvements of MySQL 5.6: Duplicate Index Detection appeared first on MySQL Performance Blog.




No comments:

Post a Comment