Tuesday, September 27, 2016

If enforcing data retention via DROP PARTITION, better watch your auto_increment counter

One of the main tasks that any DBA has to deal with is certainly data retention. Especially when your developers like to store logging and tracking information inside the database. This is not something that I would recommend (don't try it at home!) but when you're late to the party (read: someone had taken this path before you joined) , then you'll have to live with it.

Data retention in MySQL is usually applied by partitioning the table by RANGE on a selected column, usually containing the row creation timestamp. New partitions are created in advance for the coming days or weeks or months, and a script is used that will run nightly and drop partitions that are older than a set retention.

There are many tools that can be used to automate data retention; I have chosen the excellent pdb-parted, a nice and cozy perl script that you can find in the PalominoDB repository on GitHub (since PalominoDB is no longer in existence, I have forked the repository under my own account, you can also get the script here).

It goes without saying that when I first joined, although there were many tables already containing millions of rows of logging, statistical and tracking data, there were no pruners in effect, i.e. no retention at all was applied. Over the next weeks and months I managed to get the required information from the various dev teams and started pruning the bigger tables.  After a couple of months of strict diet, I had recovered about 250 GB of space out of our main cluster, or about 30% of total dataset size at that time. Not bad!  Backup and restore times improved, and also their footprint was much smaller.

Over the next several months the company kept growing very fast. A DWH Team was set up and they started ingesting data from MySQL inside their own systems (Vertica based). Some of the data that was ingested was coming from these pruned tables.

Everything went well for some time, then I received a visit from one of the DWH guys with a strange issue: they were seeing duplicate keys in the data ingested from one of the partitioned tables.  I knew that, in theory, this should not be possible, as each and every table of that type has an auto increment primary key, and we had no reports of duplicate keys in production when data was first inserted.

At first I tried to blame the ETLs that they use to import data from MySQL to Vertica, but soon I realised that the table that they were having the issue with had a notable difference:  it was really small in size, only few hundreds rows, while all other tables contained hundreds of millions.  So I asked them for the MAX value of the auto increment key they had in their systems, and to my surprise, it was higher than the MAX value we had in MySQL!!! D'oh.

After some investigation it turned out that this particular table was only used for debugging, therefore the application  was not writing to it  unless the debug mode was activated. Usually, debug was only turned on for small amount of time (e.g. hours) and then turned off again, and the saved data was used for troubleshooting.

Anyways, it was clear that somehow, the auto_increment in this table was getting reset at some point; it was the only possibility for having an higher value in DWH than what we had in MySQL, as DWH was basically ingesting our rows directly, without any modification whatsoever.

It was easy enough for me to point the finger here, as the only statement that was issued against this table, apart from INSERT, was.... DROP PARTITION.

You will object that DROP PARTITION does not reset the auto_increment counter of a table. Of course I agree, but there was no other possibility here, so it had to be some kind of bug.

It was easy enough to reproduce the behaviour and to confirm that, if the partition you are dropping leaves no other rows in the table, the auto_increment counter is, in fact, reset to 1. You can find the steps in the bug I have submitted to Oracle (please see below).

In my opinion, the behaviour of auto_increment in this situation is not consistent:

  • TRUNCATE TABLE (DDL) empties the table, counter is reset => OK
  • DELETE * FROM TABLE (DML) empties the table, counter is NOT reset => OK
  • DROP PARTITION (DDL) does NOT empty the table, merely drops some data - counter is reset or not depending on content of other partitions  => WEIRD
I really feel that the auto_increment should be left alone when DROP PARTITION is issued, regardless what the content of the other partitions is.

I invite you to check the bug content for some discussion that I had with the Oracle bugs team, where they tried to convince me that this behaviour was correct at first, then acknowledged, but there was not much activity after that.



No comments:

Post a Comment