Wednesday, December 14, 2016

In place transparent compression of MySQL binary logs

Any DBA who has administered a busy master knows how fast the disk space occupied by binary logs may grow.  DBAs have no control on this: the growth depends on the workload, and the workload depends on many factors, e.g.:

- application changes (the applications start writing more due to code changes)
- traffic changes (the peak season arrive, your workload doubles in size)
- infrastructure changes (the devops add more servers)
- business changes (new business flows adds to existing workload)

So either you have being thoughtful and have planned in advance for a large enough storage space (to handle the increase in number of binary logs), or,  sooner or later, you will face the usual dilemma - how many retention days dare you give up to accommodate for the binlog growth?

In my case, I was very thoughtful, but the boss didn't listen and gave me servers with very limited binlog storage space and, more important, using internal disks so no way to add more disk space on demand. Pity, as I had previously found a way to move binlogs to a different filesystem without downtime. But unfortunately, there is no "different filesystem" this time. I'm stuck with what I have.

So, the obvious answer at first was to reduce the retention of binary logs via the
expire_logs_days variable. Not the ideal option, as in my situation, I had to reduce it to a mere 2 days, not even enough to survive a weekend. At the same time I didn't like the idea to keep more days and be waken up at 3am by a filesystem full alert.

So the idea of compressing the binary logs came to my mind, but I wanted to do it in a transparent way, preserving the existing binlog retention mechanisms (expire_logs_days variable and PURGE BINARY LOGS command should continue to work as before) and without breaking the MySQL server and/or replication. Ideally, the timestamps of the files themselves should also preserved.

I decided to try and compress the binary logs in place. In place means that when you look at your binlogs folder, you don't notice anything unusual.  Here's (a part of ) the listing of one of our servers with compression active:

-rw-rw---- 1 mysql mysql  13631245 Dec 14 14:17 dbmd01.079826
-rw-rw---- 1 mysql mysql  13909321 Dec 14 14:23 dbmd01.079827
-rw-rw---- 1 mysql mysql  13656948 Dec 14 14:30 dbmd01.079828
-rw-rw---- 1 mysql mysql  13915222 Dec 14 14:37 dbmd01.079829
-rw-rw---- 1 mysql mysql 104883640 Dec 14 14:44 dbmd01.079830
-rw-rw---- 1 mysql mysql 104898032 Dec 14 14:50 dbmd01.079831
-rw-rw---- 1 mysql mysql 104861122 Dec 14 14:57 dbmd01.079832

You can spot which files are compressed by looking at the file size, but everything else is preserved. Looking with file, here's what's in each of them:

dbmd01.079826: gzip compressed data
dbmd01.079827: gzip compressed data
dbmd01.079828: gzip compressed data
dbmd01.079829: gzip compressed data
dbmd01.079830: MySQL replication log
dbmd01.079831: MySQL replication log
dbmd01.079832: MySQL replication log

Basically, after compression, the compressed file retains the original file name. This is to keep MySQL happy - you know, these files are tracked in the index file and read when mysqld starts.
Speaking of which, here's what happens when you start mysqld  with in-place compression:

2016-10-26 03:06:13 0 [Note] /usr/sbin/mysqld (mysqld 5.6.34-79.1-log) starting as process 1407 ...
[ ... ]
2016-10-26 03:06:13 1407 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.34-79.1 started; log sequence number 2722915
2016-10-26 03:06:13 1407 [ERROR] Binlog has bad magic number;  It's not a binary log file that can be used by this version of MySQL
2016-10-26 03:06:13 1407 [ERROR] Binlog has bad magic number;  It's not a binary log file that can be used by this version of MySQL
2016-10-26 03:06:13 1407 [ERROR] Binlog has bad magic number;  It's not a binary log file that can be used by this version of MySQL
2016-10-26 03:06:13 1407 [ERROR] Binlog has bad magic number;  It's not a binary log file that can be used by this version of MySQL

Although these entries are flagged as ERROR, actually mysqld just ignores the compressed files, and more importantly, does not touch them.  

This way, compressed binlog files can be managed exactly like uncompressed, standard ones. Of course, I need to remember that if I need to peruse their content, I have to uncompress them before feeding them to the mysqlbinlog utility.

Last but not least, in order to ensure that we do not inadvertently compress binlogs that haven't been downloaded by all slaves yet,  the following extra safety rules are enforced:

- do not compress any binlog which is not at least 1 hour old 
- only compress binlogs that have a sequence number which is less than the current sequence number  of the most lagging slave

The script runs every 10 minutes, extracts the list of the binary logs which aren't compressed yet, applies the above rules, then it proceeds to compress them.

I have set expire_logs_days back to 7 days and have plenty of disk space for extra growth now....



You can find the  compress_binlog.sh which I crafted for the above on my GitHub page.

One caveat  - script hasn't been tested with binlog sequence numbers greater than 999,999. I am actually curious to see what happens to the binlog file name when we reach one million - need to either be very patient, or have a look at the source code... :-)

Enjoy!

Rick

9 comments:

  1. I wouldn't call this transparent. However I think this can be useful. Note that you might have to edit the binlog index file if you want to make the binlogs available again after uncompressing them. Another option is to use mysqlbinlog on the decompressed files.

    Note that Tencent has a patch to get MySQL to compress the binlog files.
    https://bugs.mysql.com/bug.php?id=48396
    https://github.com/TencentDBA/TMySQL

    ReplyDelete
    Replies
    1. Hi Daniel,

      thank you for your comment.

      By "transparent" I meant to say that MySQL has no idea that we have compressed the binary logs and therefore handles them, well, transparently, including automatic retention.

      Saw the bug report and the patch from Tencent, but I am using binary distributions, and am not planning on recompiling from source and/or adapting this patch to new releases at this time.

      My simple script actually does the trick and follows the KISS principle, which actually is something I strongly believe into.

      If I were to uncompress the binary logs I would do it in place again - no need to edit the index file.

      Thanks

      Rick

      Delete
  2. It'd be cool if you could store the binlogs on a compressed filesystem so they're still readable by mysql. Even cooler if you could somehow setup tiered storage so recent binlogs are on fast storage and older ones get rotated out to the compressed filesystem.

    Am I curious with your current gzip design what's the benefit of keeping gzipped binlogs on the master if mysql can't read them? Archiving perhaps? I'd love to hear your use case.

    ReplyDelete
    Replies
    1. Hi Dane,

      a compressed filesystem would be nice, but could also be a burden if you have many slaves competing to read a binlog file, and a busy master which has a write intensive workload . Remember that the current binlog is kept open and has one writer and many readers, I feel that the overhead of a compressed filesystem wouldn't play nice with this due to the contention point which happens on this single binlog file. And these servers aren't connected to a Storage Area Network unfortunately so we don't have fancy options here, hence we built this simple compression setup which actually works well for this particular situation.

      Your question about what benefit we get actually makes sense: why don't we just delete them if we are short on space, you say.

      Well, to us, the reason for keeping one week full of binlogs is twofold: first, to have a last resort backup if for some reason the "official" daily backup comes up corrupt and we are hit by a disaster; and second, to be able to inspect statements that have been issued against the master in the last week, in order to troubleshoot application issues (it happens to us quite often that we are asked questions like, "who put that value in that column?"). For this last purpose, we just have to uncompress them before feeding them to the mysqlbinlog utility.

      Thanks,
      Rick

      Delete
    2. I forgot to mention one more reason (maybe the most important one) for keeping one week of binlogs (although I mentioned it briefly in the post) - if replication breaks e.g. during the weekend, and you don't notice it (alerts fail for example) you may find yourself on monday morning with an unrecoverable slave because the needed binlogs are no longer on the master.

      Delete
    3. I'm not sure that would happen in practice with say zfs. If a file (actually a block) is hot it will be in the arc cache, or at worst the l2arc. Depending on the zfs version this could be compressed in ram not not (older versions not) However it has been shown in most cases its still quicker to come from ram compressed than disk, and from disk compressed than uncompressed. Their are obviously edge cases, so it's always vital to benchmark it yourself to verify your situation. Also bare in mind you will never get the same compression ratios as say gzip on the file itself even if you use the same settings. This is due to the file system compressing at the block level as opposed to the whole file level. This means the compression window is much smaller, and hence more limited opportunities.

      One good thing about zfs compression is you can toggle it on and off seamlessly, so if you start to notice problems you can revert it.

      In theory the same things should be evident on btrfs, but i'm not as confident there as its not as mature yet.

      https://blogs.oracle.com/observatory/entry/zfs_compression_a_win_win
      https://wiki.illumos.org/display/illumos/LZ4+Compression

      Delete
  3. Thanks for taking the time to respond! It sounds like gzip meets your needs. Nice!

    ReplyDelete
    Replies
    1. You're welcome.
      Actually you can use whatever you want. I used bzip2 first but it was taking too much time to compress each file (we have them set to grow to 100MB before they rotate) so I switched to pigz (parallel gzip).

      Delete
  4. My approach would use one of my favorite tool: do not keep the binlogs on the master, but on a Binlog Server. A smart Binlog Server should be able to managed transparently compressed/uncompressed binlogs. And an even smarter Binlog Server should be able to store recent binlogs on the local disks, and old ones in S3 or HDFS.

    ReplyDelete