Friday, September 23, 2016

Moving binary logs to a different filesystem without downtime

Another day at the office...

"Whoa, the write workload on our statistical cluster has  suddendly increased by 20% and the filesystem that holds the binary logs is no longer large enough".

Of course, I had warned the boss about this possibility when I received those servers with that tiny 250G filesystem for binlogs, but my red flag was just ignored as usual.

So here we are, presto I get this new shiny 600G LUN, but we need to stop the damn MySQL server in order to repoint the log_bin variable to the new storage area.

Dunno about you, but the idea of waking up at 2am to just perform a variable change is not something that makes me particularly happy. Not to mention the maintenance period that is needed around it....

So, I decided to investigate a bit about the possibilities to do such change without stopping the service.

As we all know, the log_bin variable is read-only and to change it, you need to restart mysqld.  So I wanted to check whether there was a way to actually swap the current binlog directory with another without making the server mad and/or lose data and/or causing downtime.

First and foremost I wanted  to check how the server would react with a rename of the binary logs folder. So I just issued the following:

mv /db/binlog /db/binlog_orig

I was very happy to find out that mysqld kept  running, and writing to the currently open file without any complaints.  I decided then to push the test further by actually linking a new folder to the old path and then switching to the next binlog.

So I created a symlink to the new binlog area first:

mkdir /db/binlog2
ln -s /db/binlog2 /db/binlog

I then connected to MySQL and issued a FLUSH BINARY LOGS. To my surprise, MySQL happily closed the current binary log (still being written in the original folder) and started a new one in the new folder. Oh,  and the index file was recreated there automatically too.  Cool :-)

Still something was not good, as the operation resulted in the binary log file number being reset to 1. Not cool if you have slaves (which you likely have, since you have binary logging enabled).

So I had to  guess how mysqld decides what the  number of the next binlog file should be.  My  first guess was that it should be looking at other files already in place. So  I repeated the steps above, but before actually issuing the flush logs command, I created an empty file in the new binary log folder with the same number as the one currently being written to in the old folder.
Voil√†, that did the trick!  The new binlog file was now correctly numbered :-)

But what about slaves?  I did all these tests with no slave connected.
So I quickly set up a slave instance and set up replication, then I retried my binary logs path exchange trick.

Right after the binary log file switch, the slave stopped, complaining that it couldn't read the next file. Basically, it had finished reading the log written in the old path, but could not "see" the next one written in the new path. I was able to verify that the old log was entirely read, and just the switch to the next one had failed:

[ERROR] Error reading packet from server: could not find next log; the first event 'dbstat01.014676' at 103351600, the last event read from '/storage/binlog/dbstat01.016653' at 136744026, the last byte read from '/storage/binlog/dbstat01.016653' at 136744026. (server_errno=1236)

That position was exactly where the file, in the old path, was closed due to the flush:

-rw-rw---- 1 mysql mysql 136744026 Sep 23 14:30 dbstat01.016653

Okay I thought, I can just repoint the slave to next binlog, position 4. 
But if the slave complained, that means the IO_Thread is probably just still looking in the old path.  What if I just stop the slave and start it again?  
Pronto, STOP SLAVE; START SLAVE ...and the slave resumed replication from the new binlog  file in the new path just fine!

So happy that I have saved myself a nightly maint!
This was tested on Percona 5.6.31-77.0-log and previous versions, but I don't see why it shouldn't work on stock Oracle's or even on Maria.

If you are lazy like me, you can use the simple script that I have crafted for this test, you can find it on my GitHub page .

1 comment:

  1. Cool, just what I've been looking for. Thanks for posting your solution!