Tuesday, April 01, 2014

Auto failover of mysql master in mysql multi-master multi-slave cluster

This post is an extension to my earlier posts about multi master replication cluster multi master replication in mysql and mysql multi master replication act II The problem I had encountered and discussed was with automatic failover. What happens when the master goes down? How can either a slave or another master be promoted to become the master? Once the settings are done on all the mysql dbs in the cluster to identify the new master, we will also have to change our app to point to the new master.

Let us look at the auto failover part from the mysql cluster point of view. The tool that we are going to use is mysql master ha. It is a tool written by http://yoshinorimatsunobu.blogspot.com/ and has been in place for quite some time now. The tool supports failover in a master-master scenario, but the master-master setup has to be an active-passive(read only) setup. One of the master mysql servers has to be in read only mode. It does not support active-active mysql master-master setup where both masters are handling writes. In order to make a mysql slave read-only, simply enter "read_only" in its my.cnf file.

The mysql version I am using here is mysql 5.6.17. I have created a mysql cluster with 4 nodes.

M1(RW)--->S1
 |
 |
M2(RO)--->S2

M1 and M2 are 2 masters running on ports 3311 & 3312. S1 is a slave running on port 3313 and replicates from master M1. S2 is another slave running on port 3314 and replicates from master M2. Remember, in mysql 5.6, an additional variable has been incorporated known as the server-uuid. This has to be different for all mysql servers in the mysql cluster. So if you are replicating by copying the data directory, simply remove the auto.cnf file, which contains the server-uuid and mysql will create a new uuid when it starts. In order to make M2 read only slave, I have to put the following parameter in its my.cnf file.

read_only

Download the mysql-master-ha tool from https://code.google.com/p/mysql-master-ha/downloads/list. I am using the following version.

mha4mysql-manager-0.55.tar.gz
mha4mysql-node-0.54.tar.gz


untar the mha4mysql-node archive first and install it.

mha4mysql-node-0.54$ perl Makefile.PL
*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI        ...loaded. (1.63)
- DBD::mysql ...loaded. (4.025)
*** Module::AutoInstall configuration finished.
Writing Makefile for mha4mysql::node
Writing MYMETA.yml and MYMETA.json

mha4mysql-node-0.54$ make
mha4mysql-node-0.54$ sudo make install


Next install the mha4mysql-manager.

mha4mysql-manager-0.55$ perl Makefile.PL
*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI                   ...loaded. (1.63)
- DBD::mysql            ...loaded. (4.025)
- Time::HiRes           ...loaded. (1.9725)
- Config::Tiny          ...loaded. (2.20)
- Log::Dispatch         ...loaded. (2.41)
- Parallel::ForkManager ...loaded. (1.06)
- MHA::NodeConst        ...loaded. (0.54)
*** Module::AutoInstall configuration finished.
Writing Makefile for mha4mysql::manager
Writing MYMETA.yml and MYMETA.json

mha4mysql-manager-0.55$ make
mha4mysql-manager-0.55$ sudo make install


Watch out for errors. Once the tool is installed, we need to create a configuration file for the cluster. Here is the configuration file - app1.cnf.

[server default]
multi_tier_slave=1
manager_workdir=/home/jayantk/log/masterha/app1
manager_log=/home/jayantk/log/masterha/app1/manager.log

[server1]
hostname=127.0.0.1
candidate_master=1
port=3311
user=root
ssh_user=jayantk
master_binlog_dir=/home/jayantk/mysql-5.6.17/data1

[server2]
hostname=127.0.0.1
candidate_master=1
port=3312
user=root
ssh_user=jayantk
master_binlog_dir=/home/jayantk/mysql-5.6.17/data2

[server3]
hostname=127.0.0.1
port=3313
user=root
ssh_user=jayantk
master_binlog_dir=/home/jayantk/mysql-5.6.17/data3

[server4]
hostname=127.0.0.1
port=3314
user=root
ssh_user=jayantk
master_binlog_dir=/home/jayantk/mysql-5.6.17/data4



Let us go through it and understand what is happening. "multi_tier_slave" is a parameter used for specifying that the mysql cluster is a multi-master and multi-slave cluster where each master can have its own slave. If the parameter is not specified, mha will give an error. Next we specify the working directory and log file for mha manager. Then we start specifying each of our servers with the host name and port. We have to specify the root username and password for each of our mysql servers. The parameters are "user" and "password". I have not specified password for my mysql servers and so do not have that parameter in my configuration file. The parameter "candidate_master" is used to prioritize a certain mysql server as a master candidate during failover. In our case, we are prioritizing server2 (M2) as the master. Finally mha needs ssh access to the machines running our mysql servers. I have specified the ssh username to be used for ssh as "ssh_user". And have enabled ssh public key authentication without pass phrase for the machines (127.0.0.1 in my case). The parameter "master_binlog_dir" is used if the dead master mysql is reachable via ssh to copy any pending binary log events. This is required because there is no information with slave about where the binary log files are located.

Another thing to remember is to grant "replication slave" to all other mysql servers on the network from each mysql server in the cluster - irrespective of whether it is a master or a slave. In my case, I had to run the following grant statement on all my mysql servers - M1, M2, S1 and S2.

grant replication slave on *.* to  slave_user@localhost identified by 'slave_pwd';

Once all the settings are in place, run the command

masterha_check_repl --conf=app1.cnf

It will output a lot of messages ending with

"MySQL Replication Health is OK."


Which means that the configuration is a success. In case of errors, check the wiki at https://code.google.com/p/mysql-master-ha/wiki/TableOfContents for solutions and missing parameters.

To start the masterha manager, run the following command

masterha_manager --conf=app1.cnf

In order to check the status of the mha manager script, go to the log directory - in our case - /home/jayantk/log/masterha/app1. It has a file specifying the health status. You can cat the file to see the health status.

~/log/masterha/app1$ cat app1.master_status.health
9100    0:PING_OK    master:127.0.0.1


Another file is the log file - manager.log - which will give the following output as tail.

Tue Apr  1 12:20:50 2014 - [warning] master_ip_failover_script is not defined.
Tue Apr  1 12:20:50 2014 - [warning] shutdown_script is not defined.
Tue Apr  1 12:20:50 2014 - [info] Set master ping interval 3 seconds.
Tue Apr  1 12:20:50 2014 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Tue Apr  1 12:20:50 2014 - [info] Starting ping health check on 127.0.0.1(127.0.0.1:3311)..
Tue Apr  1 12:20:50 2014 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..


The mha manager will keep on pinging the master every 3 seconds to see if it is alive or not. The ping interval time can be changed by specifying the parameter "ping_interval" in the application configuration. Failover is triggered after missing 3 pings to the master. So if ping_interval is 3 seconds, the maximum time to discover that the master mysql is down is 12 seconds.

Now, lets kill the master and see what happens. Find out the pid of the mysqld process and kill it. It is automatically restarted by the parent process "mysqld_safe". Here is what we get in out manager.log file.

140401 12:34:12 mysqld_safe Number of processes running now: 0
140401 12:34:12 mysqld_safe mysqld restarted
Tue Apr  1 12:34:14 2014 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Tue Apr  1 12:34:14 2014 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/home/jayantk/mysql-5.6.17/data1 --output_file=/var/tmp/save_binary_logs_test --manager_version=0.55 --binlog_prefix=zdev-bin
Tue Apr  1 12:34:15 2014 - [info] HealthCheck: SSH to 127.0.0.1 is reachable.
Tue Apr  1 12:34:17 2014 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..


It tires to do an ssh and check if the machine is reachable. But in the meantime the machine is back up an ping is available, so nothing happens. As a next step, lets kill both parent "mysqld_safe" and "mysqld" processes associated with the master mysql server in that sequence. Let us see what happens in the manager.log file. There are a lot of log entires here.


Tue Apr  1 12:44:23 2014 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
...
Tue Apr  1 12:44:26 2014 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '127.0.0.1' (111))
Tue Apr  1 12:44:26 2014 - [warning] Connection failed 1 time(s)..
Tue Apr  1 12:44:29 2014 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '127.0.0.1' (111))
Tue Apr  1 12:44:29 2014 - [warning] Connection failed 2 time(s)..
Tue Apr  1 12:44:32 2014 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '127.0.0.1' (111))
Tue Apr  1 12:44:32 2014 - [warning] Connection failed 3 time(s)..
Tue Apr  1 12:44:32 2014 - [warning] Master is not reachable from health checker!
...
Tue Apr  1 12:44:32 2014 - [info] Multi-master configuration is detected. Current primary(writable) master is 127.0.0.1(127.0.0.1:3311)
Tue Apr  1 12:44:32 2014 - [info] Master configurations are as below:
Master 127.0.0.1(127.0.0.1:3312), replicating from localhost(127.0.0.1:3311), read-only
Master 127.0.0.1(127.0.0.1:3311), dead
...
...
Tue Apr  1 12:44:32 2014 - [info] * Phase 1: Configuration Check Phase..
Tue Apr  1 12:44:32 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info] Multi-master configuration is detected. Current primary(writable) master is 127.0.0.1(127.0.0.1:3311)
Tue Apr  1 12:44:33 2014 - [info] Master configurations are as below:
Master 127.0.0.1(127.0.0.1:3311), dead
Master 127.0.0.1(127.0.0.1:3312), replicating from localhost(127.0.0.1:3311), read-only

Tue Apr  1 12:44:33 2014 - [info] Dead Servers:
Tue Apr  1 12:44:33 2014 - [info]   127.0.0.1(127.0.0.1:3311)
Tue Apr  1 12:44:33 2014 - [info] Checking master reachability via mysql(double check)..
Tue Apr  1 12:44:33 2014 - [info]  ok.
Tue Apr  1 12:44:33 2014 - [info] Alive Servers:
Tue Apr  1 12:44:33 2014 - [info]   127.0.0.1(127.0.0.1:3312)
Tue Apr  1 12:44:33 2014 - [info]   127.0.0.1(127.0.0.1:3313)
Tue Apr  1 12:44:33 2014 - [info] Alive Slaves:
Tue Apr  1 12:44:33 2014 - [info]   127.0.0.1(127.0.0.1:3312)  Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr  1 12:44:33 2014 - [info]     Replicating from localhost(127.0.0.1:3311)
Tue Apr  1 12:44:33 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Apr  1 12:44:33 2014 - [info]   127.0.0.1(127.0.0.1:3313)  Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr  1 12:44:33 2014 - [info]     Replicating from localhost(127.0.0.1:3311)
Tue Apr  1 12:44:33 2014 - [info] Unmanaged Servers:
Tue Apr  1 12:44:33 2014 - [info]   127.0.0.1(127.0.0.1:3314)  Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr  1 12:44:33 2014 - [info]     Replicating from localhost(127.0.0.1:3312)
Tue Apr  1 12:44:33 2014 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info] * Phase 2: Dead Master Shutdown Phase..
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info] Forcing shutdown so that applications never connect to the current master..
...
Tue Apr  1 12:44:33 2014 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info] * Phase 3: Master Recovery Phase..
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info] The latest binary log file/position on all slaves is zdev-bin.000009:120
Tue Apr  1 12:44:33 2014 - [info] Latest slaves (Slaves that received relay log files to the latest):
Tue Apr  1 12:44:33 2014 - [info]   127.0.0.1(127.0.0.1:3312)  Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr  1 12:44:33 2014 - [info]     Replicating from localhost(127.0.0.1:3311)
Tue Apr  1 12:44:33 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Apr  1 12:44:33 2014 - [info]   127.0.0.1(127.0.0.1:3313)  Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr  1 12:44:33 2014 - [info]     Replicating from localhost(127.0.0.1:3311)
Tue Apr  1 12:44:33 2014 - [info] The oldest binary log file/position on all slaves is zdev-bin.000009:120
Tue Apr  1 12:44:33 2014 - [info] Oldest slaves:
Tue Apr  1 12:44:33 2014 - [info]   127.0.0.1(127.0.0.1:3312)  Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr  1 12:44:33 2014 - [info]     Replicating from localhost(127.0.0.1:3311)
Tue Apr  1 12:44:33 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Apr  1 12:44:33 2014 - [info]   127.0.0.1(127.0.0.1:3313)  Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr  1 12:44:33 2014 - [info]     Replicating from localhost(127.0.0.1:3311)
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info] Fetching dead master's binary logs..
...
Tue Apr  1 12:44:33 2014 - [info] Additional events were not found from the orig master. No need to save.
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info] * Phase 3.3: Determining New Master Phase..
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Tue Apr  1 12:44:33 2014 - [info] All slaves received relay logs to the same position. No need to resync each other.
Tue Apr  1 12:44:33 2014 - [info] Searching new master from slaves..
Tue Apr  1 12:44:33 2014 - [info]  Candidate masters from the configuration file:
Tue Apr  1 12:44:33 2014 - [info]   127.0.0.1(127.0.0.1:3312)  Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr  1 12:44:33 2014 - [info]     Replicating from localhost(127.0.0.1:3311)
Tue Apr  1 12:44:33 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
...
Tue Apr  1 12:44:33 2014 - [info] New master is 127.0.0.1(127.0.0.1:3312)
Tue Apr  1 12:44:33 2014 - [info] Starting master failover..
Tue Apr  1 12:44:33 2014 - [info]
From:
127.0.0.1 (current master)
 +--127.0.0.1
 +--127.0.0.1

To:
127.0.0.1 (new master)
 +--127.0.0.1
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info] * Phase 3.4: Master Log Apply Phase..
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Tue Apr  1 12:44:33 2014 - [info] Starting recovery on 127.0.0.1(127.0.0.1:3312)..
Tue Apr  1 12:44:33 2014 - [info]  This server has all relay logs. Waiting all logs to be applied..
Tue Apr  1 12:44:33 2014 - [info]   done.
Tue Apr  1 12:44:33 2014 - [info]  All relay logs were successfully applied.
Tue Apr  1 12:44:33 2014 - [info] Getting new master's binlog name and position..
Tue Apr  1 12:44:33 2014 - [info]  zdev-bin.000009:797
Tue Apr  1 12:44:33 2014 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_PORT=3312, MASTER_LOG_FILE='zdev-bin.000009', MASTER_LOG_POS=797, MASTER_USER='slave_user', MASTER_PASSWORD='xxx';
Tue Apr  1 12:44:33 2014 - [warning] master_ip_failover_script is not set. Skipping taking over new master ip address.
Tue Apr  1 12:44:33 2014 - [info] Setting read_only=0 on 127.0.0.1(127.0.0.1:3312)..
Tue Apr  1 12:44:33 2014 - [info]  ok.
Tue Apr  1 12:44:33 2014 - [info] ** Finished master recovery successfully.
Tue Apr  1 12:44:33 2014 - [info] * Phase 3: Master Recovery Phase completed.
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info] * Phase 4: Slaves Recovery Phase..
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info] -- Slave diff file generation on host 127.0.0.1(127.0.0.1:3313) started, pid: 10586. Check tmp log /home/jayantk/log/masterha/app1/127.0.0.1_3313_20140401124432.log if it takes time..
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info] Log messages from 127.0.0.1 ...
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Tue Apr  1 12:44:33 2014 - [info] End of log messages from 127.0.0.1.
Tue Apr  1 12:44:33 2014 - [info] -- 127.0.0.1(127.0.0.1:3313) has the latest relay log events.
Tue Apr  1 12:44:33 2014 - [info] Generating relay diff files from the latest slave succeeded.
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info] -- Slave recovery on host 127.0.0.1(127.0.0.1:3313) started, pid: 10588. Check tmp log /home/jayantk/log/masterha/app1/127.0.0.1_3313_20140401124432.log if it takes time..
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info] Log messages from 127.0.0.1 ...
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info] Starting recovery on 127.0.0.1(127.0.0.1:3313)..
Tue Apr  1 12:44:33 2014 - [info]  This server has all relay logs. Waiting all logs to be applied..
Tue Apr  1 12:44:33 2014 - [info]   done.
Tue Apr  1 12:44:33 2014 - [info]  All relay logs were successfully applied.
Tue Apr  1 12:44:33 2014 - [info]  Resetting slave 127.0.0.1(127.0.0.1:3313) and starting replication from the new master 127.0.0.1(127.0.0.1:3312)..
Tue Apr  1 12:44:33 2014 - [info]  Executed CHANGE MASTER.
Tue Apr  1 12:44:33 2014 - [info]  Slave started.
Tue Apr  1 12:44:33 2014 - [info] End of log messages from 127.0.0.1.
Tue Apr  1 12:44:33 2014 - [info] -- Slave recovery on host 127.0.0.1(127.0.0.1:3313) succeeded.
Tue Apr  1 12:44:33 2014 - [info] All new slave servers recovered successfully.
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info] * Phase 5: New master cleanup phase..
Tue Apr  1 12:44:33 2014 - [info]
Tue Apr  1 12:44:33 2014 - [info] Resetting slave info on the new master..
Tue Apr  1 12:44:33 2014 - [info]  127.0.0.1: Resetting slave info succeeded.
Tue Apr  1 12:44:33 2014 - [info] Master failover to 127.0.0.1(127.0.0.1:3312) completed successfully.
Tue Apr  1 12:44:33 2014 - [info]

----- Failover Report -----

app1: MySQL Master failover 127.0.0.1 to 127.0.0.1 succeeded

Master 127.0.0.1 is down!

Check MHA Manager logs at zdev.net:/home/jayantk/log/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
The latest slave 127.0.0.1(127.0.0.1:3312) has all relay logs for recovery.
Selected 127.0.0.1 as a new master.
127.0.0.1: OK: Applying all logs succeeded.
127.0.0.1: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
127.0.0.1: OK: Applying all logs succeeded. Slave started, replicating from 127.0.0.1.
127.0.0.1: Resetting slave info succeeded.
Master failover to 127.0.0.1(127.0.0.1:3312) completed successfully.



So eventually what has happened is that the following mysql cluster is now in place

M2(RW) ---> S2
   |
   |
  V
  S1

We can log into each of the mysql servers and verify the same. So, the failover happened. There are a few problems with this solution.

  1. Now that M2 is the master, we will have to change our application to make M2 as master instead of M1. So all inserts start happening on M2. An easier way to do this is to use an HA solution known as Pacemaker which will take over the virtual ip of the master M1 and assign it to the new master M2. Another way is to change the script on all app servers and swap the ip of M1 with that of M2. This is handled by the "master_ip_failover_script" which can be configured to handle either scenario. More details on configuring the same is available here - https://code.google.com/p/mysql-master-ha/wiki/Parameters#master_ip_failover_script 
  2. The purpose of Master-Master mysql replication is lost when failover happens. Master-Master replication allows the possibility of easy switching of the app between two masters. It is even possible to write to certain databases on M1 and other databases on M2. But with this solution, firstly M2 has to be read-only, which is only acting as a slave. Secondly, after the failover, M2 loses all information which makes it a slave of M1. So, after M1 comes back up, it cannot be put into circular replication as before.
  3. The script does not handle slave failover. So it is expected that each master will have multiple slaves and the failover of slaves should be handled separately. If a mysql slave server goes down, the application should be able to identify the same and not use that particular slave server for queries. Also if mha is monitoring the mysql servers and one of the slaves goes down (undetected), and then the master mysql fails, it may not be able to make the failover.
  4. And finally, after the failover happens the script simply exits. It does not update the mha configuration file. So, multiple failovers cannot be handled. In order to start it again, we will have to manually change the configuration file and start the mha manager script again.
But, inspite of the limitations of the script, I believe it can be used in majority of the cases. And provides a good solution to mysql master failover.