Whatever....: 2014

Tuesday, September 09, 2014

gitlab hooks and disabling forced push

For people who do not know much about gitlab, it is a "github" equivalent with the possibility of deployment in a local network. Which means, that if your organization is not comfortable of pushing code to an external website like github, you can get all the features that github provides in your local environment.

The installation of gitlab is not an easy process. If you have an ubuntu-type system, you can get a .deb file and simply install it. Else installation is a very tedius task.

After installation and deployment, the first task that you would think about doing is to write some hooks for putting some restrictions and sanitizing the code moving into the different git repos. The trick to enable hooks in gitlab is to modify the "gitlab-shell/hooks/update" file.

On an ubuntu installation, the file can be found in the following folder

/opt/gitlab/embedded/service/gitlab-shell/hooks

But, if you have followed the installation notes - https://github.com/gitlabhq/gitlabhq/blob/master/doc/install/installation.md as in the documents, the update file is found in

/home/git/gitlab-shell/hooks

The "update" file is written in ruby. In order to abort a push, we need to exit with a "1" status. Let us create a very simple hook to disable all deletes. Add the following lines to the "update" after the line "require_relative"

require_relative '../lib/gitlab_update'

oldhash = ARGV[1]
newhash = ARGV[2]

cmd = "git cat-file -t #{newhash}"
action = IO.popen(cmd).read

if action == "delete"
    puts "deletes are not allowed"
    exit 1
end

For each push, gitlab's update script receives the old commit hash and the new commit hash. using the "git cat-file" command we are getting the action / method performed. And then we are comparing it with the "delete" string to identify if it is a delete. We are exiting with a "1" to disable a push.

We can also write another hook to disable forced push. But it is easier to modify the "gitlab-shell/lib/gitlab_update.rb" to disable forced push. There is a function here which identifies if this is a forced push. To check forced push simply call this function inside the "exec" function. Here is the code that i added to the "exec" function to disable forced push.

def exec
    # reset GL_ID env since we already
    # get value from it
    ENV['GL_ID'] = nil

    if forced_push?
      puts "Forced push now allowed!"
      exit 1
    end
......
end

Tuesday, April 01, 2014

Auto failover of mysql master in mysql multi-master multi-slave cluster

This post is an extension to my earlier posts about multi master replication cluster multi master replication in mysql and mysql multi master replication act II The problem I had encountered and discussed was with automatic failover. What happens when the master goes down? How can either a slave or another master be promoted to become the master? Once the settings are done on all the mysql dbs in the cluster to identify the new master, we will also have to change our app to point to the new master.

Let us look at the auto failover part from the mysql cluster point of view. The tool that we are going to use is mysql master ha. It is a tool written by http://yoshinorimatsunobu.blogspot.com/ and has been in place for quite some time now. The tool supports failover in a master-master scenario, but the master-master setup has to be an active-passive(read only) setup. One of the master mysql servers has to be in read only mode. It does not support active-active mysql master-master setup where both masters are handling writes. In order to make a mysql slave read-only, simply enter "read_only" in its my.cnf file.

The mysql version I am using here is mysql 5.6.17. I have created a mysql cluster with 4 nodes.

M1(RW)--->S1
|
|
M2(RO)--->S2

M1 and M2 are 2 masters running on ports 3311 & 3312. S1 is a slave running on port 3313 and replicates from master M1. S2 is another slave running on port 3314 and replicates from master M2. Remember, in mysql 5.6, an additional variable has been incorporated known as the server-uuid. This has to be different for all mysql servers in the mysql cluster. So if you are replicating by copying the data directory, simply remove the auto.cnf file, which contains the server-uuid and mysql will create a new uuid when it starts. In order to make M2 read only slave, I have to put the following parameter in its my.cnf file.

read_only

Download the mysql-master-ha tool from https://code.google.com/p/mysql-master-ha/downloads/list. I am using the following version.

mha4mysql-manager-0.55.tar.gz
mha4mysql-node-0.54.tar.gz

untar the mha4mysql-node archive first and install it.

mha4mysql-node-0.54$ perl Makefile.PL
*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI        ...loaded. (1.63)
- DBD::mysql ...loaded. (4.025)
*** Module::AutoInstall configuration finished.
Writing Makefile for mha4mysql::node
Writing MYMETA.yml and MYMETA.json

mha4mysql-node-0.54$ make
mha4mysql-node-0.54$ sudo make install

Next install the mha4mysql-manager.

mha4mysql-manager-0.55$ perl Makefile.PL
*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI                   ...loaded. (1.63)
- DBD::mysql            ...loaded. (4.025)
- Time::HiRes           ...loaded. (1.9725)
- Config::Tiny          ...loaded. (2.20)
- Log::Dispatch         ...loaded. (2.41)
- Parallel::ForkManager ...loaded. (1.06)
- MHA::NodeConst        ...loaded. (0.54)
*** Module::AutoInstall configuration finished.
Writing Makefile for mha4mysql::manager
Writing MYMETA.yml and MYMETA.json

mha4mysql-manager-0.55$ make
mha4mysql-manager-0.55$ sudo make install

Watch out for errors. Once the tool is installed, we need to create a configuration file for the cluster. Here is the configuration file - app1.cnf.

[server default]
multi_tier_slave=1
manager_workdir=/home/jayantk/log/masterha/app1
manager_log=/home/jayantk/log/masterha/app1/manager.log

[server1]
hostname=127.0.0.1
candidate_master=1
port=3311
user=root
ssh_user=jayantk
master_binlog_dir=/home/jayantk/mysql-5.6.17/data1

[server2]
hostname=127.0.0.1
candidate_master=1
port=3312
user=root
ssh_user=jayantk
master_binlog_dir=/home/jayantk/mysql-5.6.17/data2

[server3]
hostname=127.0.0.1
port=3313
user=root
ssh_user=jayantk
master_binlog_dir=/home/jayantk/mysql-5.6.17/data3

[server4]
hostname=127.0.0.1
port=3314
user=root
ssh_user=jayantk
master_binlog_dir=/home/jayantk/mysql-5.6.17/data4

Let us go through it and understand what is happening. "multi_tier_slave" is a parameter used for specifying that the mysql cluster is a multi-master and multi-slave cluster where each master can have its own slave. If the parameter is not specified, mha will give an error. Next we specify the working directory and log file for mha manager. Then we start specifying each of our servers with the host name and port. We have to specify the root username and password for each of our mysql servers. The parameters are "user" and "password". I have not specified password for my mysql servers and so do not have that parameter in my configuration file. The parameter "candidate_master" is used to prioritize a certain mysql server as a master candidate during failover. In our case, we are prioritizing server2 (M2) as the master. Finally mha needs ssh access to the machines running our mysql servers. I have specified the ssh username to be used for ssh as "ssh_user". And have enabled ssh public key authentication without pass phrase for the machines (127.0.0.1 in my case). The parameter "master_binlog_dir" is used if the dead master mysql is reachable via ssh to copy any pending binary log events. This is required because there is no information with slave about where the binary log files are located.

Another thing to remember is to grant "replication slave" to all other mysql servers on the network from each mysql server in the cluster - irrespective of whether it is a master or a slave. In my case, I had to run the following grant statement on all my mysql servers - M1, M2, S1 and S2.

grant replication slave on *.* to slave_user@localhost identified by 'slave_pwd';

Once all the settings are in place, run the command

masterha_check_repl --conf=app1.cnf

It will output a lot of messages ending with

"MySQL Replication Health is OK."

Which means that the configuration is a success. In case of errors, check the wiki at https://code.google.com/p/mysql-master-ha/wiki/TableOfContents for solutions and missing parameters.

To start the masterha manager, run the following command

masterha_manager --conf=app1.cnf

In order to check the status of the mha manager script, go to the log directory - in our case - /home/jayantk/log/masterha/app1. It has a file specifying the health status. You can cat the file to see the health status.

~/log/masterha/app1$ cat app1.master_status.health
9100    0:PING_OK    master:127.0.0.1

Another file is the log file - manager.log - which will give the following output as tail.

Tue Apr 1 12:20:50 2014 - [warning] master_ip_failover_script is not defined.
Tue Apr 1 12:20:50 2014 - [warning] shutdown_script is not defined.
Tue Apr 1 12:20:50 2014 - [info] Set master ping interval 3 seconds.
Tue Apr 1 12:20:50 2014 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Tue Apr 1 12:20:50 2014 - [info] Starting ping health check on 127.0.0.1(127.0.0.1:3311)..
Tue Apr 1 12:20:50 2014 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

The mha manager will keep on pinging the master every 3 seconds to see if it is alive or not. The ping interval time can be changed by specifying the parameter "ping_interval" in the application configuration. Failover is triggered after missing 3 pings to the master. So if ping_interval is 3 seconds, the maximum time to discover that the master mysql is down is 12 seconds.

Now, lets kill the master and see what happens. Find out the pid of the mysqld process and kill it. It is automatically restarted by the parent process "mysqld_safe". Here is what we get in out manager.log file.

140401 12:34:12 mysqld_safe Number of processes running now: 0
140401 12:34:12 mysqld_safe mysqld restarted
Tue Apr 1 12:34:14 2014 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Tue Apr 1 12:34:14 2014 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/home/jayantk/mysql-5.6.17/data1 --output_file=/var/tmp/save_binary_logs_test --manager_version=0.55 --binlog_prefix=zdev-bin
Tue Apr 1 12:34:15 2014 - [info] HealthCheck: SSH to 127.0.0.1 is reachable.
Tue Apr 1 12:34:17 2014 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

It tires to do an ssh and check if the machine is reachable. But in the meantime the machine is back up an ping is available, so nothing happens. As a next step, lets kill both parent "mysqld_safe" and "mysqld" processes associated with the master mysql server in that sequence. Let us see what happens in the manager.log file. There are a lot of log entires here.

Tue Apr 1 12:44:23 2014 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
...
Tue Apr 1 12:44:26 2014 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '127.0.0.1' (111))
Tue Apr 1 12:44:26 2014 - [warning] Connection failed 1 time(s)..
Tue Apr 1 12:44:29 2014 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '127.0.0.1' (111))
Tue Apr 1 12:44:29 2014 - [warning] Connection failed 2 time(s)..
Tue Apr 1 12:44:32 2014 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '127.0.0.1' (111))
Tue Apr 1 12:44:32 2014 - [warning] Connection failed 3 time(s)..
Tue Apr 1 12:44:32 2014 - [warning] Master is not reachable from health checker!
...
Tue Apr 1 12:44:32 2014 - [info] Multi-master configuration is detected. Current primary(writable) master is 127.0.0.1(127.0.0.1:3311)
Tue Apr 1 12:44:32 2014 - [info] Master configurations are as below:
Master 127.0.0.1(127.0.0.1:3312), replicating from localhost(127.0.0.1:3311), read-only
Master 127.0.0.1(127.0.0.1:3311), dead
...
...
Tue Apr 1 12:44:32 2014 - [info] * Phase 1: Configuration Check Phase..
Tue Apr 1 12:44:32 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] Multi-master configuration is detected. Current primary(writable) master is 127.0.0.1(127.0.0.1:3311)
Tue Apr 1 12:44:33 2014 - [info] Master configurations are as below:
Master 127.0.0.1(127.0.0.1:3311), dead
Master 127.0.0.1(127.0.0.1:3312), replicating from localhost(127.0.0.1:3311), read-only

Tue Apr 1 12:44:33 2014 - [info] Dead Servers:
Tue Apr 1 12:44:33 2014 - [info]   127.0.0.1(127.0.0.1:3311)
Tue Apr 1 12:44:33 2014 - [info] Checking master reachability via mysql(double check)..
Tue Apr 1 12:44:33 2014 - [info] ok.
Tue Apr 1 12:44:33 2014 - [info] Alive Servers:
Tue Apr 1 12:44:33 2014 - [info]   127.0.0.1(127.0.0.1:3312)
Tue Apr 1 12:44:33 2014 - [info]   127.0.0.1(127.0.0.1:3313)
Tue Apr 1 12:44:33 2014 - [info] Alive Slaves:
Tue Apr 1 12:44:33 2014 - [info]   127.0.0.1(127.0.0.1:3312) Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr 1 12:44:33 2014 - [info]     Replicating from localhost(127.0.0.1:3311)
Tue Apr 1 12:44:33 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Apr 1 12:44:33 2014 - [info]   127.0.0.1(127.0.0.1:3313) Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr 1 12:44:33 2014 - [info]     Replicating from localhost(127.0.0.1:3311)
Tue Apr 1 12:44:33 2014 - [info] Unmanaged Servers:
Tue Apr 1 12:44:33 2014 - [info]   127.0.0.1(127.0.0.1:3314) Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr 1 12:44:33 2014 - [info]     Replicating from localhost(127.0.0.1:3312)
Tue Apr 1 12:44:33 2014 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] * Phase 2: Dead Master Shutdown Phase..
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] Forcing shutdown so that applications never connect to the current master..
...
Tue Apr 1 12:44:33 2014 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] * Phase 3: Master Recovery Phase..
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] The latest binary log file/position on all slaves is zdev-bin.000009:120
Tue Apr 1 12:44:33 2014 - [info] Latest slaves (Slaves that received relay log files to the latest):
Tue Apr 1 12:44:33 2014 - [info]   127.0.0.1(127.0.0.1:3312) Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr 1 12:44:33 2014 - [info]     Replicating from localhost(127.0.0.1:3311)
Tue Apr 1 12:44:33 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Apr 1 12:44:33 2014 - [info]   127.0.0.1(127.0.0.1:3313) Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr 1 12:44:33 2014 - [info]     Replicating from localhost(127.0.0.1:3311)
Tue Apr 1 12:44:33 2014 - [info] The oldest binary log file/position on all slaves is zdev-bin.000009:120
Tue Apr 1 12:44:33 2014 - [info] Oldest slaves:
Tue Apr 1 12:44:33 2014 - [info]   127.0.0.1(127.0.0.1:3312) Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr 1 12:44:33 2014 - [info]     Replicating from localhost(127.0.0.1:3311)
Tue Apr 1 12:44:33 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Apr 1 12:44:33 2014 - [info]   127.0.0.1(127.0.0.1:3313) Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr 1 12:44:33 2014 - [info]     Replicating from localhost(127.0.0.1:3311)
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] Fetching dead master's binary logs..
...
Tue Apr 1 12:44:33 2014 - [info] Additional events were not found from the orig master. No need to save.
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] * Phase 3.3: Determining New Master Phase..
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Tue Apr 1 12:44:33 2014 - [info] All slaves received relay logs to the same position. No need to resync each other.
Tue Apr 1 12:44:33 2014 - [info] Searching new master from slaves..
Tue Apr 1 12:44:33 2014 - [info] Candidate masters from the configuration file:
Tue Apr 1 12:44:33 2014 - [info]   127.0.0.1(127.0.0.1:3312) Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr 1 12:44:33 2014 - [info]     Replicating from localhost(127.0.0.1:3311)
Tue Apr 1 12:44:33 2014 - [info]     Primary candidate for the new Master (candidate_master is set)
...
Tue Apr 1 12:44:33 2014 - [info] New master is 127.0.0.1(127.0.0.1:3312)
Tue Apr 1 12:44:33 2014 - [info] Starting master failover..
Tue Apr 1 12:44:33 2014 - [info]
From:
127.0.0.1 (current master)
+--127.0.0.1
+--127.0.0.1

To:
127.0.0.1 (new master)
+--127.0.0.1
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] * Phase 3.4: Master Log Apply Phase..
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Tue Apr 1 12:44:33 2014 - [info] Starting recovery on 127.0.0.1(127.0.0.1:3312)..
Tue Apr 1 12:44:33 2014 - [info] This server has all relay logs. Waiting all logs to be applied..
Tue Apr 1 12:44:33 2014 - [info]   done.
Tue Apr 1 12:44:33 2014 - [info] All relay logs were successfully applied.
Tue Apr 1 12:44:33 2014 - [info] Getting new master's binlog name and position..
Tue Apr 1 12:44:33 2014 - [info] zdev-bin.000009:797
Tue Apr 1 12:44:33 2014 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_PORT=3312, MASTER_LOG_FILE='zdev-bin.000009', MASTER_LOG_POS=797, MASTER_USER='slave_user', MASTER_PASSWORD='xxx';
Tue Apr 1 12:44:33 2014 - [warning] master_ip_failover_script is not set. Skipping taking over new master ip address.
Tue Apr 1 12:44:33 2014 - [info] Setting read_only=0 on 127.0.0.1(127.0.0.1:3312)..
Tue Apr 1 12:44:33 2014 - [info] ok.
Tue Apr 1 12:44:33 2014 - [info] ** Finished master recovery successfully.
Tue Apr 1 12:44:33 2014 - [info] * Phase 3: Master Recovery Phase completed.
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] * Phase 4: Slaves Recovery Phase..
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] -- Slave diff file generation on host 127.0.0.1(127.0.0.1:3313) started, pid: 10586. Check tmp log /home/jayantk/log/masterha/app1/127.0.0.1_3313_20140401124432.log if it takes time..
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] Log messages from 127.0.0.1 ...
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Tue Apr 1 12:44:33 2014 - [info] End of log messages from 127.0.0.1.
Tue Apr 1 12:44:33 2014 - [info] -- 127.0.0.1(127.0.0.1:3313) has the latest relay log events.
Tue Apr 1 12:44:33 2014 - [info] Generating relay diff files from the latest slave succeeded.
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] -- Slave recovery on host 127.0.0.1(127.0.0.1:3313) started, pid: 10588. Check tmp log /home/jayantk/log/masterha/app1/127.0.0.1_3313_20140401124432.log if it takes time..
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] Log messages from 127.0.0.1 ...
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] Starting recovery on 127.0.0.1(127.0.0.1:3313)..
Tue Apr 1 12:44:33 2014 - [info] This server has all relay logs. Waiting all logs to be applied..
Tue Apr 1 12:44:33 2014 - [info]   done.
Tue Apr 1 12:44:33 2014 - [info] All relay logs were successfully applied.
Tue Apr 1 12:44:33 2014 - [info] Resetting slave 127.0.0.1(127.0.0.1:3313) and starting replication from the new master 127.0.0.1(127.0.0.1:3312)..
Tue Apr 1 12:44:33 2014 - [info] Executed CHANGE MASTER.
Tue Apr 1 12:44:33 2014 - [info] Slave started.
Tue Apr 1 12:44:33 2014 - [info] End of log messages from 127.0.0.1.
Tue Apr 1 12:44:33 2014 - [info] -- Slave recovery on host 127.0.0.1(127.0.0.1:3313) succeeded.
Tue Apr 1 12:44:33 2014 - [info] All new slave servers recovered successfully.
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] * Phase 5: New master cleanup phase..
Tue Apr 1 12:44:33 2014 - [info]
Tue Apr 1 12:44:33 2014 - [info] Resetting slave info on the new master..
Tue Apr 1 12:44:33 2014 - [info] 127.0.0.1: Resetting slave info succeeded.
Tue Apr 1 12:44:33 2014 - [info] Master failover to 127.0.0.1(127.0.0.1:3312) completed successfully.
Tue Apr 1 12:44:33 2014 - [info]

----- Failover Report -----

app1: MySQL Master failover 127.0.0.1 to 127.0.0.1 succeeded

Master 127.0.0.1 is down!

Check MHA Manager logs at zdev.net:/home/jayantk/log/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
The latest slave 127.0.0.1(127.0.0.1:3312) has all relay logs for recovery.
Selected 127.0.0.1 as a new master.
127.0.0.1: OK: Applying all logs succeeded.
127.0.0.1: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
127.0.0.1: OK: Applying all logs succeeded. Slave started, replicating from 127.0.0.1.
127.0.0.1: Resetting slave info succeeded.
Master failover to 127.0.0.1(127.0.0.1:3312) completed successfully.

So eventually what has happened is that the following mysql cluster is now in place

M2(RW) ---> S2
   |
   |
V
S1

We can log into each of the mysql servers and verify the same. So, the failover happened. There are a few problems with this solution.

Now that M2 is the master, we will have to change our application to make M2 as master instead of M1. So all inserts start happening on M2. An easier way to do this is to use an HA solution known as Pacemaker which will take over the virtual ip of the master M1 and assign it to the new master M2. Another way is to change the script on all app servers and swap the ip of M1 with that of M2. This is handled by the "master_ip_failover_script" which can be configured to handle either scenario. More details on configuring the same is available here - https://code.google.com/p/mysql-master-ha/wiki/Parameters#master_ip_failover_script
The purpose of Master-Master mysql replication is lost when failover happens. Master-Master replication allows the possibility of easy switching of the app between two masters. It is even possible to write to certain databases on M1 and other databases on M2. But with this solution, firstly M2 has to be read-only, which is only acting as a slave. Secondly, after the failover, M2 loses all information which makes it a slave of M1. So, after M1 comes back up, it cannot be put into circular replication as before.
The script does not handle slave failover. So it is expected that each master will have multiple slaves and the failover of slaves should be handled separately. If a mysql slave server goes down, the application should be able to identify the same and not use that particular slave server for queries. Also if mha is monitoring the mysql servers and one of the slaves goes down (undetected), and then the master mysql fails, it may not be able to make the failover.
And finally, after the failover happens the script simply exits. It does not update the mha configuration file. So, multiple failovers cannot be handled. In order to start it again, we will have to manually change the configuration file and start the mha manager script again.

But, inspite of the limitations of the script, I believe it can be used in majority of the cases. And provides a good solution to mysql master failover.

Wednesday, March 19, 2014

An interesting suggester in Solr

Auto suggestion has evolved over time. Lucene has a number of implementations for type ahead suggestions. Existing suggesters generally find suggestions whose whole prefix matches the current user input. Recently a new AnalyzingInfixSuggester has been developed which finds matches of tokens anywhere in the user input and in the suggestion. Here is how the implementation looks.

Let us see how to implement this in Solr. Firstly, we will need solr 4.7 which has the capability to utilize the Lucene suggester module. For more details check out SOLR-5378 and SOLR-5528. To implement this, look at the searchComponent named "suggest" in solrconfig.xml. Make the following changes.

<searchComponent name="suggest" class="solr.SuggestComponent">
     <lst name="suggester">
      <str name="name">mySuggester</str>
      <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
      <str name="suggestAnalyzerFieldType">text_ws</str>
      <str name="dictionaryImpl">DocumentDictionaryFactory</str>     
      <str name="field">cat</str>
      <str name="weightField">price</str>
      <str name="buildOnCommit">true</str>
    </lst>
</searchComponent>

Here we have changed the type of lookup to use - lookupImpl to AnalyzingInfixLookupFactory. And defined the Analyzer to use for building the dictionary as text_ws - which is a simple WhiteSpaceTokenizer factory implementation. The field to be used for providing suggestions is "cat" and we use the "price" field as weight for sorting the suggestions.

Also change the default dictionary for suggester to "mySuggester". Add the following line to "requestHandler" by the name of "/suggest".

<str name="suggest.dictionary">mySuggester</str>

Once these configurations are in place, simply restart the solr server. In order to check the suggester, index all the documents in the exampleDocs folder. The suggester index is created when the documents are committed to the index. In order to check the implementation simply use the following URL.

http://localhost:8983/solr/suggest/?suggest=true&suggest.q=and&suggest.count=2

The output will be somewhat similer to this...

<response>
  <lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">3</int>
  </lst>
  <lst name="suggest">
    <lst name="mySuggester">
      <lst name="and">
        <int name="numFound">2</int>
        <arr name="suggestions">
        <lst>
          <str name="term">electronics <b>and</b> computer1</str>
          <long name="weight">2199</long>
          <str name="payload"/>
        </lst>
        <lst>
          <str name="term">electronics <b>and</b> stuff2</str>
          <long name="weight">279</long>
          <str name="payload"/>
        </lst>
        </arr>
      </lst>
    </lst>
  </lst>
</response>

We are already getting the suggestions as highlighted. Try getting suggestions for some other partial words like "elec", "comp" and see the output. Let us note down some limitations that I came across while implementing this.

Checkout the type of the field "cat" which is being used for providing suggestions in schema.xml. It is of the type "string" and is both indexed and stored. We can change the field name in our solrconfig.xml to provide suggestions based on some other field, but the field has to be both indexed and stored. We would not want to tokenize the field as it may mess up with the suggestions - so it is recommended to use "string" fields for providing suggestions.

Another flaw, that I came across is that the suggestions do not work on multiValued fields. "cat" for example is multivalued. Do a search on all "electronics" and get the "cat" field.

http://localhost:8983/solr/collection1/select/?q=electronics&fl=cat

We can see that in addition to "electronics", the cat field also contains "connector", "hard drive" and "memory". But a search on those strings does not give any suggestions.

http://localhost:8983/solr/suggest/?suggest=true&suggest.q=hard&suggest.count=2

So, it is recommended that the field be of type "string" and not multivalued. If there are multiple fields on which suggestions are to be provided, it is recommended to merge them into a single "string" field in our index.

Monday, February 24, 2014

Win Free e-copies of Apache Solr PHP Integration

Readers would be pleased to know that I have teamed up with Packt Publishing to organize a giveaway of Apache Solr PHP Integration

And 3 lucky winners stand a chance to win e-copies of their new book. Keep reading to find out how you can be one of the Lucky Winners.

Overview

• Understand the tools that can be used to communicate between PHP and Solr, and how they work internally
• Explore the essential search functions of Solr such as sorting, boosting, faceting, and highlighting using your PHP code
• Take a look at some advanced features of Solr such as spell checking, grouping, and auto complete with implementations using PHP code

How to enter ?

All you need to do is head on over to the book page and look through the product description of the book and drop a line via the comments below this post to let us know what interests you the most about this book. It’s that simple.

Deadline

The contest will close on 5th March 2014 Winners will be contacted by email, so be sure to use your real email address when you comment!