Thursday, April 12, 2012

Mysql HA solutions

Lets see what HA solutions can be designed in mysql and where are they suited.

1. Single master - single slave.

M(RW)
|
S(R)

A simple master slave solution can be used for a small site - where all the inserts go into the master and some (non-critical) requests are served from the slave. In case if the master crashes, the slave can be simply promoted as the master - once it has replicated the "available" logs from the master. You will need to create another slave of the now "new" master to make your mysql highly available again. In case the slave crashes, you will have to switch your read queries on master and create a new slave.

As mentioned earlier this is for a very "small" site. There are multiple scenarios where single master - single slave solution is not suitable. You will not be able to perform read scalability or run heavy queries to generate reports without affecting your site performance. Also for creating a new slave after failure, you will need to lock and take backup from the available mysql server. This will affect your site.


2. Single master - multiple slave.

          M1(RW)
            |
      -------------------------------
      |                |                         |
    S1(R)       S2(R)              Sn(R)

A single master multiple slave scenario is the most suitable architecture for many web sites. It provides read scalability across multiple slaves. Creation of new slaves are much easier. You can easily allocate a slave for backups and another for running heavy reports without affecting the site performance. You can create new slaves to scale reads as and when needed. But all inserts go into the only master. This architecture is not suitable for write scalability.

When any of the slave crashes, you can simply remove that slave, create another slave and put it back into the system. In case the master fails, you will need to wait for the slaves to be in sync with the master - all replication binary logs have been executed and then make one of them as the master. Other slaves then become the slave of the new master. You will need to be very careful in defining the exact position from where the new slaves start replication. Else you will end up with lots of duplicate records and may lose data sanity on some of the slaves.


3. Single master - standby master - multiple slaves.

         M1(RW) ------ M(R)
           |
      --------------------------------
      |                   |                       |
    S1(R)       S2(R)               Sn(R)

This architecture is very much similar to the previous single master - multiple slave. The standby master is identified and kept for failover. The benefit of this architecture is that the standby master can be of the same configuration as the original master. This architecture is suitable for medium to high traffic websites where master is of a much higher configuration than the slaves - maybe having RAID 1+0 or SSD drives. The standby master is kept close to the original master so that there is hardly any lag between the two. Standby master can be used for reads also, but care should be taken that there is not much lag between the master and the standby - so that in case of failure, switching can be done with minimum downtime.

When the master fails, you need to wait for the slaves to catch up with the old master and the simply switch them and the app to the standby master.


4. Single master - candidate master - multiple slaves.

         M1(RW) -------- M(R)
                                        |
              -----------------------------------
              |                   |                           |
            S1(R)         S2(R)                  Sn(R)

    This is an architecture very similar to the earlier one. The only difference being that all slaves are replicating from the candidate master instead of the original master. The benefit of this is that in case the master goes down, there is no switching required in the slaves. The old master can be removed from the system and the new master will automatically take over. Afterwards, in order to get the architecure back in place a new candidate master needs to be identified and the slaves can be moved one by one to the new master. The downtime here is minimal. The catch here is that there would be a definite lag between the master and the slaves, since replication on slaves happen through the candidate. This lag can be quite annoying in some cases. Also if the standby fails, all slaves will stop replication and will need to be moves to either the old master or a new standby server needs to be identified and all slaves be pointed to it.


5. Single master - multiple slaves - candidate master - multiple slaves

       M1(RW) ----------------------- M(R)
           |                                               |
   ---------------                      ---------------------  
   |                  |                       |                           |
S1(R)      S1n(R)            S2(R)                  S2n(R)


  This architecture is again similar to the earlier one with the fact that there is a complete failover setup for the current master. If either the master of the candidate master goes down, there are still slaves which are replicating and can be used. This is suitable for a high traffic website which require read scalability. The only drawback of this architecture is that writes cannot be scaled.


5. Multiple master - multiple slaves

        M1(RW) ----------------------- M2(RW)
          |                                                 |
  ----------------                       ----------------
  |                    |                       |                      |
S1(R)         S2(R)               S3(R)             S4(R)

This is "the" solution for high traffic websites. It provides read and write scalability as well as high availability. M1 and M2 are two masters in circular replication - both replicate each other. All slaves either point to M1 or M2. In case if one of the masters go down, it can be removed from the system, a new master can be created and put back in the system without affecting the site. If you are worried about performance issues when a master goes down and all queries are redirected to another master, you can have even 3 or more Masters in circular replication.

It is necessary to decide beforehand how many masters you would like to have in circular replication because adding more masters - though possible, is not easy. Having 2 masters does not mean that you will be able to do 2X writes. Writes also happen due to replication on the masters, so it depends entirely on the system resources how many writes can the complete system handle. Your application has to handle unique key generation in a fashion that does not result in duplication between the masters. Your application also needs to handle scenarios where the lag between M1 and M2 becomes extensite or annoying. But with proper thought to this architecture, it could be scaled and managed very well.

No comments: