Wednesday, May 01, 2013

how to create a 3 node riak cluster ?

A very brief intro about riak - http://basho.com/riak/. Riak is a distributed database written in erlang. Each node in a riak cluster contains the complete independent copy of the riak package. A riak cluster does not have any "master". Data is distributed across nodes using consistent hashing - which ensures that the data is evenly distributed and a new node can be added with minimum reshuffling. Each object has in a riak cluster has multiple copies distributed acorss multiple nodes. Hence failure of a node does not necessarily result in data loss.

To setup a 3 node riak cluster, we first setup 3 machines with riak installed. To install riak on ubuntu machines all that needs to be done is download the "deb" package and do a dpkg -i "riak_x.x.x_amd64.deb". The version I used here was 1.3.1. 3 machines with ips 10.20.220.2, 10.20.220.3 & 10.20.220.4 were setup

To setup riak on 1st node, there are 3 config changes that need to be done

1. replace http ip: in /etc/riak/app.config replace ip in {http, [ {"127.0.0.1", 8098 } ]} with 10.20.220.2
2. replace pb_ip: in /etc/riak/app.config replace ip in {pb_ip,   "127.0.0.1" } with 10.20.220.2
3. change the name of the fiak machine to match your ip: in /etc/riak/vm.args change name to riak@10.20.220.2



If you had started the riak cluster earlier - before making the ip related changes, you will need to clear the ring and backend db. Do the following.

rm -rf /var/lib/riak/bitcask/
rm -rf /var/lib/riak/ring/



To start the first node, run riak start.

To prepare the second node, replace the ips with 10.20.220.3. Once done do a "riak start". To join this node to the cluster do the following

root@riak2# riak-admin cluster join riak@10.20.220.2
Attempting to restart script through sudo -H -u riak
Success: staged join request for 'riak@10.20.220.3' to 'riak@10.20.220.2'

check out the cluster plan

root@riak2# riak-admin cluster plan
Attempting to restart script through sudo -H -u riak
===============================Staged Changes================================
Action         Nodes(s)
-------------------------------------------------------------------------------
join           'riak@10.20.220.3'
-------------------------------------------------------------------------------

NOTE: Applying these changes will result in 1 cluster transition

###############################################################################
                         After cluster transition 1/1
###############################################################################

=================================Membership==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
valid     100.0%     50.0%    'riak@10.20.220.2'
valid       0.0%     50.0%    'riak@10.20.220.3'
-------------------------------------------------------------------------------
Valid:2 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

WARNING: Not all replicas will be on distinct nodes

Transfers resulting from cluster changes: 32
  32 transfers from 'riak@10.20.220.2' to 'riak@10.20.220.3'


Save the cluster

root@riak2# riak-admin cluster commit
Attempting to restart script through sudo -H -u riak
Cluster changes committed

Add 1 more node

Prepare the 3rd node by replacing the ip with 10.20.220.4. And add this node to the riak cluster.

root@riak3# riak-admin cluster join riak@10.20.220.2
Attempting to restart script through sudo -H -u riak
Success: staged join request for 'riak@10.20.220.4' to 'riak@10.20.220.2'

check and commit the new node to the cluster.

root@riak3# riak-admin cluster plan
Attempting to restart script through sudo -H -u riak
=============================== Staged Changes ================================
Action         Nodes(s)
-------------------------------------------------------------------------------
join           'riak@10.20.220.4'
-------------------------------------------------------------------------------

NOTE: Applying these changes will result in 1 cluster transition

###############################################################################
                         After cluster transition 1/1
###############################################################################

================================= Membership ==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
valid      50.0%     34.4%    'riak@10.20.220.2'
valid      50.0%     32.8%    'riak@10.20.220.3'
valid       0.0%     32.8%    'riak@10.20.220.4'
-------------------------------------------------------------------------------
Valid:3 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

WARNING: Not all replicas will be on distinct nodes

Transfers resulting from cluster changes: 21
  10 transfers from 'riak@10.20.220.2' to 'riak@10.20.220.4'
  11 transfers from 'riak@10.20.220.3' to 'riak@10.20.220.4'

root@riak3# riak-admin cluster commit
Attempting to restart script through sudo -H -u riak
Cluster changes committed
check status

root@riak3# riak-admin status | grep ring
Attempting to restart script through sudo -H -u riak
ring_members : ['riak@10.20.220.2','riak@10.20.220.3','riak@10.20.220.4']
ring_num_partitions : 64
ring_ownership : <<"[{'riak@10.20.220.2',22},{'riak@10.20.220.3',21},{'riak@10.20.220.4',21}]">>
ring_creation_size : 64


For Advanced configuration refer:

http://docs.basho.com/riak/latest/cookbooks/Adding-and-Removing-Nodes/

Tuesday, March 26, 2013

how to cleanup a huge mongodb collection ?

As most of mongodb users must be knowing, mongodb works on RAM. The more RAM you give on the DB server, the happier mongodb is. But if the data/index size exceeds the RAM requirements, you see increasing response times for all your queries.

Recently we had an issue where the db size exceeded the RAM we had on our machine. Suddenly we saw the query response time increase to 10-20 times its original time. By luck we had a cleanup strategy in place but never got the chance to execute the same.

We were dealing with around 110 million entries and were expecting that after cleanup around 50% of entries would be removed. The problem was our setup.

We had multiple slaves in our replica set. So running a simple delete query on the master would send the entries to the slave as well. What we wanted to do was remove all entries which are say "n" days old. For an example say 6 months. The delete query for this would be

db.test1.remove( { ts : { $lt : ISODate("2012-09-27T00:00:00.000Z")  } } )

This will fire 1 query on master but for each record deleted on master, it will have a delete query written in the oplog. Which will replicate on slave. So if this query is run on master and we intend to remove 50 million entries from our existing 110 million entries, we would end up having 50 million entries in the oplog. Which is a lot of IO.

Another solution that crossed our mind was to disable oplog by creating a stand alone instance of mongodb and running our delete query there. This should have theoretically worked. But even when the oplog was disabled, the deletions were terribly slow. After firing the query and waiting for around 3 hours, we knew that this will not work.

This plan aborted, another small beam of light came through. Remember mysql and how we used to move data across tables.

Select * from table1 select * from table2 where

We tried replicating this statement in mongo and were successful.

db.col1.find( { ts : { $gt : ISODate("2012-09-27T00:00:00.000Z")  } } ).forEach( function(c){db.col2.insert(c)} )

This query took approximately 30 minutes to execute. And we had a new collection col2 ready with data greater than 6 months. Now all we needed to do was to rename the collections. Prefer swapping to backup existing data - in case something went wrong.

db.test1.renameCollection(temp);
db.test2.renameCollection(test1);
db.temp.renameCollection(test2);

In order to maintain the data, we converted the collection to a ttl collection.

db.test1.ensureIndex( { "ts" : 1 }, { expireAfterSeconds : 15552000 } )

So any entry which exceeds 6 months = 15552000 seconds will be automatically deleted.





Saturday, March 02, 2013

G-Shock

I had been a fan of G-Shock watches for quite some time now. But this was my first experience of owning one. After lots of dilema on whether to get one, I went ahead and bought the Mudman 9300.




Features :
Thermometer
Compass
moon date
dual time
5 alarms with snooze
stop watch
count down timer
auto-backlight
power saving feature
solar powered
Battery level indicator
water resistant till 200 meters
Shock resistant
world clock
hourly chime


And really good looking. Worth the money spent...

Microsoft Licences

Recently I got the opportunity to be a part of the windows team. We are (yes still are) using a microsoft (yes the same microsoft) product to handle one of our websites due to legacy bindings - user base, existing technology, backend team.

My first encounter with microsoft on the enterprise end was when we were trying to use Microsoft Navision - supply chain management solution - in one of my previous companies. The reason why I say that we were "trying" to use was because it took us more than 6-8 months to put it into production. And spend another 3 months in training. Microsoft sucks the user. I saw that if I purchase 1 product from microsoft, the dependencies are so well built in that I eventually end up purchasing a lot of other microsoft product.

Microsoft NAV cost us around 1 million INR. Now I cannot use NAV as it is, it needs to be customized. And it cannot be customized by just any developer. NAV can only be customized only by companies / developers who have the licence to do so. The licence for customization is extremely expensive - maybe even more expensive than the licence for selling liquor in india. Once I pay for customization, I have to go ahead and deploy the software somewhere. For which I need microsoft licences - OS, web server, database server. And then ofcourse plan for HA (high availability) - which means atleast 2 of each. So the strategy here is that once you purchase a product licence, you need the complete platform licence and eventually you end up paying many times more than the actual product cost.

Another concept that I became aware of recently was "software assurance". What is that ?? Well, have you heard of life insurance ? Software Assurance (SA) is somewhat similar to that. It ensures that you get all the patches and version upgrades - (may or may not be free) as and when they are released. So if you purchase windows 2012 and plan to shift to windows 2014 when it is released, it is possible. There may be some cost involved.

Among all microsoft licences, I believe that the DB licence is the killer. The standard licence costs 1/4th of the cost of enterprise licence.  The difference between enterprise and standard licence is that the standard can utilize only upto 2 cores in a machine. But an Enterprise edition can utilize upto any number of cores - and the licence cost is in mulitples of "dual cores". So if you have a dual quad core machine (8 cores), you end up purchasing 4 licences which is 16 times that of the standard licence cost.

And why should I pay for microsoft? when there are so many technologies which are better and available for free of cost. If I have to pay for support, why should i pay for the product and then for the support. Why not get the product for free and then pay for support ?

Final accessment was that microsoft is like a spider's web, once you get entangled, you keep on getting more and more entangled. And there is no getting out without losing your own investment. Beware!!!

Saturday, December 29, 2012

Does this work ??

Recently had a very unique experience with the tata safari. I own a 1 year olo tata safari dicor 2.2

It was parked for about 2-3 hours near sector 30 noida. When i came back to the car, i saw the alarm blaring. And there was no one near the car. Assuming that i there was some malfunction, i locked and unlocked the car. But the alarm kept on blaring.

Finally when i was near the car, i saw that the driver side door was open. Looked around and saw no one interested in either the car or the alarm. Climbed inside and saw my sterio still in place. Thought that had left the door open °by mistake°. So started the engine and headed home. Then i noticed that my door - the driver side door was unlocked. Tried pushing the lock, but it was stuck. It was then i realized that some "not so smart" thief tried to break into my car and was unsuccessful.

This is when the story starts.

Had a sleepless night. Cause my car would not lock. I woke up 3-4 times just to check if my car was still standing. To add fuel to the fire, googled about tata safari theft and found that both sa#ari & scorpio are toppers in the list of vehicles which are stolen. My Tata safari has engine immobilizer, gear lock and the now broken central lock. But i read cases where a tata safari with gps also was stolen.

Next morning i went to a nearby service center of tata safari and told him the complete tale. People were awed. But they told me a different tale. That of replacing the complete lock set. And get a new set of keys. I was like "i wud almost never use the key to open the door. Why spend almost 7000 to replace it? I am ok with the key not working as long as the central locking works.".

The driver side door look was examined. It was assumed that the thief tried inserting some sort of screw-driver to attempt to open the lock. As a result the key channel was damaged and the key would not go into the lock. The lock was in a tilted position due to which the central locking was open and "not movable". My simple idea was to bring the lock to its original vertical position so that the central locking becomes possible. But sad to say the TASS guys were not willing to comply.

I took it to another TASS and then to a local mechanic. Made both of them understand that it was a simple matter of "bringing the lock back to its vertical position". Everyone was of the opinion of a complete new lock set.

Even i was convinced that spending 7k was the only option. But as a last try approached another local mechanic and explained him the complete story. This guy said that he will try but cannot promise anything. He opened up the door from inside and took a look at the lock. And turned it anti-clockwise to make it vertical. And bingo my central locking was back in place. He took rs.200 but i was ok with it. He has saved a lot.

If i had gone by the book it would have cost me much more to put a solution in place which i would not be using much.

Sunday, November 04, 2012

Ecommerce - an experience worth mentioning

I have just realized that I have started purchasing a lot of stuff online. Starting from electronics, clothes to even groceries and vegetables. The fact is that not only they are heavily discounted - but all i have to do is "a few clicks" and some one else does the effort of getting it home for me.

I remember the first purchase I made online - where-else but flipkart and what-else but books. The experience was ok. I have used www.flipkart.com many times since then, but my recent experience was terrible. I ordered something for myself and after 10 days, when I followed up, I came to know that the order will "not" be delivered because they do not have the stuff. I was like "hello, did you have to wait for 10 days to tell me - and that too after I give a reminder". Dont you have an escalation system for missed orders. Thanks god, the order was for myself - what if I had sent it as a gift to someone else? And I got no response back - except that my money will be credited back in another 7-8 days.

My experience with homeshop18 has been much better. Timely deliveries and heavily discounted products. It is important to remember that for an Indian customer - price matters. Remember the "kitna deti hai" ad ?

And my experience with jabong a lot better. Same day delivery - wow. And that too on prices which are way-way below. Though the site is terrible. They hardly have any content for any product. You have to look at the pic and figure out what it is. I had to purchase a watch and i had to do a lot of google to figure out the specs for the watch I liked. But eventually i went to jabong to get the "extra" discount. As long as they are burning money, I am happy.

I have used www.watchkart.com and www.bagskart.com to order stuff and the experience has been good. Watchkart guys played a trick where they gave me a huge discount coupon immediately after making my first order. It made me feel guilty. I was double minded on cancelling my first order and using the coupon to order the product again. Maybe I should have.

Recently, I have ordered number of stuff at www.firstcry.com and www.hushbabies.com. Firstcry has huge delivery times but a better range of products. Hushbabies is much better. Delivery times are less and discounts are easily available. Also purchased a lot of stuff through www.shersingh.com. The single page purchase is a very unique experience. And the packaging for www.shersingh.com is wonderful. It makes you feel a level higher.

Have been doing some shopping at www.shopclues.com. They have the "jaw dropping deals" option where you get extremely cheap stuff. Have tried it twice and both times have been extremely satisfactory. www.letsbuy.com is a site that i miss - after it was killed by flipkart. It had much better deals than flipkart.

My first attempt to shop for vegetables online at www.sabjiwala.com went unsuccessful, as they have next day delivery option and we needed vegetables - on an urgent basis.

But the best experience so far was with www.gopeppers.com - a site which sells groceries. The interface is very intuitive. I sat with a list given to me and opened up two sites offering groceries - www.mygrahak.com & www.gopeppers.com placing orders side by side. On mygrahak, firstly the site was terribly slow, and searching for stuff was so difficult. But gopeppers is much better. It took me less than 10 minutes to get the complete list of around 50+ items in my cart.

When I enquired on the chat option about payment and delivery - i came to know that the stuff can be delivered in 1 hour - flat. I was like "what!!! Are you sure??" And the response was "Yes". So, I placed the order and selected COD (Card on delivery) option - ever heard of that... Sat back and started watching a movie - thinking 1 hour is impossible - it will take atleast 2 hours. In 30 minutes, the guy was at my place with all the stuff. Unpacked it - made me tick the list he had brought with him. And then he told me that I was his "first" customer. I was both shocked and surprised. Shocked cause, I could have been duped - If i had paid online and surprised at the efficiency with which he made his first order. I will look forward to ordering my next month groceries through the same site.

Wednesday, June 27, 2012

SQL Joins

Reference pic for SQL joins

Courtesy : someone who posted it on facebook..

Tuesday, May 29, 2012

Petrol car versus diesel car (part 2)

For the earlier part please refer to http://jayant7k.blogspot.in/2006/10/petrol-engine-versus-diesel-engine.html

In this blog i am trying to create a calculation sheet which can be used to figure out - if you go for a diesel car, when would you be able to recover the cost of purchasing the car - as to spending in petrol.

Government in india loves petrol. It is used generally by the middle class with small cars and 2 wheelers and who rarely vote. People who fall below the middle class mostly travel using public transport. And people who are above the middle class generally have long cars powered by diesel engines. The government finds it difficult to increase the price of diesel - due to the ripple effect it will cause in the price of food commodities and the overall inflation. So raising the price of petrol seems the only viable option - to cater to the loss due to subside on diesel and cooking gas - without effecting the government's vote bank.

And ofcourse the amount of tax the government is able to make out of fuel is really good. How the tax amount is used - is a blind spot for the residents of india. We do not see any improvement in roads - jams are always there. It has been more than 60 years since independence and we still are struggling in providing the basic necessities like electricity and water to the general public. It seems valid that there should be a fine on the "government" for unable to accomplish what it has been promising for ages - food, clothes, electricity, water and home.

But lets focus on diesel cars. The new diesel cars from tata and fiat have really long maintenance schedules. Earlier older diesel engines need quick maintenance. Most petrol engines need a service in every 5K kms - or at max 10K kms. On the other hand diesel engines in tata and fiat need maintenance every 15K kms or once a year. This brings down the cost of ownership of diesel engines to a lot less.

Maintenance set aside, a simple table can be used to figure out how long it will take to recover the "extra" cost of diesel.



A B C D

## Petrol Diesel Difference
1 Price per litre 71 41
2 average 14 12
3 rs per km 5.0714285714 (=B1/B2) 3.4166666667 (=C1/C2)
4 kms per day 100 100
5 car price 500000 600000 100000 (=C5-B5)
6 cost per day 507.1428571429 (=B3*B4) 341.6666666667 (=C3*C4) 165.4761904762 (=B6-C6)





7 Days to recover cost

604.3165467626 =(D5/D6)
8 Months

20.1438848921 = (D7/12)




As you see, with petrol and diesel at 71 and 41 respectively, and giving average of 14 and 12 respectively - the cost of owning a diesel car will be recovered in around 20 months (1.5 years), provided everyday run in 100 kms and the difference in price is 1 Lakh. Feel free to copy the info on an excel sheet and try altering the numbers to suit your needs and see when would you be able to recover the cost - if you intend to purchase a diesel car.

Tuesday, May 01, 2012

Deployments

Deployments are a critical phase of any project. In the "stone age", developers used to simply scp the files required into production. And there used to be issues when you are dealing with multiple http servers. Keeping all the servers in sync was always the issue.

Then capistrano came into picture. It made deployment of ruby/rails apps very easy. Me and a few other people went ahead and modified it to deploy code into production for php apps as well. But it was a tricky job since capistrano was originally designed to work for ruby/rails apps. Here is a sample capistrano code that sucks out code from svn and pushes it to multiple web servers - and then runs some specific post deployment tasks on them

deploy.rb

set :application, "app"
set :repository,  "http:///tags/TAG102"
set :imgpath, "/var/images"

# svn settings
set :deploy_via, :copy
set :scm, :subversion
set :scm_username, "svn_username"
set :scm_password, "svn_password"
set :scm_checkout, "export"
set :copy_cache, true
set :copy_exclude, [".svn", "**/.svn"]

# ssh settings
set :user, "server_username"
set :use_sudo, true
default_run_options[:pty] = true

#deployment settings
set :current_dir, "html"
set :deploy_to, ""
set :site_root, "/var/www/#{current_dir}"
set :keep_releases, 3

#web servers
role :web, "192.168.1.1","192.168.1.2","192.168.1.3"

#the actual script
namespace :deploy do
    desc <<-DESC
  deploy the app
    DESC
    task :update do
      transaction do
        update_code
            symlink
        end
      end

    task :finalize_update do
      transaction do
        sudo "chown -R apache.apache #{release_path}"
        sudo "ln -nfs #{imgpath}/images #{release_path}/images"     
      end
    end

    task :symlink do
      transaction do
            puts "Symlinking #{current_path} to #{site_root}."
            sudo "ln -nfs #{release_path} #{site_root}"
      end
    end


    task :migrate do
      #do nothing
    end

    task :restart do
      #do nothing
    end   
end

This sucks out the code from the svn repository. creates a tar on local. Scps it to production web servers. Untars it to the specified location. Runs all the tasks specified in finalize_update and finally changes the symlink of "html" directory to the new deployed path. The good point about capistrano is that you are almost blind as to what happens in the backend. The bad point is that since you are blind, you do not know how to do what you want to do. It would need a bit of digging and a bit of tweaking to get your requirements fulfilled by this script.

Now lets check fabric.

Installation is quite easy using pip.

sudo pip install fabric

In case you like the old fashioned way, you can go ahead and download the source code and do a

sudo python setup.py install

To create a fabric script, you need to create a simple fab file with whatever you require. For example, if you need to run a simple command like 'uname -a' on all your servers, just create a simple script fabfile.py with the following code

from fabric.api import run

def host_type():
        run('uname -a')

And run the script using the following command

$ fab -f fabfile.py -H localhost,127.0.0.1 host_type

[localhost] Executing task 'host_type'
[localhost] run: uname -a
[localhost] Login password:
[localhost] out: Linux gamegeek 3.2.0-24-generic #37-Ubuntu SMP Wed Apr 25 08:43:22 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

[127.0.0.1] Executing task 'host_type'
[127.0.0.1] run: uname -a
[127.0.0.1] out: Linux gamegeek 3.2.0-24-generic #37-Ubuntu SMP Wed Apr 25 08:43:22 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Done.
Disconnecting from localhost... done.
Disconnecting from 127.0.0.1... done.

A simple fabric script which can do whatever the earlier capistrano script was doing is here.

fabfile.py

from __future__ import with_statement
from fabric.api import *
from fabric.operations import local,put

def production():
  env.user = 'server_username'
  env.hosts = ['192.168.1.1','192.168.1.2','192.168.1.3']
  env.deploy_to = ''
  env.site_root = '/var/www/html'
  env.tag_name = 'tag101'
  env.repository = {  'url':'http:///tags/TAG101', \
            'username': 'svn_username', \
            'password': 'svn_password', \
            'command': 'svn export --force', \
          }
  env.image_path = '/var/images'

def deploy():
  checkout()
  pack()
  unpack()
  symlinks()
  makelive()

def checkout():
  local('%s --username %s --password %s --no-auth-cache %s /tmp/%s' % \
    (env.repository['command'], env.repository['username'], env.repository['password'], env.repository['url'], env.tag_name));

def pack():
  local('tar -czf /tmp/%s.tar.gz /tmp/%s' % (env.tag_name, env.tag_name))

def unpack():
  put('/tmp/%s.tar.gz' % (env.tag_name), '/tmp/')
  with cd('%s' % (env.deploy_to)):
    run('tar -xzf /tmp/%s.tar.gz' % (env.tag_name))

def symlinks():
  run('ln -nfs %s/images %s/%s/images' % (env.image_path, env.deploy_to, env.tag_name))

def makelive():
  run('ln -nfs %s/%s %s' % (env.deploy_to, env.tag_name, env.site_root))


The good point is that i have more control on what i want to do using fabric as compared to capistrano. And it took me a lot less time to cook the fabric recipe as compared to capistrano.

To run this script simply do

fab production deploy

This will execute the tasks production and deploy in that order. You can have separate settings for staging and local in the same script. You can even go ahead and create your own deployment infrastructure and process to do whatever you want without running into any restrictions.

Thursday, April 12, 2012

Mysql HA solutions

Lets see what HA solutions can be designed in mysql and where are they suited.

1. Single master - single slave.

M(RW)
|
S(R)

A simple master slave solution can be used for a small site - where all the inserts go into the master and some (non-critical) requests are served from the slave. In case if the master crashes, the slave can be simply promoted as the master - once it has replicated the "available" logs from the master. You will need to create another slave of the now "new" master to make your mysql highly available again. In case the slave crashes, you will have to switch your read queries on master and create a new slave.

As mentioned earlier this is for a very "small" site. There are multiple scenarios where single master - single slave solution is not suitable. You will not be able to perform read scalability or run heavy queries to generate reports without affecting your site performance. Also for creating a new slave after failure, you will need to lock and take backup from the available mysql server. This will affect your site.


2. Single master - multiple slave.

          M1(RW)
            |
      -------------------------------
      |                |                         |
    S1(R)       S2(R)              Sn(R)

A single master multiple slave scenario is the most suitable architecture for many web sites. It provides read scalability across multiple slaves. Creation of new slaves are much easier. You can easily allocate a slave for backups and another for running heavy reports without affecting the site performance. You can create new slaves to scale reads as and when needed. But all inserts go into the only master. This architecture is not suitable for write scalability.

When any of the slave crashes, you can simply remove that slave, create another slave and put it back into the system. In case the master fails, you will need to wait for the slaves to be in sync with the master - all replication binary logs have been executed and then make one of them as the master. Other slaves then become the slave of the new master. You will need to be very careful in defining the exact position from where the new slaves start replication. Else you will end up with lots of duplicate records and may lose data sanity on some of the slaves.


3. Single master - standby master - multiple slaves.

         M1(RW) ------ M(R)
           |
      --------------------------------
      |                   |                       |
    S1(R)       S2(R)               Sn(R)

This architecture is very much similar to the previous single master - multiple slave. The standby master is identified and kept for failover. The benefit of this architecture is that the standby master can be of the same configuration as the original master. This architecture is suitable for medium to high traffic websites where master is of a much higher configuration than the slaves - maybe having RAID 1+0 or SSD drives. The standby master is kept close to the original master so that there is hardly any lag between the two. Standby master can be used for reads also, but care should be taken that there is not much lag between the master and the standby - so that in case of failure, switching can be done with minimum downtime.

When the master fails, you need to wait for the slaves to catch up with the old master and the simply switch them and the app to the standby master.


4. Single master - candidate master - multiple slaves.

         M1(RW) -------- M(R)
                                        |
              -----------------------------------
              |                   |                           |
            S1(R)         S2(R)                  Sn(R)

    This is an architecture very similar to the earlier one. The only difference being that all slaves are replicating from the candidate master instead of the original master. The benefit of this is that in case the master goes down, there is no switching required in the slaves. The old master can be removed from the system and the new master will automatically take over. Afterwards, in order to get the architecure back in place a new candidate master needs to be identified and the slaves can be moved one by one to the new master. The downtime here is minimal. The catch here is that there would be a definite lag between the master and the slaves, since replication on slaves happen through the candidate. This lag can be quite annoying in some cases. Also if the standby fails, all slaves will stop replication and will need to be moves to either the old master or a new standby server needs to be identified and all slaves be pointed to it.


5. Single master - multiple slaves - candidate master - multiple slaves

       M1(RW) ----------------------- M(R)
           |                                               |
   ---------------                      ---------------------  
   |                  |                       |                           |
S1(R)      S1n(R)            S2(R)                  S2n(R)


  This architecture is again similar to the earlier one with the fact that there is a complete failover setup for the current master. If either the master of the candidate master goes down, there are still slaves which are replicating and can be used. This is suitable for a high traffic website which require read scalability. The only drawback of this architecture is that writes cannot be scaled.


5. Multiple master - multiple slaves

        M1(RW) ----------------------- M2(RW)
          |                                                 |
  ----------------                       ----------------
  |                    |                       |                      |
S1(R)         S2(R)               S3(R)             S4(R)

This is "the" solution for high traffic websites. It provides read and write scalability as well as high availability. M1 and M2 are two masters in circular replication - both replicate each other. All slaves either point to M1 or M2. In case if one of the masters go down, it can be removed from the system, a new master can be created and put back in the system without affecting the site. If you are worried about performance issues when a master goes down and all queries are redirected to another master, you can have even 3 or more Masters in circular replication.

It is necessary to decide beforehand how many masters you would like to have in circular replication because adding more masters - though possible, is not easy. Having 2 masters does not mean that you will be able to do 2X writes. Writes also happen due to replication on the masters, so it depends entirely on the system resources how many writes can the complete system handle. Your application has to handle unique key generation in a fashion that does not result in duplication between the masters. Your application also needs to handle scenarios where the lag between M1 and M2 becomes extensite or annoying. But with proper thought to this architecture, it could be scaled and managed very well.