Tuesday, December 06, 2011

Postgresql replication

There are many solutions to postgresql replication available in the market. Almost all of them are third party solutions, since there was no inbuilt replication in postgresql. Postgresql 9.0 introduced replication into the database - it is also known as streaming replication.And it can be used only for master-slave replication. There is no master-master or clustering feature available with postgresql SR (streaming replication).

The way SR works is that there are log files (known as XLOG files) which are shipped to the standby or slave server via network. Multiple slave servers can connect to the master over the network. The stand by servers continuously replay the XLOG records shipped in continuous recovery mode.As soon as XLOG files are shipped, they are replayed on the slave. This makes latest data available on slave almost immediately. Log shipping does not interfere with any query execution on master. In case the primary goes offline, the standby server will wait for the primary to become active.

Here is how i did a test setup of master-slave replication using postgresql.

I had 2 machines 241 and 242. I downloaded postgresql-9.1.1.tar.bz2 on both.

Steps to setup replication :

1. untar, compile and install

241/242 ]#  tar -xvjf postgresql-9.1.1.tar.bz2
241/242 ]#  cd postgresql-9.1.1
241/242 postgresql-9.1.1]#  ./configure
241/242 postgresql-9.1.1]#  make
241/242 postgresql-9.1.1]#  sudo make install

This will install postgresql in /usr/local/pgsql folder

2. Setup 241 as master. Initialize the database cluster on 241

241 ]# adduser postgres
241 ]# mkdir /usr/local/pgsql/data
241 ]# chown postgres /usr/local/pgsql/data
241 ]# su - postgres
241 ]# /usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data

Do not start the postgres database server now.

3. configure master server to listen on all ip addresses.

241 ]# vim /usr/local/pgsql/data/postgresql.conf
    listen_addresses = '*'
4. Allow standby server to connect to postgresql on master with replication privilege

241 ]# vim /usr/local/pgsql/data/pg_hba.conf
    host   replication   postgres    trust
5. Setup replication related parameters in the master server

241 ]# vim /usr/local/pgsql/data/postgresql.conf

    # To enable read-only queries on a standby server, wal_level must be set to
    # "hot_standby". But you can choose "archive" if you never connect to the
    # server in standby mode.
    wal_level = hot_standby

    # Set the maximum number of concurrent connections from the standby servers.
    max_wal_senders = 5

    # To prevent the primary server from removing the WAL segments required for
    # the standby server before shipping them, set the minimum number of segments
    # retained in the pg_xlog directory. At least wal_keep_segments should be
    # larger than the number of segments generated between the beginning of
    # online-backup and the startup of streaming replication. If you enable WAL
    # archiving to an archive directory accessible from the standby, this may
    # not be necessary.
    wal_keep_segments = 128
    # Enable WAL archiving on the primary to an archive directory accessible from
    # the standby. If wal_keep_segments is a high enough number to retain the WAL
    # segments required for the standby server, this is not necessary.
    archive_mode    = on
    archive_command = 'cp %p /usr/local/pgsql/data/pg_archive/%f'

6. start postgresql on master

241 ]# /usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data >logfile 2>&1 &

7. copy the master server's data to standby server

241 ]# /usr/local/pgsql/bin/psql -c "SELECT pg_start_backup('label', true)"
(1 row)

241 ]# rsync -a /usr/local/pgsql/data/ root@ --exclude postmaster.pid

241 ]# /usr/local/pgsql/bin/psql -c "SELECT pg_stop_backup()"
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived
(1 row)

This will also copy all the configuration parameters and authentication related stuff from primary to standby slave.
Ensuring that the slave can be converted to a master/primary in case of a failover.

8. Change postgresql.conf to enable readonly queries on standby server

242 ]# vim /usr/local/pgsql/data/postgresql.conf

    hot_standby = on
9. Enable recovery on the standby server and change configuration.

242 ]# cp /usr/local/pgsql/share/recovery.conf.sample /usr/local/pgsql/data/recovery.conf
242 ]# vim /usr/local/pgsql/data/recovery.conf

    # Specifies whether to start the server as a standby. In streaming replication,
    # this parameter must to be set to on.
    standby_mode          = 'on'

    # Specifies a connection string which is used for the standby server to connect
    # with the primary.
    primary_conninfo      = 'host= port=5432 user=postgres'

    # Specifies a trigger file whose presence should cause streaming replication to
    # end (i.e., failover). Once the trigger file is found the server acts as a primary server.
    trigger_file = '/home/postgres/failover'

    # Specifies a command to load archive segments from the WAL archive. If
    # wal_keep_segments is a high enough number to retain the WAL segments
    # required for the standby server, this may not be necessary. But
    # a large workload can cause segments to be recycled before the standby
    # is fully synchronized, requiring you to start again from a new base backup.
    restore_command = 'cp /usr/local/pgsql/data/pg_archive/%f "%p"'

10. Start postgres on standby server. This will start streaming replication on the standby server.

242 ]# /usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data >logfile 2>&1 &

11. You can check the status of streaming replication using either the ps command or through psql - postgresql command prompt

241 (primary) ]# /usr/local/pgsql/bin/psql -c "SELECT pg_current_xlog_location()"
(1 row)

242 (standby) ]# /usr/local/pgsql/bin/psql -c "select pg_last_xlog_receive_location()"
(1 row)

242 (standby) ]$ /usr/local/pgsql/bin/psql -c "select pg_last_xlog_replay_location()"
(1 row)

To check using ps use the following commands

241 (master)]# ps ax | grep sender
 2728 ?        Ss     0:00 postgres: wal sender process postgres streaming 0/5000EC0
 2768 pts/1    R+     0:00 grep sender

242 (standby)]# ps ax| grep receiver
 28125 ?        Ss     0:00 postgres: wal receiver process   streaming 0/5000EC0
 28154 pts/1    S+     0:00 grep receiver

To do a failover, all that needs to be done is to create the 'trigger' file at the specified location. This will automatically turn off standby mode and the postgres server will start acting as a primary or master.

Do remember to use the "pg_ctl stop" command to stop either the primary or standby server. This will ensure graceful shutdown and no records will be missed being replicated.

In order to create another standby server repeat steps from 7 onwards - after adding the ip of the standby server in master configuration as in step 4

Saturday, October 15, 2011

Nginx with php-fpm versus apache with modphp

I have been using apache with modphp for ages now. It works like a charm. But as the concurrency increases, apache chews up a lot of resources. I came across nginx some time back. It is a very light weight server which I had used earlier to serve static content. I thought why not try it to serve dynamic pages as well. After some searching, i found that nginx with php-fpm is a lethal combination as a dynamic web server.

php-fpm stands for php fastcgi process manager. It is a fastcgi implementation of php with some additional features. Have a look here http://php-fpm.org/

So without getting too much into the technicalities, lets focus on the benchmark. I did a benchmark on my laptop which has a i5 processor and 4 GB of ram. Running ubuntu 11.10 - kernel 64 bit. The software versions which i used were

apache 2.2.19
nginx 1.1.5
XCache v1.3.2

Benchmark process :
First i compiled php with apache using the following configure command.

'./configure' --with-apxs2=/usr/local/apache2/bin/apxs '--with-gd' '--with-curl' '--with-mysql=mysqlnd' '--with-mysqli=mysqlnd' '--with-pdo-mysql=mysqlnd' '--enable-mbstring'

Ran my tests and then compiled the php again with fpm, configured nginx and ran the benchmark again.

To compile php with fpm do

'./configure' '--enable-fpm' '--with-fpm-user=jayant' '--with-fpm-group=jayant' '--with-gd' '--with-curl' '--with-mysql=mysqlnd' '--with-mysqli=mysqlnd' '--with-pdo-mysql=mysqlnd' '--enable-mbstring'

I ran tests for 3 levels of concurrency 100, 500 and 1000 for 10 minutes each. The code that i was benchmarking was simple phpinfo with a random number display. I used siege for benchmarking.

echo "Random : ".rand(1,100000).'

nginx with php-fpmapache with mod php

concurrency : 100
Time : 10 min
siege -i -c 100 -t 10m http://localhost/x.php
Load : 0.10

Lifting the server siege... done.
Transactions: 118171 hits
Availability: 100.00 %
Elapsed time: 599.56 secs
Data transferred: 6611.79 MB
Response time: 0.00 secs
Transaction rate: 197.10 trans/sec
Throughput: 11.03 MB/sec
Concurrency: 0.96
Successful transactions: 118171
Failed transactions: 0
Longest transaction: 0.07
Shortest transaction: 0.00

concurrency : 100
Time : 10 min
siege -i -c 100 -t 10m http://localhost/x.php
Load : 0.25

Lifting the server siege... done.
Transactions: 118688 hits
Availability: 100.00 %
Elapsed time: 599.55 secs
Data transferred: 7278.54 MB
Response time: 0.01 secs
Transaction rate: 197.96 trans/sec
Throughput: 12.14 MB/sec
Concurrency: 0.99
Successful transactions: 118688
Failed transactions: 0
Longest transaction: 0.09
Shortest transaction: 0.00

siege -i -c 500 -t 10m http://localhost/x.php
concurrency : 500
Time : 10 min
Load : 2.0

Lifting the server siege... done.
Transactions: 589098 hits
Availability: 100.00 %
Elapsed time: 599.44 secs
Data transferred: 32960.63 MB
Response time: 0.01 secs
Transaction rate: 982.75 trans/sec
Throughput: 54.99 MB/sec
Concurrency: 7.59
Successful transactions: 589098
Failed transactions: 0
Longest transaction: 3.23
Shortest transaction: 0.00

siege -i -c 500 -t 10m http://localhost/x.php
concurrency : 500
Time : 10 min
Load : 20

siege aborted due to excessive socket failure; you
can change the failure threshold in $HOME/.siegerc
Transactions: 45954 hits
Availability: 97.36 %
Elapsed time: 50.84 secs
Data transferred: 2818.13 MB
Response time: 0.02 secs
Transaction rate: 903.89 trans/sec
Throughput: 55.43 MB/sec
Concurrency: 14.47
Successful transactions: 45954
Failed transactions: 1248
Longest transaction: 3.30
Shortest transaction: 0.00

siege -i -c 1000 -t 10m http://localhost/x.php
concurrency : 1000
Time : 10 min
Load : 48

Lifting the server siege... done.
Transactions: 941105 hits
Availability: 99.98 %
Elapsed time: 599.43 secs
Data transferred: 52655.81 MB
Response time: 0.14 secs
Transaction rate: 1570.00 trans/sec
Throughput: 87.84 MB/sec
Concurrency: 213.57
Successful transactions: 941105
Failed transactions: 167
Longest transaction: 21.17
Shortest transaction: 0.00

siege -i -c 1000 -t 10m http://localhost/x.php
concurrency : 1000
Time : 10 min
Load : 58

siege aborted due to excessive socket failure; you
can change the failure threshold in $HOME/.siegerc
Transactions: 45454 hits
Availability: 96.86 %
Elapsed time: 36.27 secs
Data transferred: 2787.47 MB
Response time: 0.19 secs
Transaction rate: 1253.21 trans/sec
Throughput: 76.85 MB/sec
Concurrency: 240.04
Successful transactions: 45454
Failed transactions: 1475
Longest transaction: 9.37
Shortest transaction: 0.00

As you can see apache buckles its knees and stops responding at a concurrency of 500. The load shot upto 20 in just 50 seconds and there are lots of socket errors. Siege gave up the benchmark stating that there are too many errors. Whereas nginx+php-fpm runs well with a concurrency of 500 with 0 failed transactions. In fact when apache+modphp is aborted by siege in just 36 seconds due to excessive errors for a benchmark with concurrency of 1000. Nginx runs for the whole 10 minutes and with a success rate of 99.98%.

Without any doubt i can conclude that nginx with php-fpm is the web server for large scale websites.

Sunday, September 04, 2011

why and how to overclock your android phone

Overclocking is a geek's dream come true. Overclocking your pc is old. Overclocking your phone for better performance is something really exciting. The only problem is either you could brick your phone or fry your phone. So if you plan to go ahead be very very sure of the risk that you might lose your phone as well.

I did a lot of reading before going ahead with the overclocking. The first thing that you should be aware is that for any type of overclocking or tweaking at the system level, you need administrator privileges. The term here is "root" ing your phone.

By default all phone providers lock their users out of their phones. It is somewhat similar to ubuntu where root is not enabled by default. Gaining root access allows you to do whatever you want with your phone. Which means that you can very easily brick or fry your phone.

Motorola defy is a good handset costing around 15K Indian Rupees. It has a 800 Mhz processor. The factory settings were over safe. The phone was underclocked and overvolted. So it runs slower than it actually should and drinks more battery than it should.

Lets see how to first root and then tune your phone for better performance.

Rooting the phone

Rooting ideally is a very simple task. All you need to do is select an app and run it with your phone connected to the pc. There are multiple apps available to root your android phone. some of them are

1. SuperOneClick
2. Universal Androot
3. Z4Root

Please go through the supported devices list and see if your handset model is listed there. I used SuperOneClick. You can download it here. I used version 2.1.1

Rooting is simple

1. Extract zip file on your windows desktop.
2. run the exe with admin priviledges.
3. reboot your phone.
4. unmount the SD card.
5. enable usb debugging
6. click on "Root" - you will get a prompt "waiting for device".
7. connect your phone via usb.
8. The app will automatically detect the OS version and device model and automatically root the device. It will run a "su" test before giving an OK.

I ran into a small problem. Superoneclick froze on around step #5 causing my heart to stop for a second. I waited for 10 minutes before rebooting the phone and connecting it to the pc to run superoneclick again.

Once your phone is rooted, if an application tries to gain root privileges, you will get a prompt with the option to allow or deny.

Overclocking and undervolting

After searching around a lot, i finally figured that my phone can be overclocked to 1.2 GHz without any issues. Overclocking or undervolting is a simple app available freely in the android market. the two main apps available in the market for overclocking are

1. setvsel
2. milestone overclock.

Again I chose setvsel.

The default speed and voltage combination at Motorola Defy was

1) 300 MHz at VSel1 = 33;
2) 600 MHz at VSel2 = 48;
3) 800 MHz at VSel3 = 58;
at a threshold of 86%.

Threshold says that if the processor runs at 86% capacity it will step up to the next higher speed available.

Here are the ranges where people feel that the phone is still stable

- 300 MHz, VSel1 24-33
(some people report stable systems as low as VSel1 14);
- 600 MHz, VSel2 31-48;
(some people report stable systems as low as VSel2 27);
- 800 MHz, VSel3 41-58.
(some people report stable systems as low as VSel3 39);
- 1000 MHz, VSel3 45-74.
(above VSel3 58 your phone may get really warm).

- 1100 MHz, VSel3 55-66;
- 1200 MHz, VSel3 60-75.

I did the following settings

300/24, 600/34, 1000/52

Most of the time when the phone is idle, its processor remains at around 300 MHz. Normal browsing, calling takes the processor upto 600 Mhz. But It is the games and graphics that take the processor upto 1 GHz.

I will try reducing the voltage to see upto which level the phone remains stable.

Reference : http://androidunderground.blogspot.com/2011/05/setvsel-overclock-and-undervolt-your.html

Here is the benchmark result after overclocking. Earlier my device was listed near to the HTC Desire. Now it has moved to the top.

Monday, June 27, 2011

opcode cache comparison - xcache versus apc versus no-cache

Opcode caching tools like xcache, apc are also known as php accelerators. They work by storing the bytecode of interpreted php in shared memory.

To explain in common english, php is an interpreted language. Every time a http request comes, it picks up all php files required for its processing and interprets them - compiles them into machine readable code - known as opcode. The opcode is then run on the machine. Php accelerators speed up php execution by lazily compiling php (on demand compiling) and storing the generated opcode in shared memory. Thus preventing file io and overhead of interpreting the code again and again.

There are multiple solutions available for opcode caching in php. Here is the list available at wikipedia http://en.wikipedia.org/wiki/List_of_PHP_accelerators. APC and Xcache are two out of these which are somewhat famous.

I did a benchmark of both and tried to figure out which one would be better. Here is the benchmark result, and an analysis of the benchmark.

I ran the benchmark on my system

Intel(R) Core(TM)2 Duo CPU T6570 @ 2.10GHz
L2 cache : 2048 Kb
RAM : 2 GB
php version 5.2.17
apache version 2.2.19
apc version 3.1.9
xcache version 1.3.2
Ubuntu 11.04 - 32 bit
siege was used to run the benchmarks

APC : sudo pecl install apc
xcache : download xcache, untar, ./configure; make; make install
Load the respective caching module in php.ini
extension = apc.so OR extension = xcache.so

Cache Configuration
For both xcache and apc, i accepted the default settings and changed only these variables
gc_interval = 3600;
ttl = 3600;
size = 32M;

I benchmarked a page deployed on my local system with images, tons of php code and lots of mysql queries.

Run 1 : 5 minutes with concurrency = 10

without cache

Transactions: 422 hits
Availability: 100.00 %
Elapsed time: 299.20 secs
Data transferred: 39.65 MB
Response time: 6.50 secs
Transaction rate: 1.41 trans/sec
Throughput: 0.13 MB/sec
Concurrency: 9.16
Successful transactions: 422
Failed transactions: 0
Longest transaction: 8.17
Shortest transaction: 3.44


Transactions: 465 hits
Availability: 100.00 %
Elapsed time: 299.74 secs
Data transferred: 43.66 MB
Response time: 5.86 secs
Transaction rate: 1.55 trans/sec
Throughput: 0.15 MB/sec
Concurrency: 9.09
Successful transactions: 465
Failed transactions: 0
Longest transaction: 9.20
Shortest transaction: 3.91
Total Hits in cache: 85,773
Total Misses in cache: 223


Transactions: 479 hits
Availability: 100.00 %
Elapsed time: 299.11 secs
Data transferred: 44.99 MB
Response time: 5.67 secs
Transaction rate: 1.60 trans/sec
Throughput: 0.15 MB/sec
Concurrency: 9.07
Successful transactions: 479
Failed transactions: 0
Longest transaction: 7.39
Shortest transaction: 3.80
Total Hits on cache: 87,884
Total Misses in cache: 158

As you can see with a concurrency of 10, xcache gives a transaction rate of 1.6, apc gives 1.55 and no-cache gives 1.41 transactions per second. There is a 10% improvement with apc and 14% improvement with xcache.

Shortest transaction with nocache was 3.44 where as that with xcache was 3.8 and that with apc was 3.91. This shows that xcache took less time as compared to apc for caching a page miss. The longest transaction of apc was higher than that of no-cache. But xcache's longest transaction was better than the longest transaction of no-cache. Ideally the longest transaction of cached page should be less than that of no-cache page. Why APC had a higher longest transaction - i was unable to figure out.

Run 2 : 5 minutes with concurrency = 20

No cache

Transactions: 373 hits
Availability: 100.00 %
Elapsed time: 299.70 secs
Data transferred: 35.11 MB
Response time: 15.10 secs
Transaction rate: 1.24 trans/sec
Throughput: 0.12 MB/sec
Concurrency: 18.79
Successful transactions: 373
Failed transactions: 0
Longest transaction: 20.58
Shortest transaction: 5.41


Transactions: 458 hits
Availability: 100.00 %
Elapsed time: 299.93 secs
Data transferred: 43.09 MB
Response time: 12.28 secs
Transaction rate: 1.53 trans/sec
Throughput: 0.14 MB/sec
Concurrency: 18.75
Successful transactions: 458
Failed transactions: 0
Longest transaction: 19.19
Shortest transaction: 9.73


Transactions: 459 hits
Availability: 100.00 %
Elapsed time: 299.85 secs
Data transferred: 43.18 MB
Response time: 12.30 secs
Transaction rate: 1.53 trans/sec
Throughput: 0.14 MB/sec
Concurrency: 18.82
Successful transactions: 459
Failed transactions: 0
Longest transaction: 15.12
Shortest transaction: 6.60

In the second run though both xcache and apc have the same transaction rate of 1.53 which is 23% higher than that of no cache 1.24, but the longest transaction and shortest transaction of xcache was better than that of apc. It shows that xcache handles caching better than apc.

Eventually, there were some bugs in apc caused crashes when i tried to implement it. And xcache ran fine without any issues. This scenario shows that xcache is better.

Wednesday, June 08, 2011

Data structures - TRIE

Trie is a data structure used for data retrieval. It is a multiway tree structure used to store strings over an alphabet. It is mainly used for large dictionaries of english words in spell-checking and natural language understanding programs. The idea is that all strings which have a common stem or prefix hang off a common node. the elements in a string can be recovered by scanning from the root to the leaf that ends the string. All strings in a trie can be recovered by doing a depth first scan of the tree.

The time to insert, delete or find is almost identical because the code paths followed for each are almost identical. As a result, for inserting, deleting or searching, tries can easily beat binary search trees or even hash tables.

Advantages of Trie over Binary Search Tree
1. Looking up of keys is faster. Worst case scenario a key of length m takes O(m) time. BST performs search by doing O(log(n)) comparisons. In worst case BST takes O(m log n) time. Simple operations that tries use during lookup such as array indexing using a character are fast on real machines.
2. Tries are more space efficient when they contain a large number of short keys, because nodes are shared between keys with common initial subsequences.
3. Tries facilitate longest-prefix matching, helping to find the key sharing the longest possible prefix of characters all unique.
4. The number of internal nodes from root to leaf equals the length of the key. Balancing the tree is therefore no concern.

Advantages of Trie over Hash Tables
1. Tries support ordered iteration, whereas iteration over a hash table will result in a pseudorandom order given by the hash function (also, the order of hash collisions is implementation defined), which is usually meaningless
2. Tries facilitate longest-prefix matching, but hashing does not, as a consequence of the above. Performing such a "closest fit" find can, depending on implementation, be as quick as an exact find.
3. Tries tend to be faster on average at insertion than hash tables because hash tables must rebuild their index when it becomes full - a very expensive operation. Tries therefore have much better bounded worst case time costs, which is important for latency sensitive programs.
4. By avoiding the hash function, tries are generally faster than hash tables for small keys like integers and pointers.

Some disadvantages of Trie
1. Tries can be slower in some cases than hash tables for looking up data, especially if the data is directly accessed on a hard disk drive or some other secondary storage device where the random access time is high compared to main memory
2. Some keys, such as floating point numbers, can lead to long chains and prefixes that are not particularly meaningful. Nevertheless a bitwise trie can handle standard IEEE single and double format floating point numbers.

Code for implementing trie in java
import java.util.HashMap;
import java.util.Map;

public class Trie {
    private static class Node {
        private boolean isWord = false; // indicates a complete word
        private int prefixes = 0; // indicates how many words have the prefix
        private Map children = new HashMap(); // references to all possible children

    private Node root = new Node();

     * Inserts a new word into the trie
     * @param word
    public void insertWord(String word){
        if(searchWord(word) == true) return;

        Node current = root;
        for(char c : word.toCharArray()){
                Node child = current.children.get(Character.valueOf(c));

                current = child;
                Node child = new Node();
                child.prefixes = 1;

                current.children.put(Character.valueOf(c), child);
                current = child;
        // we have reached the endof the word, hence mark it true
        // if during a search we reach the end of the search string and this
        // flag is still false, then the search string is not registered as a valid
        // word in the trie but is a prefix
        current.isWord = true;

     * Searches for a word in the trie
     * @param word
    public boolean searchWord(String word){
        Node current = root;
        for(char c : word.toCharArray()){
                current = current.children.get(Character.valueOf(c));
                return false;
        // if at the end of the search of entire word the boolean variable is
        // still false means that the given word is not regitered as a valid
        // word in the trie, but is a prefix
        return current.isWord;

     * Deletes a word from the trie
     * @param word
    public void deleteWord(String word){
        if(searchWord(word) == false) return;

        Node current = root;
        for(char c : word.toCharArray()){
            Node child = current.children.get(Character.valueOf(c));
            if(child.prefixes == 1){
                current = child;
        // since the word is removed now, set the flag to false
        current.isWord = false;

    public static void main(String[] args) {
        Trie trie = new Trie();

        // testing deletion
        System.out.println("Testing deletion : ");

        // testing insertion
        System.out.println("Testing insertion : ");

OUTPUT : java Trie

Testing deletion :
Testing insertion :

Source : http://en.wikipedia.org/wiki/Trie,

Wednesday, June 01, 2011

Strategies for porting data

In every application there always comes a time when data needs to be ported - either from old application to another new application or from one data store to another. You might have changed the database structure to implement some new functionality. Maybe move your data from an SQL to a NoSQL.

The most important tool for moving data from one data-set to another is a porting script. The porting script maps the data from fields of old data-set to the fields of the new data-set. The porting script can contain logic or simple sql.

In a live system where data keeps on coming, it becomes difficult to port data. If data is not handled properly it might lose its sanity. There is a catch while dealing with live systems. It should be considered if the porting of data leads to a downtime or not. Ideally there should be as little downtime as possible. Here when I refer to downtime, it is downtime for both internal and external users.

There are different ways for handling the porting of data.

1. The sure-shot way of porting data without losing any sanity is the bring down the application. Freeze the data set and run your scripts. This technique is ok when you are dealing with small amounts of data. And the porting time is not more than a few minutes - if you can afford a downtime of a few minutes. The good thing about this type of porting is that you do not have to worry about the sanity of data. It is difficult to mess your data when your application is completely offline.

2. Another technique is to move to your new application and use the new data-set to insert new data and select old data from the old data-set for display. This is a very effective way of porting data. All that needs to be done is put in the adapter design pattern (wrapper design pattern) at the point where you are selecting data. Make a note of the D-Day when the new application is made live. And use the adapter pattern to fire selects on old data-set if you need to fetch data previous to the D-Day else fetch data from the new data-set. All inserts would happen on the new data set. This is very effective because ideally all data would slowly move on its own from old data-set to new data-set. If you want to expedite the process, you can have a separate script for porting data older than the D-Day.

3. The third technique is an alternate to the second technique. Instead of putting the adapter pattern at the point of select, you put an adapter pattern at the point of insert. So in addition to inserting into the old data-set, you also insert into the new data-set. All your selects are still fired on the old data-set. Eventually when you are satisfied that data has moved from old data-set to new, you can shift to the new application and start using the new data-set. Here also a script can be run to explicitly port data from old data-set to new data-set.

4. Another variant of the above techniques, is using user access requests to port data. No matter how many tables there are in a system, there is a master table and almost all the tables in the system have their foreign key referring to the master. When data is ported the primary key of the master is kept track of. What this technique does is that it ports data when a primary key is accessed. So for example when a user logs into the system, you check if his data is ported to the new data-set. If not, you port all data related to the user at that instant and move the user to the new system. The bad point about this type of data porting is that if there is a scenario where large number of users suddenly come online - it may become difficult to handle the spike and the users may experience slowness or errors.

Techniques 2, 3 and 4 require some planning and care while execution. But when you are dealing with GBs of data which cannot be worked upon in a few minutes or hours, and you cannot afford a downtime, they are the ideal way to move data. It is really important to remember that even a small issue in moving data can result in losing the sanity. Hence utmost caution and thorough testing is required before you can go ahead and implement these techniques.

In case you come across any other interesting techniques or have used any other technique, do share your knowledge.

Saturday, April 30, 2011

Hadoop 0.21 update

This is an update for setting up hadoop. There have been some changes in configuration files and startup/shutdown scripts

Following configuration files are to be created in <hadoop_directory>/conf folder

  • hdfs-site.xml

    <description>A base for other temporary directories.</description>

    <description>Default block replication.
    The actual number of replications can be specified when the file is created.
    The default is used if replication is not specified in create time.


  • core-site.xml


    <description>The name of the default file system. A URI whose
    scheme and authority determine the FileSystem implementation. The
    uri's scheme determines the config property (fs.SCHEME.impl) naming
    the FileSystem implementation class. The uri's authority is used to
    determine the host, port, etc. for a filesystem.</description>


  • mapred-site.xml


    <description>The host and port that the MapReduce job tracker runs
    at. If "local", then jobs are run in-process as a single map
    and reduce task.


Earlier all these settings were in a single file hadoop-site.xml

Similar to this breakup of configuration files, the scripts to start different services have also been separated. Now there are separate scripts to start and stop dfs, trackers and balancers.

in <hadoop_directory>


Rest of the configurations ought to remain the same.

For those who get this error : java.io.IOException: Incompatible namespaceIDs
The solution is to change the namespaceID in

<hadoop data directory>/dfs/data/current/VERSION to match the namespaceID in
<hadoop data directory>/dfs/name/current/VERSION

Monday, March 07, 2011

An encounter with Dominos

I have been ordering dominos pizza since ages - my favourite - mexican green wave - cheese burst. This is the first time that Dominos created an environment where i needed to write a blog to express.

At 8:37 today (7-3-2011) i placed an order for mexican green wave - cheese burst. Order no 140 at the Dominos outlet at sector 12, vasundra, Ghaziabad. The pizza was delivered in exactly 30 minutes at 21:15 - as expected.

What i did not expect was the quality of the pizza. A cheese burst without any cheese. The pizza was supposed to be our "finishing up" of diner. Everyone took a slice. As soon as i took a bite i realized that there was something missing - CHEESE. The pizza did not have any cheese in it.

Immediately i called up dominos and told them that they had messed up my order by providing a pizza without any cheese. Dominos promptly apologised and told me that they would send in a replacement pizza. We waited.

The second pizza arrived at 21:45. I handed over the earlier pizza and asked the delivery boy to check the cheese. His judgement was "there is a little less cheese than expected". I said that I have been eating cheese burst since last 3 years. Cheese burst means that cheese should be dripping out of the pizza. If there is no cheese dripping out of the pizza - it is not cheese burst.

At this juncture, I opened up the replacement pizza to check the quantity of cheese in it. And I was surprised. The replacement order was not "mexican green wave" but it was "spicy chicken". It seems that they had some left over pizza which they wanted to hand me over as replacement. A non-veg pizza as replacement for a veg pizza.

The delivery boy says - sir do you eat non-veg? What is the point of going to a restaurant and ordering your choice of food - if you do not get it? If you order "kadhai paneer" and you are provided "kadhai chicken" instead - would you eat it? What happened to my order? Or am i supposed to eat left-overs from Dominos - even after paying for it?

Finally the delivery boy himself placed a replacement order for "mexican green wave - cheese burst". Another replacement order. This time I had the notion that I would be eating anything that I get.

Dominos can keep on making replacement orders and delivering them, but my stomach cannot sustain its hunger till dominos gets my order right. Finally the 3rd pizza arrived at 10:10 pm. This time a cheese burst - with dripping cheese.

The moral of the story is that you should

1. Always check the order that you get from dominos
2. Hand over the money only if the order is right and complete.
3. In case the order is wrong, cancel it and cook some maggi - instead of waiting for dominos to get your order right.

Tuesday, January 25, 2011

Theory : database sharding strategies

There are a number of database sharding strategies that meet the diverse requirements of different categories of application.

Shard by Modulus
For many applications, it is appropriate to shard based on a shard key such as a User ID. Using a modulus of a numeric ID, especially an auto increment primary key, ensures even distribution of data between shards.

Shard by Date/Time Range
For time-based data such as feeds or blogs where data is accumulating over time, it may make sense to shard by date range. For example, each shard could contain data for a single month. New shards can be added each month and old shards can be dropped once historic data is no longer needed.

Master Lookup
It is sometimes a requirement to control the sharding manually or in an application specific manner. One example would be a requirement to host key customer accounts on shards hosted on higher specification hardware. To support this requirement, a master shard can be created which contains lookup tables to map customer IDs to a specific shard number.

Session-based Sharding
Some categories of application, particularly user-centric web applications, can choose a shard when a customer logs in and then direct all queries to the same shard for the duration of that user session.

Fixed Shard
Tables are mapped to specific fixed shards. Also known as table based sharding.

Custom Sharding
If there is any specific logic for sharding, a piece of code can be used to shard data based on that logic.

Global Tables
Global tables are tables which are hosted in all shards and data is automatically replicated across all shards. The benefit is that these tables can be used in joins with sharded tables in each shard. Global tables are typically fairly static tables or with low write volume, such as product codes, countries, and other reference data.