Friday, December 25, 2009

Gearman with php and mysql

In the earlier post we learned how to push jobs from the client to the worker. What we will try to do here is to use php script - the same worker we developed last time to process queries in the mysql server. That is to use gearman to process requests posted inside mysql client.

We would need the mysql UDFs for gearman which could be downloaded from the

Once you have downloaded the gearman mysql UDFs, untar it and compile it. I am using gearman-mysql-udf version 0.4 and mysql version 5.1.30 installed in path /usr/local/mysql here.

tar -xvzf gearman-mysql-udf-0.4.tar.gz
cd gearman-mysql-udf-0.4/
./configure --with-mysql=/usr/local/mysql/bin/mysql_config --libdir=/usr/local/mysql/lib/plugin/
sudo make install

check if the so files have been installed properly

jayant@gamegeek:~$ ls /usr/local/mysql/lib/plugin/
libdaemon_example.a mypluglib.a libgearman_mysql_udf.a

You wll see libgearman_mysql_udf.* files here.

Now load the gearman-mysql-udfs into mysql using the following queries

CREATE FUNCTION gman_do_high_background RETURNS STRING

check whether they have been properly loaded. log into mysql and run.

mysql> select * from mysql.func;
| name | ret | dl | type |
| gman_do | 0 | | function |
| gman_do_high | 0 | | function |
| gman_do_low | 0 | | function |
| gman_do_background | 0 | | function |
| gman_do_high_background | 0 | | function |
| gman_do_low_background | 0 | | function |
| gman_sum | 2 | | aggregate |
| gman_servers_set | 0 | | function |
8 rows in set (0.00 sec)

Now lets try calling the reverse function we developed in php to process some work from mysql client. For this.

start gearmand if it is not running

start the worker
php -q gearmanWorker.php

log into mysql and run the following queries.

select gman_servers_set('');
SELECT gman_do("reverse", Host) AS test FROM mysql.user;

The output is as below....

Here we have simply said that all functions would be processed by worker running on
We could also set function wise servers - different servers for different functions
SELECT gman_servers_set("", "sortme");
SELECT gman_servers_set("", "reverseme");

And multiple servers for the same function.
SELECT gman_servers_set(",", "factorialme");

Interesting, right??

Using gearman to distribute your work...

Gearman is a system to farm out work to other machines, dispatching function calls to machines that are better suited to do work, to do work in parallel, to load balance lots of function calls, or to call functions between languages.

How does gearman work? Well, a gearman powered app consists of a client, a worker and a job server. The client creats a job and sends it to the job server. The job server finds a suitable worker and sends the job to the worker. Once the job is done, the worker sends the response back to the client via the job server. There are client and worker APIs available for gearman for different languages which allow the app to communicate with the job server.

Too heavy is it... Lets check how this thing actually runs.

Download gearman daemon from

You would need the "Gearman server and library" - the one written in c. And a php extension which we will use to create and communicate with our workers.

I am using gearmand version 0.11, gearman extension for php version 0.60 and php version 5.3 here.

extract the gearmand server and install it using
sudo make install

extract the php extension for gearman and install it using
sudo make install

Enable the extension in php.ini. Add the following line in php.ini

To check if the extension has been enabled run

php -i | grep -i gearman

And you will see something like

gearman support => enabled
libgearman version => 0.11

Now lets write some scripts and check how this works

Create a php client :
# Create our client object.
$client= new GearmanClient();

# Add default server (localhost).

echo "Sending job\n";

# Send reverse job
$result = $client->do("reverse", "Hello World");
if ($result)
echo "Success: $result\n";

Create a php worker :

# Create our worker object.
$worker= new GearmanWorker();

# Add default server (localhost).

# Register function "reverse" with the server.
$worker->addFunction("reverse", "reverse_fn");

while (1)
print "Waiting for job...\n";

$ret= $worker->work();
if ($worker->returnCode() != GEARMAN_SUCCESS)

# A much simple reverse function
function reverse_fn($job)
$workload= $job->workload();
echo "Received job: " . $job->handle() . "\n";
echo "Workload: $workload\n";
$result= strrev($workload);
echo "Result: $result\n";
return $result;

To test the process

start the gearmand server

start the worker
php -q gearmanWorker.php

And send jobs to the worker
php -q gearmanClient.php

The output is as below

Wednesday, December 09, 2009

Using ssh tunneling to tunnel your browser requests.

You will need :

  • A ssh server - a machine with access to the net which has ssh server installed on it. We will use this server to forward our requests. You should have access to the server.

  • A ssh client - installed on your machine.

  • A web browser - preferably firefox or internet explorer. Most basic browsers do not support socks proxy.

creating the tunnel :

The way to create the tunnel is

ssh -D 8888 <your_username>@<your_machine_ip_or_host>

What this command does is it hands over requests to localhost port 8888 to the server that you have specified. Ofcourse, you will have to login and authenticate yourself for the requests to reach the server. You can do this on whichever server you have access to. Using localhost would be the best.

Configure your browser :

To set socks proxy on firefox go to

Enter socks host as localhost and socks port as 8888. And bingo now your requests are being tunneled from your browser to the ssh server through a ssh tunnel. And from there to the web server.

You can use to check the external ip address of your machine.

You can use foxy proxy extension in firefox to have different proxy settings for different sites.

Sunday, November 29, 2009

A brief on using amazon web services.

Recently we decided to use aws for doing some benchmarks. Why - it is cheaper than getting a system, setting them up and configuring them. But using aws for a long time may prove costly.

So, i went to and signed up with the service. You need to provide a valid credit card no and your phone no before the services are activated. The UI is quite confusing and at times you are not aware whether you are logged in or not. There might also be a case where you think that you are logged in but you would still be asked your password to access some other service.

The best place to start is the "Your Account" section on the top right hand corner. There are two services which are of interest.

Amazon s3 : Simple storage service
Amazon ec2 : Elastic compute cloud

There are two ways to access these services. All ways are available in the "security credentials" tab on the left hand panel after logging in. One way is to create an access key - which provides a access key id and a secret access key - similar to a username & password. Both need to be passed while communicating with the machines. Another way is to generate a X.509 certificate - it also generates a public and a private key.

S3 is a storage space on amazon. The easiest way to access it is using the s3cmd tool available at You can create your own directory and put and get files from the s3 store. Another way of accessing S3 is by using the firefox extension - elasticfox. Once the keys are in place in elasticfox, it provides an interface similar to fileftp with drag-drop features. It looks cool. S3 could only be used to store and retrieve data. If you are thinking about mounting an s3 drive on an ec2 instance to transfer data - better forget it. It would require you to mount the s3 device using fuse and it might take a lot of time to set it up. And there is another way of storing data for EC2.

EC2 is a virtual machine available for computing. You can start an instance of the virtual machine and run your processing on it. The good thing is that there is no need to install the OS of your choice. All you need to do is simply choose the type of machine and the OS and within seconds, the machine is available. You can simply ssh into the machine and start running your processing on it. The bad thing about ec2 is that it does not store your data. Once your processing is done and you turn off or terminate the machine, there is no way of getting back the machine or the data that was there on the machine. The data is lost in the cloud. Like discussed earlier storing data in S3 and mounting it on EC2 might be an option. But a better option is to create an EBS(Elastic Block Storage) volume and attach it to the EC2 instance.

The way to go about all this is logging in to the AWS management console - which is again on the left hand panel. It gives you a dashboard. First of all try launching an instance. It would ask you for the type of instance you want to launch - so you can launch a 32 bit centos instance or a 64 bit ubuntu instance and all you have to do is choose the related AMI (Amazon Machine Image). But be careful, there are some paid AMIs as well - so something like the RHEL instance might be paid and you will have to pay to launch it.

For creating an EBS, you will have to go to the left hand panel again and create a volume. You have to specify the size and the availability zone. An EBS can only be mounted on an Instance if it is in the same availability zone. So be careful about it. Select the zone, where your instance is running. You can create as large a volume you want because you are charged for the data in the volume and not for the volume size. So, if you create a 100 GB volume and put data of only 5 GB in it, you will be charged only for 5GB. Once the volume is created - attach it to the instance - the option is availabe on the web interface.

Now to access the volume you will need to create a filesystem on it and mount it. So, for example if you have attached the EBS volume at /dev/sdf on the instance currently running simply create a filesystem on it.

ssh -i private_key.pem [login to your ec2 instance]
mke2fs -j /dev/sdf [create ex3 filesystem]
mkdir /mnt/vol
mount /dev/sdf /mnt/vol [mount the volume]

Now you can work on your ec2 machine and store data on /mnt/vol. When you are done, it is better to take a snapshot of your volume using the tools in the amazon web console and then turn off your instance. Next time you need to work, simply mount the EBS volume on a new instance that you have started and all your data is readily available.

Another way of going about it is creating an amazon AMI after you have done setup of your machine - and use the AMI to boot further machines that you would need. You can download amazon ec2 api tools and the amazon ec2 ami tools which can help in creating AMIs and running them.

If you want to setup networking between the multiple instances, you need to do some extra effort. Of course by default all amazon running instances could ssh to each other using their private IPs and public DNSes. But to communicate on other ports, the ports need to be opened up. To do this simply use the "security groups" link on the left hand panel of the AWS management console. Select a group - default if you did not create any other group and open up the ports that you need.

And the most important advice - do not forget to turn down your instance after you are through with it - remember, you are billed by the hour.

Wednesday, November 25, 2009

permutation & combination


An arrangement is called a Permutation. It is the rearrangement of objects or symbols into distinguishable sequences. When we set things in order, we say we have made an arrangement. When we change the order, we say we have changed the arrangement. So each of the arrangement that can be made by taking some or all of a number of things is known as Permutation.

A Combination is a selection of some or all of a number of different objects. It is an un-ordered collection of unique sizes.In a permutation the order of occurence of the objects or the arrangement is important but in combination the order of occurence of the objects is not important.


Permutation = nPr = n! / (n-r)!
Combination = nCr = nPr / r!
n, r are non negative integers and r<=n.
r is the size of each permutation.
n is the size of the set from which elements are permuted.
! is the factorial operator.

Example:Find the number of permutations and combinations: n=6; r=4.

Step 1: Find the factorial of 6.
6! = 6×5×4×3×2×1 = 720

Step 2: Find the factorial of 6-4.
(6-4)! = 2! = 2

Step 3: Divide 720 by 2.
Permutation = 720/2 = 360

Step 4: Find the factorial of 4.
4! = 4×3×2×1 = 24

Step 5:Divide 360 by 24.
Combination = 360/24 = 15

Sunday, November 22, 2009

GPG Error: ... : NO_PUBKEY D739676F7613768D

You run an apt-get update and it gives some errors which are difficult to comprehend. The output looks some like this

Fetched 924B in 2s (352B/s)
W: GPG error: karmic Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY D739676F7613768D
W: GPG error: karmic Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 2836CB0A8AC93F7A
W: GPG error: karmic Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 2836CB0A8AC93F7A

What to do... How to remove these errors. Well, some public keys are unavailable due to which these errors are happening. Ok, ok, but how to go about it? How do i fix it?

The easiest way is to get this script. The script does not enable the PPAs that are disabled. But for all enabled PPAs it fetches their keys and installs them

jayant@gamegeek:~/bin$ cat launchpad-update 
#! /bin/sh

# Simple script to check for all PPAs refernced in your apt sources and
# to grab any signing keys you are missing from
# Additionally copes with users on launchpad with multiple PPAs
# (e.g., ~asac)
# Author: Dominic Evans
# License: LGPL v2

for APT in `find /etc/apt/ -name *.list`; do
grep -o "^deb[a-z0-9\-]\+/[a-z0-9\-]\+" $APT | while read ENTRY ; do
# work out the referenced user and their ppa
USER=`echo $ENTRY | cut -d/ -f4`
PPA=`echo $ENTRY | cut -d/ -f5`
# some legacy PPAs say 'ubuntu' when they really mean 'ppa', fix that up
if [ "ubuntu" = "$PPA" ]
# scrape the ppa page to get the keyid
KEYID=`wget -q --no-check-certificate$USER/+archive/$PPA -O- | grep -o "1024R/[A-Z0-9]\+" | cut -d/ -f2`
sudo apt-key adv --list-keys $KEYID >/dev/null 2>&1
if [ $? != 0 ]
echo Grabbing key $KEYID for archive $PPA by ~$USER
sudo apt-key adv --recv-keys --keyserver $KEYID
echo Already have key $KEYID for archive $PPA by ~$USER

echo DONE

Make the script executable

$ chmod a+x launchpad-update

And run the script

jayant@gamegeek:~/bin$ sudo ./launchpad-update 
Grabbing key 7613768D for archive vlc by ~c-korn
Executing: gpg --ignore-time-conflict --no-options --no-default-keyring --secret-keyring /etc/apt/secring.gpg --trustdb-name /etc/apt/trustdb.gpg --keyring /etc/apt/trusted.gpg --recv-keys --keyserver 7613768D
gpg: requesting key 7613768D from hkp server
gpg: key 7613768D: public key "Launchpad PPA named vlc for Christoph Korn" imported
gpg: no ultimately trusted keys found
gpg: Total number processed: 1
gpg: imported: 1 (RSA: 1)
Already have key 4E5E17B5 for archive ppa by ~chromium-daily
Grabbing key 8AC93F7A for archive backports by ~kubuntu-ppa
Executing: gpg --ignore-time-conflict --no-options --no-default-keyring --secret-keyring /etc/apt/secring.gpg --trustdb-name /etc/apt/trustdb.gpg --keyring /etc/apt/trusted.gpg --recv-keys --keyserver 8AC93F7A
gpg: requesting key 8AC93F7A from hkp server
gpg: key 8AC93F7A: public key "Launchpad Kubuntu Updates" imported
gpg: no ultimately trusted keys found
gpg: Total number processed: 1
gpg: imported: 1 (RSA: 1)
Already have key 8AC93F7A for archive staging by ~kubuntu-ppa

Thats it... Done...
Now if you run sudo apt-get update, you should not get any errors about PUBKEYs...

Tuesday, November 10, 2009

Bombay & Delhi - very precise article describing both...

He is a large amiable boy who smiles at breasts as if they are acquaintances. He suspects he is good looking, and he probably is, especially when he is quiet. A pretty white girl walks into this London pub and he nudges the elbow of a friend. As his eyes follow her to the far corner, his face assumes a sudden seriousness. He then takes her pictures with his phone camera. He tries to meet the eyes of any woman in the pub so that he can flash a smile. He has heard that white women are broadminded. “I like to hear white women scream under me,” he says. A Bengali sitting beside him says he must become a midwife then. Everybody laughs. He stares at the Bengali who is a much smaller man, and he slaps him a few times. Others now rise and try to drag him away. He growls, not metaphorically but really growls. And he says, “I’m a baaad guy, I’m a baaad guy.” He is, of course, a jat from Delhi whose matrimonial ad had once said, accurately, that he is from a good family. He has travelled the world. He studied briefly in the First World, even. There are thousands like him in Delhi, the natural habitat of a kind.

Delhi is a vast medieval town of indisputable botanical beauty, spectacular red ruins, Sheila Dixit, and other charms. Its women, rumoured to be high maintenance as if there is another kind, take so much care of themselves that one would think the men are worth it (but they make a gesture that suggests puking when asked to confirm). Space is not compressed here. Everything is far from everything else. There are real gardens where you do not see the exit when you stand at the entrance. It has sudden small parks that in Bombay would have been called, ‘Chhatrapati Shivaji Mini Forest’. Homes have corridors, and they are called corridors, not half-bedrooms. Yet, Delhi has a bestial smallness of purpose.

Those men there who drive the long phallic cars, sometimes holding a beer bottle in one hand, there is something uncontrollable about them. Even for a man, it is hard to understand their mutation. What is the swagger about? What is the great pride in driving your father’s BMW, what is the glory in being a sperm? And what is the great achievement in stepping on the accelerator? It is merely automobile engineering—press harder on the pedal and the car will move faster. Why do you think a girl will mate with you for that? It is somehow natural that the contemporary version of Devdas, Anurag Kashyap’s Dev D, would be set in Delhi, where a man can debase himself because life does not challenge him, he has no purpose, whose happiness is a type of sorrow. This motiveless Delhi male, you can argue, can be found in Bombay too, where not all BMWs are hard earned. But that’s not very different from saying Bombay, too, has bungalows.

Like a rich man’s son, Delhi is a beneficiary of undeserved privileges. That is at the heart of Bombay’s contempt for Delhi. Bombay is a natural city, like all great port cities of the world. It was not created. It had to arrive at a particular moment in time, it was an inevitability caused by geography, shipping and shallow waters. Bombay eventually earned its right to be a financial force through the power of enterprise, which created a system that allowed, to some extent, anyone to stake a claim at wealth through hard work. That culture still exists. It is the very basis of Bombay. That is the whole point of Bombay.

But Delhi as a centre of power is an inheritance, a historical habit. An unbearable consequence of this is the proximity of easy funds for various alleged intellectual pursuits which has enabled it to appropriate the status of intellectual centre. It is a scholarship city, a city of think tanks, of men who deal in discourse, debates and policies. And of fake serious women who wear the sari the other way and become leftists, nature lovers and diurnal feminists.

Delhi, often, confuses seriousness with intelligence and humour with flippancy. People will not be taken seriously here if they are not, well, serious. There is much weight attached to the imagined sophistication of talk, of gas. It is a city of talkers. There is always The Discussion. When you are in Delhi, you have to talk, and then they talk, and they appear to be solving an enigma, they seem headed towards achieving some revelation. But then, you realise, they were peeling an onion, an act that leads to more peels and at the heart of it all, there is nothing. Delhi is an onion. It is a void-delivery device.

Of course, all this is a generalisation, but then generalisation is a form of truth. One of the most repulsive images I bear in mind of Delhi is a scene in JNU, when Venezuelan president Hugo Chavez delivered a special lecture. It was like a rock concert and Chavez, who is a scion of the same imbecilic philosophy that once destroyed the great economies of South America, was the star. As students pumped their hands in the air and cheered him for his anti-capitalist calling, I looked at their faces. I knew those faces. They were from homes that once profited from India’s socialist corruption, and then from Manmohan’s revolution. They were hip. They would, of course, later join MNCs and chuckle at their youthful romanticism. That moment in JNU was despicable because it captured a meaningless aspect of Delhi’s fiery intellectuality, and also laid bare the crucial difference between intellectuality, which is borrowed conviction, and intelligence, which is creativity, innovation and original analysis.

It is for the same reason that the greatest misfortune of Indian journalism is that it is headquartered in Delhi. Needless to say, like in any other city, Delhi has astonishingly talented editors, journalists and writers, but there is a Delhi mental condition which is incurable—a fake intensity, a fraudulent concern for ‘issues’, the grand stand. Readers, on the other hand, have many interests today apart from democracy, policies and the perpetual misery of the poor. But the Indian media, based in Delhi, refused to see it until recently and very grudgingly, when The Times of India proved it. It is not a coincidence that The Times Group, the most profitable media organisation in India, is based in Bombay. It is not a coincidence that the game changer came from here. In Bombay it is hard to convert air from either side of your alimentary canal into cash. You have to do something here. You have to work. It is appropriate that the National School of Drama, with its phoney distaste for money, is in Delhi. And commercial cinema is in Bombay.

It must be said though that in recent times Delhi has become somewhat more endearing. This is partly because of Bombay’s own degradation and its loss of modernity, and partly because of a remarkable cultural irony. Bombay’s films were increasingly becoming pointless because, like Delhi has those silver sperms in BMWs, Bombay’s film industry, too, suffers the curse of the privileged lads whose fathers were something. As actors with no real talent they could still survive, but some who did not look so good could do nothing more than remaking movies about love and parental objection. Then two things happened. The flops of the brainless boys from the film families gave opportunities to talent that had arrived from all over the country, including what is called North India. They were waiting, and when they got a chance they created a new kind of commercial cinema in which Bombay was not necessarily the focus. That resulted in the startling revelation here that Bombay is a culturally impoverished, rootless setting compared to Delhi. What films like Oye Lucky! Lucky Oye! and Dev D have achieved as hilarious, poignant and self deprecatory narrations of the North Indian way of life, has changed Hindi cinema, probably forever. So Delhi is being seen a bit differently in Bombay, with some affection too. Though, the best thing about Delhi will always be its winter. When there is this mist. And you do not see its people.

Source :

Saturday, October 31, 2009

Installing mogilefs for dummies

MogileFS is an open source distributed filesystem created by Danga Interactive to be used by LiveJournal project. Its features include

  • No single point of failure

  • Automatic file replication - satisfying the number of replica counts specified in the configuration

  • Flat namespace - Files are identified by named keys in a flat, global namespace. You can create as many namespaces as you'd like, so multiple applications with potentially conflicting keys can run on the same MogileFS installation.

  • Shared-Nothing - MogileFS doesn't depend on a pricey SAN with shared disks. Every machine maintains its own local disks.

  • No RAID required - RAID doesn't buy you any safety that MogileFS doesn't already provide.

  • Local filesystem agnostic - local disks for mogilefs storage nodes can be formatted with the filesystem of choice (ext3, xfs, etc...)

Files inside mogilefs cannot be accessed directly. You need specific APIs to access the files and there are client implementations for several

Perl -
Java -
Ruby -
Python -

We will skip directly to the installation. You will have to install the following modules. You can use the cpan shell to install these modules. For novice perl users, simply type cpan from the command prompt (sudo cpan) and type out install >module_name< on the prompt. Install the following modules.


Cpan would automatically check the dependencies and install the required modules. If you are not interested in the latest release of mogilefs, you can go ahead and install the mogilefs from cpan itself. Install these modules as well.


If you prefer latest versions, just get the files and install them individually. With latest versions, your earlier installations of dependencies using cpan may not work - in which case you will need to get the files from and install them.

To install mogilefs do the following

  • Checkout the latest code from svn
    $ mkdir mogilefs-src
    $ cd mogilefs-src
    $ svn checkout

  • Create database mogilefs and grant necessary permissions
    $ mysql
    mysql> CREATE DATABASE mogilefs;
    mysql> GRANT ALL ON mogilefs.* TO 'mogile'@'%';
    mysql> SET PASSWORD FOR 'mogile'@'%' = OLD_PASSWORD( 'sekrit' );
    mysql> quit

  • Set up trackers and storage servers. Install the mogile server.
    $ cd <path to mogilefs-src>trunk/server/
    $ perl Makefile.PL
    $ make
    $ make test
    $ make install
    If during make test, you get a mysql related connection error, it could be safely ignored - assuming that you have mysql installed and perl-mysql connection (DBD::MySQL) configured.
    Now lets install some utilities:
    $ cd <path to mogilefs-src>trunk/utils/
    $ perl Makefile.PL
    $ make
    $ make test
    $ make install
    And the perl api:
    $ cd <path to mogilefs-src>trunk/api/perl/
    $ perl Makefile.PL
    $ make
    $ make test
    $ make install

  • Configure the database
    $ ./mogdbsetup --dbname=mogilefs --dbuser=mogile --dbpass=sekrit

  • Create the configuration files
    $ mkdir /etc/mogilefs/
    $ cp <path to mogilefs-src>trunk/server/conf/*.conf /etc/mogilefs/
    Edit the configuration files.
    $ vim /etc/mogilefs/mogilefsd.conf

    #Configuration for MogileFS daemon
    db_dsn =
    db_user = mogile
    db_pass = sekrit
    listen = # IP:PORT to listen on for mogilefs client requests

    $ vim /etc/mogilefs/mogstored.conf

    #Configuration for storage nodes
    maxconns = 10000
    httplisten =
    mgmtlisten =
    docroot = /home/mogile/mogdata #where data will be stored

  • Create the user mogile.
    $ adduser mogile

  • Start the storage node
    $ mogstored --daemon

  • Start the tracker node as mogile user
    $ su - mogile
    $ mogilefsd -c /etc/mogilefs/mogilefsd.conf --daemon

  • Now that we have the trackers & storage nodes up lets tell the tracker that a storage node is available.
    $ mogadm --trackers=<tracker_ip>:7001 host add <storage_node_name> --ip= --port=7500 --status=alive
    And check that the host is being recognized.
    $ mogadm --trackers=<tracker_ip>:7001 host list

  • Add a device to the storage node where files would be kept.
    $ mogadm --trackers=<tracker_ip>:7001 device add <storage_node_name> 1
    And create the directory for the device (dev1 in our case).
    $ mkdir -p /home/mogile/mogdata/dev1
    Check that the device information is being displayed
    $ mogadm --trackers=<tracker_ip>:7001 device list

  • Now we are up and running. Next step would be creating namespaces and adding files.
    We could create domains and classes withing the domains using the mogadm utility and then use some api to add files to the class.
    create domain
    $ mogadm --trackers=<tracker_ip>:7001 domain add <domain_name>
    check if domain has been added
    $ mogadm --trackers=<tracker_ip>:7001 domain list
    create a class in the domain
    $ mogadm --trackers=<tracker_ip>:7001 class add <domain_name> <class_name>
    check if class has been added
    $ mogadm --trackers=<tracker_ip>:7001 class list

You can use the stats command to see a summary of the status of mogilefs.

$ mogadm --trackers=<tracker_ip>:7001 stats

That finishes the tutorial for installing mogilefs on a single node. You can easily replicate these steps on multiple servers creating a number of trackers and storage nodes. All that is needed is to add all the storage nodes to the trackers and using the same db for storing all the information.

If you have built up enough redundancy with more than 3 storage nodes & trackers, there should not be any point of failure. The only single point of failure that i could figure out was the database. You should create a slave to use it for failover scenarios.

Tuesday, October 06, 2009

intro to lucene 2.9

What crap!!!. Why do they have to come out with a new version every now and then. And make people rewrite their code to upgrade to a new version. How much do they still have to improve their code. Just because of their frequent upgrades, i have to change my code every now and then. Why
should i upgrade to lucene 2.9?

To answer this question - it could be said that you build something and then you figure out that - oh no if this could have been done in this way then it would have been better. So for example you make an omlette and you figure out that putting a bit of cheese and pepper would have improved its taste. So next time you try that, but then you figure out that making it in butter would bring out more taste. Or you buy a Pentium 3 pc with 1 GB ram and after 2 years you see that it is outdated - the softwares have grown and so have the processing powers. To run the currently available softwares, you would need to upgrade your pc to a Pentium 4 - core 2 duo and maybe upgrade your graphics card to ATI Radeon 4870 X2 from the previous nvidia 9800 GT to play the recent games more effectively. And maybe upgrade your 20 inch CRT television to a 42 inch HD LCD for better graphics display.

It is the same reason that lucene keeps on optimizing its code and improving the features - they realize that better code leads to faster indexing and searching on the same machine.

The reason why you shud upgrade your lucene version is defined by the list of features that lucene 2.9 provides:

  • Per segment searching and caching (can lead to much faster reopen among other things). FieldCache - takes advantage of the fact that most segments of the index are static, only processes the parts that change, save on time and memory. Also faster searching among multiple segments.

  • Near real-time search capabilities added to IndexWriter - new way to search the current in-memory segment before index is written to disk.

  • New Query types

  • Smarter, more scalable multi-term queries (wildcard, range, etc)

  • A freshly optimized Collector/Scorer API

  • Improved Unicode support and the addition of Collation contrib

  • A new Attribute based TokenStream API

  • A new QueryParser framework in contrib with a core QueryParser replacement impl included.

  • Scoring is now optional when sorting by Field, or using a custom Collector, gaining sizable performance when scores are not required.

  • New analyzers (PersianAnalyzer, ArabicAnalyzer, SmartChineseAnalyzer)

  • New fast-vector-highlighter for large documents

  • Lucene now includes high-performance handling of numeric fields (NumericField & NumericRangeQuery). Such fields are indexed with a trie structure, enabling simple to use and much faster numeric range searching without having to externally pre-process numeric values into textual values. This improves the Lucene number indexing, and is faster for searching numbers, geo-locations, and dates, faster for sorting, and hugely faster for range searching.

For the newbies, all this techno-rant is just there to make you feel good about upgrading. In brief - faster search and more features.

Lets take a look at how you would go ahead with indexing and searching using lucene 2.9

Here is a very rough example. What i have done is use the twitter api to search for keywords in twitter and fetch the micro-blogs and create an index using lucene 2.9. And then use the same program to open the index and run a search - displaying only the n results. You can fetch the twitter api from

import twitter4j.*;
import org.apache.lucene.analysis.*;
import org.apache.lucene.index.*;
import org.apache.lucene.queryParser.*;
import org.apache.lucene.document.*;
import java.util.Date;
import java.util.ArrayList;
import java.util.List;

public class lucene
public static void main(String[] args) throws Exception
if(args.length != 3)
System.out.println("Usage : java lucene <index/search> <dirname> <string>");

if(!args[0].equalsIgnoreCase("index") && !args[0].equalsIgnoreCase("search"))
System.out.println("Usage : java lucene <index/search> <dirname> <string>");

lucene lu = new lucene(args[0], args[1]);
else if(args[0].equalsIgnoreCase("search"))


File index_dir;
String action;

public lucene(String action, String dirname) throws Exception
this.index_dir = new File(dirname);
this.action = action;

if(index_dir.exists() && action.equalsIgnoreCase("index"))
System.out.println("Index already exisits... enter another another directory for indexing...");

public void indexFiles(String searchstr) throws Exception
Twitter tw = new Twitter();
System.out.println("Getting tweets for "+searchstr);
twitter4j.Query qry = new twitter4j.Query("source:twitter4j "+searchstr);

QueryResult res =;
List<Tweet> tweets = res.getTweets();
System.out.println("Got "+tweets.size()+" tweets in "+res.getCompletedIn()+" : "+res.getMaxId());

// constructor changed from lucene 2.4.1
IndexWriter iw = new IndexWriter(, new WhitespaceAnalyzer(), true, IndexWriter.MaxFieldLength.UNLIMITED);

int docs = 0;
for(int z=0; z<tweets.size(); z++)
Tweet twt = (Tweet)(tweets.get(z));
String user = twt.getFromUser();
String usrTwt = twt.getText();
System.out.println("Got : "+user+" => "+usrTwt);

Document d = new Document();
// constructor for Field changed - introduced new constants ANALYZED & NOT_ANALYZED. Not storing NORMS improve performance.
d.add(new Field("user", user, Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS, Field.TermVector.YES));
d.add(new Field("tweet", usrTwt, Field.Store.YES, Field.Index.ANALYZED_NO_NORMS, Field.TermVector.WITH_POSITIONS_OFFSETS));


System.out.println("optimizing..."+docs+" docs");

public void searchFiles(String searchstr) throws Exception
BufferedReader br = new BufferedReader(new InputStreamReader(, "UTF-8"));
QueryParser parser = new QueryParser("tweet",new WhitespaceAnalyzer());
// New constructor in 2.9 - pass true to open in readonly mode.
IndexReader ir =, true);
Searcher searcher = new IndexSearcher(ir);
int ResultsPerPage = 5;
{ qry = parser.parse(searchstr);
System.out.println("Searching for : "+searchstr);

//use TopScoreDocCollector to get results and do paging. Get 2 page in a go. Do not sort on score.
TopScoreDocCollector collector = TopScoreDocCollector.create(2*ResultsPerPage, false);, collector);
//get total no of hits found;
int totalResults = collector.getTotalHits();
int start = 0;
int end = Math.min(totalResults, ResultsPerPage);
ScoreDoc[] hits = collector.topDocs().scoreDocs;

System.out.println("Total hits : "+totalResults+", end : "+end);

for(int i=start; i<end; i++)
Document doc = searcher.doc(hits[i].doc);
System.out.println(i+"] "+doc.get("user")+" => "+doc.get("tweet"));

System.out.print("\nQuery (enter \"quit\" to exit): ");
searchstr = br.readLine();
if(searchstr == null || searchstr.length() == -1)



Friday, October 02, 2009


Innodb monitors show information about innodb internal status - which could be used for performance tuning. Lets break down the output of show engine innodb status and get a look at what is happening and how it can be improved. Just fire the "Show engine innodb status" command and check the Output.

mysql> show engine innodb status\G
*************************** 1. row ***************************
Type: InnoDB
Per second averages calculated from the last 45 seconds

To get a better picture ensure that the output is sampled for some time (say more than 15 seconds - the more the better). If the sampling time is less, just run the same command again to get another sample.

OS WAIT ARRAY INFO: reservation count 20636234, signal count 20151488
Mutex spin waits 0, rounds 486769929, OS waits 6568865
RW-shared spins 19231656, OS waits 8487916; RW-excl spins 13940968, OS waits 4652674

This section reports threads waiting for a semaphore and statistics on how many times threads have needed a spin or a wait on a mutex or a rw-lock semaphore. A large number of threads waiting for semaphores may be a result of disk I/O, or contention problems inside InnoDB. Contention can be due to heavy parallelism of queries or problems in operating system thread scheduling. Setting the innodb_thread_concurrency system variable smaller than the default value might help in such situations.

A more explicit output could be (obtained from

OS WAIT ARRAY INFO: reservation count 413452, signal count 378357
--Thread 32782 has waited at btr0sea.c line 1477 for 0.00 seconds the
semaphore: X-lock on RW-latch at 41a28668 created in file btr0sea.c line 135
a writer (thread id 32782) has reserved it in mode wait exclusive
number of readers 1, waiters flag 1
Last time read locked in file btr0sea.c line 731
Last time write locked in file btr0sea.c line 1347
Mutex spin waits 0, rounds 0, OS waits 0
RW-shared spins 108462, OS waits 37964; RW-excl spins 681824, OS waits 375485

Thread wait information is available only if mysql is running in a very high concurrency environment so innodb has to fall back to OS waits. It would require a bit of knowledge of innodb source code to figure out where you are getting stuck - like in this piece of output we are getting stuck at btr0sea.c lines 731 & 1347.

The reservation count and signal count information show how actively innodb uses internal sync array - how frequently slots are allocated in it and how frequently threads are signaled using sync array. High OS Waits are bad - it means that there is a lot of context switching in the OS. Spin waits and spin rounds are cheaper as compared to OS waits but they also waste CPU cycles. So a large amount of Spin waits and spin rounds mean cpu resources are being wasted. innodb_sync_spin_loops can be used to balance and improve these status variables.

InnoDB uses spin-loop in hopes thread locked mutex is very fast and will release mutex while current thread runs in spins, so there is saving of expensive context switching. However we can't run spin very long as it eats CPU resources and after some loops it is reasonable to give CPU resource to concurrent threads.

030709 13:00:59 Transaction:
TRANSACTION 0 290328284, ACTIVE 0 sec, process no 3195, OS thread id 34831 inserting
15 lock struct(s), heap size 2496, undo log entries 9
MySQL thread id 25, query id 4668733 localhost heikki update
insert into ibtest11a (D, B, C) values (5, 'khDk' ,'khDk')
Foreign key constraint fails for table test/ibtest11a:,
CONSTRAINT `0_219242` FOREIGN KEY (`A`, `D`) REFERENCES `ibtest11b` (`A`,
Trying to add in child table, in index PRIMARY tuple:
0: len 4; hex 80000101; asc ....;; 1: len 4; hex 80000005; asc ....;; 2:
len 4; hex 6b68446b; asc khDk;; 3: len 6; hex 0000114e0edc; asc ...N..;; 4:
len 7; hex 00000000c3e0a7; asc .......;; 5: len 4; hex 6b68446b; asc khDk;;
But in parent table test/ibtest11b, in index PRIMARY,
the closest match we can find is record:
RECORD: info bits 0 0: len 4; hex 8000015b; asc ...[;; 1: len 4; hex
80000005; asc ....;; 2: len 3; hex 6b6864; asc khd;; 3: len 6; hex
0000111ef3eb; asc ......;; 4: len 7; hex 800001001e0084; asc .......;; 5:
len 3; hex 6b6864; asc khd;;

If you are using foreign keys and there has been an error, you would also have a Foreign key error section. This section provides information about the most recent foreign key constraint error.

LATEST DETECTED DEADLOCK ------------------------
091001 14:43:14
TRANSACTION 0 611308122, ACTIVE 0 sec, process no 13095, OS thread id 459504560 setting auto-inc lock
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 320, 1 row lock(s)
MySQL thread id 675137455, query id 2100136547 update
REPLACE INTO TEST (ID, CONTENT) VALUES('2012258', 'xina0157')
TABLE LOCK table `TEST` trx id 0 611308122 lock mode AUTO-INC waiting
TRANSACTION 0 611308121, ACTIVE 0 sec, process no 13095, OS thread id 458902448 updating or deleting, thread declared inside InnoDB 492
mysql tables in use 1, locked 1
6 lock struct(s), heap size 1024, 11 row lock(s), undo log entries 6
MySQL thread id 791833830, query id 2100136545 rx update
REPLACE INTO TEST (ID, CONTENT) VALUES('2012257', 'bhuvneshahuja'), ('2012257', 'lalitgarg007'), ('2012257', 'yatin1904'), ('2012257', 'vinay168'), ('2012257', 'Bhupennsnghn'), ('2012257'
, 'vulnerabl003'), ('2012257', 'manollom'), ('2012257', 'Solankiccoutantom'), ('2012257', 'gprashad@rediffmaiom'), ('2012257', 'deep1_1'), ('2012257', 'ankur20584'), ('2012257', 'amarrock'), ('2012257',
*** (2) HOLDS THE LOCK(S):

TABLE LOCK table `TEST` trx id 0 611308121 lock mode AUTO-INC

RECORD LOCKS space id 0 page no 1822478 n bits 480 index `ID` of table `TEST` trx id 0 611308121 lock_mode X waiting
Record lock, heap no 314 PHYSICAL RECORD: n_fields 3; compact format; info bits 0

0: len 4; hex 801eb462; asc b;; 1: len 8; hex 78696e6130313537; asc xina0157;; 2: len 4; hex ee5c72a8; asc \r ;;


This section provides information about the most recent deadlock. It is not present if no deadlock has occurred. The contents show which transactions are involved, the statement each was attempting to execute, the locks they have and need, and which transaction InnoDB decided to roll back to break the deadlock. Innodb only prints information about few of the locks which transaction is holding. Also only last statement from each transactions is displayed, while locks rows could be locked by one of previous statements.

Here we can see that query 1 was waiting for lock to obtain auto_increment_id from for the table TEST - which has been locked by query 2. So, auto_increment_id is the one creating problems here.

Trx id counter 0 612244468
Purge done for trx's n:o < 0 612244461 undo n:o < 0 0
History list length 11
Total number of lock structs in row lock hash table 0
---TRANSACTION 0 0, not started, process no 13095, OS thread id 524331952
MySQL thread id 793300751, query id 2104652418 localhost root
show engine innodb status
---TRANSACTION 0 612244464, not started, process no 13095, OS thread id 422017968
MySQL thread id 793301055, query id 2104652413 rx
---TRANSACTION 0 612244463, not started, process no 13095, OS thread id 288115632
MySQL thread id 793301051, query id 2104652405 rx
---TRANSACTION 0 612244465, not started, process no 13095, OS thread id 290991024
MySQL thread id 793301050, query id 2104652414 rx
---TRANSACTION 0 612244457, not started, process no 13095, OS thread id 54217648
MySQL thread id 793301046, query id 2104652381 rx
---TRANSACTION 0 612244467, not started, process no 13095, OS thread id 521587632
MySQL thread id 793301039, query id 2104652417 rx
---TRANSACTION 0 612244453, not started, process no 13095, OS thread id 469560240
MySQL thread id 793301038, query id 2104652371 rx
---TRANSACTION 0 612244444, not started, process no 13095, OS thread id 502590384
MySQL thread id 793301028, query id 2104652341 rx
---TRANSACTION 0 612244438, not started, process no 13095, OS thread id 220674992
MySQL thread id 793301027, query id 2104652325 rx
---TRANSACTION 0 612244423, not started, process no 13095, OS thread id 498207664
MySQL thread id 793301014, query id 2104652276 rx
---TRANSACTION 0 612244417, not started, process no 13095, OS thread id 291793840
MySQL thread id 793301013, query id 2104652269 rx
---TRANSACTION 0 612244397, not started, process no 13095, OS thread id 450493360
MySQL thread id 793300979, query id 2104652179 rx
---TRANSACTION 0 612244426, not started, process no 13095, OS thread id 459504560
MySQL thread id 675137455, query id 2104652279 Has read all relay log; waiting for the slave I/O thread to update it

If this section reports lock waits, your applications might have lock contention. If you have small number of connections all connections will be printed in transaction list, if you have large number of connections Innodb will only print number of them,so the output will not grow too large. Transaction id is current transaction identifier - it is incremented for each transaction.

"Purge done for trx's n:o" is number of transaction to which purge is done. Innodb can only purge old versions if they there are no running transactions potentially needing them. Old stale uncommitted transactions may block purge process eating up resources. By looking at transaction counter difference between current and last purged transaction you will be able to spot it. In some rare cases purge also could have hard time to keep up with update rate, in this case difference between these values will also grow.
"undo n:o" will show the undo log record number which purge is currently processing, if it is active otherwise it will be zero.
"History list length 11" is number of unpurged transactions in undo space. It is increased as transactions which have done updates are commited and decreased as purge runs.
For each of connections for MySQL there will be ether "not started" state if there is no active Innodb transaction for this connection, or "ACTIVE" if transaction is active.
Also transaction status is reported which is basically what transaction is doing it can be "fetching rows", "updating" and couple of other values.
"Total number of lock structs in row lock hash table 0" is number of row lock structures allocated by all transactions. Note not same as number of locked rows - there are normally many rows for each lock structure.
"Thread declared inside InnoDB 400" sometimes appears. It means thread is running inside Innodb kernel and still has 400 tickets to use. Innodb tries to limit thread concurrency allowing only innodb_thread_concurrency threads to run inside Innodb kernel at the same time. If thread is not runniing inside innodb kernel status could be "waiting in InnoDB queue" or "sleeping before joining InnoDB queue".
To avoid too many threads competing to enter innodb queue at the same time Innodb makes thread to sleep for some time before trying to wait (if no free slot was available). This may cause number of threads active inside kernel being less than number of threads allowed by "innodb_thread_concurrency". For certain workloads it may help to decrease the time thread waits before it enters the queue. This is done by adjusting "innodb_thread_sleep_delay variable". Value is specified in microseconds.
"mysql tables in use 1, locked 0" is number of tables used by transaction in question and number of tables locked by transactions. Innodb does not lock tables for normal operation so number of tables locked normally stays 0, unless it is ALTER TABLE or similar statement, or if LOCK TABLES was used.
In addition to Innodb specific information, there is generic statement information which is visible in SHOW PROCESSLIST showed in SHOW INNODB STATUS, such as statement which is being executed, query id, query status etc.

FILE I/O --------
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: waiting for i/o request (write thread)
Pending normal aio reads: 0, aio writes: 0,
ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
35472867 OS file reads, 512713655 OS file writes, 33949612 OS fsyncs
0.29 reads/s, 16384 avg bytes/read, 20.76 writes/s, 1.33 fsyncs/s

This section shows state of file IO helper threads - insert buffer thread, log thread, read thread and write thread. These are responsible appropriately for insert buffer merges, asynchronous log flushes, read-ahead and flushing of dirty buffers. Normal reads originated from query executions are executed by threads running queries. On Unix/Linux you will always see 4 helper threads, on Windows it however can be adjusted by innodb_file_io_threads variable. For each helper thread you can see thread state - if thread is ready - waiting for i/o request or if it is executing certain operation.
Number of pending operation is shown for each of helper threads - these are amount of operations queued for execution or being executed at the same time. Also number of pending fsync operations is displayed. For writes Innodb has to ensure data makes it to the disk - just passing it to OS cache is not enough. This is typically done by calling fsync() for modified files. Constant high values for any of these variables is indication of IO bound workload.
Next we see the number of IO operations & the averages of these operations which could be used for graphing/monitoring purposes.

INSERT BUFFER AND ADAPTIVE HASH INDEX -------------------------------------
Ibuf: size 1, free list len 5, seg size 7,
7567365 inserts, 7567365 merged recs, 4848077 merges
Hash table size 4425293, used cells 1294280, node heap has 1502 buffer(s)
14.07 hash searches/s, 78.40 non-hash searches/s

This section shows insert buffer and adaptive hash status. First line shows status of insert buffer - segment size and free list as well as if there are any records is insert buffer. Next it shows how many inserts were done in insert buffer, how many recs were merged and how many merges did it took. Ratio of number of merges to number of inserts is pretty much insert buffer efficiency.

Adaptive hash index is hash index Innodb builds for some pages to speed up row lookup replacing btree search with hash search. This section shows hash table size, number of used cells and number of buffers used by adaptive hash index. You can also see number of hash index lookups and number of non-hash index lookups which is indication of hash index efficiency.

Log sequence number 263 105379218
Log flushed up to 263 105373095
Last checkpoint at 263 102251451
0 pending log writes, 0 pending chkp writes
391975179 log i/o's done, 20.00 log i/o's/second

Log section provides information about log subsystem of Innodb. You can see current log sequence number - which is amount of bytes Innodb has written in log files since system tablespace creation. You can also see up to which point logs have been flushed - so how much data is unflushed in log buffer as well as when last checkpoint was performed. Innodb uses fuzzy checkpointing so this line hold log sequence, all changes up to which has been flushed from buffer pool. Changes having higher log sequences may still only be recored in logs and not flushed from buffer pool so such log sequences can't be over written in log files. By monitoring log sequence number and value up to which logs have been flushed you can check if your innodb_log_buffer_size is optimal - if you see more than 30% of log buffer size being unflushed you may want to increase it.

You also can see number of pending normal log writes and number of checkpoint log writes. Number of log/io operations allows to separate tablespace related IO from log related IO so you can see how much IO your log file requires. Note depending on your innodb_flush_log_at_trx_commit value your log writes may be more or less expensive. If innodb_flush_logs_at_trx_commit=2 log writes are done to OS cache, and being sequential writes these logs writes are pretty fast.

BUFFER POOL AND MEMORY ----------------------
Total memory allocated 1177422796; in additional pool allocated 5829888
Dictionary memory allocated 94872
Buffer pool size 65536
Free buffers 1
Database pages 64033
Modified db pages 476
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages read 58886259, created 19378118, written 210440562
0.29 reads/s, 0.42 creates/s, 0.96 writes/s
Buffer pool hit rate 1000 / 1000

This section shows Buffer pool activity and memory usage. You can see total memory allocated by Innodb (sometimes it is higher than you anticipated), amount of memory allocated in additional memory pool (so you can check if it is sized right), total number of pages in buffer pool, number of pages free, pages allocated by database pages and dirty pages. From these values you can learn if your buffer pool is sized well - if you have constantly a lot of pages free, it probably means your active database size is smaller than allocated buffer pool size so you can tune it down. Even if free pages is zero as in this case database pages will not be equal to total size of buffer pool, because buffer pool also stores lock information, adaptive hash indexes and some other system structures.

Pending reads and writes are pending requests on buffer pool level. Innodb may merge multiple requests to one on file level so these are different. We can also see different types of IO submited by Innodb - pages to be flushed via LRU pages - dirty pages which were not accessed long time, flush list - old pages which need to be flushed by checkpointing process and single page - independent page writes.

We can also see number of pages being read and written. Created pages is empty pages created in buffer pool for new data - when previous page content was not read to the buffer pool.

Finally you can see buffer pool hit ratio which measures buffer pool efficiency. 1000/1000 corresponds to 100% hit rate. It is hard to tell what buffer pool hit rate is good enough - it is very workload dependent. Sometimes 950/1000 will be enough, sometimes you can see IO bound workload with hit rate of 995/1000.
ROW OPERATIONS --------------
0 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
Main thread process no. 13095, id 538459056, state: sleeping
Number of rows inserted 748462338, updated 421356848, deleted 323128079, read 3220854064
5.13 inserts/s, 8.00 updates/s, 0.00 deletes/s, 43.71 reads/s

Row operations show activity on the row basics and some system information.

It shows innodb thread queue status - how many threads are waiting and being active. How many read views are open inside Innodb - this is when transaction was started but no statement is currently active, state of Innodb main thread which controls scheduling of number of system operations - flushing dirty pages, checkpointing, purging, flusing logs, doing insert buffer merge. Values for "state" field are rather self explanatory.

You can also see number of rows operation since system startup as well as average values. Row operations is very good measure of Innodb load. Not all row operations are created equal of course and accessing of 10 byte rows is much cheaper than accessing 10MB blog, but it is still much more helpful than number of queries, which is even more different.

Saturday, September 19, 2009

innodb configuration and optimization

Just thought i should have a quick guide for configuring an optimized for innodb my.cnf

use innodb_data_home_dir & innodb_data_file_path to specify the location and size of the tablespace.

innodb_data_home_dir = /path/to/data
innodb_data_file_path = ibdata1:50M;ibdata2:50M:autoextend:max:10G
innodb_autoextend_increment = 10

This asks innodb to create two files ibdata1 & ibdata2 (as tablespace) and keep them in the /path/to/data directory. Both would be of 50 MB initially, but ibdata2 would expand automatically by innodb_autoextend_increment MBs (yes the value is in MB) till the max size of 10 GB. InnoDB forms the directory path for each data file by textually concatenating the value of innodb_data_home_dir to the data file name, adding a path name separator (slash or backslash) between values if necessary.

By default innodb creates all its tables in the tablespace specified by the innodb_data_file_path directory - in a single file - as specified. But there is another option using which you can ask innodb to create tablespace for each table.


Innodb would now store each newly created table in its own table_name.ibd file - which would contain both the data and the index.

Lets look at some optimization and performance settings for innodb


If you have a large number of innodb tables, you will need to increase this value. It is used to store data dictionary information and other internal data structures. If the value is low, you might see warning messages in the mysql error log.


Innodb uses some table level locking for generating auto increment values - which is configurable. If the value is set to 0 (traditional mode), for all insert statements, a table level auto_inc lock is obtained and maintained till the end of the insert statement. If the value is set to 1(consecutive lock mode), all bulk insert-like statements obtain and maintain the auto_inc lock on the whole table. But simple inserts use a new locking method where a lightweight mutex is used during the allocation of auto-increment values, and no table-level AUTO-INC lock is used. If the value is set to 2 (interleaved lock mode), no insert-like statements use the table-level AUTO-INC lock, and multiple statements can execute at the same time. This is the fastest and most scalable lock mode, but it is not safe when using statement-based replication or recovery scenarios when SQL statements are replayed from the binary log. Here auto-increment values are guaranteed to be unique and monotonically increasing across all concurrently executing insert-like statements. However, because multiple statements can be generating numbers at the same time (that is, allocation of numbers is interleaved across statements), the values generated for the rows inserted by any given statement may not be consecutive.


The larger the buffer pool the less disk I/O would be needed to access data in tables. Generally buffer pool should be around 50% of available memory size. Very large values can result in paging and slow down the system.


The number of threads that can commit at the same time. Should be equal to the number of available cpus or cores.


If the value is 0, the log buffer is written out to the log file once per second and the flush to disk operation is performed on the log file, but nothing is done at a transaction commit. When the value is 1 (the default), the log buffer is written out to the log file at each transaction commit and the flush to disk operation is performed on the log file. When the value is 2, the log buffer is written out to the file at each commit, but the flush to disk operation is not performed on it. However, the flushing on the log file takes place once per second also when the value is 2. Note that the once-per-second flushing is not 100% guaranteed to happen every second, due to process scheduling issues.

The default value of 1 is the value required for ACID compliance. Better performance can be achieved by setting the value different from 1, but then you can lose at most one second worth of transactions in a crash. With a value of 0, any mysqld process crash can erase the last second of transactions. With a value of 2, then only an operating system crash or a power outage can erase the last second of transactions.


The size of buffer that innodb uses to write log files on disk. A large buffer allows large transactions to run without a need to write the log to disk before the transactions commit.


The size of each log file in a log group. The combined size of log files must be less than 4GB. Sensible values range from 1MB to 1/N-th of the size of the buffer pool, where N is the number of log files in the group. The larger the value, the less checkpoint flush activity is needed in the buffer pool, saving disk I/O. But larger log files also mean that recovery is slower in case of a crash.


The number of log files in the log group.


The maximum no of tablespaces that innodb can keep open at one time.


If this is enabled, InnoDB support for two-phase commit in XA transactions is enabled, which causes an extra disk flush for transaction preparation. If you do not wish to use XA transactions, you can disable this variable to reduce the number of disk flushes and get better InnoDB performance.


InnoDB tries to keep the number of operating system threads concurrently inside the engine less than or equal to the limit given by this variable. Once the number of threads reaches this limit, additional threads are placed into a wait state within a FIFO queue for execution. Threads waiting for locks are not counted in the number of concurrently executing threads. By default this value should be twice the number of available cpus/cores.

Using some of the data here, you can easily setup a machine to perform better using innodb. You will always have to choose between performance and reliability as both are inversely proportional. As performance increases the reliability of data would decrease. The variables that i have set here are for high performance only.

Tuesday, September 08, 2009

mad world - tears for fears

All around me are familiar faces
Worn out places, worn out faces
Bright and early for their daily races
Going nowhere, going nowhere

Their tears are filling up their glasses
No expression, no expression
Hide my head I want to drown my sorrow
No tommorow, no tommorow

And I find it kind of funny, I find it kind of sad
The dreams in which Im dying are the best I've ever had
I find it hard to tell you, I find it hard to take
When people run in circles its a very, very
Mad world, Mad World

Children waiting for the day they feel good
Happy birthday, happy birthday
And I feel the way that every child should
Sit and listen, sit and listen

Went to school and I was very nervous
No one knew me, no one knew me
Hello teacher tell me whats my lesson
Look right through me, look right through me

And I find it kind of funny, I find it kind of sad
The dreams in which I'm dying are the best I've ever had
I find it hard to tell you, I find it hard to take
When people run in circles its a very, very
Mad world, Mad World, enlarging your world
Mad World

Monday, September 07, 2009

moniter and improve lucene search

How ?
Simple - use lucidgaze available at

With lucidgaze you could analyze

  • Read/write stats on index utilization, distribution and throughput

  • Query efficiency stats - show how effectively user input is analyzed and decomposed for processing by the index

  • Mapping of tokenizers, token-streams and analyzers - makes transparent how text is processed and indexed

So, we went ahead and downloaded the api but were unable to find any example program or source code. After some amount of tinkering around we were finally able to figure out how to use it.

You could get the code here :

Wednesday, September 02, 2009

DB Basics : Types of joins

Join Types

So, you have designed a normalized database design for your application. Maybe up to the 3rd normal form. And now, when you need to run queries, you would need to join the tables in the query to get the required information. There has to be some common data that allow those tables to be connected in some meaningful way. Although it’s possible to have more than one common column between two tables, most often, the join key will be the primary key of one table and a foreign key in the other.

Lets perform queries against the following table.

company : company_id (pk), company_name
model : model_id(pk), company_id(fk), model_name

Here, we have a company table and a model table in 1:n relationship. A car company can have multiple models in the market. Let us take some sample data



Inner join : An inner join is defined as a join in which rows must match in both tables in order to be included in the result set.

mysql> select t1.company_name as 'Manufacturer', t2.model_name as 'Model' from company t1 INNER JOIN model t2 on t1.company_id = t2.company_id where t1.company_name = 'Audi';

| Manufacturer | Model |
| Audi | A4 |
| Audi | A6 |
| Audi | A8 |
| Audi | Q7 |

Outer Join : Outer joins will return records in one table that aren’t matched in another. Outer joins can be further divided into the two types of left and right. In a left outer join, all records from the first (left-hand) table in a join that meet any conditions set in the WHERE clause are returned, whether or not there’s a match in the second (right-hand) table.

mysql> select t1.company_name as 'Manufacturer', t2.model_name as 'Model' from company t1 left join model t2 on t1.company_id = t2.company_id where t1.company_name in ('Toyota','BMW');
| Manufacturer | Model |
| Toyota | Corola |
| Toyota | Camry |
| Toyota | Innova |
| BMW | NULL |

Here 'BMW' is returned even when it does not have any entry in the model Table.

Similar to the left outer join, a right outer join returns all records from the second (right-hand) table in a join that meet any conditions set in the WHERE clause, whether or not there’s a match in the first (left-hand) table.

mysql> select t1.company_name as 'Manufacturer', t2.model_name as 'Model' from company t1 right join model t2 on t1.company_id = t2.company_id where t2.model_name in ('Custom','Nano');
| Manufacturer | Model |
| Tata | Nano |
| NULL | Custom |

Cross-join :

The cross-join, also referred to as a Cartesian product, returns all the rows in all the tables listed in the query. Each row in the first table is paired with all the rows in the second table. This happens when there is no relationship defined between the two tables. We do not require cross join in our general applications, so we should try to avoid it. A cross join happens when we fail to provide a filler for the join in the query.

mysql> select t1.company_name as 'Manufacturer', t2.model_name as 'Model' from company t1, model t2;
| Manufacturer | Model |
| Audi | A4 |
| GM | A4 |
| Ford | A4 |
| Toyota | A4 |
| Tata | A4 |
| BMW | A4 |
| Audi | A6 |
| GM | A6 |
| Ford | Custom |
| Toyota | Custom |
| Tata | Custom |
| BMW | Custom |

Monday, August 24, 2009

ext4 filesystem

ext4 is the next "version" of filesystem after ext3. It was released with linux kernel version 2.6.28 which comes with ubuntu 9.04 (Jaunty).

Benefits of ext4 over ext3

  • Bigger filesystem and file sizes : ext3 supports 16TB of filesystem size and max file size of 2TB. And ext4 supports 1EB (10^18 bytes = 1024*1024 TB) of filesystem size and max file size of 16TB. Though you would never come across such huge storage system in desktop computers.

  • subdirectory limitations : ext3 allows "only" 32000 subdirectories/files in a directory. ext4 allows unlimited number of subdirectories.

  • Multiblock allocation : ext3 allocates 1 block (4KB) at a time. So if you write a 100 MB file you will be calling the ext3 block allocator 25600 times. Also this does not allow the block allocator to optimize the allocation policy because it does not know the total amount of data being allocated. Ext4 on the other hand used a multiblock allocator to allocate multiple blocks in a single go. This improves performance to a great extent.

  • Extents : Ext3 uses indirect block mapping scheme to keep track of each block for the blocks corresponding to the data of a file. Ext4 uses extents (contiguous physical blocks) that is to say that the data lies in the next n blocks. It improves performance and reduces file fragmentation

  • Delayed allocation : Ext3 allocates the blocks as soon as possible. Ext4 delays the allocation of physical blocks by as much as possible. Until then the blocks are kept in cache. This gives the block allocator the opportunity to optimize the allocation of blocks.

  • fast fsck : The total fsck time improves from 2 to 20 times,

  • journal checksumming : Ext4 checksums the journal data to know if the journal blocks are corrupted. Journal checksumming allows one to convert the two-phase commit system of Ext3's journaling to a single phase, speeding the filesystem operation up to 20% in some cases - so reliability and performance are improved at the same time.

  • "No Journaling" mode : in Ext4 depending on your requirements, the journaling feature can be disabled.

  • Online defragmentation : Ext4 supports online defragmentation. There is a e4defrag tool which can defrag individual files or even complete filesystems.

  • Inode related features : Larger inodes, nanosecond timestamps, fast extended attributes, inodes reservation

  • Persistent preallocation : Applications tell the filesystem to preallocate the space, and the filesystem preallocates the necessary blocks and data structures, but there is no data on it until the application really needs to write the data in the future.

  • Barriers on by default : This option improves the integrity of the filesystem at the cost of some performance. A barrier forbids the writing of any blocks after the barrier until all blocks written before the barrier are committed to the media. By using barriers, filesystems can make sure that their on-disk structures remain consistent at all times.

Now lets see the steps needed to convert your desktop filesystem from ext3 to ext4.

  • check the version of linux kernel. It should be > 2.6.28-11.
    jayant@gamegeek:~$ uname -a
    Linux gamegeek 2.6.28-15-generic #49-Ubuntu SMP Tue Aug 18 19:25:34 UTC 2009 x86_64 GNU/Linux

  • Just in case - take a backup of your important data

  • boot from live cd and run the following commands for converting the partition /dev/sda6 to ext4 from ext3.
    $ sudo bash
    $ tune2fs -O extents,uninit_bg,dir_index /dev/sda6
    $ e2fsck -pf /dev/sda6

  • Mount the partition and change its type entry in /etc/fstab
    $ mount -t ext4 /dev/sda6 /mnt
    $ vim /mnt/etc/fstab

    # /dev/sda6
    UUID=XXXXXX / ext3 relatime,errors=remount-ro 0 1
    # /dev/sda6
    UUID=XXXXXX / ext4 relatime,errors=remount-ro 0 1

    Save the changes

  • Reinstall grub - this is optional. If you do not do this and you get a fatal error 13 while booting the machine, just boot using the live cd and run these commands.
    $ sudo bash
    $ mkdir /mnt/boot
    $ mount /dev/sda6 /mnt/boot
    $ grub-install /dev/sda --root-directory=/mnt --recheck

After the reboot you would be using the ext4 filesystem.

Important note : Your old files have not been converted to the ext4 technology. Only new files written to disk will use the ext4 technology. But since ext3 & ext4 are compatible, you wont face any problems accessing older ext3 files on disk. With usage the ext3 files will automatically disappear and get converted to ext4.