Sunday, December 12, 2010

how to go about apache-solr

I had explored lucene for providing full text search engines. I had even gone into the depth of modifying the core classes of lucene to change and also add new functionality into lucene. Being good at lucene, i never looked at solr. I had the notion that Solr was a simple web interface on top of lucene so it will not be very customizable. But recently my belief was broken. I have been going though solr 1.4.1. Here is a list of features that solr provides by default : http://lucene.apache.org/solr/features.html.

Solr 1.4.1 combines
  • Lucene 2.9.3 - an older version of lucene. The recent version of lucene 3.0.2 is based on java 5 and has some really good performance improvements over the 2.x version of lucene. I wish we had lucene 3.x on solr. Lucene powers the core full-text search capability of solr.
  • Tika 0.4 - Again the latest version here is 0.8. Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.
  • Carrot2 3.1.0 - Here also the latest version is 3.4.2. Carrot is an open source search result clustering engine. It readily integrates with Lucene as well.

To install solr simply download it from http://lucene.apache.org/solr/index.html and untar it. You can launch the solr by simply navigating to <solr_directory>/example directory and running java -jar start.jar. This will start the sample solr server without any data. You can go to http://localhost:8983/solr/admin to see the admin page for solr. To post some sample files to solr simply do <solr_dir>/example/exampledocs$ java -jar post.jar *.xml. This will load example documents in solr server and create an index on them.

Now the real fun begins when you want to index your own site on solr. First of all, you need to define the schema and identify how data will be ported into the index.

For starters
-- copy example directory : cp example myproject
-- go to solr/conf directory : cd myproject/solr/conf
-- ls will show you the directory contents : $ ls
admin-extra.html dataimport.properties mapping-ISOLatin1Accent.txt schema.xml solrconfig.xml stopwords.txt xslt
data-config.xml elevate.xml protwords.txt scripts.conf spellings.txt synonyms.txt

These are the only files in solr which need to be tweaked for getting solr working...
We will see all the files one by one

schema.xml
-- you need to define the fieldType such as String, boolean, binary, int, float, double etc...
-- Each field type has a class and certain properties associated with it.
-- Also you can specify how a fieldtype is analyzed/tokenized/stored in the schema
-- Any filters related to a fieldType can also be specified here.
-- Lets take an example of a text field
# the text field is of type solr.TextField
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
    # the analyzer is to be applied during indexing.
    <analyzer type="index">
        # pass the text through the following tokenizer
        <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/>
        # and then apply these filters
        # use the stop words specified in stopwords.txt for the stopwordfilter
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        # avoid stemming words which are in protwords.txt file
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
    </analyzer>
    # for query use the following analyzers and filters
    # generally the analyers/filters for indexing and querying are same
    <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
    </analyzer>
</fieldType>
-- Once we are done defining the field types, we can go ahead and define the fields that will be indexed.
-- Each field can have additional attributes like type=fieldType, stored=true/false, indexed=true/false, omitNorms=true/false
-- if you want to ignore indexing of some fields, you can create an ignored field type and specify the type of the field as ignored
<fieldtype name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" />
<dynamicField name="*" type="ignored" multiValued="true" />
-- in addition to these, you also have to specify the unique-key to enforce uniqueness among multiple documents
<uniqueKey>table_id</uniqueKey>
-- Default Search field and default search operator are among other things that can be specified.

solrconfig.xml
- used to define the configuration for solr
- parameters like datadir (where index is to be stored), and indexing parameters can be specified here.
- You can also configure caches like queryResultCache, documentCache or fieldValueCache and their caching parameters.
- It also handles warming of cache
- There are request handlers for performing various tasks.
- Replication and partitioning are sections in request handling.
- Various search components are also available to handle advanced features like faceted search, moreLikeThis, highlighting
- All you have to do is put the appropriate settings in the xml file and solr will handle the rest.
- Spellchecking is available which can be used to generate a list of alternate spelling suggestions.
- Clustering is also a search component, it integrates search with Carrot2 for clustering. You can select which algorithm you want for clustering - out of those provided by carrot2.
- porting of data can be made using multiple options like xml, csv. There are request handlers available for all these formats
- A more interesting way of porting data is by using the dataImportHandler - it ports data directly from mysql to lucene. Will go in detail on this.
- There is an inbuilt dedup results handler as well. So all you have to do is set it up by telling it which fields to monitor and it will automatically deduplicates the results.

DataImportHandler

For people who use sphinx for search, a major benefit is that they do not have to write any code for porting data. You can provide a query in sphinx and it automatically pulls data out of mysql and pushes it to sphinx engine. DataImportHandler is a similar tool available for linux. You can register a dataimporthandler as a requestHandler
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
    <lst name="defaults">
    # specify the file which has the config for database connection and query to be fired for getting data
    # you can also specify parameters to handle incremental porting
    <str name="config">data-config.xml</str>
    </lst>
</requestHandler>

data-config.xml
<dataConfig>
  <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/databaseName?zeroDateTimeBehavior=convertToNull"
                 user="root"  password="jayant" />
    <document>
        <entity name="outer" pk="table_id"
        query="SELECT table_id, data1, field1, tags, rowname, date FROM mytable"
        deltaImportQuery="SELECT table_id, data1, field1, tags, rowname, date FROM mytable where table_id='${dataimporter.delta.table_id}'"
        deltaQuery="SELECT table_id from mytable where last_modified_time > '${dataimporter.last_index_time}'">
        
        # this is the map which says which column will go into which fieldname in the index
        <field column="table_id" name="table_id" />
        <field column="data1" name="data1" />
        <field column="field1" name="field1" />
        <field column="tags" name="tags" />
        <field column="rowname" name="rowname" />
        <field column="date" name="date" />

            # getting content from another table for this tableid
            <entity name="inner"
            query="SELECT content FROM childtable where table_id = '${outer.table_id}'" >
            <field column="content" name="content" />
            </entity>
        </entity>
    </document>
</dataConfig>
Importing data

Once you start the solr server using java -jar start.jar, you can see the server working on
http://localhost:8983/solr/

It will show you a "welcome message"

To import data using dataimporthandler use
http://localhost:8983/solr/dataimport?command=full-import (for full import)
http://localhost:8983/solr/dataimport?command=delta-import (for delta import)

To check the status of dataimporthandler use
http://localhost:8983/solr/dataimport?command=status

Searching

The ultimate aim of solr is searching. So lets see how can we search in solr and how to get results from solr.
http://localhost:8983/solr/select/?q=solr&start=0&rows=10&fl=rowname,table_id,score&sort=date desc&hl=true&wt=json

Now this says that
- search for the string "solr"
- start from 0 and get 10 results
- return only fields : rowname, table_id and score
- sort by date descending
- highlight the results
- return output as json

All you need to do is process the output and the results.

There is a major apprenhension in using solr due to the reason that it provides an http interface for communication with the engine. But i dont think that is a flaw. Ofcourse you can go ahead and create your own layer on top of lucene for search, but then solr uses some standards for search and it would be difficult to replicate all these standards. Another option is to create a wrapper around the http interface with limited functionality that you need. Http is an easier way of communication as compred to defining your own server and your own protocols.

Definitely solr provides an easy to use - ready made solution for search on lucene - which is also scalable (remember replication, caching and partitioning). And in case the solr guys missed something, you can pick up their classes and modify/create your own to cater to your needs.

Carnation - is it really worth getting your car serviced there ??

Before I go ahead with a walk through of my experience, I would like you to answer some questions
1. Was the cost of your service at carnation higher than the cost of your normal service at your earlier service center?
2. Were some parts found to be faulty? Are you sure they were faulty?
3. Did you ever get a call asking for a feedback?
4. If you were offered tea/coffee and you said yes - did you actually get your tea/coffee?

Now for my experience.

It was one fine day that I decided that instead of going to my gm service center, I decided to go to carnation. I could say that it was a bad day because I did not know that I would be duped. So on 20th of November, i took my aveo for service at the Noida sector 63 service station of carnation.

Upon entry i saw that there are not so many cars there. Generally on a saturday or sunday, the service centers are heavily loaded. I thought that it might be better, because I would get better attention. I did not realize that this was due to the fact that most of the prople there are either too rich to care or first-timers like me. My car was taken to the workshop and examined.

They opened the bonet and showed me that the car has hit something due to which the fan is rubbing against some "lamda". I would have to get a body work done. I was like hell. Cant you move the fan or twist the lamda. They told me that that neither can be done without hurthing the car. I was like... ok... You are the one who know more about the car. Upon delivery, i saw that in addition to the body work (pulling the front grill out), they have also moved the fan to the left --- what? but they told me that this was not possible. So a simple "moving the fan" cost me 3000/-. I could have definitely avoided the body work.

Another mechanic came hopping around with his "light in mobile" and told me that the drive belt is faulty. It was totally dark and i was unable to see. I thought that he might be having "night eyes" like to owl to be able to see a fault in such a dark place. But then they showed it to me from below and it was really cracked up. I think that was what which gave me the confidence to trust(wrongly) in them.

Another mechanic hopped and told me that the suspention is leaking. The supervisor who was advicing me first told me that the suspention was fine. But then he looked at the mechanic and then back to the suspention and then according to him the suspention was faulty. It was leaking and should be changed. I should have said no. Now the new suspention which they have put in creaks like anything. I am not an expert and i do not know whether they have put in genuine parts or duplicate parts. They charged me around 2000/- for the suspension. I could have saved that.

And eventually the most absurd thing was when the supervisor told me that the disk brake pads were totally rubbed out. I was like - what? How can he see it without touching or opening them? Maybe they have a way which i am not aware of. I was like ok - replace them. They charged me 2000/- something for the brake pads and 500/- something for the labour. Now, by luck i saw them replace the brake pads. First of all they were not even 50% rubbed. And secondly it was a 5 minute job - just removing some screws, changing the brake pads and putting them back. I paid 500/- for this 5 minute labour job. And i am not even sure now whether they are genuine or not.

One another thing that they replaced was a bearing. The supervisor called me and told me that it is faulty and will not last for even 300 kms. I was like - ok change it. When i went to the shop and took the spare - he told me that it is faulty and i could feel the vibrations if i rotate the bearing. I felt nothing. Except for monetary gain, i am not sure why he changed it.

So, i would have gone to my normal gm service center, this service would have costed me 3000 (body work) + 2500 (brake pads) + 2000 (suspention) + 1500 (bearing) = 9000/- less. The service which was 14000/- would then be only 5000/-.

Now the question is why did I not feel that something was wrong - all the while they were duping me. The reason being that i am not a mechanic. I do not know what part will run and what will not run. Hell, i am not even aware of the location of all the parts. So, it is easy to dupe me. The second reason was that I wanted to get my car serviced by carnation - at least once experience what they have to say? In fact I should have drove out when they told me that the drive belt was faulty without even looking at it.

So, when did i realize that i have been duped? 1 week after the service, the car was as rough as it was before the service. That was when my doubt was confirmed that the service was not worth it. I had realized that i have been duped of a few parts. But 1 week after when the car was as rough as before, I took a look at all the parts. And then I realized that I have been duped.

What could i do? I sent a mail to j.khattar@carnation.in regarding the fact that i was duped. There was one call where the lady called at around 11 am - and started explaining that the parts that they had changed were supposed to be changed. I told her that I was in a meeting and asked her to call back in 1 hour. I did not get back any calls.

When I get my car serviced at GM, I get a call after a week regarding how the car is performing - or if i am facing any issues. But I never got any such calls from carnation. I believe the business model that Mr. Khatter is following is of duping first-timers. They do not expect the people to return. If by mistake anyone does, he is worth duping again.

Here is the mail that I sent to j.khattar@carnation.in. I did not get any response though.

Hi,

This is the first time i visited Carnation for servicing my car - Aveo 1.6 LT. For the past 4 years, the car was being serviced by the GM authorized service center in sector 8 noida. The average cost of my service for the past 4 years was around 3000/- per service. At carnation the cost of 1 service was 13800(something) It has been 10 days after the service and the car feels very rough. My previous experience is that the car becomes rough after almost 2 months - when i used to get my car serviced at GM.

5 parts were replaced and a body repairing work was done.

1. brake pads
2. shocker
3. spark plugs
4. drive belt
5. some bearing

Out of the 5 parts,
- the brake pads were fine - i have the old ones and they could still run for another 10,000 kms.
- the new shocker creaks a lot - i am not sure if they put orignial or duplicate.
- the bearing - looked fine to me - i am not sure why it was changed.

In addition to this the body repair for 3000/- was done, which was not required.
I was told that body repair was the only solution - i should have referred a local mechanic as well.

The personal experience was good - but i get almost similar personal touch at GM as well.

Would i get my car serviced at carnation again ?
No

Reason :
I am not a car mechanic. I cannot tell whether a part is defective or not, genuine or duplicate. The service is not good.
Parts are unnecessarily replaced - even when the current parts are in working condition and doing fine. The service is very expensive - and not worth it.

Service location:
Sector 63, Noida.

2 of my friends who have got their car serviced at carnation have similar feedback.

Regards
Jayant Kumar


This review is intended to prevent people from getting duped by carnation. People at carnation ask you for tea/coffee. But if you say yes, you would never get your tea/coffee. Have you noticed it?

Samsung and me

I and samsung do not have a very long relation, but whatever relation we have is very flawed. I have been trying to "believe" in samsung products, in their "limitless" support. But samsung makes it difficult for me to "believe" in them.

Lets take case by case what has happened in the past 4 years. I got a samsung refrigerator and a washing machine around 3 years back. The life of the refrigerator was not even 2 years. Within 1.5 years the refrigerator was out of gas. For those who are new to the refrigerator business, I would like to make it clear that refrigerator does use some gas for cooling purposes. Now i have used 2 refrigerators in my lifetime of 25 years. The first one was a really old one "leonard". It ran for around 15 years - after which it lost its color and the gas. We sold it at a profit after using it for 15 years. The next one was "kelvinator". It is still running. It needed a gas change after around 6 years - that too because I moved to delhi and there was no one at home for around 6 months. But my dear Samsung refrigerator lost its gas in 1.5 years only. I had the notion that it would be one in a million cases. But after 6 months of gas change the compressor went for a toss. Thanks to Samsung, for even it does not provide reliable products it does provide a guarantee - due to which i could get the compressor change. But both times i had to pay for the gas change. If it goes on like this, by the time the refrigerator is 10 years old (if it does run for 10 years), the cost of maintenance would exceed the cost of the refrigerator. Bang - that's where samsung is making money...

My second experience was with the washing machine i had bought. I still have an LG washing machine at home - and it still runs like a charm. In fact i enjoy washing clothes. It is still as noiseless as it was when we bought it around 10 years ago. But Samsung washing machine - it sounds like an electric drill. We have to wash clothes only in the afternoon. If we try washing clothes in the morning, it disturbs the sleep of people in this home as well as those of the neighbours. The dryer keeps on running at full speed even if you shut off the machine and open the dryer lid. It seems that the brake pads are worn off. What happened to "high" safety standards (if any) of Samsung.

I had got a Samsung mobile around 3 years back. It started facing problems after 1 year and i had to get it resolved. But recently I got another Samsung mobile - and it was within a month that i realized that this purchase was a big mistake. Within a month, the mobile stopped charging. Now this is a very interesting story.

I went to the samsung service center in sector 18, noida. The service center has closed and there was no clue as to where it has shifted. Even the customer care were not aware that the service center was closed. I went to a samsung dealer in noida sector 18, to enquire about another service center - and he did not know any service centers. It seems that once the product has been sold, the dealers forget the customer. What happened to "customer is the king" motto?? So I moved to another service center - which is in sector 29, Noida. The scene there was humorous and frustrating. There was a person who had the exact similar problem with his samsung mobile. He has been trying to rectify his mobile for the past 1 month - but the service center were unable to rectify it. They never have the required parts.

I had to sit in queue and wait for my call. Once i was at the reception, i told him my problem. The "technician" checked my mobile and verified the issue. He then opened up the mobile completely and told me that i will have to deposit the mobile for 7 days to get it fixed. It seems that the charging jack was faulty. I was like hey - what about my data? and why 7 days? Well according to them my data would be safe and 7 days are required to get the new part and fix it. And there is no guarantee that the new phone will not be scratched. The service people we so adamant that they denied to take the phone if i wrote "no scratch" on the form.

That was when i called up the customer care again and he connected me to the manager - Mr Ranjit. I had to repeat my story 2-3 times - I spent around an hour on the phone - after which the manager told me that i do not need to deposit the phone - all I need to do is give a photocopy of my bill. They will arrange for the part and once it is here he will personally call me to get it fixed. I was about to say "liar liar!!!", but I let it go. After waiting for 3 days, i tried calling the service center - but no one picked up the phone. I tried calling again after 6 days - it is easier to get your calls picked up in government offices. Eventually I gave up. No point in running after them. Maybe 6 months down the line, when my phone is all used up and scratched, i will dump it in the service center and purchase a "non" samsung phone.

It is just a random thought, whether there is a problem between me and Samsung - that Samsung products do not suit me or it is a general case. For one thing, i never had a samsung TV. Onida and LG both have been running for more than 10 years now without a single issue. It makes more sense to provide better products rather than "limitless" service (troubles) for endless problems on inferior products.

Friday, October 08, 2010

Foursquare outage post mortem

As many of you are aware, Foursquare had a significant outage this week. The outage was caused by capacity problems on one of the machines hosting the MongoDB database used for check-ins. This is an account of what happened, why it happened, how it can be prevented, and how 10gen is working to improve MongoDB in light of this outage.

It’s important to note that throughout this week, 10gen and Foursquare engineers have been working together very closely to resolve the issue.

Some history
Foursquare has been hosting check-ins on a MongoDB database for some time now. The database was originally running on a single EC2 instance with 66GB of RAM. About 2 months ago, in response to
increased capacity requirements, Foursquare migrated that single instance to a two-shard cluster. Now, each shard was running on its own 66GB instance, and both shards were also replicating to a slave for redundancy. This was an important migration because it allowed Foursquare to keep all of their check-in data in RAM, which is essential for maintaining acceptable performance.

The data had been split into 200 evenly distributed chunks based on user id. The first half went to one server, and the other half to the other. Each shard had about 33GB of data in RAM at this point, and the whole system ran smoothly for two months.

What we missed in the interim
Over these two months, check-ins were being written continually to each shard. Unfortunately, these check-ins did not grow evenly across chunks. It’s easy to imagine how this might happen: assuming certain subsets of users are more active than others, it’s conceivable that their updates might all go to the same shard. That’s what occurred in this case, resulting in one shard growing to 66GB and the other only to 50GB. [1]

What went wrong
On Monday morning, the data on one shard (we’ll call it shard0) finally grew to about 67GB, surpassing the 66GB of RAM on the hosting machine. Whenever data size grows beyond physical RAM, it becomes necessary to read and write to disk, which is orders of magnitude slower than reading and writing RAM. Thus, certain queries started to become very slow, and this caused a backlog that brought the site down.

We first attempted to fix the problem by adding a third shard. We brought the third shard up and started migrating chunks. Queries were now being distributed to all three shards, but shard0 continued to hit disk very heavily. When this failed to correct itself, we ultimately
discovered that the problem was due to data fragmentation on shard0. In essence, although we had moved 5% of the data from shard0 to the new third shard, the data files, in their fragmented state, still needed the same amount of RAM. This can be explained by the fact that Foursquare check-in documents are small (around 300 bytes each), so many of them can fit on a 4KB page. Removing 5% of these just made each page a little more sparse, rather than removing pages altogether.[2]

After the first day's outage it had become clear that chunk migration, sans compaction, was not going to solve the immediate problem. By the time the second day's outage occurred, we had already move 5% of the data off of shard0, so we decided to start an offline process to
compact the database using MongoDB’s repairDatabase() feature. This process took about 4 hours (partly due to the data size, and partly because of the slowness of EBS volumes at the time). At the end of the 4 hours, the RAM requirements for shard0 had in fact been reduced
by 5%, allowing us to bring the system back online.

Afterwards
Since repairing shard0 and adding a third shard, we’ve set up even more shards, and now the check-in data is evenly distributed and there is a good deal of extra capacity. Still, we had to address the fragmentation problem. We ran a repairDatabase() on the slaves, and
promoted the slaves to masters, further reducing the RAM needed on each shard to about 20GB.

How is this issue triggered?
Several conditions need to be met to trigger the issue that brought down Foursquare:
  • Systems are at or over capacity. How capacity is defined varies; in the case of Foursquare, all data needed to fit into RAM for acceptable performance. Other deployments may not have such strict RAM requirements.
  • Document size is less than 4k. Such documents, when moved, may be too small to free up pages and, thus, memory.
  • Shard key order and insertion order are different. This prevents data from being moved in contiguous chunks.

Most sharded deployments will not meet these criteria. Anyone whose documents are larger than 4KB will not suffer significant fragmentation because the pages that aren’t being used won’t be
cached.

Prevention
The main thing to remember here is that once you’re at max capacity, it’s difficult to add more capacity without some downtime when objects are small. However, if caught in advance, adding more shards on a live system can be done with no downtime.

For example, if we had notifications in place to alert us 12 hours earlier that we needed more capacity, we could have added a third shard, migrated data, and then compacted the slaves.

Another salient point: when you’re operating at or near capacity, realize that if things get slow at your hosting provider, you may find yourself all of a sudden effectively over capacity.

Final Thoughts
The 10gen tech team is working hard to correct the issues exposed by this outage. We will continue to work as hard as possible to ensure that everyone using MongoDB has the best possible experience. We are thankful for the support that we have received from Foursquare and our
community during this unfortunate episode. As always, please let us know if you have any questions or concerns.

[1] Chunks get split when they are 200MB into 2 100MB halves. This means that even if the number of chunks on each shard was the same, data size is not always so. This is something we are going to be addressing in MongoDB. We'll be making splitting balancing look for this imbalance so it can act upon it.

[2] The 10gen team is working on doing online incremental compaction of both data files and indexes. We know this has been a concern in non-sharded systems as well. More details about this will be coming in the next few weeks.


** copied from mongodb google group **

Friday, October 01, 2010

Database speed tests (mysql and postgresql) - part 3 - code

Here is the code structure

dbfuncs.php : is the file which contains classes and functions for firing queries on mysql and pgsql
mysqlinsert.php : creates and fires inserts on mysql
mysqlselect.php : creates and fires selects on mysql
pgsqlinsert.php : creates and fires inserts on pgsql
pgsqlselect.php : creates and fires selects on pgsql
benchmark.php : script used to control concurrency and number of queries per script

Please refer to http://jayant7k.blogspot.com/2010/09/database-speed-tests-mysql-and_29.html and http://jayant7k.blogspot.com/2010/09/database-speed-tests-mysql-and.html for benchmarks of selects and inserts respectively.

And the code....

dbfuncs.php
abstract class dbfuncs
{
  abstract function insertqry($qry);
  abstract function selectqry($qry);
  abstract function connectdb();

  public function log($str)
  {
    $file = "error.log";
    $fp = fopen($file, "a");
    fwrite($fp, "$str\n");
    fclose($fp);
  }
}

class mysqldb extends dbfuncs
{
  private $user = "root";
  private $pass = "jayant";
  private $host = "localhost";
  //private $port = 3307;
  private $socket = "/tmp/mysql.sock";
  private $database = "benchmark";

  public $db;

  function __construct()
  {
    $this->connectdb();
  }

  public function connectdb()
  {
    $this->db = mysql_connect($this->host.':'.$this->socket, $this->user, $this->pass) or die(mysql_error())
    mysql_select_db($this->database, $this->db);
  }

  public function insertqry($qry)
  {
    if(!mysql_ping($this->db))
      $this->connectdb();

    mysql_query($qry, $this->db) or $this->log(mysql_error());
  }

  public function selectqry($qry)
  {
    if(!mysql_ping($this->db))
      $this->connectdb();

    $rs = mysql_query($qry, $this->db) or $this->log(mysql_error());
    return $rs;
  }
}

class pgsqldb extends dbfuncs
{
  private $dns = "host=localhost port=5432 user=jayant password=12qwaszx dbname=benchmark";

  public $db;

  function __construct()
  {
    $this->connectdb();
  }

  public function connectdb()
  {
    $this->db = pg_connect($this->dns) or die(pg_last_error());
  }

  public function insertqry($qry)
  {
    if(!pg_ping($this->db))
      connectdb();

    pg_query($this->db, $qry) or $this->log(pg_last_error($this->db));
  }

  public function selectqry($qry)
  {
    if(!pg_ping($this->db))
      $this->connectdb();

    $rs = pg_query($this->db, $qry) or $this->log(pg_last_error($this->db));
    return $rs;
  }
}

function logtime($str)
{
    $file = "benchmark.log";
    $fp = fopen($file, "a");
    fputs($fp, $str);
    fclose($fp);
}

mysqlinsert.php
include "dbfuncs.php";

$scriptno = $argv[1]+1;
$count = $argv[2];

$mysql = new mysqldb();
$start = microtime(true);
for($x=0; $x<$count; $x++)
{
  $xx = $x*$scriptno;
  $qry = "insert into data (val, txt) values ('$xx','$x in $scriptno')";
  $mysql->insertqry($qry);
}
$end = microtime(true);
$log = "\nMysql innodb Time to insert $count in run $scriptno = ".($end-$start);
logtime($log);

pgsqlinsert.php
include "dbfuncs.php";

$scriptno = $argv[1]+1;
$count = $argv[2];

$mysql = new pgsqldb();
$start = microtime(true);
for($x=0; $x<$count; $x++)
{
  $xx = $x*$scriptno;
  $qry = "insert into data (val, txt) values ('$xx','$x in $scriptno')";
  $mysql->insertqry($qry);
}
$end = microtime(true);
$log = "\nAvg Pgsql Time to insert $count in run $scriptno = ".($end-$start);
logtime($log);

mysqlselect.php
include "dbfuncs.php";

$scriptno = $argv[1]+1;
$count = $argv[2];

$mysql = new mysqldb();
$start = microtime(true);
for($x=0; $x<$count; $x++)
{
  $xx = $x*$scriptno;
  $qry = "select * from `data` where val ='$xx'";
  $mysql->selectqry($qry);
}
$end = microtime(true);
$log = "\nMysql innodb Time to select $count in run $scriptno = ".($end-$start);
logtime($log);

pgsqlselect.php
include "dbfuncs.php";

$scriptno = $argv[1]+1;
$count = $argv[2];

$mysql = new pgsqldb();
$start = microtime(true);
for($x=0; $x<$count; $x++)
{
  $xx = $x*$scriptno;
  $qry = "select * from data where val ='$xx'";
  $mysql->selectqry($qry);
}
$end = microtime(true);
$log = "\nPgsql Time to select $count in run $scriptno = ".($end-$start);
logtime($log);

benchmark.php
$count = 100000;
$concurrency = 40;

for($i=0; $i<$concurrency; $i++)
{
   exec("php -q mysqlselect.php $i $count > /dev/null &");
//   exec("php -q pgsqlselect.php $i $count > /dev/null &");
//   exec("php -q mysqlinsert.php $i $count > /dev/null &");
//   exec("php -q pgsqlinsert.php $i $count > /dev/null &");
}

All test runs were individual - meaning that only one script was run at a time with different count and concurrency. For inserts $count was set to 10,000 and for selects $count was set to 100,000 while $concurrency kept on varying.

I am using ubuntu 10.04 with 32 bit kernel 2.6.32-24 and ext4 file system. And my system has around 3GB of RAM and Intel Core 2 DUO T5870 @ 2.0 GHz.

Wednesday, September 29, 2010

Database speed tests (mysql and postgresql) - part 2

Here is the comparison between mysql and postgresql for selects (only). I had used the same table that i had created earlier http://jayant7k.blogspot.com/2010/09/database-speed-tests-mysql-and.html while comparing insertion speed. I have created approximately 1,000,000 records in the table and ran selects on them. I also modified the configuration of both mysql and postgresql to enable faster selects.

Mysql

In mysql I specially disabled query_cache - the reason being that I would use innodb for tables with large number of inserts - due to its support for row level locking. And with every insert t
he query cache is marked as dirty. So enabling query cache would lead to caching of queries which will not be the scenario in a live setup.

Mysql Configuration :
innodb_buffer_pool_size = 256MB
key_buffer_size = 256MB
read_buffer_size = 512KB
sort_buffer_size = 512KB
query_cache_size = 0
thread_cache_size = 32
table_open_cache = 64

table information :
No of records : 9755366
data (idb) size : 692 MB

Mysql : (time for 100000 selects)

Avg time for 500000 selects with concurrency = 5 : 58.67
Avg time for 1000000 selects with concurrency = 10 : 122.8
Avg time for 2000000 selects with concurrency = 20 : 225.67
Avg time for 3000000 selects with concurrency = 30 : 351.66
Avg time for 4000000 selects with concurrency = 40 : 452.3


PostgreSQL :

Mysql has better table compression as compared to postgres. Same data in innodb is of around 700 MB while that in Postgres is of around 900 MB.

Postgres configuration :
shared_buffers = 128MB
work_mem = 1MB
random_page_cost = 4.0
effective_cache_size = 256MB


table information :
No of records : 9755366
data size : 912 MB

Pgsql : (time for 100000 selects)

Avg time for 500000 selects with concurrency = 5 : 86.8
Avg time for 1000000 selects with concurrency = 10 : 144.74
Avg time for 2000000 selects with concurrency = 20 : 274.37
Avg time for 3000000 selects with concurrency = 30 : 402.92
Avg time for 4000000 selects with concurrency = 40 : 528.17



Mysql seems to perform better with selects. The graph also shows that with increase in concurrency, selects in innodb take lesser time than that in postgresql.

So, why would you switch from mysql to postgresql - only if you have a very high ratio of inserts as compared to selects. The benefit in inserts outweigh the loss in selects to some extent.

Monday, September 27, 2010

Database speed tests (mysql and postgresql) - part 1

There has been major changes in mysql and postgres over a couple of years. Mysql has been focusing on improving and optimizing innodb. Postgres on the other hand has been focusing on database replication and hot standby.

Recently postgres came out with version 9.0 which has built-in replication and hot standby - the two most requested feature in postgresql. Earlier people used to shy away from postgres because there was no proper "easily deployable" solution available for replication. Now with this release, postgres had taken a major step forward. Here http://www.postgresql.org/docs/9.0/static/release-9-0 is a list of features that has been introduced in postgres 9.0

Mysql has released the rc version of Mysql 5.5 which has a bunch of improvements over the previous version of mysql. Support for multi-core cpus, Changes in Innodb for effective use of available I/O capacity, semisynchronous replication - are some of the features that mysql 5.5 promices. Here is a list of all the new features in MySQL 5.5 http://dev.mysql.com/doc/refman/5.5/en/mysql-nutshell.html

It has been a long time, since posted my last benchmark http://jayant7k.blogspot.com/2008/06/mysql-versus-postgresql.html. And i believe it is time i do some rough benchmarks and post it out. The scope is to check out innodb tables in mysql 5.5.6 versus the tables in postgresql 9.0. I am focusing only on inserts and selects. And i will be benchmarking pure inserts and selects only. Thie blog focuses only on inserts. I will be focusing on selects in my next blog. I am running these tests on my laptop which has a Intel Core 2 Duo T5870 @ 2.00 GHz and 3 GB of RAM

I have created a simple php script to perform the benchmark. Which spawns out multiple php scripts that work on background. Let me know if you need the scripts and i will share it with you.

Mysql:

Innodb Settings in my.cnf :

innodb_buffer_pool_size = 256M
innodb_additional_mem_pool_size = 8M
innodb_log_file_size = 64M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
innodb_file_per_table

Table structure :
CREATE TABLE `data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`val` int(11) NOT NULL,
`txt` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_val` (`val`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1


Avg Time for 20000 inserts with concurrency of 2 : 1367 seconds
Load on machine : 3.4
size on disk : 9 MB

Avg Time for 50000 inserts with concurrency of 5 : 1537.2 seconds
Load on machine : 4.6
Size on disk : 13 MB

Avg Time for 100000 inserts with concurrency of 10 : 1255
Load on machine : 3.5
Size on disk : 17 MB

Avg Time for 200000 inserts with concurrency of 20 : 1403
Load on machine : 4.2
Size on disk : 26 MB

Time for 400000 inserts with concurrency of 40 : 1730
Load in machine : 6.6
Size on disk : 52 MB


Postgresql :

Settings in postgresql.conf:
shared_buffers = 64MB
work_mem = 1MB
synchronous_commit = on


Table structure :
Column | Type | Modifiers | Storage | Description
--------+-----------------------+---------------------------------------------------+----------+-------------
id | integer | not null default nextval('data_id_seq'::regclass) | plain |
val | integer | not null | plain |
txt | character varying(20) | | extended |
Indexes:
"data_pkey" PRIMARY KEY, btree (id)
"idx_val" btree (val)
Has OIDs: no

Avg Time for 20000 inserts with concurrency of 2 : 221.3 seconds
Load on machine : 2.0
size on disk : 2 MB

Avg Time for 50000 inserts with concurrency of 5 : 232 seconds
Load on machine : 2.0
Size on disk : 4.8 MB

Avg Time for 100000 inserts with concurrency of 10 : 248.75 seconds
Load on machine : 4.0
Size on disk : 9.8 MB

Avg Time for 200000 inserts with concurrency of 20 : 276.34
Load on machine : 3.6
Size on disk : 19 MB

Time for 400000 inserts with concurrency of 40 : 350.11
Load in machine : 5.8
size on disk : 38 MB



The graph shows that mysql is heavy as compared to pgsql. The base timings are almost 5 times more in mysql as compared to pgsql. Also as the concurrency goes up the time required for inserts in mysql spikes up more steeply as compared to that required for postgres.

I did a sample run on mysql by turning innodb_flush_logs_at_trx_commit=2 and the benefit I got was a lot

Avg Time for 20000 inserts with concurrency of 2 (innodb_flush_logs_at_trx_commit=2) : 5.2 seconds
Avg Time for 100000 inserts with concurrency of 10 (innodb_flush_logs_at_trx_commit=2) : 18.69 seconds

Similarly i disabled synchronous_commit on postgres and did a sample run

Avg Time for 20000 inserts with concurrency of 2 (synchronous_commit = off) : 2.95 seconds
Avg Time for 100000 inserts with concurrency of 10 (synchronous_commit = off) : 15.06 seconds

PS : The average time is the time for 10000 inserts (inserts per instance)

Lets see what do the selects tell - in the next blog.

Tuesday, September 14, 2010

Strategies to address scalability

There are some very fundamental scalability strategies which i will try to list here. These may have detailed and varied implementations.

  • Horizontal scalability - distribute the load : A very basic strategy for scalability is the ability to distribute the load or the requests across multiple processing units. Your code should be modular enough to allow horizontal scalability. Also the less information is shared among multiple requests, the easier it would be for you to scale horizontally. You should aim for a shared-nothing architecture
  • Sharding or partitioning is another way of scaling horizontally. The main idea here is to spread the load across many components by routing an individual request to a component that owns the data specific to that request. So you divide the complete data across multiple components and spread the requests among the components. There are two ways of creating shards:

    • Vertical partitioning : Spread the load across multiple processing units by distributing the functionalities across the components. So separate functions are being handled by different processing units. Column based databases are a very good example of vertical partitioning.
    • Horizontal Partitioning : Spread a single type of data element across multiple processing units. It is also referred to as hashing. Hashing based on userid to spread users across multiple datapoints can be an example of horizontal partitioning.
  • Parallelization is another way of achieving scalability. Parallelization can be achieved by working on the same task in parallel on multiple processing units. For example a web page having
    multiple components can process and fetch the components in parallel from multiple processing units.
  • Queueing or batch processing : Queueing and processing data in batches also provide a solution to scalability. Because the overhead of an operation is ammortized across multiple requests.< /li>
  • Relaxation of data constraints : Many different techniques and trade-offs with regards to how fast data can be accessed/processed or stored fall in this strategy. Caching, giving away fore
    ign keys and replication or denormalization of data are some of the examples of this strategy.

Ofcourse scalability cannot be addressed using only one of these strategies. The solution for scalability would depend on how the data is held and what compromises are we able to make with it. The solution might be a combination of a few of these strategies and more.

Original post : http://thebigsoftwareblog.blogspot.com/2010/08/scalability-fundamentals-and.html

Friday, September 10, 2010

Converting myisam tables to innodb

Why should you convert myisam tables to innodb ?

For the perfectly simple reason that innodb tables do not get locked by concurrent selects & inserts. So if you find that your myisam table is suffering for too many locks - due to concurrent selects and inserts, it is time for you to covert the table to innodb.

The simple query which does the trick is

Alter table myisam_table_name engine = innodb;

This query is good for small tables, which get converted in a flash (I am refering to tables smaller than 1 GB). But when you try to run this query to alter bigger tables, it takes a huge amount of time. You have to wait - maybe hours if not days to get your job done.

Recently I had a 30 GB table which i wanted to convert to innodb and the alter query went on for 3 days after which i gave up and killed the query. And then went on finding ways to make this alter happen fast.

There are multiple ways to make your alter fast -

1. create a temporary table with engine = innodb, disable unique checks and insert into the table in batches. The sequence of sql statements for this are

create table new_table_name like old_table_name;
alter table new_table_name engine = innodb;
set unique_checks=0;
// loop till all records are ported
insert into new_table_name select * from old_table_name where key > something and key <= something; set unique_checks=1;

If you have UNIQUE constraints on secondary keys, this speeds up a table import by turning off the uniqueness checks temporarily during the import operation.

2. Increase the buffers. In order to achieve higher I/O, you can increase the buffers - innodb_buffer_pool and innodb_log_file_size. If you are using a 64 bit machine - make the innodb_buffer_pool almost 50% of your available memory and innodb_log_file_size to around 128 MB (atleast) - you can make it 256MB to be a little more aggressive and 512 MB to be very aggressive. But it should be less than 25% of your innodb_buffer_pool size.

in my.cnf enter

innodb_buffer_pool = 2048M
innodb_log_file_size = 256M
innodb_log_buffer_size = 16M


Stop mysql.
Delete the ib_logfile0 & ib_logfile1.
start mysql.

Now running the alter query should be much faster.

A combination of both the above methods could be even more beneficial. Remember to re-tune your innodb settings - as per your requirement before putting live load on the server.

BTW, i tried using the second method to alter the 30 GB table, and looking at its speed, it seems that it will be done in around 10 hours - instead of the 3 days it took earlier.

Tuesday, September 07, 2010

Hashing algos : Consistent Hashing

Hashing is a way of mapping keys to locations. Normally you would hash by using a simple Key%n algorithm - which ensures that keys are mapped evenly across n splits. The problem with this algo is that adding or removing a node (or a split) would require a complete rehash of all the keys. And if you have a huge data set, it is ideally not feasable to rehash and re-distribute the keys.

Consistent hashing is a way of hashing that ensures that adding or removing a slot or node does not change the mapping of keys to slots significantly. When using consistent hashing, only K/n keys need to be remapped on average - where K is the number of keys and n is the number of slots.

The way this works is that both keys and slots are mapped to edges of a circle. Meaning that all slots are mapped on to a series of angles around a circle. And the bucket where each item should be stored is chosen by selecting the next highest angle which an available bucket maps to. So, each bucket contains resources mapping to an angle between it and the next smallest angle. If a bucket becomes unavailable, the keys being mapped to that bucket get mapped to the next highest bucket (or the next bucket in the circle). So, only keys which were in the bucket which became unavailable is lost. Similarly when a bucket is added, the keys between the new bucket and the next smallest bucket is mapped to the new bucket. Keys which should be associated with the new bucket and were stored previously will become unavailable.

figure 2
figure 1

Here is an example. Objects 1,2,3 and 4 map to slots A,B and C. To find which slot an object goes in, we move around the circle until we find a slot. So here objects 1 and 4 go into slot A, 2 goes into slot B and 3 goes into slot C. If C is removed, object 3 would belong to slot A. If another slot D is added as shown in figure 2, it will take objects 3 and 4 and only leave object 1 belonging to A.

Lets look at a php example which does consistent hasing us.
<?php
class ConsistentHasher
{
  private $nodeDistribution;
  private $virtualNodes;

  // nodeDistribution is the replica count per node.
  public function __construct($nodenames, $nodedistribution)
  {
    $this->nodeDistribution = $nodedistribution;
    $this->virtualNodes = array();

    for($i=0; $i<count($nodenames); $i++)
    {
      $this->addNode($nodenames[$i]);
    }
  }

  // The addNode() function takes a name and creates virtual nodes (or replicas) by 
  // appending the index of the local loop to the node name.
  // The hash value of a virtual node is an MD5 hash, base converted into an integer. 
  // The virtual node is added to a list and sorted by its value so that we ensure 
  // a lowest to highest traversal order for looking up previous and next virtual nodes
  public function addNode($name)
  {
    for($i=0; $i<$this->nodeDistribution; $i++)
    {
      //int representation of $key (8 hex chars = 4 bytes = 32 bit)
      $virtualNodeHashCode = base_convert(substr(md5($name.$i, false),0,8),16,10);
      $this->virtualNodes[$name.$i] = $virtualNodeHashCode;
    }
    asort($this->virtualNodes, SORT_NUMERIC);
  }

  public function dump()
  {
    print_r($this->virtualNodes);
    echo "\n\n";
  }

  public function sortCompare($a, $b)
  {
    if($a == $b)
    {
      return 0;
    }
    return ($a < $b) ? -1 : 1;
  }

  // The removeNode() function takes a name and removes its corresponding virtual nodes 
  // from the virtualNode list.
  // We then resort the list to ensure a lowest to highest traversal order.
  public function removeNode($name)
  {
    for($i=0; $i<$this->nodeDistribution; $i++)
    {
      unset($this->virtualNodes[$name.$i]);
    }
    asort($this->virtualNodes, SORT_NUMERIC);
  }

  // The hashToNode() function takes a key and locates the node where its value resides.
  // We loop through our virtual nodes, stopping before the first one that has a
  // hash greater than that of the key’s hash.
  // If we come to the end of the virtual node list, we loop back to the beginning to 
  // form the conceptual circle.

  public function hashToNode($key)
  {
    $keyHashCode = base_convert(substr(md5($key, false),0,8),16,10);
    $virtualNodeNames = array_keys($this->virtualNodes);
    $firstNodeName = $virtualNodeNames[0];
    $lastNodeName = $virtualNodeNames[count($virtualNodeNames)-1];
    $prevNodeName = $lastNodeName;

    foreach($this->virtualNodes as $name => $hashCode)
    {
      if($keyHashCode < $hashCode)
        return $prevNodeName;

      if($name == $lastNodeName)
        return $firstNodeName;

      $prevNodeName = $name;
    }
    return $prevNodeName;
  }
}

// demo

$hash = new ConsistentHasher(array("node1","node2","node3","node4","node5"),10);
$hash->dump();

$hash->removeNode("node2");
$hash->dump();

$hash->addNode("node6");
$hash->dump();

echo $hash->hashToNode("testing123")."\n";
echo $hash->hashToNode("key1111")."\n";
echo $hash->hashToNode("data4444")."\n";
?>


Here is a library which provides consistent hasing for php
http://code.google.com/p/flexihash/

Memcache is a widely used distributed cache which uses consistent hashing very efficiently to map keys to caching nodes.

References:
http://en.wikipedia.org/wiki/Consistent_hashing
http://www.osconvo.com/post/view/2010/9/1/distributed-hashing-algorithms-by-example-consistent-hashing