Wednesday, September 29, 2010

Database speed tests (mysql and postgresql) - part 2

Here is the comparison between mysql and postgresql for selects (only). I had used the same table that i had created earlier while comparing insertion speed. I have created approximately 1,000,000 records in the table and ran selects on them. I also modified the configuration of both mysql and postgresql to enable faster selects.


In mysql I specially disabled query_cache - the reason being that I would use innodb for tables with large number of inserts - due to its support for row level locking. And with every insert t
he query cache is marked as dirty. So enabling query cache would lead to caching of queries which will not be the scenario in a live setup.

Mysql Configuration :
innodb_buffer_pool_size = 256MB
key_buffer_size = 256MB
read_buffer_size = 512KB
sort_buffer_size = 512KB
query_cache_size = 0
thread_cache_size = 32
table_open_cache = 64

table information :
No of records : 9755366
data (idb) size : 692 MB

Mysql : (time for 100000 selects)

Avg time for 500000 selects with concurrency = 5 : 58.67
Avg time for 1000000 selects with concurrency = 10 : 122.8
Avg time for 2000000 selects with concurrency = 20 : 225.67
Avg time for 3000000 selects with concurrency = 30 : 351.66
Avg time for 4000000 selects with concurrency = 40 : 452.3

PostgreSQL :

Mysql has better table compression as compared to postgres. Same data in innodb is of around 700 MB while that in Postgres is of around 900 MB.

Postgres configuration :
shared_buffers = 128MB
work_mem = 1MB
random_page_cost = 4.0
effective_cache_size = 256MB

table information :
No of records : 9755366
data size : 912 MB

Pgsql : (time for 100000 selects)

Avg time for 500000 selects with concurrency = 5 : 86.8
Avg time for 1000000 selects with concurrency = 10 : 144.74
Avg time for 2000000 selects with concurrency = 20 : 274.37
Avg time for 3000000 selects with concurrency = 30 : 402.92
Avg time for 4000000 selects with concurrency = 40 : 528.17

Mysql seems to perform better with selects. The graph also shows that with increase in concurrency, selects in innodb take lesser time than that in postgresql.

So, why would you switch from mysql to postgresql - only if you have a very high ratio of inserts as compared to selects. The benefit in inserts outweigh the loss in selects to some extent.

Monday, September 27, 2010

Database speed tests (mysql and postgresql) - part 1

There has been major changes in mysql and postgres over a couple of years. Mysql has been focusing on improving and optimizing innodb. Postgres on the other hand has been focusing on database replication and hot standby.

Recently postgres came out with version 9.0 which has built-in replication and hot standby - the two most requested feature in postgresql. Earlier people used to shy away from postgres because there was no proper "easily deployable" solution available for replication. Now with this release, postgres had taken a major step forward. Here is a list of features that has been introduced in postgres 9.0

Mysql has released the rc version of Mysql 5.5 which has a bunch of improvements over the previous version of mysql. Support for multi-core cpus, Changes in Innodb for effective use of available I/O capacity, semisynchronous replication - are some of the features that mysql 5.5 promices. Here is a list of all the new features in MySQL 5.5

It has been a long time, since posted my last benchmark And i believe it is time i do some rough benchmarks and post it out. The scope is to check out innodb tables in mysql 5.5.6 versus the tables in postgresql 9.0. I am focusing only on inserts and selects. And i will be benchmarking pure inserts and selects only. Thie blog focuses only on inserts. I will be focusing on selects in my next blog. I am running these tests on my laptop which has a Intel Core 2 Duo T5870 @ 2.00 GHz and 3 GB of RAM

I have created a simple php script to perform the benchmark. Which spawns out multiple php scripts that work on background. Let me know if you need the scripts and i will share it with you.


Innodb Settings in my.cnf :

innodb_buffer_pool_size = 256M
innodb_additional_mem_pool_size = 8M
innodb_log_file_size = 64M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50

Table structure :
`val` int(11) NOT NULL,
`txt` varchar(20) DEFAULT NULL,
KEY `idx_val` (`val`)

Avg Time for 20000 inserts with concurrency of 2 : 1367 seconds
Load on machine : 3.4
size on disk : 9 MB

Avg Time for 50000 inserts with concurrency of 5 : 1537.2 seconds
Load on machine : 4.6
Size on disk : 13 MB

Avg Time for 100000 inserts with concurrency of 10 : 1255
Load on machine : 3.5
Size on disk : 17 MB

Avg Time for 200000 inserts with concurrency of 20 : 1403
Load on machine : 4.2
Size on disk : 26 MB

Time for 400000 inserts with concurrency of 40 : 1730
Load in machine : 6.6
Size on disk : 52 MB

Postgresql :

Settings in postgresql.conf:
shared_buffers = 64MB
work_mem = 1MB
synchronous_commit = on

Table structure :
Column | Type | Modifiers | Storage | Description
id | integer | not null default nextval('data_id_seq'::regclass) | plain |
val | integer | not null | plain |
txt | character varying(20) | | extended |
"data_pkey" PRIMARY KEY, btree (id)
"idx_val" btree (val)
Has OIDs: no

Avg Time for 20000 inserts with concurrency of 2 : 221.3 seconds
Load on machine : 2.0
size on disk : 2 MB

Avg Time for 50000 inserts with concurrency of 5 : 232 seconds
Load on machine : 2.0
Size on disk : 4.8 MB

Avg Time for 100000 inserts with concurrency of 10 : 248.75 seconds
Load on machine : 4.0
Size on disk : 9.8 MB

Avg Time for 200000 inserts with concurrency of 20 : 276.34
Load on machine : 3.6
Size on disk : 19 MB

Time for 400000 inserts with concurrency of 40 : 350.11
Load in machine : 5.8
size on disk : 38 MB

The graph shows that mysql is heavy as compared to pgsql. The base timings are almost 5 times more in mysql as compared to pgsql. Also as the concurrency goes up the time required for inserts in mysql spikes up more steeply as compared to that required for postgres.

I did a sample run on mysql by turning innodb_flush_logs_at_trx_commit=2 and the benefit I got was a lot

Avg Time for 20000 inserts with concurrency of 2 (innodb_flush_logs_at_trx_commit=2) : 5.2 seconds
Avg Time for 100000 inserts with concurrency of 10 (innodb_flush_logs_at_trx_commit=2) : 18.69 seconds

Similarly i disabled synchronous_commit on postgres and did a sample run

Avg Time for 20000 inserts with concurrency of 2 (synchronous_commit = off) : 2.95 seconds
Avg Time for 100000 inserts with concurrency of 10 (synchronous_commit = off) : 15.06 seconds

PS : The average time is the time for 10000 inserts (inserts per instance)

Lets see what do the selects tell - in the next blog.

Tuesday, September 14, 2010

Strategies to address scalability

There are some very fundamental scalability strategies which i will try to list here. These may have detailed and varied implementations.

  • Horizontal scalability - distribute the load : A very basic strategy for scalability is the ability to distribute the load or the requests across multiple processing units. Your code should be modular enough to allow horizontal scalability. Also the less information is shared among multiple requests, the easier it would be for you to scale horizontally. You should aim for a shared-nothing architecture
  • Sharding or partitioning is another way of scaling horizontally. The main idea here is to spread the load across many components by routing an individual request to a component that owns the data specific to that request. So you divide the complete data across multiple components and spread the requests among the components. There are two ways of creating shards:

    • Vertical partitioning : Spread the load across multiple processing units by distributing the functionalities across the components. So separate functions are being handled by different processing units. Column based databases are a very good example of vertical partitioning.
    • Horizontal Partitioning : Spread a single type of data element across multiple processing units. It is also referred to as hashing. Hashing based on userid to spread users across multiple datapoints can be an example of horizontal partitioning.
  • Parallelization is another way of achieving scalability. Parallelization can be achieved by working on the same task in parallel on multiple processing units. For example a web page having
    multiple components can process and fetch the components in parallel from multiple processing units.
  • Queueing or batch processing : Queueing and processing data in batches also provide a solution to scalability. Because the overhead of an operation is ammortized across multiple requests.< /li>
  • Relaxation of data constraints : Many different techniques and trade-offs with regards to how fast data can be accessed/processed or stored fall in this strategy. Caching, giving away fore
    ign keys and replication or denormalization of data are some of the examples of this strategy.

Ofcourse scalability cannot be addressed using only one of these strategies. The solution for scalability would depend on how the data is held and what compromises are we able to make with it. The solution might be a combination of a few of these strategies and more.

Original post :

Friday, September 10, 2010

Converting myisam tables to innodb

Why should you convert myisam tables to innodb ?

For the perfectly simple reason that innodb tables do not get locked by concurrent selects & inserts. So if you find that your myisam table is suffering for too many locks - due to concurrent selects and inserts, it is time for you to covert the table to innodb.

The simple query which does the trick is

Alter table myisam_table_name engine = innodb;

This query is good for small tables, which get converted in a flash (I am refering to tables smaller than 1 GB). But when you try to run this query to alter bigger tables, it takes a huge amount of time. You have to wait - maybe hours if not days to get your job done.

Recently I had a 30 GB table which i wanted to convert to innodb and the alter query went on for 3 days after which i gave up and killed the query. And then went on finding ways to make this alter happen fast.

There are multiple ways to make your alter fast -

1. create a temporary table with engine = innodb, disable unique checks and insert into the table in batches. The sequence of sql statements for this are

create table new_table_name like old_table_name;
alter table new_table_name engine = innodb;
set unique_checks=0;
// loop till all records are ported
insert into new_table_name select * from old_table_name where key > something and key <= something; set unique_checks=1;

If you have UNIQUE constraints on secondary keys, this speeds up a table import by turning off the uniqueness checks temporarily during the import operation.

2. Increase the buffers. In order to achieve higher I/O, you can increase the buffers - innodb_buffer_pool and innodb_log_file_size. If you are using a 64 bit machine - make the innodb_buffer_pool almost 50% of your available memory and innodb_log_file_size to around 128 MB (atleast) - you can make it 256MB to be a little more aggressive and 512 MB to be very aggressive. But it should be less than 25% of your innodb_buffer_pool size.

in my.cnf enter

innodb_buffer_pool = 2048M
innodb_log_file_size = 256M
innodb_log_buffer_size = 16M

Stop mysql.
Delete the ib_logfile0 & ib_logfile1.
start mysql.

Now running the alter query should be much faster.

A combination of both the above methods could be even more beneficial. Remember to re-tune your innodb settings - as per your requirement before putting live load on the server.

BTW, i tried using the second method to alter the 30 GB table, and looking at its speed, it seems that it will be done in around 10 hours - instead of the 3 days it took earlier.

Tuesday, September 07, 2010

Hashing algos : Consistent Hashing

Hashing is a way of mapping keys to locations. Normally you would hash by using a simple Key%n algorithm - which ensures that keys are mapped evenly across n splits. The problem with this algo is that adding or removing a node (or a split) would require a complete rehash of all the keys. And if you have a huge data set, it is ideally not feasable to rehash and re-distribute the keys.

Consistent hashing is a way of hashing that ensures that adding or removing a slot or node does not change the mapping of keys to slots significantly. When using consistent hashing, only K/n keys need to be remapped on average - where K is the number of keys and n is the number of slots.

The way this works is that both keys and slots are mapped to edges of a circle. Meaning that all slots are mapped on to a series of angles around a circle. And the bucket where each item should be stored is chosen by selecting the next highest angle which an available bucket maps to. So, each bucket contains resources mapping to an angle between it and the next smallest angle. If a bucket becomes unavailable, the keys being mapped to that bucket get mapped to the next highest bucket (or the next bucket in the circle). So, only keys which were in the bucket which became unavailable is lost. Similarly when a bucket is added, the keys between the new bucket and the next smallest bucket is mapped to the new bucket. Keys which should be associated with the new bucket and were stored previously will become unavailable.

figure 2
figure 1

Here is an example. Objects 1,2,3 and 4 map to slots A,B and C. To find which slot an object goes in, we move around the circle until we find a slot. So here objects 1 and 4 go into slot A, 2 goes into slot B and 3 goes into slot C. If C is removed, object 3 would belong to slot A. If another slot D is added as shown in figure 2, it will take objects 3 and 4 and only leave object 1 belonging to A.

Lets look at a php example which does consistent hasing us.
class ConsistentHasher
  private $nodeDistribution;
  private $virtualNodes;

  // nodeDistribution is the replica count per node.
  public function __construct($nodenames, $nodedistribution)
    $this->nodeDistribution = $nodedistribution;
    $this->virtualNodes = array();

    for($i=0; $i<count($nodenames); $i++)

  // The addNode() function takes a name and creates virtual nodes (or replicas) by 
  // appending the index of the local loop to the node name.
  // The hash value of a virtual node is an MD5 hash, base converted into an integer. 
  // The virtual node is added to a list and sorted by its value so that we ensure 
  // a lowest to highest traversal order for looking up previous and next virtual nodes
  public function addNode($name)
    for($i=0; $i<$this->nodeDistribution; $i++)
      //int representation of $key (8 hex chars = 4 bytes = 32 bit)
      $virtualNodeHashCode = base_convert(substr(md5($name.$i, false),0,8),16,10);
      $this->virtualNodes[$name.$i] = $virtualNodeHashCode;
    asort($this->virtualNodes, SORT_NUMERIC);

  public function dump()
    echo "\n\n";

  public function sortCompare($a, $b)
    if($a == $b)
      return 0;
    return ($a < $b) ? -1 : 1;

  // The removeNode() function takes a name and removes its corresponding virtual nodes 
  // from the virtualNode list.
  // We then resort the list to ensure a lowest to highest traversal order.
  public function removeNode($name)
    for($i=0; $i<$this->nodeDistribution; $i++)
    asort($this->virtualNodes, SORT_NUMERIC);

  // The hashToNode() function takes a key and locates the node where its value resides.
  // We loop through our virtual nodes, stopping before the first one that has a
  // hash greater than that of the key’s hash.
  // If we come to the end of the virtual node list, we loop back to the beginning to 
  // form the conceptual circle.

  public function hashToNode($key)
    $keyHashCode = base_convert(substr(md5($key, false),0,8),16,10);
    $virtualNodeNames = array_keys($this->virtualNodes);
    $firstNodeName = $virtualNodeNames[0];
    $lastNodeName = $virtualNodeNames[count($virtualNodeNames)-1];
    $prevNodeName = $lastNodeName;

    foreach($this->virtualNodes as $name => $hashCode)
      if($keyHashCode < $hashCode)
        return $prevNodeName;

      if($name == $lastNodeName)
        return $firstNodeName;

      $prevNodeName = $name;
    return $prevNodeName;

// demo

$hash = new ConsistentHasher(array("node1","node2","node3","node4","node5"),10);



echo $hash->hashToNode("testing123")."\n";
echo $hash->hashToNode("key1111")."\n";
echo $hash->hashToNode("data4444")."\n";

Here is a library which provides consistent hasing for php

Memcache is a widely used distributed cache which uses consistent hashing very efficiently to map keys to caching nodes.