Wednesday, June 04, 2008

MySQL versus PostgreSQL - part II

My earlier post mysql versus postgresql brought me lots of negative comments - that i did not compare the transactional database of pgsql with the transactional engine (innodb) of mysql. The main reason why i did not do that was because i had found InnoDB to be very slow as compared to MyISAM.

But after all those comments i ran the benchmarks again using the same scripts and the same technology on the same machine (my laptop) and here are the results. I created a new table in both Mysql (using InnoDB engine) and pgsql. And i disabled the binary logging in mysql to speed up insert/update/delete queries. Please refer to the earlier post for the setup information.

Following notification would be used :

<operation(select/insert/update/delete)> : <no_of_threads> X <operations_per_thread>


  • Firstly i ran single thread with inserts both before and after disabling binary logging in mysql
    Mysql Insert : 1 X 100000
    Time : 65.22 Sec (binary logging enabled)
    Time : 32.62 Sec (binary logging disabled)
    So disabling binary logging in mysql would make your insert/update/delete queries take half the time.
    Pgsql Insert : 1 X 100000
    Time : 53.07 Sec
    Inserts in mysql are very fast.

  • Selects : 2 X 100000
    Mysql time : 30.1 Sec
    Pgsql time : 29.92 Sec
    Both are same

  • Updates : 2 X 50000
    Mysql time : 29.38 Sec
    Pgsql time : 36.98 Sec
    Mysql updates are faster

  • Ran 4 threads with different no_of_operations/thread
    Run 1 [Select : 1 X 100000, Insert : 1 X 50000, Update : 1 X 50000, Delete : 1 X 20000]
    Mysql time : 40.86 Sec
    Pgsql time : 45.03 Sec
    Run 2 [Select : 1 X 100000, Insert : 1 X 100000, Update : 1 X 50000, Delete : 1 X 10000]
    Mysql time : 49.91 Sec
    Pgsql time : 63.38 Sec
    Run 3 [Select : 1 X 100000, Insert : 1 X 20000, Update : 1 X 20000, Delete : 1 X 1000]
    Mysql time : 29.83 Sec
    Pgsql time : 29.3 Sec
    It could be seen that increasing the amount of insert/update/delete queries affects the performance of pgsql. Pgsql would perform better if number of selects are very high. Whereas mysql-innodb performs better in all cases

  • Had 4 runs with different no of threads.
    Run 1: 12 threads [Select : 2X30000 + 3X20000, Insert : 1X20000 + 2X10000, Update : 2X10000, Delete : 2X1000]
    Mysql time : 31.16 Sec
    Pgsql time : 30.46 Sec
    Run 2: 12 threads [Select : 2X50000 + 2X40000 + 1X30000, Insert : 1X20000 + 2X15000, Update : 2X15000, Delete : 2X2000]
    Mysql time : 52.25 Sec
    Pgsql time : 53.03 Sec
    Run 3: 20 Threads [Select : 4X50000 + 4X40000 + 2X30000, Insert : 2X20000 + 3X15000, Update : 2X20000 + 1X15000, Delete : 2X5000]
    Mysql time : 169.81 Sec
    Pgsql time : 136.04 Sec
    Run 4: 30 Threads [Select : 2X50000 + 3X40000 + 3X30000 + 3X20000 + 4X10000, Insert : 1X30000 + 2X20000 + 3X10000, Update : 3X20000 + 3X10000, Delete : 1X10000 + 2X5000]
    Mysql time : 200.25 Sec
    Pgsql time : 156.9 Sec
    So, it can be said that for a small system with less concurrency, mysql would perform better. But as concurrency increases, pgsql would perform better. I also saw that while running the pgsql benchmark, the system load was twice than while running mysql benchmark.


Enabling mysql binary logging for replication would ofcourse add an over head. Similarly enabling trigger based replication in pgsql would be another overhead. The fact that replication in mysql is very closely linked with the database server helps in making a high availability system easier. Whereas creating slaves using replication in pgsql is not that easy. All available products for replication in pgsql are external - 3rd party softwares. Still, for a high concurrency system pgsql would be a better choice.

11 comments:

Anonymous said...

Seems it is not really concurrent selects, since python use a GIL http://docs.python.org/api/threads.html

So, try to launch different scripts with maybe more selects and repost your bench ?

thanks for your blogpost

Anonymous said...

Still not quite apples to apples. Postgres does 'binary logging' by default already. It's also important to do minimal tuning to each type of database - I could easily tune Postgres, for example, and have it perform much worse than, and then much better than, a default MySQL install. And vice-versa.

ryan said...

Nice work -- with the caveat that this doesn't reflect production quality hardware, I think this is much more representative of the real performance differences.

There are of course ways to tweak and improve MySQL performance in a highly concurrent environment, Mark's post to the planet today about innodb_thread_concurrency being one good place to start.

Hopefully the concurrency and multi core work via the Google patches and Sun/MySQL/InnoDB engineers will help address this in a future GA. And the Falcon/InnoDB rivalry will hopefully spur on some serious improvements in both engines for 6.0 release. :-)

Ken Jacobs said...

Thanks for this info. It would be interesting if you could try this test using the new InnoDB Plugin for 5.1.24, which is available here: http://www.innodb.com/innodb_plugin/download/

You could try creating the table with and without compression (e.g., use KEY_BLOCK_SIZE=8 to compress the data).

It would be interesting to know the sizes of data files: MySQL (with MyISAM, with InnoDB uncompressed, InnoDB compressed and PostgreSQL).

Unknown said...

Ok thats an improvement, but you didn't specify what settings innodb you used. There are several variables that can drastically affect the performance in high concurrency situations:

innodb_flush_log_at_trx_commit
innodb_thread_concurrency
innodb_buffer_pool_size
thread_cache_size

Turing off the binary logging is an interesting choice. It probably does create a more even playing field for postgres. I'm willing to bet most people don't do that when directly comparing them. But it might be a better choice all together to enable the binary log for mysql and the most popular third party replication product for postgres. I think in most cases people who are concerned about database performance are usually using replication.

gamegeek said...

I am not that good at python-threads, but i believe that GIT would apply to objects shared between different threads. What i have done is use different objects for different threads. So my threads would synchronous. Please correct me if i am wrong.

Thanks for the info regarding binary logging. I will look into postgres config to disable binary logging. Meanwhile some tips for better postgres configuration would be very helpful.

@ryan : innodb would be gone in some time. And then we will have to look forward to falcon. I am also looking forward to major improvements in the myisam, maria and falcon engines in 6.0 release.

@ water outbreaks : here are the variables that i set for my box
innodb_flush_log_at_trx_commit=1
innodb_thread_concurrency=2
innodb_buffer_pool_size=128M
thread_cache_size=25
Someone was mentioning that pgsql also does some binary logging. Maybe i would look into turning that off as well...

Anonymous said...

essentially you can't turn off postgresql's binary logging, it is used for crash recovery. On the upside, you can use this logging to set up warm standby machines with virtually no additional overhead, so if you demand a failover solution (as one poster alluded to), the "fair" comparison is postgresql vs. mysql with binary logging enabled.

Anonymous said...

Thanks for the detailed info. Good going!

test said...

You said "All available products for replication in pgsql are external"... what about http://www.postgres-r.org/ ? I know it's 3rd-party, but it's patched against postgres, so it's not really external. (Unless you meant something else by external?)

SK said...

Worth comaring licence too

Anonymous said...

http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL