Whatever....: pgsql

Showing posts with label pgsql. Show all posts

Monday, September 27, 2010

Database speed tests (mysql and postgresql) - part 1

There has been major changes in mysql and postgres over a couple of years. Mysql has been focusing on improving and optimizing innodb. Postgres on the other hand has been focusing on database replication and hot standby.

Recently postgres came out with version 9.0 which has built-in replication and hot standby - the two most requested feature in postgresql. Earlier people used to shy away from postgres because there was no proper "easily deployable" solution available for replication. Now with this release, postgres had taken a major step forward. Here http://www.postgresql.org/docs/9.0/static/release-9-0 is a list of features that has been introduced in postgres 9.0

Mysql has released the rc version of Mysql 5.5 which has a bunch of improvements over the previous version of mysql. Support for multi-core cpus, Changes in Innodb for effective use of available I/O capacity, semisynchronous replication - are some of the features that mysql 5.5 promices. Here is a list of all the new features in MySQL 5.5 http://dev.mysql.com/doc/refman/5.5/en/mysql-nutshell.html

It has been a long time, since posted my last benchmark http://jayant7k.blogspot.com/2008/06/mysql-versus-postgresql.html. And i believe it is time i do some rough benchmarks and post it out. The scope is to check out innodb tables in mysql 5.5.6 versus the tables in postgresql 9.0. I am focusing only on inserts and selects. And i will be benchmarking pure inserts and selects only. Thie blog focuses only on inserts. I will be focusing on selects in my next blog. I am running these tests on my laptop which has a Intel Core 2 Duo T5870 @ 2.00 GHz and 3 GB of RAM

I have created a simple php script to perform the benchmark. Which spawns out multiple php scripts that work on background. Let me know if you need the scripts and i will share it with you.

Mysql:

Innodb Settings in my.cnf :

innodb_buffer_pool_size = 256M
innodb_additional_mem_pool_size = 8M
innodb_log_file_size = 64M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
innodb_file_per_table

Table structure :
CREATE TABLE `data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`val` int(11) NOT NULL,
`txt` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_val` (`val`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Avg Time for 20000 inserts with concurrency of 2 : 1367 seconds
Load on machine : 3.4
size on disk : 9 MB

Avg Time for 50000 inserts with concurrency of 5 : 1537.2 seconds
Load on machine : 4.6
Size on disk : 13 MB

Avg Time for 100000 inserts with concurrency of 10 : 1255
Load on machine : 3.5
Size on disk : 17 MB

Avg Time for 200000 inserts with concurrency of 20 : 1403
Load on machine : 4.2
Size on disk : 26 MB

Time for 400000 inserts with concurrency of 40 : 1730
Load in machine : 6.6
Size on disk : 52 MB

Postgresql :

Settings in postgresql.conf:
shared_buffers = 64MB
work_mem = 1MB
synchronous_commit = on

Table structure :
Column | Type | Modifiers | Storage | Description
--------+-----------------------+---------------------------------------------------+----------+-------------
id | integer | not null default nextval('data_id_seq'::regclass) | plain |
val | integer | not null | plain |
txt | character varying(20) | | extended |
Indexes:
"data_pkey" PRIMARY KEY, btree (id)
"idx_val" btree (val)
Has OIDs: no

Avg Time for 20000 inserts with concurrency of 2 : 221.3 seconds
Load on machine : 2.0
size on disk : 2 MB

Avg Time for 50000 inserts with concurrency of 5 : 232 seconds
Load on machine : 2.0
Size on disk : 4.8 MB

Avg Time for 100000 inserts with concurrency of 10 : 248.75 seconds
Load on machine : 4.0
Size on disk : 9.8 MB

Avg Time for 200000 inserts with concurrency of 20 : 276.34
Load on machine : 3.6
Size on disk : 19 MB

Time for 400000 inserts with concurrency of 40 : 350.11
Load in machine : 5.8
size on disk : 38 MB

The graph shows that mysql is heavy as compared to pgsql. The base timings are almost 5 times more in mysql as compared to pgsql. Also as the concurrency goes up the time required for inserts in mysql spikes up more steeply as compared to that required for postgres.

I did a sample run on mysql by turning innodb_flush_logs_at_trx_commit=2 and the benefit I got was a lot

Avg Time for 20000 inserts with concurrency of 2 (innodb_flush_logs_at_trx_commit=2) : 5.2 seconds
Avg Time for 100000 inserts with concurrency of 10 (innodb_flush_logs_at_trx_commit=2) : 18.69 seconds

Similarly i disabled synchronous_commit on postgres and did a sample run

Avg Time for 20000 inserts with concurrency of 2 (synchronous_commit = off) : 2.95 seconds
Avg Time for 100000 inserts with concurrency of 10 (synchronous_commit = off) : 15.06 seconds

PS : The average time is the time for 10000 inserts (inserts per instance)

Lets see what do the selects tell - in the next blog.

Wednesday, June 04, 2008

MySQL versus PostgreSQL - part II

My earlier post mysql versus postgresql brought me lots of negative comments - that i did not compare the transactional database of pgsql with the transactional engine (innodb) of mysql. The main reason why i did not do that was because i had found InnoDB to be very slow as compared to MyISAM.

But after all those comments i ran the benchmarks again using the same scripts and the same technology on the same machine (my laptop) and here are the results. I created a new table in both Mysql (using InnoDB engine) and pgsql. And i disabled the binary logging in mysql to speed up insert/update/delete queries. Please refer to the earlier post for the setup information.

Following notification would be used :

<operation(select/insert/update/delete)> : <no_of_threads> X <operations_per_thread>

Firstly i ran single thread with inserts both before and after disabling binary logging in mysql
Mysql Insert : 1 X 100000
Time : 65.22 Sec (binary logging enabled)
Time : 32.62 Sec (binary logging disabled)
So disabling binary logging in mysql would make your insert/update/delete queries take half the time.
Pgsql Insert : 1 X 100000
Time : 53.07 Sec
Inserts in mysql are very fast.

Selects : 2 X 100000
Mysql time : 30.1 Sec
Pgsql time : 29.92 Sec
Both are same

Updates : 2 X 50000
Mysql time : 29.38 Sec
Pgsql time : 36.98 Sec
Mysql updates are faster

Ran 4 threads with different no_of_operations/thread
Run 1 [Select : 1 X 100000, Insert : 1 X 50000, Update : 1 X 50000, Delete : 1 X 20000]
Mysql time : 40.86 Sec
Pgsql time : 45.03 Sec
Run 2 [Select : 1 X 100000, Insert : 1 X 100000, Update : 1 X 50000, Delete : 1 X 10000]
Mysql time : 49.91 Sec
Pgsql time : 63.38 Sec
Run 3 [Select : 1 X 100000, Insert : 1 X 20000, Update : 1 X 20000, Delete : 1 X 1000]
Mysql time : 29.83 Sec
Pgsql time : 29.3 Sec
It could be seen that increasing the amount of insert/update/delete queries affects the performance of pgsql. Pgsql would perform better if number of selects are very high. Whereas mysql-innodb performs better in all cases

Had 4 runs with different no of threads.
Run 1: 12 threads [Select : 2X30000 + 3X20000, Insert : 1X20000 + 2X10000, Update : 2X10000, Delete : 2X1000]
Mysql time : 31.16 Sec
Pgsql time : 30.46 Sec
Run 2: 12 threads [Select : 2X50000 + 2X40000 + 1X30000, Insert : 1X20000 + 2X15000, Update : 2X15000, Delete : 2X2000]
Mysql time : 52.25 Sec
Pgsql time : 53.03 Sec
Run 3: 20 Threads [Select : 4X50000 + 4X40000 + 2X30000, Insert : 2X20000 + 3X15000, Update : 2X20000 + 1X15000, Delete : 2X5000]
Mysql time : 169.81 Sec
Pgsql time : 136.04 Sec
Run 4: 30 Threads [Select : 2X50000 + 3X40000 + 3X30000 + 3X20000 + 4X10000, Insert : 1X30000 + 2X20000 + 3X10000, Update : 3X20000 + 3X10000, Delete : 1X10000 + 2X5000]
Mysql time : 200.25 Sec
Pgsql time : 156.9 Sec
So, it can be said that for a small system with less concurrency, mysql would perform better. But as concurrency increases, pgsql would perform better. I also saw that while running the pgsql benchmark, the system load was twice than while running mysql benchmark.

Enabling mysql binary logging for replication would ofcourse add an over head. Similarly enabling trigger based replication in pgsql would be another overhead. The fact that replication in mysql is very closely linked with the database server helps in making a high availability system easier. Whereas creating slaves using replication in pgsql is not that easy. All available products for replication in pgsql are external - 3rd party softwares. Still, for a high concurrency system pgsql would be a better choice.

Saturday, May 31, 2008

From MySQL to PostgreSQL

Why? In brief, it is said that mysql is not as stable as postgresql. Postgresql or pgsql focuses on a single database engine as compared to mysql which has a pluggable engine architecture and has multiple engines. Also postgresql is well designed as compared to mysql. psql console is much better than mysql console (you will realize it when you use it). It is supposed to be much more scalable and have better performance than mysql. Pgsql uses a more standard sql language as compared to mysql.

Both have fulltext search capabilities. Though it is said that the fulltext search of pgsql is better than mysql, but i still have to look into it. And the most important of all - pgsql is a full fledged RDBMS, whereas mysql (using the default MyISAM engine) is a DBMS. Data integrity is better in pgsql as compared to mysql. Also i have seen lots of corrupt tables in mysql(might be because i have used mysql throughout my career), and have heard(on the web) that mysql is more crash prone than pgsql.

The final question then comes down to speed. Which one is faster? Logically, since mysql is a DBMS and it does not maintain foreign key relations, it should be faster than pgsql. I will benchmark and blog the results sometime later.

Basically, pgsql is more stable & reliable than mysql. Pgsql has a rich set of features and data integrity is well maintained in pgsql.

Lets look at some of the steps you need to follow if you decide to switch to pgsql from mysql:

First of all, down the database engine from www.postgresql.org. Untar it and compile it.

$ tar -xvjf postgresql-8.3.1.tar.bz2
$ cd postgresql-8.3.1
$ ./configure --with-openssl --enable-thread-safety --with-libxml
(check out the options that you need using ./configure --help)
$ make
$ sudo make install

By default this will install pgsql in /usr/local directory

Next, create a user postgres and the database directory for pgsql
$ adduser postgres
$ mkdir /usr/local/pgsql/data
$ chown postgres.postgres /usr/local/pgsql/data

Create the pgsql database:
Firstly, log into mysql and do
mysql> show variables like '%character';
+--------------------------+-----------------------------------------------------------------+
| Variable_name | Value |
+--------------------------+-----------------------------------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.1.24-rc-linux-x86_64-glibc23/share/charsets/ |
+--------------------------+-----------------------------------------------------------------+
You will see here that the character set for the database & server is latin1. Even though the default database is utf8. If you create the pgsql database using the default command, it will create a utf8 database and you will face problems importing your data into it.

So, create a latin1 database for pgsql
$ su - postgres
$ initdb -D /usr/local/pgsql/data -E latin1 --locale=C

Now you have postgresql installed and you can create your own databases and tables. But before proceeding forward, do psql from user postgres's shell.

start pgsql:

$ /usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data >~/logfile 2>&1 &

$ psql
Welcome to psql 8.3.1, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit

postgres=#

The =# says that you are logged in as a super user. To login as another user, create another user and give him required privileges. And then su to the user and login to pgsql from that user.

postgres=# CREATE USER jayant WITH CREATEDB INHERIT LOGIN;
postgres=# \q
$ su - jayant
$ createdb test
$ psql test
test=>

Here => says that i am not a super user. Now i can create my tables in the test database or create more databases.

For help on how to create users do

test=> \h CREATE USER

And it will give you the syntax for creating a user in postgres

To convert your table from mysql format to pgsql format, just dump the table and data separately from mysql using the mysqldump --no-data and mysqldump --no-create-info respectively.

And you will get two files <mysql_table>.sql & <mysql_table_data>.sql

download a perl script mysql2pgsql.perl which will be able to convert the sql for your table into appropriate format for pgsql.

Now load the table and data into pgsql

$ psql -f <mysql_table>.sql
$ psql -f <mysql_table_data>.sql

It looks simple, but it is not that simple. Some points that need to be remembered while using pgsql

auto_increment column is defined using the SERIAL word. So, to create an auto_increment primary key column the syntax would be

CREATE TABLE MYTABLE(ID SERIAL NOT NULL PRIMARY KEY, ....);

This would also create a sequence mytable_id_seq.

While importing data from the mysql dump you may get some warning about backslash escapes in your sql. To remove this warning you will have to set the escape_string_warning to off.

edit your postgresql.conf file which can be found in the /usr/local/pgsql/data directory and change the variable to off. Now restart pgsql.

To start and stop pgsql following commands can be used

start pgsql
$ /usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data >~/logfile 2>&1 &

stop pgsql
$ /usr/local/pgsql/bin/pg_ctl stop -D /usr/local/pgsql/data

Another thing to note is that you dont have custom queries like 'insert ignore into', 'replace into' or 'insert into ... on duplicate key...' in pgsql. You will need to create rules on the table to handle these cases.

test=> \h create rule
Command: CREATE RULE
Description: define a new rewrite rule
Syntax:
CREATE [ OR REPLACE ] RULE name AS ON event
TO table [ WHERE condition ]
DO [ ALSO | INSTEAD ] { NOTHING | command | ( command ; command ... ) }

So for exampe to create an insert ignore into type of rule, the following rule needs to be created.

test=> CREATE OR REPLACE RULE "insert_ignore_mytable" AS ON INSERT TO "mytable" WHERE EXISTS (SELECT 1 FROM "mytable" WHERE id = NEW.id) DO INSTEAD NOTHING;

Remember SHOW PROCESSLIST of mysql which used to list down all the processes.

To list down all the processes for pgsql following command needs to be run

test=> select * from pg_stat_activity;

We will looking more and more into pgsql.