Whatever....: innodb

Showing posts with label innodb. Show all posts

Wednesday, September 29, 2010

Database speed tests (mysql and postgresql) - part 2

Here is the comparison between mysql and postgresql for selects (only). I had used the same table that i had created earlier http://jayant7k.blogspot.com/2010/09/database-speed-tests-mysql-and.html while comparing insertion speed. I have created approximately 1,000,000 records in the table and ran selects on them. I also modified the configuration of both mysql and postgresql to enable faster selects.

Mysql

In mysql I specially disabled query_cache - the reason being that I would use innodb for tables with large number of inserts - due to its support for row level locking. And with every insert t
he query cache is marked as dirty. So enabling query cache would lead to caching of queries which will not be the scenario in a live setup.

Mysql Configuration :
innodb_buffer_pool_size = 256MB
key_buffer_size = 256MB
read_buffer_size = 512KB
sort_buffer_size = 512KB
query_cache_size = 0
thread_cache_size = 32
table_open_cache = 64

table information :
No of records : 9755366
data (idb) size : 692 MB

Mysql : (time for 100000 selects)

Avg time for 500000 selects with concurrency = 5 : 58.67
Avg time for 1000000 selects with concurrency = 10 : 122.8
Avg time for 2000000 selects with concurrency = 20 : 225.67
Avg time for 3000000 selects with concurrency = 30 : 351.66
Avg time for 4000000 selects with concurrency = 40 : 452.3

PostgreSQL :

Mysql has better table compression as compared to postgres. Same data in innodb is of around 700 MB while that in Postgres is of around 900 MB.

Postgres configuration :
shared_buffers = 128MB
work_mem = 1MB
random_page_cost = 4.0
effective_cache_size = 256MB

table information :
No of records : 9755366
data size : 912 MB

Pgsql : (time for 100000 selects)

Avg time for 500000 selects with concurrency = 5 : 86.8
Avg time for 1000000 selects with concurrency = 10 : 144.74
Avg time for 2000000 selects with concurrency = 20 : 274.37
Avg time for 3000000 selects with concurrency = 30 : 402.92
Avg time for 4000000 selects with concurrency = 40 : 528.17

Mysql seems to perform better with selects. The graph also shows that with increase in concurrency, selects in innodb take lesser time than that in postgresql.

So, why would you switch from mysql to postgresql - only if you have a very high ratio of inserts as compared to selects. The benefit in inserts outweigh the loss in selects to some extent.

Friday, September 10, 2010

Converting myisam tables to innodb

Why should you convert myisam tables to innodb ?

For the perfectly simple reason that innodb tables do not get locked by concurrent selects & inserts. So if you find that your myisam table is suffering for too many locks - due to concurrent selects and inserts, it is time for you to covert the table to innodb.

The simple query which does the trick is

Alter table myisam_table_name engine = innodb;

This query is good for small tables, which get converted in a flash (I am refering to tables smaller than 1 GB). But when you try to run this query to alter bigger tables, it takes a huge amount of time. You have to wait - maybe hours if not days to get your job done.

Recently I had a 30 GB table which i wanted to convert to innodb and the alter query went on for 3 days after which i gave up and killed the query. And then went on finding ways to make this alter happen fast.

There are multiple ways to make your alter fast -

1. create a temporary table with engine = innodb, disable unique checks and insert into the table in batches. The sequence of sql statements for this are

create table new_table_name like old_table_name;
alter table new_table_name engine = innodb;
set unique_checks=0;
// loop till all records are ported
insert into new_table_name select * from old_table_name where key > something and key <= something; set unique_checks=1;
If you have UNIQUE constraints on secondary keys, this speeds up a table import by turning off the uniqueness checks temporarily during the import operation.

2. Increase the buffers. In order to achieve higher I/O, you can increase the buffers - innodb_buffer_pool and innodb_log_file_size. If you are using a 64 bit machine - make the innodb_buffer_pool almost 50% of your available memory and innodb_log_file_size to around 128 MB (atleast) - you can make it 256MB to be a little more aggressive and 512 MB to be very aggressive. But it should be less than 25% of your innodb_buffer_pool size.

in my.cnf enter

innodb_buffer_pool = 2048M
innodb_log_file_size = 256M
innodb_log_buffer_size = 16M

Stop mysql.
Delete the ib_logfile0 & ib_logfile1.
start mysql.

Now running the alter query should be much faster.

A combination of both the above methods could be even more beneficial. Remember to re-tune your innodb settings - as per your requirement before putting live load on the server.

BTW, i tried using the second method to alter the 30 GB table, and looking at its speed, it seems that it will be done in around 10 hours - instead of the 3 days it took earlier.

Monday, June 07, 2010

Innodb now supports native AIO on Linux

With the exception of Windows InnoDB has used ’simulated AIO’ on all other platforms to perform certain IO operations. The IO requests that have been performed in a ’simulated AIO’ way are the write requests and the readahead requests for the datafile pages. Let us first look at what does ’simulated AIO’ mean in this context.

We call it ’simulated AIO’ because it appears asynchronous from the context of a query thread but from the OS perspective the IO calls are still synchronous. The query thread simply queues the request in an array and then returns to the normal working. One of the IO helper thread, which is a background thread, then takes the request from the queue and issues a synchronous IO call (pread/pwrite) meaning it blocks on the IO call. Once it returns from the pread/pwrite call, this helper thread then calls the IO completion routine on the block in question which includes doing a merge of buffered operations, if any, in case of a read. In case of a write, the block is marked as ‘clean’ and is removed from the flush_list. Some other book keeping stuff also happens in IO completion routine.

What we have changed in the InnoDB Plugin 1.1 is to use the native AIO interface on Linux. Note that this feature requires that your system has libaio installed on it. libaio is a thin wrapper around the kernelized AIO on Linux. It is different from Posix AIO which requires user level threads to service AIO requests. There is a new boolean switch, innodb_use_native_aio, to choose between simulated or native AIO, the default being to use native AIO.

How does this change the design of the InnoDB IO subsystem? Now the query thread instead of enqueueing the IO request actually dispatches the request to the kernel and returns to the normal working. The IO helper thread, instead of picking up enqueued requests, waits on the IO wait events for any completed IO requests. As soon as it is notified by the kernel that a certain request has been completed it calls the IO completion routine on that request and then returns back to wait on the IO wait events. In this new design the IO requesting thread becomes kind of a dispatcher while the background IO thread takes on the role of a collector.

What will this buy us? The answer is simple – scalability. For example, consider a system which is heavily IO bound. In InnoDB one IO helper thread works on a maximum of 256 IO requests at one time. Assume that the heavy workload results in the queue being filled up. In simulated AIO the IO helper thread will go through these requests one by one making a synchronous call for each request. This means serialisation forcing the request that is serviced last to wait for the other 255 requests before it gets a chance. What this implies is that with simulated AIO there can be at most ‘n’ IO requests in parallel inside the kernel where ‘n’ is the total number of IO helper threads (this is not entirely true because query threads are also allowed to issue synchronous requests as well, but I’ll gloss over that detail for now). In case of native AIO all 256 requests are dispatched to the kernel and if the underlying OS can service more requests in parallel then we’ll take advantage of that.

The idea of coalescing contiguous requests is now off loaded to the kernel/IO scheduler. What this means is that which IO scheduler you are using or the properties of your RAID/disk controller may now have more affect on the overall IO performance. This is also true because now many more IO requests will be inside the kernel than before. Though we have not run tests to specifically certify any particular IO scheduler the conventional wisdom has been that for database engine workloads perhaps no-op or deadline scheduler would give optimal performance. I have heard that lately a lots of improvements have gone in cfq as well. It is for you to try and as always YMMV. And we look forward to hear your story.

NOTE:InnoDB h as always used native AIO on Windows and it continues to do so in Plugin 1.1. innodb_use_native_aio will have no affect on Windows.

Friday, October 02, 2009

SHOW ENGINE INNODB STATUS;

Innodb monitors show information about innodb internal status - which could be used for performance tuning. Lets break down the output of show engine innodb status and get a look at what is happening and how it can be improved. Just fire the "Show engine innodb status" command and check the Output.

mysql> show engine innodb status\G
*************************** 1. row ***************************
Type: InnoDB
Name:
Status:
=====================================
091002 9:44:20 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 45 seconds

To get a better picture ensure that the output is sampled for some time (say more than 15 seconds - the more the better). If the sampling time is less, just run the same command again to get another sample.

----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 20636234, signal count 20151488
Mutex spin waits 0, rounds 486769929, OS waits 6568865
RW-shared spins 19231656, OS waits 8487916; RW-excl spins 13940968, OS waits 4652674

This section reports threads waiting for a semaphore and statistics on how many times threads have needed a spin or a wait on a mutex or a rw-lock semaphore. A large number of threads waiting for semaphores may be a result of disk I/O, or contention problems inside InnoDB. Contention can be due to heavy parallelism of queries or problems in operating system thread scheduling. Setting the innodb_thread_concurrency system variable smaller than the default value might help in such situations.

A more explicit output could be (obtained from mysql.com)

----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 413452, signal count 378357
--Thread 32782 has waited at btr0sea.c line 1477 for 0.00 seconds the
semaphore: X-lock on RW-latch at 41a28668 created in file btr0sea.c line 135
a writer (thread id 32782) has reserved it in mode wait exclusive
number of readers 1, waiters flag 1
Last time read locked in file btr0sea.c line 731
Last time write locked in file btr0sea.c line 1347
Mutex spin waits 0, rounds 0, OS waits 0
RW-shared spins 108462, OS waits 37964; RW-excl spins 681824, OS waits 375485

Thread wait information is available only if mysql is running in a very high concurrency environment so innodb has to fall back to OS waits. It would require a bit of knowledge of innodb source code to figure out where you are getting stuck - like in this piece of output we are getting stuck at btr0sea.c lines 731 & 1347.

The reservation count and signal count information show how actively innodb uses internal sync array - how frequently slots are allocated in it and how frequently threads are signaled using sync array. High OS Waits are bad - it means that there is a lot of context switching in the OS. Spin waits and spin rounds are cheaper as compared to OS waits but they also waste CPU cycles. So a large amount of Spin waits and spin rounds mean cpu resources are being wasted. innodb_sync_spin_loops can be used to balance and improve these status variables.

InnoDB uses spin-loop in hopes thread locked mutex is very fast and will release mutex while current thread runs in spins, so there is saving of expensive context switching. However we can't run spin very long as it eats CPU resources and after some loops it is reasonable to give CPU resource to concurrent threads.

------------------------
LATEST FOREIGN KEY ERROR
------------------------
030709 13:00:59 Transaction:
TRANSACTION 0 290328284, ACTIVE 0 sec, process no 3195, OS thread id 34831 inserting
15 lock struct(s), heap size 2496, undo log entries 9
MySQL thread id 25, query id 4668733 localhost heikki update
insert into ibtest11a (D, B, C) values (5, 'khDk' ,'khDk')
Foreign key constraint fails for table test/ibtest11a:,
CONSTRAINT `0_219242` FOREIGN KEY (`A`, `D`) REFERENCES `ibtest11b` (`A`,
`D`) ON DELETE CASCADE ON UPDATE CASCADE
Trying to add in child table, in index PRIMARY tuple:
0: len 4; hex 80000101; asc ....;; 1: len 4; hex 80000005; asc ....;; 2:
len 4; hex 6b68446b; asc khDk;; 3: len 6; hex 0000114e0edc; asc ...N..;; 4:
len 7; hex 00000000c3e0a7; asc .......;; 5: len 4; hex 6b68446b; asc khDk;;
But in parent table test/ibtest11b, in index PRIMARY,
the closest match we can find is record:
RECORD: info bits 0 0: len 4; hex 8000015b; asc ...[;; 1: len 4; hex
80000005; asc ....;; 2: len 3; hex 6b6864; asc khd;; 3: len 6; hex
0000111ef3eb; asc ......;; 4: len 7; hex 800001001e0084; asc .......;; 5:
len 3; hex 6b6864; asc khd;;

If you are using foreign keys and there has been an error, you would also have a Foreign key error section. This section provides information about the most recent foreign key constraint error.

------------------------
LATEST DETECTED DEADLOCK ------------------------
091001 14:43:14
*** (1) TRANSACTION:
TRANSACTION 0 611308122, ACTIVE 0 sec, process no 13095, OS thread id 459504560 setting auto-inc lock
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 320, 1 row lock(s)
MySQL thread id 675137455, query id 2100136547 update
REPLACE INTO TEST (ID, CONTENT) VALUES('2012258', 'xina0157')
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
TABLE LOCK table `TEST` trx id 0 611308122 lock mode AUTO-INC waiting
*** (2) TRANSACTION:
TRANSACTION 0 611308121, ACTIVE 0 sec, process no 13095, OS thread id 458902448 updating or deleting, thread declared inside InnoDB 492
mysql tables in use 1, locked 1
6 lock struct(s), heap size 1024, 11 row lock(s), undo log entries 6
MySQL thread id 791833830, query id 2100136545 10.208.67.50 rx update
REPLACE INTO TEST (ID, CONTENT) VALUES('2012257', 'bhuvneshahuja'), ('2012257', 'lalitgarg007'), ('2012257', 'yatin1904'), ('2012257', 'vinay168'), ('2012257', 'Bhupennsnghn'), ('2012257'
, 'vulnerabl003'), ('2012257', 'manollom'), ('2012257', 'Solankiccoutantom'), ('2012257', 'gprashad@rediffmaiom'), ('2012257', 'deep1_1'), ('2012257', 'ankur20584'), ('2012257', 'amarrock'), ('2012257',
'jyotima_89yaon')
*** (2) HOLDS THE LOCK(S):

TABLE LOCK table `TEST` trx id 0 611308121 lock mode AUTO-INC
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 0 page no 1822478 n bits 480 index `ID` of table `TEST` trx id 0 611308121 lock_mode X waiting
Record lock, heap no 314 PHYSICAL RECORD: n_fields 3; compact format; info bits 0

0: len 4; hex 801eb462; asc b;; 1: len 8; hex 78696e6130313537; asc xina0157;; 2: len 4; hex ee5c72a8; asc \r ;;

*** WE ROLL BACK TRANSACTION (1)

This section provides information about the most recent deadlock. It is not present if no deadlock has occurred. The contents show which transactions are involved, the statement each was attempting to execute, the locks they have and need, and which transaction InnoDB decided to roll back to break the deadlock. Innodb only prints information about few of the locks which transaction is holding. Also only last statement from each transactions is displayed, while locks rows could be locked by one of previous statements.

Here we can see that query 1 was waiting for lock to obtain auto_increment_id from for the table TEST - which has been locked by query 2. So, auto_increment_id is the one creating problems here.

------------
TRANSACTIONS
------------
Trx id counter 0 612244468
Purge done for trx's n:o < 0 612244461 undo n:o < 0 0
History list length 11
Total number of lock structs in row lock hash table 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0 0, not started, process no 13095, OS thread id 524331952
MySQL thread id 793300751, query id 2104652418 localhost root
show engine innodb status
---TRANSACTION 0 612244464, not started, process no 13095, OS thread id 422017968
MySQL thread id 793301055, query id 2104652413 10.208.67.111 rx
---TRANSACTION 0 612244463, not started, process no 13095, OS thread id 288115632
MySQL thread id 793301051, query id 2104652405 10.208.67.46 rx
---TRANSACTION 0 612244465, not started, process no 13095, OS thread id 290991024
MySQL thread id 793301050, query id 2104652414 10.208.67.111 rx
---TRANSACTION 0 612244457, not started, process no 13095, OS thread id 54217648
MySQL thread id 793301046, query id 2104652381 10.208.67.46 rx
---TRANSACTION 0 612244467, not started, process no 13095, OS thread id 521587632
MySQL thread id 793301039, query id 2104652417 10.208.67.46 rx
---TRANSACTION 0 612244453, not started, process no 13095, OS thread id 469560240
MySQL thread id 793301038, query id 2104652371 10.208.67.111 rx
---TRANSACTION 0 612244444, not started, process no 13095, OS thread id 502590384
MySQL thread id 793301028, query id 2104652341 10.208.67.111 rx
---TRANSACTION 0 612244438, not started, process no 13095, OS thread id 220674992
MySQL thread id 793301027, query id 2104652325 10.208.67.103 rx
---TRANSACTION 0 612244423, not started, process no 13095, OS thread id 498207664
MySQL thread id 793301014, query id 2104652276 10.208.67.103 rx
---TRANSACTION 0 612244417, not started, process no 13095, OS thread id 291793840
MySQL thread id 793301013, query id 2104652269 10.208.67.103 rx
---TRANSACTION 0 612244397, not started, process no 13095, OS thread id 450493360
MySQL thread id 793300979, query id 2104652179 10.208.67.111 rx
---TRANSACTION 0 612244426, not started, process no 13095, OS thread id 459504560
MySQL thread id 675137455, query id 2104652279 Has read all relay log; waiting for the slave I/O thread to update it

If this section reports lock waits, your applications might have lock contention. If you have small number of connections all connections will be printed in transaction list, if you have large number of connections Innodb will only print number of them,so the output will not grow too large. Transaction id is current transaction identifier - it is incremented for each transaction.

"Purge done for trx's n:o" is number of transaction to which purge is done. Innodb can only purge old versions if they there are no running transactions potentially needing them. Old stale uncommitted transactions may block purge process eating up resources. By looking at transaction counter difference between current and last purged transaction you will be able to spot it. In some rare cases purge also could have hard time to keep up with update rate, in this case difference between these values will also grow.
"undo n:o" will show the undo log record number which purge is currently processing, if it is active otherwise it will be zero.
"History list length 11" is number of unpurged transactions in undo space. It is increased as transactions which have done updates are commited and decreased as purge runs.
For each of connections for MySQL there will be ether "not started" state if there is no active Innodb transaction for this connection, or "ACTIVE" if transaction is active.
Also transaction status is reported which is basically what transaction is doing it can be "fetching rows", "updating" and couple of other values.
"Total number of lock structs in row lock hash table 0" is number of row lock structures allocated by all transactions. Note not same as number of locked rows - there are normally many rows for each lock structure.
"Thread declared inside InnoDB 400" sometimes appears. It means thread is running inside Innodb kernel and still has 400 tickets to use. Innodb tries to limit thread concurrency allowing only innodb_thread_concurrency threads to run inside Innodb kernel at the same time. If thread is not runniing inside innodb kernel status could be "waiting in InnoDB queue" or "sleeping before joining InnoDB queue".
To avoid too many threads competing to enter innodb queue at the same time Innodb makes thread to sleep for some time before trying to wait (if no free slot was available). This may cause number of threads active inside kernel being less than number of threads allowed by "innodb_thread_concurrency". For certain workloads it may help to decrease the time thread waits before it enters the queue. This is done by adjusting "innodb_thread_sleep_delay variable". Value is specified in microseconds.
"mysql tables in use 1, locked 0" is number of tables used by transaction in question and number of tables locked by transactions. Innodb does not lock tables for normal operation so number of tables locked normally stays 0, unless it is ALTER TABLE or similar statement, or if LOCK TABLES was used.
In addition to Innodb specific information, there is generic statement information which is visible in SHOW PROCESSLIST showed in SHOW INNODB STATUS, such as statement which is being executed, query id, query status etc.

--------
FILE I/O --------
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: waiting for i/o request (write thread)
Pending normal aio reads: 0, aio writes: 0,
ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
35472867 OS file reads, 512713655 OS file writes, 33949612 OS fsyncs
0.29 reads/s, 16384 avg bytes/read, 20.76 writes/s, 1.33 fsyncs/s

This section shows state of file IO helper threads - insert buffer thread, log thread, read thread and write thread. These are responsible appropriately for insert buffer merges, asynchronous log flushes, read-ahead and flushing of dirty buffers. Normal reads originated from query executions are executed by threads running queries. On Unix/Linux you will always see 4 helper threads, on Windows it however can be adjusted by innodb_file_io_threads variable. For each helper thread you can see thread state - if thread is ready - waiting for i/o request or if it is executing certain operation.
Number of pending operation is shown for each of helper threads - these are amount of operations queued for execution or being executed at the same time. Also number of pending fsync operations is displayed. For writes Innodb has to ensure data makes it to the disk - just passing it to OS cache is not enough. This is typically done by calling fsync() for modified files. Constant high values for any of these variables is indication of IO bound workload.
Next we see the number of IO operations & the averages of these operations which could be used for graphing/monitoring purposes.

-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX -------------------------------------
Ibuf: size 1, free list len 5, seg size 7,
7567365 inserts, 7567365 merged recs, 4848077 merges
Hash table size 4425293, used cells 1294280, node heap has 1502 buffer(s)
14.07 hash searches/s, 78.40 non-hash searches/s

This section shows insert buffer and adaptive hash status. First line shows status of insert buffer - segment size and free list as well as if there are any records is insert buffer. Next it shows how many inserts were done in insert buffer, how many recs were merged and how many merges did it took. Ratio of number of merges to number of inserts is pretty much insert buffer efficiency.

Adaptive hash index is hash index Innodb builds for some pages to speed up row lookup replacing btree search with hash search. This section shows hash table size, number of used cells and number of buffers used by adaptive hash index. You can also see number of hash index lookups and number of non-hash index lookups which is indication of hash index efficiency.

---
LOG
---
Log sequence number 263 105379218
Log flushed up to 263 105373095
Last checkpoint at 263 102251451
0 pending log writes, 0 pending chkp writes
391975179 log i/o's done, 20.00 log i/o's/second

Log section provides information about log subsystem of Innodb. You can see current log sequence number - which is amount of bytes Innodb has written in log files since system tablespace creation. You can also see up to which point logs have been flushed - so how much data is unflushed in log buffer as well as when last checkpoint was performed. Innodb uses fuzzy checkpointing so this line hold log sequence, all changes up to which has been flushed from buffer pool. Changes having higher log sequences may still only be recored in logs and not flushed from buffer pool so such log sequences can't be over written in log files. By monitoring log sequence number and value up to which logs have been flushed you can check if your innodb_log_buffer_size is optimal - if you see more than 30% of log buffer size being unflushed you may want to increase it.

You also can see number of pending normal log writes and number of checkpoint log writes. Number of log/io operations allows to separate tablespace related IO from log related IO so you can see how much IO your log file requires. Note depending on your innodb_flush_log_at_trx_commit value your log writes may be more or less expensive. If innodb_flush_logs_at_trx_commit=2 log writes are done to OS cache, and being sequential writes these logs writes are pretty fast.

----------------------
BUFFER POOL AND MEMORY ----------------------
Total memory allocated 1177422796; in additional pool allocated 5829888
Dictionary memory allocated 94872
Buffer pool size 65536
Free buffers 1
Database pages 64033
Modified db pages 476
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages read 58886259, created 19378118, written 210440562
0.29 reads/s, 0.42 creates/s, 0.96 writes/s
Buffer pool hit rate 1000 / 1000

This section shows Buffer pool activity and memory usage. You can see total memory allocated by Innodb (sometimes it is higher than you anticipated), amount of memory allocated in additional memory pool (so you can check if it is sized right), total number of pages in buffer pool, number of pages free, pages allocated by database pages and dirty pages. From these values you can learn if your buffer pool is sized well - if you have constantly a lot of pages free, it probably means your active database size is smaller than allocated buffer pool size so you can tune it down. Even if free pages is zero as in this case database pages will not be equal to total size of buffer pool, because buffer pool also stores lock information, adaptive hash indexes and some other system structures.

Pending reads and writes are pending requests on buffer pool level. Innodb may merge multiple requests to one on file level so these are different. We can also see different types of IO submited by Innodb - pages to be flushed via LRU pages - dirty pages which were not accessed long time, flush list - old pages which need to be flushed by checkpointing process and single page - independent page writes.

We can also see number of pages being read and written. Created pages is empty pages created in buffer pool for new data - when previous page content was not read to the buffer pool.

Finally you can see buffer pool hit ratio which measures buffer pool efficiency. 1000/1000 corresponds to 100% hit rate. It is hard to tell what buffer pool hit rate is good enough - it is very workload dependent. Sometimes 950/1000 will be enough, sometimes you can see IO bound workload with hit rate of 995/1000.
--------------
ROW OPERATIONS --------------
0 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
Main thread process no. 13095, id 538459056, state: sleeping
Number of rows inserted 748462338, updated 421356848, deleted 323128079, read 3220854064
5.13 inserts/s, 8.00 updates/s, 0.00 deletes/s, 43.71 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

Row operations show activity on the row basics and some system information.

It shows innodb thread queue status - how many threads are waiting and being active. How many read views are open inside Innodb - this is when transaction was started but no statement is currently active, state of Innodb main thread which controls scheduling of number of system operations - flushing dirty pages, checkpointing, purging, flusing logs, doing insert buffer merge. Values for "state" field are rather self explanatory.

You can also see number of rows operation since system startup as well as average values. Row operations is very good measure of Innodb load. Not all row operations are created equal of course and accessing of 10 byte rows is much cheaper than accessing 10MB blog, but it is still much more helpful than number of queries, which is even more different.

Wednesday, June 04, 2008

MySQL versus PostgreSQL - part II

My earlier post mysql versus postgresql brought me lots of negative comments - that i did not compare the transactional database of pgsql with the transactional engine (innodb) of mysql. The main reason why i did not do that was because i had found InnoDB to be very slow as compared to MyISAM.

But after all those comments i ran the benchmarks again using the same scripts and the same technology on the same machine (my laptop) and here are the results. I created a new table in both Mysql (using InnoDB engine) and pgsql. And i disabled the binary logging in mysql to speed up insert/update/delete queries. Please refer to the earlier post for the setup information.

Following notification would be used :

<operation(select/insert/update/delete)> : <no_of_threads> X <operations_per_thread>

Firstly i ran single thread with inserts both before and after disabling binary logging in mysql
Mysql Insert : 1 X 100000
Time : 65.22 Sec (binary logging enabled)
Time : 32.62 Sec (binary logging disabled)
So disabling binary logging in mysql would make your insert/update/delete queries take half the time.
Pgsql Insert : 1 X 100000
Time : 53.07 Sec
Inserts in mysql are very fast.

Selects : 2 X 100000
Mysql time : 30.1 Sec
Pgsql time : 29.92 Sec
Both are same

Updates : 2 X 50000
Mysql time : 29.38 Sec
Pgsql time : 36.98 Sec
Mysql updates are faster

Ran 4 threads with different no_of_operations/thread
Run 1 [Select : 1 X 100000, Insert : 1 X 50000, Update : 1 X 50000, Delete : 1 X 20000]
Mysql time : 40.86 Sec
Pgsql time : 45.03 Sec
Run 2 [Select : 1 X 100000, Insert : 1 X 100000, Update : 1 X 50000, Delete : 1 X 10000]
Mysql time : 49.91 Sec
Pgsql time : 63.38 Sec
Run 3 [Select : 1 X 100000, Insert : 1 X 20000, Update : 1 X 20000, Delete : 1 X 1000]
Mysql time : 29.83 Sec
Pgsql time : 29.3 Sec
It could be seen that increasing the amount of insert/update/delete queries affects the performance of pgsql. Pgsql would perform better if number of selects are very high. Whereas mysql-innodb performs better in all cases

Had 4 runs with different no of threads.
Run 1: 12 threads [Select : 2X30000 + 3X20000, Insert : 1X20000 + 2X10000, Update : 2X10000, Delete : 2X1000]
Mysql time : 31.16 Sec
Pgsql time : 30.46 Sec
Run 2: 12 threads [Select : 2X50000 + 2X40000 + 1X30000, Insert : 1X20000 + 2X15000, Update : 2X15000, Delete : 2X2000]
Mysql time : 52.25 Sec
Pgsql time : 53.03 Sec
Run 3: 20 Threads [Select : 4X50000 + 4X40000 + 2X30000, Insert : 2X20000 + 3X15000, Update : 2X20000 + 1X15000, Delete : 2X5000]
Mysql time : 169.81 Sec
Pgsql time : 136.04 Sec
Run 4: 30 Threads [Select : 2X50000 + 3X40000 + 3X30000 + 3X20000 + 4X10000, Insert : 1X30000 + 2X20000 + 3X10000, Update : 3X20000 + 3X10000, Delete : 1X10000 + 2X5000]
Mysql time : 200.25 Sec
Pgsql time : 156.9 Sec
So, it can be said that for a small system with less concurrency, mysql would perform better. But as concurrency increases, pgsql would perform better. I also saw that while running the pgsql benchmark, the system load was twice than while running mysql benchmark.

Enabling mysql binary logging for replication would ofcourse add an over head. Similarly enabling trigger based replication in pgsql would be another overhead. The fact that replication in mysql is very closely linked with the database server helps in making a high availability system easier. Whereas creating slaves using replication in pgsql is not that easy. All available products for replication in pgsql are external - 3rd party softwares. Still, for a high concurrency system pgsql would be a better choice.

Saturday, June 24, 2006

mysql storage engines

I would be discussing mysql storage engines/table types over here. MySQL supports pluggable storage engine architecture, which means that you can create your own storage engine and ask MySQL to store data for you using the sql query syntax defined in MySQL. Sounds cool right, but lets not dig that deep now. Lets first see what types of storage engines does MySQL provide and their benefits/drawbacks.

MYISAM Storage Engine

Based on the older IASM storage engine and currently the default storage engine in MySQL. So, if you create a table in MySQL and dont define a storage engine then, the table would use MYIASM as the storage engine. MySQL databases are generally directories and MYISAM tables are stored using 3 files -
.frm => the table format
.MYD => table data
.MYI => table indexes

Points to Remember:

The tables are machine/os independent and can be shifted by simply copying the 3 files to another installation of mysql on another machine.

Limit of 2^32 rows which can be increased to (2^32)^2.

Maximum no of indexes per table = 64, which can be increased upto 128 by recompiling.

Maximum no of columns per index = 16.

Blob and TEXT columns can be indexed.

Table size is almost unlimited (256TB). Though the table gives an error of "Table is full" when it reaches 4 GB. Then some parameters (MAX_ROWS & AVG_ROW_LENGTH) in the table definition needs to be changed to make the table expandable.

Transactions are not supported
Fulltext indexing and searching is supported though it is very slow.

Granuality of locking is TABLE. So if an insert and select happens one after another, the insert will lock the complete table and the select will have to wait till the lock is removed.

The tables do get corrupted sometimes and to recover it, you would need to run the "REPAIR TABLE" command.

Does not provide "Foreign key" usage.

InnoDB Storage Engine

InnoDB provides transactions and relational database support in MySQL - supports foreign keys and commit, rollback and crash recovery capabilities. InnoDB creates two log files namely "ib_logfile0" and "ib_logfile1" and a data file "ibdata1" in the MySQL data directory where it stores its tables. You can specify the path for creation of data file and the initial/max size of the file and also the no of data files to create. Placing different data files on different disks would of course lead to increased efficiency.

The table definitions are stored in database directory with a .frm extension whereas the data is stored in the "ibdata1" - tablespace. InnoDB has something known as "clustered index" where the data for the rows is stored. Accessing a row through the clustered index is fast because the row data is on the same page where the index search leads. If a table is large, the clustered index architecture often saves a disk I/O.

innodb_data_home_dir = /ibdata
innodb_data_file_path=ibdata1:50M;ibdata2:50M:autoextend:max:500M

Points to remember:

Moving databases/tables is not easy. You need to take a dump and import it in the other installation of mysql

Table cannot contain more than 1000 columns

Although InnoDB supports row sizes larger than 65535 internally, you cannot define a row containing VARCHAR columns with a combined size larger than 65535.

combined size of the InnoDB log files must be less than 4GB

Minimum tablespace size is 10MB. And maximum tablespace size is 64TB. This is also the maximum size for a table

Does not support FULLTEXT indexes.

InnoDB does not keep an internal count of rows in a table. So to process a select count(*) from tablename, InnoDB must scan the complete index which may take some time.

Delete from tablename does not regenerate the table, but deletes rows one by one.

Supports foreign keys constrains and transactions (commit and rollback)

Has row level locking and also provides consistent non-locking read in select statements. Which provides better multi-user concurrency and performance.

Memory/Heap Storage Engine

As the name suggests, the tables are made in memory/RAM. Since the data for the tables are stored in memory they are lost if the mysql server is shut down or if it crashes, but the tables continue to exist since the table definition is a .frm file stored in the database directory. On a mysql restart, all data would have been lost and the tables would be empty. The indexes made on these tables are HASH indexes which makes access to these tables extremely fast. You can create temporary HEAP tables which can be used for high frequency reads & writes.

Points to remember:

Can have 32 indexes per table and 16 columns per index.

Type of indexing can be decided to be either HASH or BTREE.

Using Hash indexes can lead to slower updates and deletes on the table if the degree of key duplication is high.

Cannot contain BLOB or TEXT columns

Server needs sufficient memory to maintain all MEMORY tables at one point of time

To populate a MEMORY table when MySQL server is started, the --init-file option can be used

It has table level locking

Memory can be freed up by using either truncate/drop table command

Merge Storage Engine

Using MERGE storage engine, we can make a collection of identical MYISAM tables look and work like one. Each MERGE table is made up of two files
.frm - table definition/format
.MRG - names of tables that should be used as one
MERGEd tables support SELECT, DELETE, UPDATE and INSERT. On DROPPING a merge table, only the merge specification is dropped. The underlying tables are unaffected.

Points to remember:

MERGE table cannot enforce uniqueness over the underlying table.

MERGE table can be used to spilt big tables into parts and refer to them using the merged table name

MERGE table does not have its own indexes, it uses indexes of its underlying tables.

You cannot search the MERGE tables using FULLTEXT indexes. The FULLTEXT indexes can be created on the underlying MyIASM tables but they cannot be used through the MERGE table.

Merge tables use more file descriptors. If 10 clients are using a MERGE table that maps to 10 tables, the server uses (10 × 10) + 10 file descriptors.

Key reads are slower, since the on performing a SELECT, the MERGE storage engine has to issue a call on all the underlying tables one by one.

INSERTS can be directed to either the first or the last table using INSERT_METHOD parameter during MERGE table creation.

REPLACE does not work

When you create a MERGE table, there is no check to ensure that the underlying tables exist and have identical structures.

There is a limit of 2^32 rows to a MERGE table.

BDB(BerkeleyDB) Storage Engine

Sleepycat's Berkeley DB is a database engine that provides developers with fast, reliable, local persistence with zero administration (check www.sleepycat.com ). Mysql has been provided the BDB transactional storage engine. You need to have BDB installed on your system and compile MySQL with BDB to use the BDB storage engine. Each BDB table is stored in two files
.frm -> table definition/format
.db -> contains table data and indexes

Points to remember:

BDB tables can have up to 31 indexes per table and 16 columns per index.

MySQL needs a primary key for each BDB table. If not provided, MySQL would create a hidden primary key and use it.

If all columns that you access in a BDB table are part of the same index or part of the primary key, MySQL can execute the query without having to access the actual row.

Sequential scanning is slower for BDB tables than for MyISAM tables because the data in BDB tables is stored in B-trees and not in a separate data file.

There are often holes in the BDB table to allow you to insert new rows in the middle of the index tree. This makes BDB tables somewhat larger than MyISAM tables.

SELECT COUNT(*) FROM tablename is slow for BDB tables, because no row count is maintained in the table.

Internal locking in BDB tables is done at the page level.

Opening many BDB tables at the same time may be quite slow.

Each BDB table stores in its .db file the path to the file as it was created. So it is not possible to move BDB table files from one database directory to another. To shift a BDB table mysqldump and import are required to be done.

FEDERATED Storage Engine

The federated storage engine can be used to access data in tables of remote databases locally. When a federated table is created a .frm file is created in the database directory. This file just contains the table definition. The actual table remains in the remote mysql database.
Flow of information in case of federated tables is as follows :

SQL calls issued locally
MySQL handler API (data in handler format)
MySQL client API (data converted to SQL calls)
Remote database -> MySQL client API
Convert result sets (if any) to handler format
Handler API -> Result rows or rows-affected count to local

The structure of the federated table must be exactly similar to the remote table structure except that the ENGINE should be FEDERATED and CONNECTION parameter should contain the connection string that indicates to the federated engine how to connect to the remote mysql server and access the table over there.

connection string format:
mysql://user_name[:password]@host_name[:port_num]/db_name/tbl_name

Points to remember:

The remote table that a FEDERATED table points to must exist before you try to access the table through the FEDERATED table.

It is possible for one FEDERATED table to point to another, but you must be careful not to create a loop.

Transactions are not supported

There is no way for the FEDERATED engine to know if the remote table has changed.

The FEDERATED storage engine supports SELECT, INSERT, UPDATE, DELETE, and indexes. It does not support ALTER TABLE, or any Data Definition Language statements other than DROP TABLE. The DROP TABLE statement would drop only the local table and not the remote table.

FEDERATED tables do not work with the query cache.

ARCHIVE Storage Engine

Used for storing large amounts of data without indexes in a very small footprint. When an archive table is created, following files are created in the database directory.
.frm -> table definition
.ARZ -> DATA file
.ARM -> METADATA file
The engine uses zlib lossless data compression (check www.zlib.net ).

The Archive table is almost 75% smaller in size then the MyISAM table and almost 83% smaller in size then the transactional InnoDB engine. Even after compression of MyISAM table, Archive tables are still 7% smaller. Insert operations on an Archive table is faster than that on a MyISAM table. If table scans on both tables are compared, table scan on an Archive table is found to be faster.

Points to remember:

Supports only INSERTS and SELECTS

Does NOT support DELETE, REPLACE and UPDATE

Has row level locking

Rows are compressed as they are inserted.

A SELECT statement performs a complete table scan and uncompresses rows on demand. There is no ROW cache.

CSV Storage Engine

Stores data in text files using comma-separated values format. When a table is created 2 files are created in the database directory
.frm -> table definition
.CSV -> data file. A normal text file with comma separated values.

This engine does not support INDEXING. The data file (.csv) file can be read by spreadsheet applications like openoffice, MS office or star office.

In addition to these, there are two other storage engines:

EXAMPLE Storage engine: It is a stub engine and does nothing. You cannot perform anything on this engine except creation of tables. Tables are created as .frm files in the database directory. No data file is created and no data can be stored using this engine.
BLACKHOLE Storage engine: It acts as a black hole, accepts data and throws it away. It does not store anything. Selects always return an empty result set.

This summarizes the mysql storage engines. For details the following url can be referred.

http://dev.mysql.com/doc/refman/5.1/en/storage-engines.html

The following link can be used to select a storage engine.

http://dev.mysql.com/doc/refman/5.1/en/pluggable-storage-choosing.html

Next ... lets c what i can write