Whatever....: September 2009

Saturday, September 19, 2009

innodb configuration and optimization

Just thought i should have a quick guide for configuring an optimized for innodb my.cnf

use innodb_data_home_dir & innodb_data_file_path to specify the location and size of the tablespace.

innodb_data_home_dir = /path/to/data
innodb_data_file_path = ibdata1:50M;ibdata2:50M:autoextend:max:10G
innodb_autoextend_increment = 10

This asks innodb to create two files ibdata1 & ibdata2 (as tablespace) and keep them in the /path/to/data directory. Both would be of 50 MB initially, but ibdata2 would expand automatically by innodb_autoextend_increment MBs (yes the value is in MB) till the max size of 10 GB. InnoDB forms the directory path for each data file by textually concatenating the value of innodb_data_home_dir to the data file name, adding a path name separator (slash or backslash) between values if necessary.

By default innodb creates all its tables in the tablespace specified by the innodb_data_file_path directory - in a single file - as specified. But there is another option using which you can ask innodb to create tablespace for each table.

innodb_file_per_table

Innodb would now store each newly created table in its own table_name.ibd file - which would contain both the data and the index.

Lets look at some optimization and performance settings for innodb

innodb_additional_mem_pool_size=4M

If you have a large number of innodb tables, you will need to increase this value. It is used to store data dictionary information and other internal data structures. If the value is low, you might see warning messages in the mysql error log.

innodb_autoinc_lock_mode=2

Innodb uses some table level locking for generating auto increment values - which is configurable. If the value is set to 0 (traditional mode), for all insert statements, a table level auto_inc lock is obtained and maintained till the end of the insert statement. If the value is set to 1(consecutive lock mode), all bulk insert-like statements obtain and maintain the auto_inc lock on the whole table. But simple inserts use a new locking method where a lightweight mutex is used during the allocation of auto-increment values, and no table-level AUTO-INC lock is used. If the value is set to 2 (interleaved lock mode), no insert-like statements use the table-level AUTO-INC lock, and multiple statements can execute at the same time. This is the fastest and most scalable lock mode, but it is not safe when using statement-based replication or recovery scenarios when SQL statements are replayed from the binary log. Here auto-increment values are guaranteed to be unique and monotonically increasing across all concurrently executing insert-like statements. However, because multiple statements can be generating numbers at the same time (that is, allocation of numbers is interleaved across statements), the values generated for the rows inserted by any given statement may not be consecutive.

innodb_buffer_pool_size=1024M

The larger the buffer pool the less disk I/O would be needed to access data in tables. Generally buffer pool should be around 50% of available memory size. Very large values can result in paging and slow down the system.

innodb_commit_concurrency=4

The number of threads that can commit at the same time. Should be equal to the number of available cpus or cores.

innodb_flush_log_at_trx_commit=2

If the value is 0, the log buffer is written out to the log file once per second and the flush to disk operation is performed on the log file, but nothing is done at a transaction commit. When the value is 1 (the default), the log buffer is written out to the log file at each transaction commit and the flush to disk operation is performed on the log file. When the value is 2, the log buffer is written out to the file at each commit, but the flush to disk operation is not performed on it. However, the flushing on the log file takes place once per second also when the value is 2. Note that the once-per-second flushing is not 100% guaranteed to happen every second, due to process scheduling issues.

The default value of 1 is the value required for ACID compliance. Better performance can be achieved by setting the value different from 1, but then you can lose at most one second worth of transactions in a crash. With a value of 0, any mysqld process crash can erase the last second of transactions. With a value of 2, then only an operating system crash or a power outage can erase the last second of transactions.

innodb_log_buffer_size=8M

The size of buffer that innodb uses to write log files on disk. A large buffer allows large transactions to run without a need to write the log to disk before the transactions commit.

innodb_log_file_size=512M

The size of each log file in a log group. The combined size of log files must be less than 4GB. Sensible values range from 1MB to 1/N-th of the size of the buffer pool, where N is the number of log files in the group. The larger the value, the less checkpoint flush activity is needed in the buffer pool, saving disk I/O. But larger log files also mean that recovery is slower in case of a crash.

innodb_log_files_in_group=2

The number of log files in the log group.

innodb_open_files=300

The maximum no of tablespaces that innodb can keep open at one time.

innodb_support_xa=false

If this is enabled, InnoDB support for two-phase commit in XA transactions is enabled, which causes an extra disk flush for transaction preparation. If you do not wish to use XA transactions, you can disable this variable to reduce the number of disk flushes and get better InnoDB performance.

innodb_thread_concurrency=8

InnoDB tries to keep the number of operating system threads concurrently inside the engine less than or equal to the limit given by this variable. Once the number of threads reaches this limit, additional threads are placed into a wait state within a FIFO queue for execution. Threads waiting for locks are not counted in the number of concurrently executing threads. By default this value should be twice the number of available cpus/cores.

Using some of the data here, you can easily setup a machine to perform better using innodb. You will always have to choose between performance and reliability as both are inversely proportional. As performance increases the reliability of data would decrease. The variables that i have set here are for high performance only.

Tuesday, September 08, 2009

mad world - tears for fears

All around me are familiar faces
Worn out places, worn out faces
Bright and early for their daily races
Going nowhere, going nowhere

Their tears are filling up their glasses
No expression, no expression
Hide my head I want to drown my sorrow
No tommorow, no tommorow

And I find it kind of funny, I find it kind of sad
The dreams in which Im dying are the best I've ever had
I find it hard to tell you, I find it hard to take
When people run in circles its a very, very
Mad world, Mad World

Children waiting for the day they feel good
Happy birthday, happy birthday
And I feel the way that every child should
Sit and listen, sit and listen

Went to school and I was very nervous
No one knew me, no one knew me
Hello teacher tell me whats my lesson
Look right through me, look right through me

And I find it kind of funny, I find it kind of sad
The dreams in which I'm dying are the best I've ever had
I find it hard to tell you, I find it hard to take
When people run in circles its a very, very
Mad world, Mad World, enlarging your world
Mad World

Monday, September 07, 2009

moniter and improve lucene search

How ?
Simple - use lucidgaze available at http://www.lucidimagination.com/Downloads/LucidGaze-for-Lucene

With lucidgaze you could analyze

Read/write stats on index utilization, distribution and throughput

Query efficiency stats - show how effectively user input is analyzed and decomposed for processing by the index

Mapping of tokenizers, token-streams and analyzers - makes transparent how text is processed and indexed

So, we went ahead and downloaded the api but were unable to find any example program or source code. After some amount of tinkering around we were finally able to figure out how to use it.

You could get the code here : http://ai-cafe.blogspot.com/2009/09/lucid-gaze-tough-nut.html

Wednesday, September 02, 2009

DB Basics : Types of joins

Join Types

So, you have designed a normalized database design for your application. Maybe up to the 3rd normal form. And now, when you need to run queries, you would need to join the tables in the query to get the required information. There has to be some common data that allow those tables to be connected in some meaningful way. Although it’s possible to have more than one common column between two tables, most often, the join key will be the primary key of one table and a foreign key in the other.

Lets perform queries against the following table.

company : company_id (pk), company_name
model : model_id(pk), company_id(fk), model_name

Here, we have a company table and a model table in 1:n relationship. A car company can have multiple models in the market. Let us take some sample data

company_id	company_name
1	Audi
2	GM
3	Ford
4	Toyota
5	Tata
6	BMW

model_id	company_id	model_name
1	1	A4
2	1	A6
3	1	A8
4	1	Q7
5	2	Chevrolet
6	2	Hummer
7	3	Ikon
8	3	Endeavor
9	3	Fiesta
10	4	Corola
11	4	Camry
12	4	Innova
13	5	Indica
14	5	Nano
15	5	Safari
16	Null	Custom