Showing posts with label comparison. Show all posts
Showing posts with label comparison. Show all posts

Tuesday, August 11, 2009

Document oriented data stores

A document oriented database or data store does not use tables for storing data. It stores each record as a document with certain characteristics. So a multiple documents in this type of store can have different characteristics - which means different number of fields per record and different fields per row. The benefit would be that if you are using a document oriented database for storing a large number of records in a huge database, any change in the number or type of row does not need an alter on the table. All you need to do is insert new documents with new structure and it is automatically inserted to the current datastore.

I went ahead and tried comparing some document-oriented data stores - TokyoTyrant, MongoDb and CouchDb and i compared them to Mysql-5.4 as well to get an idea about the performance advantage.

I created 3 scripts in all - 2 for insert and 1 for select. And created a 50,00,000 record table with a very normal record structure

field1 -> string, indexed
field2 -> string, indexed
date -> date, not indexed
n -> integer, indexed

The size of the original mysql table was 214 MB data & 205 MB index - total 419 MB for 50,00,000 records. The 3 scripts could be described as follows


  • insert script 1 : went from 1 to 25,00,000 pulling 1000 records in a single query from the source database and inserting the records one by one to the target data store.

  • insert script 2 : went from 50,00,000 to 24,99,999 pulling 1000 records in a single query from source database and inserting the records one by one to the target data store.

  • select script : picks records from the source database and makes 2 queries for both field1 & field2 and fires both the queries on the target data store.



Test Machine configuration:

CPU : Intel Xeon 1.6 GHz - Quad Core [4 MB L2 cache] 64 Bit
Memory : 8 GB RAM
OS : Centos 5.2 - Kernel 2.6.18 64 bit


Mysql

Used mysql version 5.4
Load during execution : 3.5
Time taken by Insert script 1 : 19682.673708916 sec for 2501000 records
Time taken by Insert script 2 : 19608.902073145 sec for 2499000 records
Time taken by Select script : 20465.380266905 sec for 8099782 records
Table engine : MyISAM
Total database size in mysql 5.4 : 215+233 = 248 MB

Ofcourse i used MyISAM which resulted in table locks, and i had indexes on three fields which resulted in increased locking times on these tables during inserts & selects.

MongoDB - www.mongodb.org

Used version for testing : 0.9.1
The current version available is 0.9.7 which has a lot of bug fixes and performance improvement features.MongoDB is written in C++ and stores data in BSON format which is Binary Serialization of JSON documents. It has a console (similar to mysql) which can be used to fire queries - so unlike bdb, you dont have to go and write programs to fire queries. MongoDB has full indexing support for different columns. You can also do query profiling (like explain in mysql) to check the execution path of the query and then make optimizations if possible. It provides replication support. It can be used to store large binary data like videos very efficiently.

The one good thing about the future of mongodb is that it would be providing auto sharding of data for cloud level scalability. This feature is in alpha stage now, but it should be mature in some time and could be used then.

Mongodb provides drivers/apis for a lot of languages including python, php, java, ruby, c++ and perl. I downloaded the php driver and compiled it and installed the extension mongo.so in the php extentions directory. And then i ran the same tests to check out the speed of mongodb.

Load during execution : 2.5
Time taken by Insert script 1 : 1006.6038348675 sec for 2501000 records
Time taken by Insert script 2 : 1435.0536739826 sec for 2499000 records
Time taken by Select script : 2942.2539789677 sec - 9999914 records
Total database size in mongodb : 4 GB (both data & index)

Wow, so Mongodb turns to be approximately 16 times faster than simple mysql - MyISAM tables

But it takes up a huge amount of space. Why is that? Well, Mongodb creates data files with predefined sizes so that it does not have to increase or decrease the file size as per requirements. The first file it creates is of 64MB, the next file is 128MB etc. upto 2GB. After 2GB the remaining files are of 2GB only. So, even if we exceed a byte above the current file of 2GB, another file of 2GB will be created - even if it is 98% empty.

Another important fact about mongodb is that mongodb's storage engine uses memory-mapped files for performance. This limits the data size on a 32 bit machine to around 2GB. I had hit this limitation earlier so i used a 64 bit machine for testing. But this architecture of data files in
mongodb storage engine allows the code to be much plain and simple and open to embrace the 64 bit world. Mongodb does not support traditional locking which is another reason it is fast.

More info / References :
32 bit limitation => blog.mongodb.org/post/137788967/32-bit-limitations
performance testing => www.mongodb.org/display/DOCS/Performance+Testing
regarding mongodb => www.mongodb.org/display/DOCS/Home
Quickstart => www.mongodb.org/display/DOCS/Quickstart
Production deployments => www.mongodb.org/display/DOCS/Production+Deployments
Locking in mongodb => www.mongodb.org/display/DOCS/Atomic+Operations

Tokyo Cabinet/Tyrant

Tokyo Tyrant is a set of 3 applications
- tokyo cabinet : the embedded data store
Tokyo Cabinet => tokyocabinet.sourceforge.net/spex-en.html
- tokyo tyrant : network api
Tokyo Tyrant => tokyocabinet.sourceforge.net/tyrantdoc/
- tokyo dystopia : fulltext search system
Tokyo Dystopia => tokyocabinet.sourceforge.net/dystopiadoc/

I explored tokyo cabinet-1.4.29 & tokyo tyrant-1.1.30. So after installing tokyo cabinet, I had to go ahead and install tokyo tyrant on top of tokyo cabinet for the networking support. Tokyo tyrant provides a set of binaries for talking to the tokyo cabinet embedded database. So you can use tokyo tyrant to insert and select data from a tokyo cabinet server.

I got an api for php to interact with the tokyo tyrant server. It is known as Php Tyrant and it can be obtained from mamasam.indefero.net/p/tyrant/. It is written completely in php and is not a C extension.

If you go through the documentation, you would see that Tokyo cabinet supports majorly 4 types of databases.

Hash database - a key value store which uses hash algorithm to retrieve records. The time complexity in this case is constant [ O(1) ].
B+ tree database - a B+Tree is slower than the hash database. Records of a B+Tree are sorted and arranged in logical pages. So the time complex
ity of record retrival is O(log n). But the size of a B+Tree database is half of a hash database.
Fixed length database - It is faster than a hash database. But it has a restriction that each key has to be a natural number and the length of
each value is limited. The whole region of the database is mapped to memory by the mmap call - which reduces overhead related to file I/O.
Table database - It is close to a document oriented database than a key value pair. You can also form indexes on columns in the table to improve search and sorting performance. Indices are implemented as different files of B+Tree database.

So, i created a simple table database with the structure i wanted and created indexes on the required columns. And started running my scripts. But i found that tokyo cabinet is fast in the beginning, but it suffers from file locks. So any operation happening on the table locks the comp
lete file. This caused a lot of problem with simultaneous inserts & selects. In fact i had cases where the server stopped responding totally. And it took 1 hr 17 minutes to push in just 661 inserts & fire 270 selects. So i stopped running the select script and focused on running multiple insert scripts. And then ran the select script separately.

This says that if you go ahead implementing tokyo tyrant on a huge table, you should be aware of the file locks and hence should implement the selects on a slave of the main database. A master can handle multiple inserts while a slave can handle multiple selects. Or you form a queue for firing queries sequentially instead of simultaneously.

I ran 2 insert scripts simultaneously and the results were even worse then mysql.

Load during execution : 2.9
Time taken by Insert script 1 : 65671.21945715 sec for 2501000 records
Time taken by Insert script 2 : 63807.51564312 sec for 2499000 records
Total database size : 1095 MB [715 MB data + 380 MB index]

And i ran 2 select scripts simultaneously and the results were comparable to that of mysql.

Load during execution : 2.1
Time taken by Select script 1 : 20269.342120886 sec for 9999914 records
Time taken by Select script 2 : 20115.848437071 sec for 9999914 records

What could be said about tokyo tyrant is that it should be used as a key-value store - maybe by using a hash database or a fixed length database for persistent session storage.

Couchdb
couchdb.apache.org/

The third database i looked into was couchdb. Which i had heard a lot about. The good thing about couchdb is that it provides a RESTful JSON API that can be accessed from any environment that allows HTTP requests. Couchdb is written in erlang and is quite fast. Again it does not have a c extention. There is a php api known as phpillow which wraps all functionalities of couchdb in its function calls.

You can download phpillow at arbitracker.org/phpillow.html

Also note that it works only with php 5.3. With php 5.2.x it still has a lot of bugs. Firstly there is no proper documentation, so it took me some time to figure out how to write insertion and select scripts and how to go about creating views (couchdb calls indexes as views). When i ra
n the scripts i found that it was crashing a lot. So, i did not go ahead with a proper testing of couchdb. Couchdb also supports master master replication (with developer supplied conflict resolution).

You could go ahead and read the comparison of couchdb & mongodb at mongodb's page

www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB
www.mongodb.org/display/DOCS/MongoDB%2C+CouchDB%2C+MySQL+Compare+Grid

Monday, November 12, 2007

xbox 360 vs sony ps3 vs nintendo wii

Yes, another comparison. But from an end user perspective. I have recently got an Xbox 360.



Each gaming console has its own advantages and disadvantages. Lets go through the tech specs first to see which one is the best technically...

Microsoft Xbox 360 :

Cost - 20,000/- INR for the pro console which contains a 20 GB hdd and various wires for connecting with various devices. 18,000 for the core console (basic without hdd and wires).

Processor - 3.2 GHz PowerPC with 3 dual threaded processor cores

Graphics - ATI based custom processor @ 500 MHz Clock speed.

Video RAM - Up to 512 MB GDDR3 system RAM (700 MHz) plus 10 MB embedded DRAM (eDRAM) frame buffer

Video memory bandwidth - 21.6 GBps to system RAM; 256 GBps to eDRAM

Video resolution - 16:9 widescreen 720p, 1080i, 1080p (will downsample to standard definition). HDTV output supported.

Sound - Dolby Pro-Logic II(analog), 5.1 channel Dolby Digital. Number of voices is software based and limited only by the CPU and memory available.

System memory - 512 MB GDDR3 RAM (700 MHz), shared with GPU. Memory bandwidth of 22.4 GBps.

Drives - Optical 12X dual layered DVD drive and 20 GB removable hard drive.

Memory card ports - 2 Xbox 360 memory units (64 MB or 512 MB).

USB 2.0 ports - 3

Networking - 1 ethernet port 100 MBps. No Wi-Fi. Online gaming via Xbox LIVE. Also download full length movies through xbox live gold.


Sony Playstation 3 (ps3) :

Cost - 30,000/- INR for the console which includes a 60 GB Hard drive and almost everything. A higher model is also available with 80 GB Hard drive.

Processor - 3.2 GHz Cell processor with 7 single-threaded synergistic processing units cores.

Graphics - NVIDIA-based RSX "Reality Synthesizer" @ 550 MHz

Video RAM - 256MB GDDR3 (700MHz)

Video memory bandwidth - 22.4 GBps

Video resolution - 480i, 480p, 720p, 1080i, 1080p (will downsample to standard definition). HDTV output supported.

Sound - Stereo (Analog sound). 5.1 Channel Dolby digital and 7.1 Channel LPCM. Number of voices is limited to 320 compressed channels if hardware based. For software based voices the number is limited only by the available CPU and memory.

System memory - 256 MB XDR RAM (3.2GHz). Memory bandwidth of 25.6 GBps.

Drives - Optical Blu-Ray and 60 GB or 80 GB replaceable hard drive.

Memory card ports - Flash memory card reader (supports Memory Stick, Compact Flash and SD/MMC).

USB 2.0 ports - 4

Networking - 1 ethernet port 1 GBps. Bluetooth 2.0. Bluetooth controller interface. Wi-Fi. Free PlayStation Network with micropayment system; includes a Web browser. Individual game makers can choose to charge for online services.



Nintendo wii :

Cost - Sorry, did not check the price in india. In USD it is $249. There is only 1 model. It is the cheapest of all consoles.

Processor - 729 MHz IBM Broadway processor with 5 execution units

Graphics - ATI Hollywood processor @ 243 MHz

Video RAM - 24 MB of system RAM (486 MHz) plus 3 MB of embedded DRAM (eDRAM)

Video memory bandwidth - 3.9 GBps

Video resolution - 853 x 480 (480p) in widescreen or 4:3 aspect ratio. HDTV output NOT supported.

Sound - Dolby Pro-Logic II (Analog sound). No digital sound output. Number of voices is limited to Hardware DSP with 64+ channels

System memory - 64 MB GDDR3 RAM. Memory bandwidth of 1.9 GBps.

Drives - Proprietary optical drive. No hard drive. Has 512 MB internal flash memory for storing saved games, downloaded games and other data.

Memory card ports - 1 SD card slot, 2 GameCube memory card ports.

USB 2.0 ports - 2

Networking - No ethernet ports. Integrated Wi-Fi for networking and internet access. Bluetooth 2.0 for controllers. Wii Network online service includes online shopping, Web browsing, messaging, and other features.

So from all these following things are clear

-> Wii is underpowered as compared to the three.
-> Xbox 360 is ok, but ps3 has the best configuration.
-> In terms of networking and interconnectivity between devices, xbox is the best. Which when connected with a windows media centric pc can be used to record and play live media. In addition, media can be downloaded and played.
-> PS3 seems more like a pc than a gaming machine.
-> In terms of games the Xbox 360 has numerous options to choose from. Though Xbox 360 does not support most of the legacy Xbox games, but still it has a wide range of titles to choose from. Where as PS3 does not have that many titles to choose from. But since PS2 and PS1 games can also be played on PS3, the number of titles for PS3 also goes high. And Wii supports almost all original gamecube titles. Which means that if you have a PS2 or a game-cube, it would make more sense to upgrade to PS3 or Wii respectively. Since then you would be able to play your older games on the new machine.
-> The best things about PS3 are its video graphics which is said to be awesome. In addition to the next gen Blu-Ray drive and Wi-Fi. A Wi-Fi on Xbox 360 would have made it look irresistible.
-> And the thing about Wii is its wireless remote which uses accelerometers to sense how players swing, point, and tilt the controller, encouraging game titles to incorporate activity. So to swing a golf club in Wii Sports, which is bundled with the console, you literally swing the remote controller as if it were a golf club. That is something which is unavailable with xbox 360 or PS3.
-> If you are looking at power consumption factor then both Xbox 360 and PS3 are power hogs. Wii is very efficient (uses only 18 Watts of power). Xbox 360 is still a bit better than PS3 in terms of power consumption.

If you are looking at a complete entertainer with lots of games to choose from and easy connectivity between various "windows" based devices that you have, then Xbox 360 is recommended.

But before going out and purchasing just an Xbox 360, please be aware that it is a microsoft product and is made to crash. The "Ring of 3 red lights of death". That's what is supposed to happen with the Xbox. The problem, as i have found out after some googleing is that the Xbox 360 unit is not properly cooled. So if you decide to buy an Xbox-360 be prepared to have a fan (preferably an A/c) on top of it cooling it down. Though microsoft has been prompt in replacing defective Xbox-360s with new ones. So in case your Xbox-360 dies, you can call up microsoft and ask for replacement. PS3 and Wii have never had any issue of this type till now. They seem to be more properly tested.

There is also this funda or modding an Xbox-360. You can go to specific places where they place chips in your Xbox-360 for around 2000/- after which you can play most of the pirated games which come for 200-300 rs. The original Xbox-360 games cost around 2000-2500/- too much to spend for a normal person.

The thing i liked about xbox was its support - not from microsoft, but from the local community. It is difficult to hear about PS3 or Wii, but you can hear about Xbox-360 a lot cause there are lots of people using it. And once you go live on Xbox-360(it requires a minimum of 512 KBps internet connection), you have the whole world to play against.

Wii also has tonnes of games, but it is more appealing to small kids. Xbox has games both for serious players and kids. Whereas PS3 has some games which are mostly for serious players.

Source:

http://www.winsupersite.com/showcase/xbox360_ps3_wii.asp
http://www.hardcoreware.net/reviews/review-356-1.htm
http://www.winsupersite.com/showcase/xbox360_how_to_choose.asp
And some other sites...