Tuesday, April 24, 2007

realtime fulltext index

For past some days, i have been struggling with getting something which can allow me to create, insert, update and search fulltext indexes - big fulltext indexes. Not very large but somewhere around 3-4 GB of data.

You would suggest mysql, but with mysql the inserts on table with fulltext indexes is very slow. For every insert, data is put in the index which leads to slow inserts.

What else? Well, i had come across some other fulltext engines like sphinx and senna. Compiled mysql with senna but there was no benefit. The index size was almost double and the searches were also slow - as slow as mysql fulltext index.

What about sphinx. Well, had integrated sphinx with mysql. But the sphinx engine has lots of limitations - like the first 2 columns must be integer and the 3rd column should be a text. Rest all columns should be integers. So if i use sphinx, i would need to write a stored procedure and trigger it to port data from my current table to a parallel sphinx table whenever a new row is inserted. And what would i do if i have to run a query - i would be joining both the tables. Would that be fast. Dont know. Let me figure out how to get this thing running...

You would say - what about lucene - my favorite search engine. Well dear, lucene is in java and there is no way i could integrate it with mysql if i have to do it in a jiffy. I would need to design and build a library to get this thing running. Forget it. In fact dbsight provides a similar type of library. Maybe i will get that thing working and check it out. At a small price, I might get what the organization requires and that too without wasting any resources on building it.

Hope i get some solution to the situation i am in right now. I have solutions for it, but not a solution which requires least amount of resources to get it running. Till i get one, the search will continue...

3 comments:

Phil said...

Choose a way to partition the data - split it out, make a lookup table to find out where it is, and make a load of small tables. Then you have a real-time index :)

Shipra said...

hmm..so mysql inserts are too slow, senna does not give any improvement, sphinx is too limited, and lucene takes time to integrate...so have you tried out phil's suggested solution till now??

Jayant Kumar said...

The solution suggested by phil is the ultimate solution - what i call as "divide and rule". But it has to be designed in a manner that resource usage is minimized.

And each application has specific needs which needs special designing to satisfy its needs and keep the app up and running...

So, lets c..