Tuesday, August 05, 2008

Berkeley DB

I had a chance to look in BDB-XML some time back. But i did not find it that good in performance. BDB on the other hand promises fast storage and retrieval of key-value pairs. For the new-comers, BDB is a embedded database API which can be used to create your own database and fire a defined set of queries on it. BDB does not provide you with an SQL interface or even a database server. All you have is a set of API using which you can write programs to create your own database, populate data into the database, define your own cache and indexes and then use the database to retrieve values for a particular key using your own programs.

With BDB, there is no sql/query optimizer/query parser layer between you and the database.

There are certain types of BDB engines available:

1.] BDB -> the original BDB with c/c++ api. You write programs in c/c++ to create and access your database.
2.] BDB-JAVA -> The java api to BDB. The java api uses JNI in the backend to communicate with the BDB library (written in c/c++).
3.] BDB-JE -> The java edition of the BDB engine. It is a pure java implementation of the BDB engine. It does not use JNI for communication with the system.
4.] BDB-XML -> This is a very sophesticated version of BDB - where you can store XML documents and retrieve documents using any of the keys in the XML Document. You have an XQuery interface where you can fire XML based queries and retrieve results.

The original BDB is ofcourse the fastest.

For a startup, we will take a look at the DPL API of BDB-JAVA. DPL stands for Direct Persistence Layer and is used generally for storing and managing java objects in the database. DPL works best with a static database schema and requires java 1.5.

To create a database using DPL, you generally require an entity class and then create/open a database environment and insert the entity object into the entity store in the database environment. Sounds greek right ?? Lets see an example

Entity Class

import com.sleepycat.persist.*;
import com.sleepycat.db.*;
import com.sleepycat.persist.model.*;
import static com.sleepycat.persist.model.Relationship.*;

@Entity
public class SimpleEntityClass {

   // Primary key is pKey
   @PrimaryKey
   private String pKey;

   // Secondary key is the sKey
   @SecondaryKey(relate=MANY_TO_ONE)
   private String sKey;

   public SimpleEntityClass(String pk, String sk)
   {
      this.pKey = pk;
      this.sKey = sk;
   }

   public void setpKey(String data)
   {
      pKey = data;
   }

   public void setsKey(String data)
   {
      sKey = data;
   }

   public String getpKey()
   {
      return pKey;
   }

   public String getsKey()
   {
      return sKey;
   }
}


Then create a Database Access class which can insert and retrieve Entity objects from the database.

import java.io.*;
import com.sleepycat.db.*;
import com.sleepycat.persist.*;

public class SimpleDA
{
   PrimaryIndex pIdx;
   SecondaryIndex sIdx;

   EntityCursor pcursor;
   EntityCursor scursor;

   public SimpleDA(EntityStore store) throws Exception
   {
      pIdx = store.getPrimaryIndex(String.class, SimpleEntityClass.class);
      sIdx = store.getSecondaryIndex(pIdx, String.class, "sKey");
   }

   public void addEntry(String pk, String sk) throws DatabaseException
   {
      pIdx.put(new SimpleEntityClass(pk, sk));
   }

   public SimpleEntityClass findByPk(String pk) throws DatabaseException
   {
      SimpleEntityClass found = pIdx.get(pk);
      return found;
   }

   public ArrayList findBySk(String sk) throws DatabaseException
   {
      ArrayList ret = new ArrayList();
      scursor = sIdx.subIndex(sk).entities();
      for(SimpleEntityClass sec1 = scursor.first(); sec1!=null; sec1 = scursor.next())
      {
         ret.add(sec1);
      }
      scursor.close();
      return ret;
   }
}


Create/open the environment to put and retrieve records from the database

import java.io.*;
import com.sleepycat.db.*;
import com.sleepycat.persist.*;

public class SimpleStore
{
   private static File envHome = new File("./bdbjava");
   private Environment env;
   private EntityStore store;
   private SimpleDA sda;

   public void setup() throws DatabaseException
   {
   // put all config options here.
      EnvironmentConfig envConfig = new EnvironmentConfig();
      envConfig.setAllowCreate(true);
      envConfig.setCacheSize(536870912); //512 MB
      envConfig.setCacheCount(2); // 2 caches of 256 MB each
      envConfig.setTxnWriteNoSync(true);
      envConfig.setInitializeCache(true);
      envConfig.setThreaded(true);
      envConfig.setInitializeLogging(true);
      envConfig.setTransactional(true);

      StoreConfig sConfig = new StoreConfig();
      sConfig.setAllowCreate(true);

      try
      {
         env = new Environment(envHome, envConfig);
         store = new EntityStore(env, "MyDatabaseName", sConfig);
      }catch(Exception ex)
      {
         ex.printStackTrace();
         System.exit(-1);
      }
   }

   public SimpleStore()
   {
      setup();
   }

   public void putData(String pk, String sk) throws Exception
   {
      sda = new SimpleDA(store);
      sda.addEntry(pk, sk);
   }

   public void getDataPk(String pk) throws Exception
   {
      sda = new SimpleDA(store);
      SimpleEntityClass sec = sda.findByPk(pk);
      System.out.println("pk = "+sec.getpKey()+", sk = "+sec.getsKey());
   }

   public void getDataSk(String sk) throws Exception
   {
      sda = new SimpleDA(store);
      ArrayList data = sda.findBySk(sk);
      for(int x=0; x<data.size(); x++)
      {
         SimpleEntityClass sec = data.get(x);
         System.out.println("pk = "+sec.getpKey()+", sk = "+sec.getsKey());
      }
   }

   public void closeAll() throws Exception
   {
      store.close();
      env.close();
   }

   public static void main(String[] args)
   {
      SimpleStore ss = new SimpleStore();
      ss.putData("pk1","sk1");
      ss.putData("pk2","sk2");
      ss.putData("pk3","sk1");
      ss.putData("pk4","sk3");

      ss.getDataPk("pk1");

      ss.getDataSk("sk1");

      ss.closeAll();
   }
}


So, now you have your own program for creating entities of the type SimpleEntityClass and store them in the database in serialized form. These objects cab be retrieved using primary or secondary keys.

For the relationship between primary and secondary keys please refer to http://www.oracle.com/technology/documentation/berkeley-db/db/java/com/sleepycat/persist/model/Relationship.html

Since you would be storing complete objects instead of just the required data sets, the database size would be relatively high and that would slow down things a bit.

No comments: