Showing posts with label java. Show all posts
Showing posts with label java. Show all posts

Wednesday, June 08, 2011

Data structures - TRIE

Trie is a data structure used for data retrieval. It is a multiway tree structure used to store strings over an alphabet. It is mainly used for large dictionaries of english words in spell-checking and natural language understanding programs. The idea is that all strings which have a common stem or prefix hang off a common node. the elements in a string can be recovered by scanning from the root to the leaf that ends the string. All strings in a trie can be recovered by doing a depth first scan of the tree.

The time to insert, delete or find is almost identical because the code paths followed for each are almost identical. As a result, for inserting, deleting or searching, tries can easily beat binary search trees or even hash tables.

Advantages of Trie over Binary Search Tree
1. Looking up of keys is faster. Worst case scenario a key of length m takes O(m) time. BST performs search by doing O(log(n)) comparisons. In worst case BST takes O(m log n) time. Simple operations that tries use during lookup such as array indexing using a character are fast on real machines.
2. Tries are more space efficient when they contain a large number of short keys, because nodes are shared between keys with common initial subsequences.
3. Tries facilitate longest-prefix matching, helping to find the key sharing the longest possible prefix of characters all unique.
4. The number of internal nodes from root to leaf equals the length of the key. Balancing the tree is therefore no concern.

Advantages of Trie over Hash Tables
1. Tries support ordered iteration, whereas iteration over a hash table will result in a pseudorandom order given by the hash function (also, the order of hash collisions is implementation defined), which is usually meaningless
2. Tries facilitate longest-prefix matching, but hashing does not, as a consequence of the above. Performing such a "closest fit" find can, depending on implementation, be as quick as an exact find.
3. Tries tend to be faster on average at insertion than hash tables because hash tables must rebuild their index when it becomes full - a very expensive operation. Tries therefore have much better bounded worst case time costs, which is important for latency sensitive programs.
4. By avoiding the hash function, tries are generally faster than hash tables for small keys like integers and pointers.

Some disadvantages of Trie
1. Tries can be slower in some cases than hash tables for looking up data, especially if the data is directly accessed on a hard disk drive or some other secondary storage device where the random access time is high compared to main memory
2. Some keys, such as floating point numbers, can lead to long chains and prefixes that are not particularly meaningful. Nevertheless a bitwise trie can handle standard IEEE single and double format floating point numbers.

Code for implementing trie in java
import java.util.HashMap;
import java.util.Map;

public class Trie {
    private static class Node {
        private boolean isWord = false; // indicates a complete word
        private int prefixes = 0; // indicates how many words have the prefix
        private Map children = new HashMap(); // references to all possible children
    }

    private Node root = new Node();

    /**
     * Inserts a new word into the trie
     * @param word
     */
    public void insertWord(String word){
        if(searchWord(word) == true) return;

        Node current = root;
        for(char c : word.toCharArray()){
            if(current.children.containsKey(Character.valueOf(c))){
                Node child = current.children.get(Character.valueOf(c));
                child.prefixes++;

                current = child;
            }else{
                Node child = new Node();
                child.prefixes = 1;

                current.children.put(Character.valueOf(c), child);
                current = child;
            }
        }
        // we have reached the endof the word, hence mark it true
        // if during a search we reach the end of the search string and this
        // flag is still false, then the search string is not registered as a valid
        // word in the trie but is a prefix
        current.isWord = true;
    }

    /**
     * Searches for a word in the trie
     * @param word
     */
    public boolean searchWord(String word){
        Node current = root;
        for(char c : word.toCharArray()){
            if(current.children.containsKey(Character.valueOf(c))){
                current = current.children.get(Character.valueOf(c));
            }else{
                return false;
            }
        }
        // if at the end of the search of entire word the boolean variable is
        // still false means that the given word is not regitered as a valid
        // word in the trie, but is a prefix
        return current.isWord;
    }

    /**
     * Deletes a word from the trie
     * @param word
     */
    public void deleteWord(String word){
        if(searchWord(word) == false) return;

        Node current = root;
        for(char c : word.toCharArray()){
            Node child = current.children.get(Character.valueOf(c));
            if(child.prefixes == 1){
                current.children.remove(Character.valueOf(c));
                return;
            }else{
                child.prefixes--;
                current = child;
            }
        }
        // since the word is removed now, set the flag to false
        current.isWord = false;
    }

    public static void main(String[] args) {
        Trie trie = new Trie();
        trie.insertWord("ball");
        trie.insertWord("balls");
        trie.insertWord("bat");
        trie.insertWord("doll");
        trie.insertWord("dork");
        trie.insertWord("dorm");
        trie.insertWord("send");
        trie.insertWord("sense");

        // testing deletion
        System.out.println("Testing deletion : ");
        System.out.println(trie.searchWord("ball"));
        trie.deleteWord("ball");
        System.out.println(trie.searchWord("ball"));
        System.out.println(trie.searchWord("balls"));

        // testing insertion
        System.out.println("Testing insertion : ");
        System.out.println(trie.searchWord("dumb"));
        trie.insertWord("dumb");
        System.out.println(trie.searchWord("dumb"));
        trie.deleteWord("dumb");
        System.out.println(trie.searchWord("dumb"));
    }
} 

OUTPUT : java Trie

Testing deletion :
true
false
true
Testing insertion :
false
true

Source : http://en.wikipedia.org/wiki/Trie,
http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/Trie/

Wednesday, June 16, 2010

Writing your first map-reduce program on hadoop

Before we go ahead with the actual program, lets have a look at what map-reduce is and its usage.

Map-Reduce is a programming model or concept which helps in implementing and processing large scale data sets. MapReduce model consists of two functions map() and reduce(). The map() function is applied to all input items in the input data set converting them to a set of key-value pairs. Multiple map jobs can be run in parallel over different processing nodes to process large data sets.

Map (K1, V1) = List (K2, V2).

The list of processed pairs (K2,V2) is collected by the MapReduce framework. All pairs with the same key are grouped together, creating one group for each one of the generated keys. The reduce function is then applied to each group in parallel to produce a collection of "similar" values.

Reduce (K2, List(V2)) = List (V3)

Since both map and reduce jobs can be run in parallel, they could be distributed over a large number of processing nodes to process large data sets.

Lets write a simple program to count the number of characters in each string using the MapReduce concept but without using any framework. Our program here would create some random strings and use MapReduce concept to distribute the task over a number of threads and eventually output the total number of characters.

// MyMapReduce.java
import java.util.*;

public class MyMapReduce
{
  List<List<String>> buckets = new ArrayList<List<String>>();
  List<String> intermediateresults = new ArrayList<String>();
  List<String> values = new ArrayList<String>();

  public void init()
  {
    for(int i = 1; i<=30; i++)
    {
      values.add("zyx" + new Integer(i).toString());
    }
    System.out.println("**Running Conversion into Buckets**\n");
    //convert the input data in smaller chunks. Here dividing 30 strings into chunks of 6 chunks of 5.
    List b = step1ConvertIntoBuckets(values,5);
    System.out.println("*DONE*\n");
    System.out.println("**Running #Map Function# concurrently for all Buckets\n");
    List res = step2RunMapFunctionForAllBuckets(b);
    System.out.println("*MAP Done*\n");
    System.out.println("**Running #Reduce Function# for collating Intermediate Results and Printing Results\n");
    step3RunReduceFunctionForAllBuckets(res);
    System.out.println("*REDUCE Done*\n");
  }
  public List step1ConvertIntoBuckets(List list,int numberofbuckets)
  {
    int n = list.size();
    int m = n / numberofbuckets;
    int rem = n% numberofbuckets;

    int count = 0;
    System.out.println("BUCKETS");
    for(int j =1; j<= numberofbuckets; j++)
    {
      List<String> temp = new ArrayList<String>();
      for(int i=1; i<= m; i++)
      {
        temp.add((String)values.get(count));
        count++;
      }
      buckets.add(temp);
      temp = new ArrayList<String>();
    }
    if(rem != 0)
    {
      List<String> temp = new ArrayList<String>();
      for(int i =1; i<=rem;i++)
      {
        temp.add((String)values.get(count));
        count++;
      }
      buckets.add(temp);
    }
    System.out.println(buckets);
    return buckets;
  }

  public List step2RunMapFunctionForAllBuckets(List list)
  {
    for(int i=0; i< list.size(); i++)
    {
      List<String> elementList = (ArrayList)list.get(i);
      new StartThread(elementList).start();
    }
    try
    {
      Thread.currentThread().sleep(1000);
    }catch(Exception e)
    {    }
    return intermediateresults;
  }

  public void step3RunReduceFunctionForAllBuckets(List list)
  {
    int sum =0;
    for(int i=0; i< list.size(); i++)
    {
      //you can do some processing here, like finding max of all results etc
      int t = Integer.parseInt((String)list.get(i));
      sum += t;
    }
    System.out.println("\nTotal Count is "+ sum+"\n");
  }

  class StartThread extends Thread
  {
    private List<String> tempList = new ArrayList<String>();
    public StartThread(List<String> list)
    {
      tempList = list;
    }
    public void run()
    {
      System.out.println("In Map...");
      for (String str : tempList)
      {
        synchronized(this)
        {
          intermediateresults.add(new Integer(str.length()).toString());
        }
      }
    }
  }
  public static void main(String[] args)
  {
    MyMapReduce my = new MyMapReduce();
    my.init();
  }
}

In this program, you have an arraylist of 30 strings. It is being divided into 5 batches of 6 strings. Each Batch is run in parallel in a thread and the processed information is collected in an intermediate arraylist. The reduce function is then called which loops through the intermediate arraylist and sums up the counts to output the information. Here only the Map function is being distributed over multiple threads. The reduce simply loops through the output of map and sums up the information.

Compile and run the program - output is :

$ java MyMapReduce 
**Running Conversion into Buckets**

BUCKETS
[[zyx1, zyx2, zyx3, zyx4, zyx5, zyx6], [zyx7, zyx8, zyx9, zyx10, zyx11, zyx12], [zyx13, zyx14, zyx15, zyx16, zyx17, zyx18], [zyx19, zyx20, zyx21, zyx22, zyx23, zyx24], [zyx25, zyx26, zyx27, zyx28, zyx29, zyx30]]
*DONE*

**Running #Map Function# concurrently for all Buckets

In Map...
In Map...
In Map...
In Map...
In Map...
*MAP Done*

**Running #Reduce Function# for collating Intermediate Results and Printing Results


Total Count is 141

*REDUCE Done*

A mapreduce framework like hadoop provides the distributed setup to run such mapreduce programs over a large number of processing nodes. It can automatically handle the spawning of processing threads and store the intermediatory information in some temporary file among its nodes.

Pre-requisite : setup hadoop on multiple nodes refer - http://jayant7k.blogspot.com/2008/10/setting-up-hadoop.html

Here is the same code written in the hadoop 0.20.1 map-reduce framework

import java.io.*;
import java.util.*;

import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.util.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount extends Configured implements Tool {

  public static class MapClass extends Mapper<Object, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
      String line = value.toString();
      StringTokenizer itr = new StringTokenizer(line);
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }

  /**
   * A reducer class that just emits the sum of the input values.
   */
  public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable value : values) {
        sum += value.get();
      }
      context.write(key, new IntWritable(sum));
    }
  }

  static int printUsage() {
    System.out.println("wordcount [-r <reduces>] <input> <output>");
    ToolRunner.printGenericCommandUsage(System.out);
    return -1;
  }

  public int run(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = new Job(conf, "WordCount example for hadoop 0.20.1");

    job.setJarByClass(WordCount.class);
    job.setMapperClass(MapClass.class);
    job.setCombinerClass(Reduce.class);
    job.setReducerClass(Reduce.class);

    // the keys are words (strings)
    job.setOutputKeyClass(Text.class);
    // the values are counts (ints)
    job.setOutputValueClass(IntWritable.class);


    List<String> other_args = new ArrayList<String>();
    for(int i=0; i < args.length; ++i) {
      try {
        // The number of map tasks was earlier configurable, 
        // But with hadoop 0.20.1, it is decided by the framework.
        // Since this heavily depends on the input data size and how it is being split.
        if ("-r".equals(args[i])) {
          job.setNumReduceTasks(Integer.parseInt(args[++i]));
        } else {
          other_args.add(args[i]);
        }
      } catch (NumberFormatException except) {
        System.out.println("ERROR: Integer expected instead of " + args[i]);
        return printUsage();
      } catch (ArrayIndexOutOfBoundsException except) {
        System.out.println("ERROR: Required parameter missing from " +
            args[i-1]);
        return printUsage();
      }
    }
    // Make sure there are exactly 2 parameters left.
    if (other_args.size() != 2) {
      System.out.println("ERROR: Wrong number of parameters: " + other_args.size() + " instead of 2.");
      return printUsage();
    }
    FileInputFormat.addInputPath(job, new Path(other_args.get(0)));
    FileOutputFormat.setOutputPath(job, new Path(other_args.get(1)));

    //submit job and wait for completion. Also show output to user.
    job.waitForCompletion(true);
    return 0;
  }
  public static void main(String[] args) throws Exception {
    int res = ToolRunner.run(new Configuration(), new WordCount(), args);
    System.exit(res);
  }
}
Compile, create jar and run the file
$ javac -classpath ../hadoop-0.20.1/hadoop-0.20.1-core.jar -d wordcount_classes/ WordCount.java
$ jar -cvf wordcount.jar -C wordcount_classes/ .
$ cd ../hadoop-0.20.1
$ ./bin/hadoop jar ../progs/wordcount.jar WordCount -r 2 /user/hadoop/input /user/hadoop/output

10/06/16 17:00:47 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/06/16 17:00:48 INFO input.FileInputFormat: Total input paths to process : 2
10/06/16 17:00:48 INFO mapred.JobClient: Running job: job_201006141203_0008
10/06/16 17:00:49 INFO mapred.JobClient:  map 0% reduce 0%
10/06/16 17:00:58 INFO mapred.JobClient:  map 100% reduce 0%
10/06/16 17:01:10 INFO mapred.JobClient:  map 100% reduce 100%
10/06/16 17:01:12 INFO mapred.JobClient: Job complete: job_201006141203_0008
10/06/16 17:01:12 INFO mapred.JobClient: Counters: 17
10/06/16 17:01:12 INFO mapred.JobClient:   Job Counters 
10/06/16 17:01:12 INFO mapred.JobClient:     Launched reduce tasks=2
10/06/16 17:01:12 INFO mapred.JobClient:     Launched map tasks=2
10/06/16 17:01:12 INFO mapred.JobClient:     Data-local map tasks=2
10/06/16 17:01:12 INFO mapred.JobClient:   FileSystemCounters
10/06/16 17:01:12 INFO mapred.JobClient:     FILE_BYTES_READ=2009
10/06/16 17:01:12 INFO mapred.JobClient:     HDFS_BYTES_READ=1467
10/06/16 17:01:12 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=4142
10/06/16 17:01:12 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1356
10/06/16 17:01:12 INFO mapred.JobClient:   Map-Reduce Framework
10/06/16 17:01:12 INFO mapred.JobClient:     Reduce input groups=0
10/06/16 17:01:12 INFO mapred.JobClient:     Combine output records=142
10/06/16 17:01:12 INFO mapred.JobClient:     Map input records=33
10/06/16 17:01:12 INFO mapred.JobClient:     Reduce shuffle bytes=2021
10/06/16 17:01:12 INFO mapred.JobClient:     Reduce output records=0
10/06/16 17:01:12 INFO mapred.JobClient:     Spilled Records=284
10/06/16 17:01:12 INFO mapred.JobClient:     Map output bytes=2200
10/06/16 17:01:12 INFO mapred.JobClient:     Combine input records=190
10/06/16 17:01:12 INFO mapred.JobClient:     Map output records=190
10/06/16 17:01:12 INFO mapred.JobClient:     Reduce input records=142
Here is an explanation of the program

Configuration is the configuration which is replicated at all nodes in HDFS. All HDFS nodes run on their own JVM. JobTracker runs on master and accepts jobs from clients. It distributes the job into parts and sends the parts over to all the nodes via TaskTracker. taskTracker communicates with the jobtracker and keeps on asking for more job to run - as and when it finishes its task. Tasktracker can run multiple instances for multiple tasks. You set the mapper class and the reducer class in job.

job.setMapperClass();
job.setReducerClass();

You can also specify the input and output paths in the job

FileInputFormat.addInputPath(job, Path);
FileOutputFormat.setOutputPath(job, Path);

You can also set a number of other options in the Job.

job.setNumReduceTasks(); //set number of reduce tasks
job.setOutputFormat(); //set the output format

Job is submitted by calling either of.

job.submit(); //send the job to the cluster and return - non blocking
job.WaitForCompletion(); // send the job to the cluster and wait for its completion

Job determines the proper division of input into parts and sends job data to master jobtracker server. Tasktracker asks the jobtracker for its inputsplit data and the job to run. It calls mapper for each record received from the inputSplit. Mapper should extend the mapreduce.Mapper class which has code for some of the required functions of mapper. All mappers run separately on each node inside the tasktracker running on separate instances of jvm. There is no sharing of data. If you set a static variable in one mapper, you will not get its value in another mapper on another tasktracker.

Map(Object key, Object value, Context context)

To allow serialization and transfer of all types of data, java defines its own writable class. These box classes like Text (for String), IntWritable (for integers), LongWritable (for long) are instances of base class Writable (for values), and instances of WritableComparable (for Keys). Context is used to collect and write the ouput into intermediate as well as final files.

context.write() takes (Key,Value)

Partitioner is used to get the partition number for a given key. By default HashPartitioner is used which uses key.hashCode() to return the partition number. You can use job to specify custo
m partitioner.

Reducer(Object Key, Iterator Values, Context context)

Keys and values sent to one partition all go to the same reduce task. Reduce calls are sorted by key. Reducer sends the data to the recordwriter which writes the data to a output file.

Hadoop provides lots of flexibility to override the components and customize the input and output data.

HadoopStreaming can be used to interface the hadoop map-reduce framework with arbitary program code. It uses stdin and stdout for data flow. You can define a separate program for each of map
per and reducer in any language of your choice.

Friday, May 28, 2010

Quick sort algorithm

Quicksort follows the simple strategy of dividing the data into parts and sorting it.

Simple steps for the algorithm are:
  • Pick an element, called a pivot, from the list.
  • Reorder the list so that all elements with values less than the pivot come before the pivot, while all elements with values greater than the pivot come after it (equal values can go either way). After this partitioning, the pivot is in its final position. This is called the partition operation.
  • Recursively sort the sub-list of lesser elements and the sub-list of greater elements.

Time complexity of the quick sort algorithm is O(n log n). And the worst case time complexity is O(n^2). Quicksort has a space complexity of O(log n).

Lets see a java program to do quick sort

public class QuickSort {

    public int[] data;
    public QuickSort(int[] data)
    {
        this.data = data;
    }

    public void sortMe()
    {
        this.sortMe(data, 0, data.length-1);
    }

    public void sortMe(int[] array, int start, int end)
    {
        int i=start;
        int k=end;

        if((end - start) < 1) //return if there is < 1 element
            return;

        int pivot = array[start]; //take an integer as pivot
        while(k > i) //while the left and right indices have not met
        {
            //from left to right find first element greater than pivot
            while(array[i] <= pivot && i <= end && k > i) 
                i++;
            //from right to left find first element smaller than pivot
            while(array[k] > pivot && k >= start && k >= i)
                k--;
            //if left index is still smaller than right, swap the elements
            if(k > i)
                swap(array, i, k);
        }
        //swap the last element in the left partition with the pivot.
        swap(array, start, k);

        //quicksort left partition
        sortMe(array, start, k-1);
        //quicksort right partition
        sortMe(array, k+1, end);
    }

    public void swap(int[] array, int idx1, int idx2)
    {
        int temp=array[idx1];
        array[idx1] = array[idx2];
        array[idx2] = temp;
    }

    public void printMe(int[] data)
    {
        for(int x=0; x<data.length; x++)
        {
            System.out.print(data[x]+", ");
        }
        System.out.println();
    }

    public static void main(String[] args)
    {
        int[] data = {9,2,6,4,11,88,66,44,55,32,23,5,32,10};
        QuickSort qs = new QuickSort(data);
        System.out.print("Before Sorting : ");
        qs.printMe(data);
        qs.sortMe();        
        System.out.print("After Sorting : ");
        qs.printMe(qs.data);
    }

}

Friday, April 09, 2010

An intro to Thrift

What is thrift - a software framework for scalable cross language service development. It combines software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, Ruby, Erlang, Perl, Smalltalk etc.

Thrift allows you to define data types and service interfaces in a simple definition file. Taking the definition file as input the thrift compiler generates code that can be used to easily build RPC clients and servers that communicate seamlessly across programming languages.

You can find all this stuff at http://incubator.apache.org/thrift/.

My intention here is to provide a step by step guide to installing and building simple services in java/php using thrift. Since till date there is no proper tutorial, this should be helpful.

To download thrift, you can either get it from http://incubator.apache.org/thrift/download/ or do

$ svn co http://svn.apache.org/repos/asf/incubator/thrift/trunk thrift
$ cd thrift

Installation steps

$ ./bootstrap.sh
$ ./configure
$ make
$ make install


thrift depends on c++ boost libraries. To install boost libraries on ubuntu do

$ sudo apt-get install libboost-dev libboost-doc libboost-thread-dev libboost-python-dev

or

$ sudo apt-get install libboost-*

One problem i ran into while doing "sudo make install" was

Error: JAVA_HOME is not defined correctly.
We cannot execute java

The solution to this is

$ su
$ <enter password>
$ export JAVA_HOME=/path/to/jdk
$ export ANT_HOME=/path/to/ant
$ export PATH=$JAVA_HOME/bin:$ANT_HOME/bin:$PATH
$ make install


It would install the required libraries for all supported languages.

Now lets try writing a very simple service using thrift and use php/java clients to communicate with the service.

First we need to create a thrift definition file.

$ mkdir test # create a separate working directory
$ cd test
$ vim time.thrift # paste the following code in time.thrift

namespace java tserver.gen  // define namespace for java code
namespace php tserver  // define namespace for php code

typedef i64 Timestamp

service TimeServer {  // define the service you want to implement
  Timestamp time(),   // define the funcionalities the service would provide.
            string date(),
            i64 multiply(1:i32 num1, 2:i32 num2),
}
Namespaces are the places where code would be generated - what we call package in java.

Create a separate directory to store source files

$ mkdir src
$ cd src
$ vim TimeServerImpl.java

Create a java source file for implementing the functions that we had defined in the time.thrift file. The name of the file is the Service name (as defined in the thrift file) followed by "Impl". Provide the body of the functions in this file.

package server;

import java.util.*;
import org.apache.thrift.*;
import tserver.gen.*;

class TimeServerImpl implements TimeServer.Iface
{
  public long time() throws TException
  {
    long time = System.currentTimeMillis();
    System.out.println("Current time : "+time);
    return time;
  }

  public String date() throws TException
  {
    Calendar now = Calendar.getInstance();
    String dt = now.get(Calendar.YEAR)+"-"+(now.get(Calendar.MONTH)+1)+"-"+now.get(Calendar.DAY_OF_MONTH)+" "+now.get(Calendar.HOUR_OF_DAY)+":"+now.get(Calendar.MINUTE)+":"+now.get(Calendar.SECOND);
    System.out.println(" date : "+dt);
    return dt;
  }

  public long multiply(int num1, int num2) throws TException
  {
    System.out.println("Got : "+num1+", "+num2);
    return num1*num2;
  }
}

$ vim Server.java

Create a file for providing the service. This program simply has a main function which binds the service to a particular port and makes the server ready to accept connections and provide response. This code will generally remain constant unless you want to provide additional functionality at server level.

package server;

import java.io.*;
import org.apache.thrift.protocol.*;
import org.apache.thrift.protocol.TBinaryProtocol.*;
import org.apache.thrift.server.*;
import org.apache.thrift.transport.*;
import tserver.gen.*;

public class Server
{
  private void start()
  {
    try
    {
      TServerSocket serverTransport = new TServerSocket(7911);
      TimeServer.Processor processor = new TimeServer.Processor(new TimeServerImpl());
      Factory protFactory = new TBinaryProtocol.Factory(true, true);
      TServer server = new TThreadPoolServer(processor, serverTransport, protFactory);
      System.out.println("Starting server on port 7911 ...");
      server.serve();
    }catch(TTransportException e)
    {
      e.printStackTrace();
    }
  }

  public static void main(String[] args)
  {
    Server srv = new Server();
    srv.start();
  }
}

$ vim TimeClient.java

And create a client which will connect to the server on the specified port and call the provided functions.
package client;

import java.net.*;
import org.apache.thrift.*;
import org.apache.thrift.protocol.*;
import org.apache.thrift.transport.*;

import tserver.gen.TimeServer.Client;

public class TimeClient {
  private void start(){
    TTransport transport;
    try {
      transport = new TSocket("localhost", 7911);
      TProtocol protocol = new TBinaryProtocol(transport);
      // this client depends on the client class in the gen-java/tserver/gen/TimeServer.java file
      // public static class Client implements TServiceClient, Iface
      Client client = new Client(protocol); 
      transport.open();
      long time = client.time();
      System.out.println("Time from server:" + time);
      String dt = client.date();
      System.out.println("Got date from server: "+dt);
      long res = client.multiply(30,100);
      System.out.println("Multiply from server: "+res);
      transport.close();
    } catch (TTransportException e) {
      e.printStackTrace();
    } catch (TException e) {
      e.printStackTrace();
    }
  }

  public static void main(String[] args) {
    TimeClient c = new TimeClient();
    c.start();
  }
}
Now lets move ahead and generate the thrift code. And compile everything.

$ thrift --gen java time.thrift
$ vim build.xml


Write a simple build.xml to build these files
<project name="tserver" default="tserver" basedir=".">
<description>Time Server Tutorial</description>

<property name="src" location="src" />
<property name="gen" location="gen-java" />
<property name="build" location="build" />
<property name="cpath" location="/usr/local/lib/libthrift.jar:/usr/local/lib/slf4j-api-1.5.11.jar" />

<target name="init">
<tstamp />
<mkdir dir="${build}"/>
</target>

<target name="compile" depends="init">
<javac srcdir="${gen}" destdir="${build}" classpath="${cpath}" >  
<compilerarg value="-Xlint:deprecation"/> 
</javac>
<javac srcdir="${src}" destdir="${build}" classpath="${cpath}:${gen}" >   
<compilerarg value="-Xlint:deprecation"/> 
</javac>
</target>

<target name="tserver" depends="compile">
<jar jarfile="tserver.jar" basedir="${build}"/>
</target>

<target name="clean">
<delete dir="${build}" />
<delete file="tserver.jar" />
</target>

</project>
If you will check out the build file, you will see that the classpath contains slf4j-api-x.y.z.jar file. Download these files from http://www.slf4j.org/download.html and put two files (slf4j-api-x.y.z.jar and slf4j-simple-x.y.z.jar) from the archive in the /usr/local/lib directory to satisfy the required dependencies.

To complile the file run ant and create two files runClient and runServer to run the server and clients.

$ ant
$ cat runClient
java -cp tserver.jar:/usr/local/lib/libthrift.jar:/usr/local/lib/slf4j-api-1.5.11.jar:/usr/local/lib/slf4j-simple-1.5.11.jar client.TimeClient
$ cat runServer
java -cp tserver.jar:/usr/local/lib/libthrift.jar:/usr/local/lib/slf4j-api-1.5.11.jar:/usr/local/lib/slf4j-simple-1.5.11.jar server.Server


On compiling you will get the tserver.jar file created.

Now run the server and the client

$ ./runServer
Starting server on port 7911 ...

$ ./runClient
Time from server:1270726146716
Got date from server: 2010-4-8 16:59:6
Multiply from server: 3000


Now lets try building a client in php. First we will generate the thrift-code for php and then write the client to interact with the java server.

$ thrift --gen php time.thrift
$ mkdir php # create a directory to store your php client source code.
$ cd php
$ vim PhpClient.php

<?php

//location of thrift php libraries
$GLOBALS['THRIFT_ROOT'] = './lib';

require_once $GLOBALS['THRIFT_ROOT'].'/Thrift.php';
require_once $GLOBALS['THRIFT_ROOT'].'/protocol/TBinaryProtocol.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TSocket.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/THttpClient.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TBufferedTransport.php';

error_reporting(E_ALL);
$GEN_DIR = '../gen-php';
require_once $GEN_DIR.'/time/TimeServer.php';
require_once $GEN_DIR.'/time/time_types.php';
error_reporting(E_ALL);

try {
  $socket = new TSocket('localhost', 7911);
  $transport = new TBufferedTransport($socket, 1024, 1024);
  $protocol = new TBinaryProtocol($transport);
  // this client depends on the client class in the gen-php/time/TimeServer.php file
  // class TimeServerClient implements TimeServerIf
  $client = new TimeServerClient($protocol);

  $transport->open();

  $time = $client->time();
  print "time frm server = $time\n";
  $res = $client->multiply(100,5000);
  print "multiply frm server = $res\n";
  $transport->close();

} catch (TException $tx) {
  print 'TException: '.$tx->getMessage()."\n";
}
?>
As you can see, in the code, the thrift library for php is included in the lib directory inside the php directory.
Create the lib directory and copy the required library files from thrift installation directory to the lib directory.

$ mkdir lib
$ cp -r /path/to/thrift/lib/php/src/* lib/


Now try running your php client. Remember to fire up your java server.

$ php -q PhpClient.php

You may get tons of errors "failed opening './lib/packages/time/time_types.php' for inclusion". For this you will need to edit the generated php files. The time_types.php file is in the generated directory "gen-php".

$ vim ../gen-php/time/TimeServer.php

Add the following lines

include_once '../gen-php/time/time_types.php';
//include_once $GLOBALS['THRIFT_ROOT'].'/packages/time/time_types.php';

now "php -q PhpClient.php" works...

Monday, March 22, 2010

Design patterns : Decorator pattern

A decorator is a class which wraps the original class. This allows the user to extend the functionality of certain objects at runtime. The pattern is designed in a way that multiple decorators can be stacked on top of each other. Decorators are used to avoid the rigidity that subclassing provides.

php example
<?php                                                                                                                                                                   
/**                                                                                                                                                                     
 * The smallest cohesive interface we can think of for this type        
 * of Decorator. This is the Component interface.
*/                                                                                                                                                                     
interface HtmlElement                                                   
{ /**                                                                                                * @return string    html code
*/                                                                                                                                                                   
  public function __toString();                                         

  /**
   * @return string    the name of the POST request key for this element,
   *                   aka the "name" attribute.                         
   */                                                                    
  public function getName();                                             
}                                                                        

/**
 * Represents a <input type="text" /> html element.
 * It can be created programmatically and then printed.  
 * This is the only ConcreteComponent.                   
 */                                                      
class InputText implements HtmlElement                   
{                                                        
  protected $_name;                                      

  public function __construct($name)
  {                                 
    $this->_name = $name;           
  }                                 

  public function getName()
  {                        
    return $this->_name;   
  }                        

  public function __toString()
  {                           
    return "<input type=\"text\" id=\"{$this->_name}\" name=\"{$this->_name}\" />\n";
  }                                                                                        
}                                                                                          

/**
 * Very simple base class to share the wrapping code between Decorators.
 * This is the Decorator participant.                                   
 */                                                                     
abstract class HtmlDecorator implements HtmlElement                     
{                                                                       
  protected $_element;                                                  

  public function __construct(HtmlElement $input)
  {                                              
    $this->_element = $input;                    
  }                                              

  /**
   * All operations are delegated by default, not changing anything
   * of the original behavior.                                     
   * ConcreteDecorators will override the methods they are interested in.
   */                                                                    
  public function getName()                                              
  {                                                                      
    return $this->_element->getName();                                   
  }                                                                      

  public function __toString()
  {                           
    return $this->_element->__toString();
  }                                      
}                                        

/**
 * Adds a custom <label> element alongside the <input> one.
 * Example of ConcreteDecorator.                           
 */                                                        
class LabelDecorator extends HtmlDecorator                 
{
  protected $_label;                                       

  public function setLabel($label)
  {                               
    $this->_label = $label;       
  }                               

  public function __toString()
  {                           
    $name = $this->getName(); 
    return "<label for=\"{$name}\">{$this->_label}</label>\n"
      . $this->_element->__toString();                                      
  }                                                                         
}                                                                           

/**
 * Adds a <span> containing an error message after the <input> element.
 * Example of ConcreteDecorator.                                       
 */                                                                    
class ErrorDecorator extends HtmlDecorator                             
{                                                                      
  protected $_error;                                                   

  public function setError($message)
  {                                 
    $this->_error = $message;       
  }                                 

  public function __toString()
  {                           
    return $this->_element->__toString()
      . "<span>{$this->_error}</span>\n";
  }                                                     
}                                                       

$input = new InputText('nickname');
$labelled = new LabelDecorator($input);
$labelled->setLabel('Nick:');          
echo "Using labelDecorator => \n",$labelled, "\n";

$input = new InputText('nickname');
$error = new ErrorDecorator($input);
$error->setError('You must enter a unique nickname');
echo "Using errorDecorator => \n",$error, "\n";   

// how can we obtain a LabelledErrorInputText, which has both the <label>
// and <span> elements?                                                  
$input = new InputText('nickname');                                      
$labelled = new LabelDecorator($input);                                  
$labelled->setLabel('Nick:');                                            
$error = new ErrorDecorator($labelled); // a Decorator wrapping another one
$error->setError('You must enter a unique nickname');                      
echo "Using both labelDecorator & errorDecorator => \n".$error;         
?>                                                                         

Output : 

Using labelDecorator =>
<label for="nickname">Nick:</label>
<input type="text" id="nickname" name="nickname" />

Using errorDecorator =>
<input type="text" id="nickname" name="nickname" />
<span>You must enter a unique nickname</span>

Using both labelDecorator & errorDecorator =>
<label for="nickname">Nick:</label> 
<input type="text" id="nickname" name="nickname" />
<span<You must enter a unique nickname>/span>


Some code in java

interface iComponent                                             
{                                                                
  public void doStuff();                                         
}                                                                

class component implements iComponent
{                                    
  public void doStuff()              
  {                                  
    System.out.println("component does stuff");
  }                                            
}                                              

interface decorator extends iComponent
{                                     
  public void addedBehaviour();       
}                                     

class concreteDecorator implements decorator
{                                           
  iComponent comp;                          
  public concreteDecorator(iComponent comp) 
  {                                         
    super();                                
    this.comp = comp;                       
  }                                         

  public void addedBehaviour()
  {                           
    System.out.println("Added behaviour in decorator");
  }                                                    

  public void doStuff()
  {                    
    comp.doStuff();    
    addedBehaviour();  
  }                    
}                      

class concreteDeco2 implements decorator
{                                       
  iComponent comp;                      
  public concreteDeco2(iComponent comp) 
  {                                     
    super();                            
    this.comp = comp;                   
  }                                     

  public void addedBehaviour()
  {
    System.out.println("Added behaviour in deco no 2");
  }

  public void doStuff()
  {
    comp.doStuff();
    addedBehaviour();
  }

}

public class decoClient
{
  public static void main(String[] args)
  {
    iComponent comp = new component();
    decorator deco = new concreteDecorator(comp);
    deco.doStuff();
    decorator deco2 = new concreteDeco2(comp);
    deco2.doStuff();
  }
}

Output :

component does stuff
Added behaviour in decorator
component does stuff
Added behaviour in deco no 2

Wednesday, February 24, 2010

Design Patterns : Composite Design Pattern

A composite design pattern builds complex objects out of simple objects and itself like a tree structure. Its goal is managing a hierarchy of objects where both leaf objects and composition of other objects conform to a common interface.

The client sends a message to the head component. The component declares the interface that various parts of the graph should respect. A leaf is a concrete class that has no children. A composite is a concrete class that composes other components.

PHP Code :

/**
 * Component interface.
 * The Client depends only on this abstraction, whatever graph is built using
 * the specializations.
 */
interface HtmlElement
{
  /**
   * @return string   representation
   */
  public function __toString();
}

/**
 * Leaf sample implementation.
 * Represents an <h1> element.
 */
class H1 implements HtmlElement
{
  private $_text;

  public function __construct($text)
  {
    $this->_text = $text;
  }

  public function __toString()
  {
    return "<h1>{$this->_text}</h1>";
  }
}

/**
 * Leaf sample implementation.
 * Represents a <p> element.
 */
class P implements HtmlElement
{
  private $_text;

  public function __construct($text)
  {
    $this->_text = $text;
  }

  public function __toString()
  {
    return "<p>{$this->_text}</p>";
  }
}

/**
 * A Composite implementation, which accepts as children generic Components.
 * These children may be H1, P or even other Divs.
 */
class Div implements HtmlElement
{
  private $_children = array();

  public function addChild(HtmlElement $element)
  {
    $this->_children[] = $element;
  }

  public function __toString()
  {
    $html = "<div>\n";
    foreach ($this->_children as $child) {
      $childRepresentation = (string) $child;
      $childRepresentation = str_replace("\n", "\n    ", $childRepresentation);
      $html .= '    ' . $childRepresentation . "\n";
    }
    $html .= "</div>";
    return $html;
  }
}

// Client code
$div = new Div();
$div->addChild(new H1('Title'));
$div->addChild(new P('Lorem ipsum...'));
$sub = new Div();
$sub->addChild(new P('Dolor sit amet...'));
$div->addChild($sub);
echo $div, "\n";


Output

<div>
  <h1>Title</h1>
  <p>Lorem ipsum...</p>
  <div>
    <p>Dolor sit amet...</p>
  </div>
</div>

Java Code : 

import java.util.List;
import java.util.ArrayList;

/** "Component" */
interface Graphic {

  //Prints the graphic.
  public void print();
}

/** "Composite" */
class CompositeGraphic implements Graphic {

  //Collection of child graphics.
  private List mChildGraphics = new ArrayList();

  //Prints the graphic.
  public void print() {
    for (Graphic graphic : mChildGraphics) {
      graphic.print();
    }
  }

  //Adds the graphic to the composition.
  public void add(Graphic graphic) {
    mChildGraphics.add(graphic);
  }

  //Removes the graphic from the composition.
  public void remove(Graphic graphic) {
    mChildGraphics.remove(graphic);
  }
}

/** "Leaf" */
class Ellipse implements Graphic {

  String myname = "Ellipse";

  public Ellipse(int i){
    this.myname = myname + ":"+i;
  }

  //Prints the graphic.
  public void print() {
    System.out.println(this.myname);
  }
}

/** Client */
public class Program {

  public static void main(String[] args) {
    //Initialize four ellipses
    Ellipse ellipse1 = new Ellipse();
    Ellipse ellipse2 = new Ellipse();
    Ellipse ellipse3 = new Ellipse();
    Ellipse ellipse4 = new Ellipse();

    //Initialize three composite graphics
    CompositeGraphic graphic = new CompositeGraphic();
    CompositeGraphic graphic1 = new CompositeGraphic();
    CompositeGraphic graphic2 = new CompositeGraphic();

    //Composes the graphics
    graphic1.add(ellipse1);
    graphic1.add(ellipse2);
    graphic1.add(ellipse3);

    graphic2.add(ellipse4);

    graphic.add(graphic1);
    graphic.add(graphic2);

    //Prints the complete graphic (four times the string "Ellipse").
    graphic.print();
  }
}

OUTPUT:

Ellipse:1
Ellipse:2
Ellipse:3
Ellipse:4

Wednesday, February 10, 2010

Design patterns : Bridge Design Pattern

Bridge pattern decouples the abstraction from its implementation so that both can vary independently. It is quite similar to the adapter pattern. An adapter pattern makes two unrelated, existing classes work together, when the two participants were not thought to be aware of each other during design. A bridge pattern separates concerns and is chosen at the design level before the creation of participating classes.

There is an abstraction that the client sees. There is an implementor which provides the interface for actual implementation. A refined abstraction which provides extension to the abstraction's functionalities. And a ConcreteImplementor which is an implementation of the implementor.

PHP Code :

abstract class BridgeBook
{
  private $bbAuthor;
  private $bbTitle;
  private $bbImp;
  function __construct($author_in, $title_in, $choice_in)
  {
    $this->bbAuthor = $author_in;
    $this->bbTitle  = $title_in;
    if ('STARS' == $choice_in)
    {
      $this->bbImp = new BridgeBookStarsImp();
    } else
    {
      $this->bbImp = new BridgeBookCapsImp();
    }
  }
  function showAuthor()
  {
    return $this->bbImp->showAuthor($this->bbAuthor);
  }
  function showTitle()
  {
    return $this->bbImp->showTitle($this->bbTitle);
  }
}

class BridgeBookAuthorTitle extends BridgeBook
{
  function showAuthorTitle()
  {
    return $this->showAuthor() . "'s " . $this->showTitle();
  }
}

class BridgeBookTitleAuthor extends BridgeBook
{
  function showTitleAuthor()
  {
    return $this->showTitle() . ' by ' . $this->showAuthor();
  }
}

abstract class BridgeBookImp
{
  abstract function showAuthor($author);
  abstract function showTitle($title);
}

class BridgeBookCapsImp extends BridgeBookImp
{
  function showAuthor($author_in)
  {
    return strtoupper($author_in);
  }
  function showTitle($title_in)
  {
    return strtoupper($title_in);
  }
}

class BridgeBookStarsImp extends BridgeBookImp
{
  function showAuthor($author_in)
  {
    return Str_replace(" ","*",$author_in);
  }
  function showTitle($title_in)
  {
    return Str_replace(" ","*",$title_in);
  }
}

echo('BEGIN'."\n");

echo('test 1 - author title with caps');
$book = new BridgeBookAuthorTitle('Larry Truett','PHP for Cats','CAPS');
echo($book->showAuthorTitle());
echo("\n");

echo('test 2 - author title with stars');
$book = new BridgeBookAuthorTitle('Larry Truett','PHP for Cats','STARS');
echo($book->showAuthorTitle());
echo("\n");

echo('test 3 - title author with caps');
$book = new BridgeBookTitleAuthor('Larry Truett','PHP for Cats','CAPS');
echo($book->showTitleAuthor());
echo("\n");

echo('test 4 - title author with stars');
$book = new BridgeBookTitleAuthor('Larry Truett','PHP for Cats','STARS');
echo($book->showTitleAuthor());
echo("\n");

echo('END'."\n");

Output : 
BEGIN
test 1 - author title with capsLARRY TRUETT's PHP FOR CATS
test 2 - author title with starsLarry*Truett's PHP*for*Cats
test 3 - title author with capsPHP FOR CATS by LARRY TRUETT
test 4 - title author with starsPHP*for*Cats by Larry*Truett
END



Java Code :

/** "Implementor" */
interface DrawingAPI 
{
  public void drawCircle(double x, double y, double radius);
}

/** "ConcreteImplementor" 1/2 */
class DrawingAPI1 implements DrawingAPI 
{
  public void drawCircle(double x, double y, double radius) 
  {
    System.out.printf("API1.circle at %f:%f radius %f\n", x, y, radius);
  }
}

/** "ConcreteImplementor" 2/2 */
class DrawingAPI2 implements DrawingAPI 
{
  public void drawCircle(double x, double y, double radius) 
  { 
    System.out.printf("API2.circle at %f:%f radius %f\n", x, y, radius);
  }
}

/** "Abstraction" */
interface Shape 
{
  public void draw();                             // low-level
  public void resizeByPercentage(double pct);     // high-level
}

/** "Refined Abstraction" */
class CircleShape implements Shape 
{
  private double x, y, radius;
  private DrawingAPI drawingAPI;
  public CircleShape(double x, double y, double radius, DrawingAPI drawingAPI) 
  {
    this.x = x;  
    this.y = y;  
    this.radius = radius; 
    this.drawingAPI = drawingAPI;
  }

  // low-level i.e. Implementation specific
  public void draw() 
  {
    drawingAPI.drawCircle(x, y, radius);
  }   
  // high-level i.e. Abstraction specific
  public void resizeByPercentage(double pct) 
  {
    radius *= pct;
  }
}

/** "Client" */
public class Bridge
{
  public static void main(String[] args) 
  {
    Shape[] shapes = new Shape[2];
    shapes[0] = new CircleShape(1, 2, 3, new DrawingAPI1());
    shapes[1] = new CircleShape(5, 7, 11, new DrawingAPI2());

    for (Shape shape : shapes) 
    {
      shape.resizeByPercentage(2.5);
      shape.draw();
    }
  }
}

Output :

API1.circle at 1.000000:2.000000 radius 7.500000
API2.circle at 5.000000:7.000000 radius 27.500000

Tuesday, February 09, 2010

Design patterns : Adapter Design Pattern

Adapter design pattern converts the interface of a class into another interface that the client expects. It lets the classes work together that otherwise could not because of incompatible interfaces. It wraps the existing class in a compatible interface acceptible to the the client.

The actors here are the client - which makes use of an object which implements the target interface, the target is the point of extension of the client module, adaptee - object of different module or library and the adapter is an implementation of the target which forwards real work to the adaptee and also hides him from the client.

PHP Code :

class SimpleBook
{
  private $author;
  private $title;
  function __construct($author_in, $title_in)
  {
    $this->author = $author_in;
    $this->title  = $title_in;
  }
  function getAuthor()
  {
    return $this->author;
  }
  function getTitle()
  {
    return $this->title;
  }
}

class BookAdapter
{
  private $book;
  function __construct(SimpleBook $book_in)
  {
    $this->book = $book_in;
  }
  function getAuthorAndTitle()
  {
    return $this->book->getTitle().' by '.$this->book->getAuthor();
  }
}

// client

echo "BEGIN\n";

$book = new SimpleBook("PHP Author", "PHP Book on Design patterns");
$bookAdapter = new BookAdapter($book);
echo "Author and Title: ".$bookAdapter->getAuthorAndTitle()."\n";

echo "END\n";

Output:

BEGIN
Author and Title: PHP Book on Design patterns by PHP Author
END

JAVA Code:

class LegacyLine
{
  public void draw(int x1, int y1, int x2, int y2)
  {
    System.out.println("line from (" + x1 + ',' + y1 + ") to (" + x2 + ','  + y2 + ')');
  }
}

class LegacyRectangle
{
  public void draw(int x, int y, int w, int h)
  {
    System.out.println("rectangle at (" + x + ',' + y + ") with width " + w  + " and height " + h);
  }
}

interface Shape
{
  void draw(int x1, int y1, int x2, int y2);
}

class Line implements Shape
{
  private LegacyLine adaptee = new LegacyLine();
  public void draw(int x1, int y1, int x2, int y2)
  {
    adaptee.draw(x1, y1, x2, y2);
  }
}

class Rectangle implements Shape
{
  private LegacyRectangle adaptee = new LegacyRectangle();
  public void draw(int x1, int y1, int x2, int y2)
  {
    adaptee.draw(Math.min(x1, x2), Math.min(y1, y2), Math.abs(x2 - x1),   Math.abs(y2 - y1));
  }
}

public class AdapterDemo
{
  public static void main(String[] args)
  {
    Shape[] shapes = 
    {
      new Line(), new Rectangle()
    };
    // A begin and end point from a graphical editor
    int x1 = 10, y1 = 20;
    int x2 = 30, y2 = 60;
    for (int i = 0; i < shapes.length; ++i)
      shapes[i].draw(x1, y1, x2, y2);
  }
}

Output : 

line from (10,20) to (30,60)
rectangle at (10,20) with width 20 and height 40

Design Patterns : Prototype pattern

The prototype pattern specifies the kind of objects to create using a prototypical instance, and creates new objects by copying the prototype. A clone method is available in the class which can be used to create new objects of the same kind.

The pattern consists of a prototype - an interface of the cloneable classes. A ConcretePrototype implements the clone operations and a client actually clones the prototype instance to use the clones as new objects.

Simple examples

PHP
abstract class BookPrototype
{
  protected $title;
  protected $topic;
  abstract function __clone();
  function getTitle()
  {
    return $this->title;
  }
  function setTitle($titleIn)
  {
    $this->title = $titleIn;
  }
  function getTopic()
  {
    return $this->topic;
  }
}

class PHPBookPrototype extends BookPrototype
{
  function __construct()
  {
    $this->topic = 'PHP';
  }
  function __clone()
  {  }
}

class SQLBookPrototype extends BookPrototype
{
  function __construct()
  {
    $this->topic = 'SQL';
  }
  function __clone()
  {  }
}

echo("BEGIN \n");

$phpProto = new PHPBookPrototype();
$sqlProto = new SQLBookPrototype();

$book1 = clone $sqlProto;
$book1->setTitle('SQL Book prototype');
echo('Book 1 topic: '.$book1->getTopic()."\n");
echo('Book 1 title: '.$book1->getTitle()."\n");

$book2 = clone $phpProto;
$book2->setTitle('PHP Book prototype');
echo('Book 2 topic: '.$book2->getTopic()."\n");
echo('Book 2 title: '.$book2->getTitle()."\n");

$book3 = clone $sqlProto;
$book3->setTitle('SQL Book prototype II');
echo('Book 3 topic: '.$book3->getTopic()."\n");
echo('Book 3 title: '.$book3->getTitle()."\n");

echo("END\n");

Output:

BEGIN 
Book 1 topic: SQL
Book 1 title: SQL Book prototype
Book 2 topic: PHP
Book 2 title: PHP Book prototype
Book 3 topic: SQL
Book 3 title: SQL Book prototype II
END

Java :

public class proto 
{
  interface Xyz 
  {
    Xyz cloan();
  }

  static class Tom implements Xyz 
  {
    public Xyz cloan()    
    {
      return new Tom();
    }
    public String toString() 
    {
      return "tom prototype";
    }
  }

  static class Dick implements Xyz 
  {
    public Xyz    cloan()    
    {
      return new Dick();
    }
    public String toString() 
    {
      return "dick prototype";
    }
  }

  static class Harry implements Xyz 
  {
    public Xyz    cloan()    
    {
      return new Harry();
    }
    public String toString() 
    {
      return "harry prototype";
    }
  }

  static class Factory 
  {
    private static java.util.Map prototypes = new java.util.HashMap();
    static 
    {
      prototypes.put( "tom",   new Tom() );
      prototypes.put( "dick",  new Dick() );
      prototypes.put( "harry", new Harry() );
    }
    public static Xyz makeObject( String s ) 
    {
      return ((Xyz)prototypes.get(s)).cloan();
    }
  }

  public static void main( String[] args ) 
  {
    for (int i=0; i < args.length; i++) 
    {
      System.out.println( Factory.makeObject( args[i] ) + "  " );
    }
  }
}

Output : 
$ java proto tom dick harry harry tomtom prototype  
dick prototype  
harry prototype  
harry prototype  
tom prototype 

Monday, February 08, 2010

Design patterns : Factory Method

Factory method defines an interface for creating an object, but lets the subclasses decide which class to instantiate. Factory method lets a class defer instantiation to subclasses.

Factory method makes the design more customizable and only a little more complicated. Other design patterns require new classes whereas factory method requires only a new operation.

Factory method is quite similar to the Abstract Factory. In fact every relevant method of Abstract Factory is a Factory Method. Factory methods could be moved out of their classes in subsequent refactoring and evolve in an external Abstract Factory.

The participating actors are the "Product" - the abstraction of the created object, a "concrete product" - an implementation of the product. "Creator" is the base class which declares a default Factory Method and "ConcreteCreator" is a subclass of the creator which overrides the Factory Method to return a Concrete Product.

Lets check some code

PHP : lets check some code similar to those we wrote for Abstract Factory
abstract class AbstractFactoryMethod
{
  abstract function makeBook($param);
}

class OReillyFactoryMethod extends AbstractFactoryMethod
{
  private $context = "OReilly";
  function makeBook($param)
  {
    $book = NULL;
    switch($param)
    {
      case "PHP":
        $book = new OReillyPHPBook;
      break;
      case "MySQL":
        $book = new OReillyMySQLBook;
      break;
      default:
      $book = new OReillyPHPBook;
      break;
    }
    return $book;
  }
}

class SAMSFactoryMethod extends AbstractFactoryMethod
{
  private $context = "SAMS";
  function makeBook($param)
  {
    $book = NULL;
    switch($param)
    {
      case "PHP":
        $book = new SamsPHPBook;
      break;
      case "MySQL":
        $book = new SamsMySQLBook;
      break;
      default:
      $book = new SamsMySQLBook;
      break;
    }
    return $book;
  }
}

abstract class AbstractBook
{
  abstract function getAuthor();
  abstract function getTitle();
}

class OReillyPHPBook extends AbstractBook
{
  private $author;
  private $title;
  function __construct()
  {
    $this->author = "Author - OReilly php";
    $this->title = "Title - OReilly php";
  }

  function getAuthor() {return $this->author;}
  function getTitle() {return $this->title;}
}

class SamsPHPBook extends AbstractBook
{
  private $author;
  private $title;
  function __construct()
  {
    $this->author = "Author - SAMS php";
    $this->title = "Title - SAMS php";
  }

  function getAuthor() {return $this->author;}
  function getTitle() {return $this->title;}
}

class OReillyMySQLBook extends AbstractBook
{
  private $author;
  private $title;
  function __construct()
  {
    $this->author = "Author - OReilly MySQL";
    $this->title = "Title - OReilly MySQL";
  }

  function getAuthor() {return $this->author;}
  function getTitle() {return $this->title;}
}

class SamsMySQLBook extends AbstractBook
{
  private $author;
  private $title;
  function __construct()
  {
    $this->author = "Author - SAMS MySQL";
    $this->title = "Title - SAMS MySQL";
  }

  function getAuthor() {return $this->author;}
  function getTitle() {return $this->title;}
}

function testFactoryMethod($factoryMethodInstance)
{
  $phpBook = $factoryMethodInstance->makeBook("PHP");
  echo "php author : ".$phpBook->getAuthor()."\n";
  echo "php title : ".$phpBook->getTitle()."\n";

  $mysqlBook = $factoryMethodInstance->makeBook("MySQL");
  echo "mysql author : ".$mysqlBook->getAuthor()."\n";
  echo "mysql title : ".$mysqlBook->getTitle()."\n";
}

echo("Begin\n");
echo("Testing OReillyFactoryMethod\n");
$bookMethodInstance = new OReillyFactoryMethod;
testFactoryMethod($bookMethodInstance);
echo("----\n");

echo("Testing SamsFactoryMethod\n");
$bookMethodInstance = new SamsFactoryMethod;
testFactoryMethod($bookMethodInstance);
echo("----\n");

Output : 

Begin
Testing OReillyFactoryMethod
php author : Author - OReilly php
php title : Title - OReilly php
mysql author : Author - OReilly MySQL
mysql title : Title - OReilly MySQL
----
Testing SamsFactoryMethod
php author : Author - SAMS php
php title : Title - SAMS php
mysql author : Author - SAMS MySQL
mysql title : Title - SAMS MySQL
----

And some example in java

abstract class Pizza 
{
  public abstract int getPrice(); // count the cents
}

class HamAndMushroomPizza extends Pizza 
{
  public int getPrice() 
  {
    return 850;
  }
}

class DeluxePizza extends Pizza 
{
  public int getPrice() 
  {
    return 1050;
  }
}

class HawaiianPizza extends Pizza 
{
  public int getPrice() 
  {
    return 1150;
  }
}

class PizzaFactory 
{
  public enum PizzaType 
  {
    HamMushroom,
      Deluxe,
      Hawaiian
  }

  public static Pizza createPizza(PizzaType pizzaType) 
  {
    switch (pizzaType) 
    {
      case HamMushroom:
        return new HamAndMushroomPizza();
      case Deluxe:
        return new DeluxePizza();
      case Hawaiian:
        return new HawaiianPizza();
    }
    throw new IllegalArgumentException("The pizza type " + pizzaType + " is not recognized.");
  }
}

class myPizza
{
  // Create all available pizzas and print their prices
  public static void main (String args[]) 
  {
    for (PizzaFactory.PizzaType pizzaType : PizzaFactory.PizzaType.values()) 
    {
      System.out.println("Price of " + pizzaType + " is " + PizzaFactory.createPizza(pizzaType).getPrice());
    }
  }
}

Output : 

$ java myPizza 
 Price of HamMushroom is 850
 Price of Deluxe is 1050
 Price of Hawaiian is 1150


Friday, February 05, 2010

Design patterns : Builder

The purpose of a builder is to separate the construction process of a complex object from its representation so that the same construction process can be used to create different representations.

The participating actors are a "director" which interprets the information and invokes the "builder" to get the object built. The builder creates parts of a complex object each time it is called and maintains the state of the object. Eventually when the product is complete, the client can retrieve the product from the "builder".

Lets check out some code:

In PHP

abstract class AbstractPageBuilder //abstract builder
{
  abstract function getPage();
}

abstract class AbstractPageDirector //abstract director
{
  abstract function __construct(AbstractPageBuilder $builder_in);
  abstract function buildPage();
  abstract function getPage();
}

class HTMLPage //product 
{
  private $page = NULL;
  private $page_title = NULL;
  private $page_heading = NULL;
  private $page_text = NULL;
  function __construct() 
  {  }
  function showPage() 
  {
    return $this->page;
  }
  function setTitle($title_in) 
  {
    $this->page_title = $title_in;
  }
  function setHeading($heading_in) 
  {
    $this->page_heading = $heading_in;
  }
  function setText($text_in) 
  {
    $this->page_text .= $text_in;
  }
  function formatPage() 
  {
    $this->page  = "<html>\n";
    $this->page .= "<head><title>".$this->page_title."</title></head>\n";
    $this->page .= "<body>\n";
    $this->page .= "<h1>".$this->page_heading."</h1>\n";
    $this->page .= $this->page_text;
    $this->page .= "\n</body>\n";
    $this->page .= "</html>";
  }
}

class HTMLPageBuilder extends AbstractPageBuilder //concrete builder
{
  private $page = NULL;
  function __construct() 
  {
    $this->page = new HTMLPage();
  }
  function setTitle($title_in) 
  {
    $this->page->setTitle($title_in);
  }
  function setHeading($heading_in) 
  {
    $this->page->setHeading($heading_in);
  }
  function setText($text_in) 
  {
    $this->page->setText($text_in);
  }
  function formatPage() 
  {
    $this->page->formatPage();
  }
  function getPage() 
  {
    return $this->page;
  }
}

class HTMLPageDirector extends AbstractPageDirector //concrete director
{
  private $builder = NULL;
  public function __construct(AbstractPageBuilder $builder_in) 
  {
    $this->builder = $builder_in;
  }
  public function buildPage() 
  {
    $this->builder->setTitle('Testing HTMLPage Title');
    $this->builder->setHeading('Testing HTMLPage Heading');
    $this->builder->setText('Body1 Testing, testing, testing!');
    $this->builder->setText('Body2 Testing, testing, testing, or!');
    $this->builder->setText('Body3 Testing, testing, testing, more!');
    $this->builder->formatPage();
  }
  public function getPage() 
  {
    return $this->builder->getPage();
  }
}

echo("BEGIN\n");

$pageBuilder = new HTMLPageBuilder();
$pageDirector = new HTMLPageDirector($pageBuilder);
$pageDirector->buildPage();
$page = $pageDirector->GetPage();
echo $page->showPage();
echo "\nEND\n";

Output:

BEGIN
<html>
<head><title>Testing HTMLPage Title</title></head>
<body>
<h1>Testing HTMLPage Heading</h1>
Body1 Testing, testing, testing!Body2 Testing, testing, testing, or!Body3 Testing, testing, testing, more!
</body>
</html>
END


The classic pizza example in java

/* "Product" */
class Pizza 
{
  private String dough = "";
  private String sauce = "";
  private String topping = "";

  public void setDough(String dough)     { this.dough = dough; }
  public void setSauce(String sauce)     { this.sauce = sauce; }
  public void setTopping(String topping) { this.topping = topping; }

  public void showPizza()
  {
    System.out.println("PIZZA........");
    System.out.println("Dough : "+this.dough);
    System.out.println("Sauce : "+this.sauce);
    System.out.println("Topping : "+this.topping);
    System.out.println("----------");
  }
}

/* "Abstract Builder" */
abstract class PizzaBuilder 
{
  protected Pizza pizza;

  public Pizza getPizza() { return pizza; }
  public void createNewPizzaProduct() 
  { 
    pizza = new Pizza(); 
  }

  public abstract void buildDough();
  public abstract void buildSauce();
  public abstract void buildTopping();

}

/* "ConcreteBuilder" */
class HawaiianPizzaBuilder extends PizzaBuilder 
{
  public void buildDough()   { pizza.setDough("cross"); }
  public void buildSauce()   { pizza.setSauce("mild"); }
  public void buildTopping() { pizza.setTopping("ham+pineapple"); }
}

/* "ConcreteBuilder" */
class SpicyPizzaBuilder extends PizzaBuilder 
{
  public void buildDough()   { pizza.setDough("pan baked"); }
  public void buildSauce()   { pizza.setSauce("hot"); }
  public void buildTopping() { pizza.setTopping("pepperoni+salami"); }
}

/* "Director" */
class Waiter 
{
  private PizzaBuilder pizzaBuilder;

  public void setPizzaBuilder(PizzaBuilder pb) { pizzaBuilder = pb; }
  public Pizza getPizza() { return pizzaBuilder.getPizza(); }

  public void constructPizza() 
  {
    pizzaBuilder.createNewPizzaProduct();
    pizzaBuilder.buildDough();
    pizzaBuilder.buildSauce();
    pizzaBuilder.buildTopping();
  }
}

/* A customer ordering a pizza. */
public class BuilderExample 
{
  public static void main(String[] args) 
  {
    Waiter waiter = new Waiter();
    PizzaBuilder hawaiian_pizzabuilder = new HawaiianPizzaBuilder();
    PizzaBuilder spicy_pizzabuilder = new SpicyPizzaBuilder();

    waiter.setPizzaBuilder( hawaiian_pizzabuilder );
    waiter.constructPizza();

    Pizza pizza = waiter.getPizza();
    pizza.showPizza();
  }
}

Output:

PIZZA........
Dough : cross
Sauce : mild
Topping : ham+pineapple
----------

Design patterns : Abstract Factory

The abstract factory pattern provides an interface for creating families of related or dependent objects without specifying their concrete classes. It encapsulates the possibility of creation of a suite of "products" which otherwise would have required a sequence of "if .. then .. else" statements. The abstract factory has the responsibility for providing creation services for the entire family of objects. Clients never create the objects directly - they ask the factory to do that for them.

This mechanism makes changing the entiry family of objects easy - because the specific class of factory object appears only once in the application - where it was instantiated. The application can replace the entire family of objects by simply instantiating a different instance of the abstract factory. It also provides for lazy creation of objects.

The participating actors here are the client, the abstract factory, the abstract product, the concrete factory and the concrete product. The abstract factory defines which objects the concrete factory will need to be able to create. The concrete factory must create the correct objects for its context, ensuring that all objects created by the concrete factory are able to work correctly for a given circumstance.

Lets take an example to explain the situation. The AbstractBookFactory specifies that two classes - the AbstractPHPBook and the AbstractMySQLBook will need to be created by the concrete factory. The concreate factory OReillyBookFactory extends the AbstractBookFactory and can create the objects for OReillyPHPBook and OReillyMySQLBook which are correct classes for the context of OReilly.

abstract class AbstractBookFactory
{
  abstract function makePHPBook();
  abstract function makeMySQLBook();
}

//for objects in context of OReilly
class OReillyBookFactory extends AbstractBookFactory
{
  private $context = "OReilly";
  function makePHPBook()
  {
    return new OReillyPHPBook;
  }
  function makeMySQLBook()
  {
    return new OReillyMySQLBook;
  }
}

//for objects in context of Sams
class SamsBookFactory extends AbstractBookFactory
{
  private $context = "Sams";
  function makePHPBook()
  {
    return new SamsPHPBook;
  }
  function makeMySQLBook()
  {
    return new SamsMySQLBook;
  }
}

//Classes for books
abstract class AbstractBook
{
  abstract function getAuthor();
  abstract function getTitle();
}

abstract class AbstractPHPBook
{
  private $subject = "PHP";
}

abstract class AbstractMySQLBook
{
  private $subject = "MySQL";
}

class OReillyPHPBook extends AbstractPHPBook
{
  private $author;
  private $title;

  function __construct()
  {
    $this->author = "OReilly : php author 1";
    $this->title = "OReilly : title for php";
  }

  function getAuthor()
  {
    return $this->author;
  }

  function getTitle()
  {
    return $this->title;
  }
}

class SamsPHPBook extends AbstractPHPBook
{
  private $author;
  private $title;

  function __construct()
  {
    $this->author = "Sams : php author";
    $this->title = "Sams : title for php";
  }

  function getAuthor()
  {
    return $this->author;
  }

  function getTitle()
  {
    return $this->title;
  }
}

class OReillyMySQLBook extends AbstractMySQLBook
{
  private $author;
  private $title;

  function __construct()
  {
    $this->author = "OReilly : MySQL author";
    $this->title = "OReilly : title for MySQL";
  }

  function getAuthor()
  {
    return $this->author;
  }

  function getTitle()
  {
    return $this->title;
  }
}

class SamsMySQLBook extends AbstractMySQLBook
{
  private $author;
  private $title;

  function __construct()
  {
    $this->author = "Sams : MySQL author";
    $this->title = "Sams : title for MySQL";
  }

  function getAuthor()
  {
    return $this->author;
  }

  function getTitle()
  {
    return $this->title;
  }
}

//testing 

echo("Begin\n");
echo("Testing OReillyBookFactory\n");
$bookFactoryInstance = new OReillyBookFactory;
testConcreteFactory($bookFactoryInstance);
echo("----\n");

echo("Testing SamsBookFactory\n");
$bookFactoryInstance = new SamsBookFactory;
testConcreteFactory($bookFactoryInstance);
echo("----\n");

function testConcreteFactory($bookFactoryInstance)
{
  $phpBook = $bookFactoryInstance->makePHPBook();
  echo "php author : ".$phpBook->getAuthor()."\n";
  echo "php title : ".$phpBook->getTitle()."\n";

  $mysqlBook = $bookFactoryInstance->makeMySQLBook();
  echo "mysql author : ".$mysqlBook->getAuthor()."\n";
  echo "mysql title : ".$mysqlBook->getTitle()."\n";
}

//Output

Begin
Testing OReillyBookFactory
php author : OReilly : php author 1
php title : OReilly : title for php
mysql author : OReilly : MySQL author
mysql title : OReilly : title for MySQL
----
Testing SamsBookFactory
php author : Sams : php author
php title : Sams : title for php
mysql author : Sams : MySQL author
mysql title : Sams : title for MySQL
----

Another example using java

interface GUIFactory //Abstract Factory 
{
  public Button createButton();
}

class WinFactory implements GUIFactory //concrete Factory
{
  public Button createButton() 
  {
    return new WinButton();
  }
}

class OSXFactory implements GUIFactory //Concrete Factory
{
  public Button createButton() 
  {
    return new OSXButton();
  }
}

interface Button //Abstract Product 
{
  public void paint();
}

class WinButton implements Button //concrete Product
{
  public void paint() 
  {
    System.out.println("I'm a WinButton");
  }
}


class OSXButton implements Button //concrete Product
{
  public void paint() 
  {
    System.out.println("I'm an OSXButton");
  }
}


class Application //execution free of product type 
{
  public Application(GUIFactory factory)
  {
    Button button = factory.createButton();
    button.paint();
  }
}

public class ApplicationRunner 
{
  public static void main(String[] args) 
  {
    if(args.length != 1)
    {
      System.out.println("usage : java ApplicationRunner <0/1>");
      System.exit(1);
    }
    new Application(createOsSpecificFactory(args[0]));
  }

  public static GUIFactory createOsSpecificFactory(String ostype) 
  {
    if (ostype.equals("0"))
    {
      return new WinFactory();
    } 
    else 
    {
      return new OSXFactory();
    }
  }
}

Output
------

$ java ApplicationRunner 0
I'm a WinButton
$ java ApplicationRunner 1
I'm an OSXButton

Friday, January 29, 2010

Design patterns : Singleton pattern

Singleton design pattern is required when you want to allow only one instance of the class to be created. Database connections and filesystems are examples where singleton classes might be required. You can also use a singleton class to store variables which need global access - thereby limiting the scope of those variables and keeping the global space variable free.

With singleton design pattern, following points need to be ensured
  • Ensure that only a single instance of the class is created.
  • Provide a global point of access to the object.
  • The creation of the object should be thread safe to avoid multiple instances in a multi-threaded environment.

An illustration in java (from wikipedia)

public class Singleton 
{
  // Private constructor prevents instantiation from other classes
  private Singleton() {}
         
  /**
   * SingletonHolder is loaded on the first execution of Singleton.getInstance() 
   * or the first access to SingletonHolder.INSTANCE, not before.
  */
  private static class SingletonHolder 
  { 
    private static final Singleton INSTANCE = new Singleton();
  }
                                      
  public static Singleton getInstance() 
  {
    return SingletonHolder.INSTANCE;
  }
}

Another illustration in php

class singleton
{
  private static $_instance;
  
  //private constructor to prevent instantiation from other classes
  private function __construct()
  {  }

  private function __destruct()
  {  }

  // only first call to getInstance creates and returns an instance. 
  // next call returns the already created instance.
  public static function getInstance()
  {
    if(self::$_instance === null)
    {
      self::$_instance = new self();
    }
    return self::$_instance;
  }

}

Tuesday, October 06, 2009

intro to lucene 2.9

What crap!!!. Why do they have to come out with a new version every now and then. And make people rewrite their code to upgrade to a new version. How much do they still have to improve their code. Just because of their frequent upgrades, i have to change my code every now and then. Why
should i upgrade to lucene 2.9?

To answer this question - it could be said that you build something and then you figure out that - oh no if this could have been done in this way then it would have been better. So for example you make an omlette and you figure out that putting a bit of cheese and pepper would have improved its taste. So next time you try that, but then you figure out that making it in butter would bring out more taste. Or you buy a Pentium 3 pc with 1 GB ram and after 2 years you see that it is outdated - the softwares have grown and so have the processing powers. To run the currently available softwares, you would need to upgrade your pc to a Pentium 4 - core 2 duo and maybe upgrade your graphics card to ATI Radeon 4870 X2 from the previous nvidia 9800 GT to play the recent games more effectively. And maybe upgrade your 20 inch CRT television to a 42 inch HD LCD for better graphics display.

It is the same reason that lucene keeps on optimizing its code and improving the features - they realize that better code leads to faster indexing and searching on the same machine.

The reason why you shud upgrade your lucene version is defined by the list of features that lucene 2.9 provides:


  • Per segment searching and caching (can lead to much faster reopen among other things). FieldCache - takes advantage of the fact that most segments of the index are static, only processes the parts that change, save on time and memory. Also faster searching among multiple segments.

  • Near real-time search capabilities added to IndexWriter - new way to search the current in-memory segment before index is written to disk.

  • New Query types

  • Smarter, more scalable multi-term queries (wildcard, range, etc)

  • A freshly optimized Collector/Scorer API

  • Improved Unicode support and the addition of Collation contrib

  • A new Attribute based TokenStream API

  • A new QueryParser framework in contrib with a core QueryParser replacement impl included.

  • Scoring is now optional when sorting by Field, or using a custom Collector, gaining sizable performance when scores are not required.

  • New analyzers (PersianAnalyzer, ArabicAnalyzer, SmartChineseAnalyzer)

  • New fast-vector-highlighter for large documents

  • Lucene now includes high-performance handling of numeric fields (NumericField & NumericRangeQuery). Such fields are indexed with a trie structure, enabling simple to use and much faster numeric range searching without having to externally pre-process numeric values into textual values. This improves the Lucene number indexing, and is faster for searching numbers, geo-locations, and dates, faster for sorting, and hugely faster for range searching.



For the newbies, all this techno-rant is just there to make you feel good about upgrading. In brief - faster search and more features.

Lets take a look at how you would go ahead with indexing and searching using lucene 2.9

Here is a very rough example. What i have done is use the twitter api to search for keywords in twitter and fetch the micro-blogs and create an index using lucene 2.9. And then use the same program to open the index and run a search - displaying only the n results. You can fetch the twitter api from http://yusuke.homeip.net/twitter4j/en/index.html


import twitter4j.*;
import org.apache.lucene.analysis.*;
import org.apache.lucene.store.*;
import org.apache.lucene.index.*;
import org.apache.lucene.queryParser.*;
import org.apache.lucene.search.*;
import org.apache.lucene.document.*;
import java.io.*;
import java.util.Date;
import java.util.ArrayList;
import java.util.List;

public class lucene
{
public static void main(String[] args) throws Exception
{
if(args.length != 3)
{
System.out.println("Usage : java lucene <index/search> <dirname> <string>");
System.exit(1);
}

if(!args[0].equalsIgnoreCase("index") && !args[0].equalsIgnoreCase("search"))
{
System.out.println("Usage : java lucene <index/search> <dirname> <string>");
System.exit(1);
}
System.out.println(args[0]+","+args[1]+","+args[2]);

lucene lu = new lucene(args[0], args[1]);
if(args[0].equalsIgnoreCase("index"))
lu.indexFiles(args[2]);
else if(args[0].equalsIgnoreCase("search"))
lu.searchFiles(args[2]);


}

File index_dir;
String action;

public lucene(String action, String dirname) throws Exception
{
this.index_dir = new File(dirname);
this.action = action;

if(index_dir.exists() && action.equalsIgnoreCase("index"))
{
System.out.println("Index already exisits... enter another another directory for indexing...");
System.exit(1);
}
}

public void indexFiles(String searchstr) throws Exception
{
Twitter tw = new Twitter();
System.out.println("Getting tweets for "+searchstr);
twitter4j.Query qry = new twitter4j.Query("source:twitter4j "+searchstr);
qry.setRpp(50);

QueryResult res = tw.search(qry);
List<Tweet> tweets = res.getTweets();
System.out.println("Got "+tweets.size()+" tweets in "+res.getCompletedIn()+" : "+res.getMaxId());

// constructor changed from lucene 2.4.1
IndexWriter iw = new IndexWriter(NIOFSDirectory.open(this.index_dir), new WhitespaceAnalyzer(), true, IndexWriter.MaxFieldLength.UNLIMITED);

int docs = 0;
for(int z=0; z<tweets.size(); z++)
{
Tweet twt = (Tweet)(tweets.get(z));
String user = twt.getFromUser();
String usrTwt = twt.getText();
System.out.println("Got : "+user+" => "+usrTwt);

Document d = new Document();
// constructor for Field changed - introduced new constants ANALYZED & NOT_ANALYZED. Not storing NORMS improve performance.
d.add(new Field("user", user, Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS, Field.TermVector.YES));
d.add(new Field("tweet", usrTwt, Field.Store.YES, Field.Index.ANALYZED_NO_NORMS, Field.TermVector.WITH_POSITIONS_OFFSETS));

iw.addDocument(d);
docs++;
}

System.out.println("optimizing..."+docs+" docs");
iw.optimize();
iw.close();
}

public void searchFiles(String searchstr) throws Exception
{
BufferedReader br = new BufferedReader(new InputStreamReader(System.in, "UTF-8"));
QueryParser parser = new QueryParser("tweet",new WhitespaceAnalyzer());
// New constructor in 2.9 - pass true to open in readonly mode.
IndexReader ir = IndexReader.open(NIOFSDirectory.open(this.index_dir), true);
Searcher searcher = new IndexSearcher(ir);
int ResultsPerPage = 5;
do
{
org.apache.lucene.search.Query qry = parser.parse(searchstr);
System.out.println("Searching for : "+searchstr);

//use TopScoreDocCollector to get results and do paging. Get 2 page in a go. Do not sort on score.
TopScoreDocCollector collector = TopScoreDocCollector.create(2*ResultsPerPage, false);
searcher.search(qry, collector);
//get total no of hits found;
int totalResults = collector.getTotalHits();
int start = 0;
int end = Math.min(totalResults, ResultsPerPage);
ScoreDoc[] hits = collector.topDocs().scoreDocs;

System.out.println("Total hits : "+totalResults+", end : "+end);

for(int i=start; i<end; i++)
{
Document doc = searcher.doc(hits[i].doc);
System.out.println(i+"] "+doc.get("user")+" => "+doc.get("tweet"));
}


System.out.print("\nQuery (enter \"quit\" to exit): ");
searchstr = br.readLine();
if(searchstr == null || searchstr.length() == -1)
{
break;
}
searchstr.trim();
if(searchstr.length()==0)
{
break;
}

}while(!searchstr.equalsIgnoreCase("quit"));

}
}

Monday, September 07, 2009

moniter and improve lucene search

How ?
Simple - use lucidgaze available at http://www.lucidimagination.com/Downloads/LucidGaze-for-Lucene

With lucidgaze you could analyze

  • Read/write stats on index utilization, distribution and throughput

  • Query efficiency stats - show how effectively user input is analyzed and decomposed for processing by the index

  • Mapping of tokenizers, token-streams and analyzers - makes transparent how text is processed and indexed



So, we went ahead and downloaded the api but were unable to find any example program or source code. After some amount of tinkering around we were finally able to figure out how to use it.

You could get the code here : http://ai-cafe.blogspot.com/2009/09/lucid-gaze-tough-nut.html

Monday, August 25, 2008

Thread Executors in java

With Java 1.4, when we used to write threaded applications, we did not have much options in limiting and reusing the same thread for the available tasks. I had written applications earlier where each process that needed to be executed freely (or a process which can be executed on a thread) used to create its own thread. And once a thread was created and used, it was simply discarded. New threads were created whenever needed.

With java 1.5, there is an executor class which allows you to create controlled thread pools and execute the list of tasks on the same number of threads. If the number of tasks exceeds the number of threads then these extra tasks just wait for a thread to become available and then start their execution.

A simple threaded server using java 1.4 :

1 import java.io.*;
2 import java.net.*;
3 import java.util.*;
4
5 public class testPool implements Runnable
6 {
7   private int port;
8   private int active, total;
9
10   public testPool(int port)
11   {
12     this.port = port;
13     System.out.println("Thread pool constructor - creating thread");
14     new Thread(this).start();
15   }
16
17   public void run()
18   {
19     try
20     {
21       System.out.println("Thread pool - new thread created");
22       ServerSocket ss = new ServerSocket(port);
23       while(true)
24       {
25         Socket s = ss.accept();
26         total++;
27         new Handler(s);
28       }
29     }catch(Exception ex)
30     {
31       ex.printStackTrace();
32     }
33   }
34
35   public class Handler implements Runnable
36   {
37     private Socket socket;
38     public Handler(Socket s)
39     {
40       this.socket = s;
41       System.out.println("Handler constructor - creating thread");
42       new Thread(this).start();
43     }
44     public void run()
45     {
46       System.out.println(Thread.currentThread().getName()+" - Handler - new thread created");
47       active++;
48       boolean loop = true;
49       try
50       {
51         BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
52         DataOutputStream out = new DataOutputStream(socket.getOutputStream());
53         while(loop)
54         {
55           String cmd = in.readLine();
56           if(cmd.equals("QUIT"))
57           {
58             loop = false;
59             socket.close();
60           }
61           else if(cmd.equals("INFO"))
62           {
63             System.out.println("Active Connections : "+active);
64             System.out.println("Total Connections : "+total);
65           }
66           else
67           {
68             System.out.println("Command = "+cmd);
69           }
70         }
71       }catch(Exception ex)
72       {
73         ex.printStackTrace();
74       }
75       active--;
76     }
77   }
78
79   public static void main(String[] args)
80   {
81     System.out.println("Starting Daemon");
82     new testPool(Integer.parseInt(args[0]));
83   }
84 }


Compile and run the program

jayant@jayantbox:~/myprogs/java$ java -cp . testPool 5000
Starting Daemon
Thread pool constructor - creating thread
Thread pool - new thread created


Now, as you keep on connecting to localhost 5000 port, you can see that new threads are created. Even when the old connections are closed, the same thread is not used again but new threads are created.

so, if i do

telnet localhost 5000

from 3 consoles and issue the "INFO" command from 4th console, the output is

jayant@jayantbox:~/myprogs/java$ java -cp . testPool 5000
Starting Daemon
Thread pool constructor - creating thread
Thread pool - new thread created
Handler constructor - creating thread
Thread-1 - Handler - new thread created
Handler constructor - creating thread
Thread-2 - Handler - new thread created
Handler constructor - creating thread
Thread-3 - Handler - new thread created
Handler constructor - creating thread
Thread-4 - Handler - new thread created
Active Connections : 1
Total Connections : 4

4 threads are created. In this case, everytime you connect to the server, a new thread would be created.

Now, lets modify the code to work on java 1.5 using thread Executors...

Add the following code snippets at or after the following line numbers

4 import java.util.concurrent.*;
10  private int pool;
12  this.pool=2;
22  ExecutorService threadExecutor = Executors.newFixedThreadPool(pool);
27  threadExecutor.execute(new Handler(s));

And comment the following 2 lines

27  new Handler(s);
42  new Thread(this).start();


Not lets run the program again on port 5000 and try connecting to it. We have created a pool of 2 threads here. Observer that:


  • pool-1-thread-1 & pool-1-thread-2 are the two threads created.

  • After two connections to the server, the connections are not rejected, but are queued.

  • If a thread becomes available, a connection from the queue is assigned to the thread for processing

  • The same two threads are used again and again. No new threads are created.

  • Connections which are in queue are not able to process any requests unless they are assigned to a worker thread



Process for testing

console 1: telnet localhost 5000
console 2: telnet localhost 5000
console 3: telnet localhost 5000
console 1: from 1
console 2: from 2
console 3: from 3
console 1: INFO
console 1: 1 quitting
console 1: QUIT


Output for the testing process


jjayant@jayantbox:~/myprogs/java$ java -cp . testPool 5000
Starting Daemon
Thread pool constructor - creating thread
Thread pool - new thread created
Handler constructor - creating thread
pool-1-thread-1 - Handler - new thread created
Handler constructor - creating thread
pool-1-thread-2 - Handler - new thread created
Handler constructor - creating thread
Command = from 1
Command = from 2
Active Connections : 2
Total Connections : 3
Command = 1 quitting
pool-1-thread-1 - Handler - new thread created
Command = from 3


As seen, processing of connection 3 starts only after connection 1 is closed by the client.

The benefits of using a thread executor instead of a normal thread are

a] Overhead of creating and destroying threads is avoided.
b] A queue of threads is created automatically when the no of tasks exceeds the no of threads in the pool. The tasks in queue are automatically executed when the threads become free.
c] A limit (maximum no of threads) could be imposed thus controlling the available resources for delivering better performance.