Wednesday, June 08, 2011

Data structures - TRIE

Trie is a data structure used for data retrieval. It is a multiway tree structure used to store strings over an alphabet. It is mainly used for large dictionaries of english words in spell-checking and natural language understanding programs. The idea is that all strings which have a common stem or prefix hang off a common node. the elements in a string can be recovered by scanning from the root to the leaf that ends the string. All strings in a trie can be recovered by doing a depth first scan of the tree.

The time to insert, delete or find is almost identical because the code paths followed for each are almost identical. As a result, for inserting, deleting or searching, tries can easily beat binary search trees or even hash tables.

Advantages of Trie over Binary Search Tree
1. Looking up of keys is faster. Worst case scenario a key of length m takes O(m) time. BST performs search by doing O(log(n)) comparisons. In worst case BST takes O(m log n) time. Simple operations that tries use during lookup such as array indexing using a character are fast on real machines.
2. Tries are more space efficient when they contain a large number of short keys, because nodes are shared between keys with common initial subsequences.
3. Tries facilitate longest-prefix matching, helping to find the key sharing the longest possible prefix of characters all unique.
4. The number of internal nodes from root to leaf equals the length of the key. Balancing the tree is therefore no concern.

Advantages of Trie over Hash Tables
1. Tries support ordered iteration, whereas iteration over a hash table will result in a pseudorandom order given by the hash function (also, the order of hash collisions is implementation defined), which is usually meaningless
2. Tries facilitate longest-prefix matching, but hashing does not, as a consequence of the above. Performing such a "closest fit" find can, depending on implementation, be as quick as an exact find.
3. Tries tend to be faster on average at insertion than hash tables because hash tables must rebuild their index when it becomes full - a very expensive operation. Tries therefore have much better bounded worst case time costs, which is important for latency sensitive programs.
4. By avoiding the hash function, tries are generally faster than hash tables for small keys like integers and pointers.

Some disadvantages of Trie
1. Tries can be slower in some cases than hash tables for looking up data, especially if the data is directly accessed on a hard disk drive or some other secondary storage device where the random access time is high compared to main memory
2. Some keys, such as floating point numbers, can lead to long chains and prefixes that are not particularly meaningful. Nevertheless a bitwise trie can handle standard IEEE single and double format floating point numbers.

Code for implementing trie in java
import java.util.HashMap;
import java.util.Map;

public class Trie {
    private static class Node {
        private boolean isWord = false; // indicates a complete word
        private int prefixes = 0; // indicates how many words have the prefix
        private Map children = new HashMap(); // references to all possible children
    }

    private Node root = new Node();

    /**
     * Inserts a new word into the trie
     * @param word
     */
    public void insertWord(String word){
        if(searchWord(word) == true) return;

        Node current = root;
        for(char c : word.toCharArray()){
            if(current.children.containsKey(Character.valueOf(c))){
                Node child = current.children.get(Character.valueOf(c));
                child.prefixes++;

                current = child;
            }else{
                Node child = new Node();
                child.prefixes = 1;

                current.children.put(Character.valueOf(c), child);
                current = child;
            }
        }
        // we have reached the endof the word, hence mark it true
        // if during a search we reach the end of the search string and this
        // flag is still false, then the search string is not registered as a valid
        // word in the trie but is a prefix
        current.isWord = true;
    }

    /**
     * Searches for a word in the trie
     * @param word
     */
    public boolean searchWord(String word){
        Node current = root;
        for(char c : word.toCharArray()){
            if(current.children.containsKey(Character.valueOf(c))){
                current = current.children.get(Character.valueOf(c));
            }else{
                return false;
            }
        }
        // if at the end of the search of entire word the boolean variable is
        // still false means that the given word is not regitered as a valid
        // word in the trie, but is a prefix
        return current.isWord;
    }

    /**
     * Deletes a word from the trie
     * @param word
     */
    public void deleteWord(String word){
        if(searchWord(word) == false) return;

        Node current = root;
        for(char c : word.toCharArray()){
            Node child = current.children.get(Character.valueOf(c));
            if(child.prefixes == 1){
                current.children.remove(Character.valueOf(c));
                return;
            }else{
                child.prefixes--;
                current = child;
            }
        }
        // since the word is removed now, set the flag to false
        current.isWord = false;
    }

    public static void main(String[] args) {
        Trie trie = new Trie();
        trie.insertWord("ball");
        trie.insertWord("balls");
        trie.insertWord("bat");
        trie.insertWord("doll");
        trie.insertWord("dork");
        trie.insertWord("dorm");
        trie.insertWord("send");
        trie.insertWord("sense");

        // testing deletion
        System.out.println("Testing deletion : ");
        System.out.println(trie.searchWord("ball"));
        trie.deleteWord("ball");
        System.out.println(trie.searchWord("ball"));
        System.out.println(trie.searchWord("balls"));

        // testing insertion
        System.out.println("Testing insertion : ");
        System.out.println(trie.searchWord("dumb"));
        trie.insertWord("dumb");
        System.out.println(trie.searchWord("dumb"));
        trie.deleteWord("dumb");
        System.out.println(trie.searchWord("dumb"));
    }
} 

OUTPUT : java Trie

Testing deletion :
true
false
true
Testing insertion :
false
true

Source : http://en.wikipedia.org/wiki/Trie,
http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/Trie/

2 comments:

Shrikant Thakare said...

Great post.You just need to cast object you get from hashmap to Node like (Node) current.children.get(Character.valueOf(c))

Anonymous said...

Hi , how we can write printTree method for this data structure?
thank you :-)