Saturday, April 30, 2011

Hadoop 0.21 update

This is an update for setting up hadoop. There have been some changes in configuration files and startup/shutdown scripts

Following configuration files are to be created in <hadoop_directory>/conf folder

  • hdfs-site.xml
    <configuration>

    <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hadoop/hadoop-${user.name}</value>
    <description>A base for other temporary directories.</description>
    </property>


    <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>Default block replication.
    The actual number of replications can be specified when the file is created.
    The default is used if replication is not specified in create time.
    </description>
    </property>

    </configuration>

  • core-site.xml

    <configuration>

    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:54310</value>
    <description>The name of the default file system. A URI whose
    scheme and authority determine the FileSystem implementation. The
    uri's scheme determines the config property (fs.SCHEME.impl) naming
    the FileSystem implementation class. The uri's authority is used to
    determine the host, port, etc. for a filesystem.</description>
    </property>

    </configuration>

  • mapred-site.xml

    <configuration>

    <property>
    <name>mapred.job.tracker</name>
    <value>localhost:54311</value>
    <description>The host and port that the MapReduce job tracker runs
    at. If "local", then jobs are run in-process as a single map
    and reduce task.
    </description>
    </property>

    </configuration>


Earlier all these settings were in a single file hadoop-site.xml

Similar to this breakup of configuration files, the scripts to start different services have also been separated. Now there are separate scripts to start and stop dfs, trackers and balancers.

in <hadoop_directory>

./bin/start-dfs.sh
./bin/stop-dfs.sh
./bin/start-mapred.sh
./bin/stop-mapred.sh


Rest of the configurations ought to remain the same.

For those who get this error : java.io.IOException: Incompatible namespaceIDs
The solution is to change the namespaceID in

<hadoop data directory>/dfs/data/current/VERSION to match the namespaceID in
<hadoop data directory>/dfs/name/current/VERSION