Here is a quick step by step instruction for setting up solr cloud on few machines. We will be setting up solr cloud to match production environment - that is - there will be separate setup for zookeeper & solr will be sitting inside tomcat instead of the default jetty.
Apache zookeeper is a centralized service for maintaining configuration information. In case of solr cloud, the solr configuration information is maintained in zookeeper. Lets set up the zookeeper first.
We will be setting up solr cloud on 3 machines. Please make the following entries in /etc/host file on all machines.
172.22.67.101 solr1
172.22.67.102 solr2
172.22.67.103 solr3
Lets setup the zookeeper cloud on 2 machines
download and untar zookeeper in /opt/zookeeper directory on both servers solr1 & solr2. On both the servers do the following
root@solr1$ mkdir /opt/zookeeper/data
root@solr1$ cp /opt/zookeeper/conf/zoo_sample.cfg /opt/zookeeper/conf/zoo.cfg
root@solr1$ vim /opt/zookeeper/zoo.cfg
Make the following changes in the zoo.cfg file
dataDir=/opt/zookeeper/data
server.1=solr1:2888:3888
server.2=solr2:2888:3888
Save the zoo.cfg file.
server.x=[hostname]:nnnnn[:nnnnn] : here x should match with the server id - which is there in the myid file in zookeeper data directory.
Assign different ids to the zookeeper servers
on solr1
root@solr1$ cat 1 > /opt/zookeeper/data/myid
on solr2
root@solr2$ cat 2 > /opt/zookeeper/data/myid
Start zookeeper on both the servers
root@solr1$ cd /opt/zookeeper
root@solr1$ ./bin/zkServer.sh start
Note : in future when you need to reset the cluster/shards information do the following
root@solr1$ ./bin/zkCli.sh -server solr1:2181
[zk: solr1:2181(CONNECTED) 0] rmr /clusterState.json
Now lets setup solr and start solr with the external zookeeper.
install solr on all 3 machines. I installed them in /opt/solr folder.
Start the first solr and upload solr configuration into the zookeeper cluster.
root@solr1$ cd /opt/solr/example
root@solr1$ java -DzkHost=solr1:2181,solr2:2181 -Dbootstrap_confdir=solr/collection1/conf/ -DnumShards=2 -jar start.jar
Here number of shards is specified as 2. So our cluster will have 2 shards and multiple replicas per shard.
Now start solr on remaining servers.
root@solr2$ cd /opt/solr/example
root@solr2$ java -DzkHost=solr1:2181,solr2:2181 -DnumShards=2 -jar start.jar
root@solr3$ cd /opt/solr/example
root@solr3$ java -DzkHost=solr1:2181,solr2:2181 -DnumShards=2 -jar start.jar
Note : It is important to put the "numShards" parameter, else numShards gets reset to 1.
Point your browser to http://172.22.67.101:8983/solr/
Click on "cloud"->"graph" on your left section and you can see that there are 2 nodes in shard 1 and 1 node in shard 2.
Lets feed some data and see how they are distributed across multiple shards.
root@solr3$ cd /opt/solr/example/exampledocs
root@solr3$ java -jar post.jar *.xml
Lets check the data as it was distributed between the two shards. Head back to to Solr admin cloud graph page at http://172.22.67.101:8983/solr/.
click on the first shard collection1, you may have 14 documents in this shard
click on the second shard collection1, you may have 18 documents in this shard
check the replicas for each shard, they should have the same counts
At this point, we can start issues some queries against the collection:
Get all documents in the collection:
http://172.22.67.101:8983/solr/collection1/select?q=*:*
Get all documents in the collection belonging to shard1:
http://172.22.67.101:8983/solr/collection1/select?q=*:*&shards=shard1
Get all documents in the collection belonging to shard2:
http://172.22.67.101:8983/solr/collection1/select?q=*:*&shards=shard2
Lets check what zookeeper has in its cluster.
root@solr3$ cd /opt/zookeeper/
root@solr3$ ./bin/zkCli.sh -server solr1:2181
[zk: 172.22.67.101:2181(CONNECTED) 1] get /clusterstate.json
{"collection1":{
"shards":{
"shard1":{
"range":"80000000-ffffffff",
"state":"active",
"replicas":{
"172.22.67.101:8983_solr_collection1":{
"shard":"shard1",
"state":"active",
"core":"collection1",
"collection":"collection1",
"node_name":"172.22.67.101:8983_solr",
"base_url":"http://172.22.67.101:8983/solr",
"leader":"true"},
"172.22.67.102:8983_solr_collection1":{
"shard":"shard1",
"state":"active",
"core":"collection1",
"collection":"collection1",
"node_name":"172.22.67.102:8983_solr",
"base_url":"http://172.22.67.102:8983/solr"}}},
"shard2":{
"range":"0-7fffffff",
"state":"active",
"replicas":{"172.22.67.103:8983_solr_collection1":{
"shard":"shard2",
"state":"active",
"core":"collection1",
"collection":"collection1",
"node_name":"172.22.67.103:8983_solr",
"base_url":"http://172.22.67.103:8983/solr",
"leader":"true"}}}},
"router":"compositeId"}}
It can be seen that there are 2 shards. Shard 1 has only 1 replica and shard 2 has 2 replicas. As you keep on adding more nodes, the number of replicas per shard will keep on increasing.
Now lets configure solr to use tomcat and add zookeeper related and numShards configuration into tomcat for solr.
Install tomcat on all machines in /opt folder. And create the following files on all machines.
cat /opt/apache-tomcat-7.0.40/conf/Catalina/localhost/solr.xml
<?xml version="1.0" encoding="UTF-8"?>
<Context path="/solr/home"
docBase="/opt/solr/example/webapps/solr.war"
allowlinking="true"
crosscontext="true"
debug="0"
antiResourceLocking="false"
privileged="true">
<Environment name="solr/home" override="true" type="java.lang.String" value="/opt/solr/example/solr" />
</Context>
on solr1 & solr2 create the solr.xml file for shard1
root@solr1$ cat /opt/solr/example/solr/solr.xml
<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="true" zkHost="solr1:2181,solr2:2181">
<cores defaultCoreName="collection1" adminPath="/admin/cores" zkClientTimeout="${zkClientTimeout:15000}" hostPort="8080" hostContext="solr">
<core loadOnStartup="true" shard="shard1" instanceDir="collection1/" transient="false" name="collection1"/>
</cores>
</solr>
on solr3 create the solr.xml file for shard2
root@solr3$ cat /opt/solr/example/solr/solr.xml
<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="true" zkHost="solr1:2181,solr2:2181">
<cores defaultCoreName="collection1" adminPath="/admin/cores" zkClientTimeout="${zkClientTimeout:15000}" hostPort="8080" hostContext="solr">
<core loadOnStartup="true" shard="shard2" instanceDir="collection1/" transient="false" name="collection1"/>
</cores>
</solr>
set the numShards variable as a part of solr starup environment variable on all machines.
root@solr1$ cat /opt/apache-tomcat-7.0.40/bin/setenv.sh
export JAVA_OPTS=' -Xms4096M -Xmx8192M -DnumShards=2 '
To be on the safer side, cleanup the clusterstate.json in zookeeper. Now start tomcat on all machines and check the catalina.out file for errors if any. Once all nodes are up, you should be able to point your browser to http://172.22.67.101:8080/solr -> cloud -> graph and see the 3 nodes which form the cloud.
Lets add a new node to shard 2. It will be added as a replica of current node on shard 2.
http://172.22.67.101:8983/solr/admin/cores?action=CREATE&name=collection1_shard2_replica2&collection=collection1&shard=shard2
And lets check the clusterstate.json file now
root@solr3$ cd /opt/zookeeper/
root@solr3$ ./bin/zkCli.sh -server solr1:2181
[zk: 172.22.67.101:2181(CONNECTED) 2] get /clusterstate.json
{"collection1":{
"shards":{
"shard1":{
"range":"80000000-ffffffff",
"state":"active",
"replicas":{
"172.22.67.101:8080_solr_collection1":{
"shard":"shard1",
"state":"active",
"core":"collection1",
"collection":"collection1",
"node_name":"172.22.67.101:8080_solr",
"base_url":"http://172.22.67.101:8080/solr",
"leader":"true"},
"172.22.67.102:8080_solr_collection1":{
"shard":"shard1",
"state":"active",
"core":"collection1",
"collection":"collection1",
"node_name":"172.22.67.102:8080_solr",
"base_url":"http://172.22.67.102:8080/solr"}}},
"shard2":{
"range":"0-7fffffff",
"state":"active",
"replicas":{
"172.22.67.103:8080_solr_collection1":{
"shard":"shard2",
"state":"active",
"core":"collection1",
"collection":"collection1",
"node_name":"172.22.67.103:8080_solr",
"base_url":"http://172.22.67.103:8080/solr",
"leader":"true"},
"172.22.67.103:8080_solr_collection1_shard2_replica2":{
"shard":"shard2",
"state":"active",
"core":"collection1_shard2_replica2",
"collection":"collection1",
"node_name":"172.22.67.103:8080_solr",
"base_url":"http://172.22.67.103:8080/solr"}}}},
"router":"compositeId"}}
Similar to adding more nodes, you can unload and delete a node in solr.
http://172.22.67.101:8080/solr/admin/cores?action=UNLOAD&core=collection_shard2_replica2&deleteIndex=true
More details can be obtained from
http://wiki.apache.org/solr/SolrCloud
Apache zookeeper is a centralized service for maintaining configuration information. In case of solr cloud, the solr configuration information is maintained in zookeeper. Lets set up the zookeeper first.
We will be setting up solr cloud on 3 machines. Please make the following entries in /etc/host file on all machines.
172.22.67.101 solr1
172.22.67.102 solr2
172.22.67.103 solr3
Lets setup the zookeeper cloud on 2 machines
download and untar zookeeper in /opt/zookeeper directory on both servers solr1 & solr2. On both the servers do the following
root@solr1$ mkdir /opt/zookeeper/data
root@solr1$ cp /opt/zookeeper/conf/zoo_sample.cfg /opt/zookeeper/conf/zoo.cfg
root@solr1$ vim /opt/zookeeper/zoo.cfg
Make the following changes in the zoo.cfg file
dataDir=/opt/zookeeper/data
server.1=solr1:2888:3888
server.2=solr2:2888:3888
Save the zoo.cfg file.
server.x=[hostname]:nnnnn[:nnnnn] : here x should match with the server id - which is there in the myid file in zookeeper data directory.
Assign different ids to the zookeeper servers
on solr1
root@solr1$ cat 1 > /opt/zookeeper/data/myid
on solr2
root@solr2$ cat 2 > /opt/zookeeper/data/myid
Start zookeeper on both the servers
root@solr1$ cd /opt/zookeeper
root@solr1$ ./bin/zkServer.sh start
Note : in future when you need to reset the cluster/shards information do the following
root@solr1$ ./bin/zkCli.sh -server solr1:2181
[zk: solr1:2181(CONNECTED) 0] rmr /clusterState.json
Now lets setup solr and start solr with the external zookeeper.
install solr on all 3 machines. I installed them in /opt/solr folder.
Start the first solr and upload solr configuration into the zookeeper cluster.
root@solr1$ cd /opt/solr/example
root@solr1$ java -DzkHost=solr1:2181,solr2:2181 -Dbootstrap_confdir=solr/collection1/conf/ -DnumShards=2 -jar start.jar
Here number of shards is specified as 2. So our cluster will have 2 shards and multiple replicas per shard.
Now start solr on remaining servers.
root@solr2$ cd /opt/solr/example
root@solr2$ java -DzkHost=solr1:2181,solr2:2181 -DnumShards=2 -jar start.jar
root@solr3$ cd /opt/solr/example
root@solr3$ java -DzkHost=solr1:2181,solr2:2181 -DnumShards=2 -jar start.jar
Note : It is important to put the "numShards" parameter, else numShards gets reset to 1.
Point your browser to http://172.22.67.101:8983/solr/
Click on "cloud"->"graph" on your left section and you can see that there are 2 nodes in shard 1 and 1 node in shard 2.
Lets feed some data and see how they are distributed across multiple shards.
root@solr3$ cd /opt/solr/example/exampledocs
root@solr3$ java -jar post.jar *.xml
Lets check the data as it was distributed between the two shards. Head back to to Solr admin cloud graph page at http://172.22.67.101:8983/solr/.
click on the first shard collection1, you may have 14 documents in this shard
click on the second shard collection1, you may have 18 documents in this shard
check the replicas for each shard, they should have the same counts
At this point, we can start issues some queries against the collection:
Get all documents in the collection:
http://172.22.67.101:8983/solr/collection1/select?q=*:*
Get all documents in the collection belonging to shard1:
http://172.22.67.101:8983/solr/collection1/select?q=*:*&shards=shard1
Get all documents in the collection belonging to shard2:
http://172.22.67.101:8983/solr/collection1/select?q=*:*&shards=shard2
Lets check what zookeeper has in its cluster.
root@solr3$ cd /opt/zookeeper/
root@solr3$ ./bin/zkCli.sh -server solr1:2181
[zk: 172.22.67.101:2181(CONNECTED) 1] get /clusterstate.json
{"collection1":{
"shards":{
"shard1":{
"range":"80000000-ffffffff",
"state":"active",
"replicas":{
"172.22.67.101:8983_solr_collection1":{
"shard":"shard1",
"state":"active",
"core":"collection1",
"collection":"collection1",
"node_name":"172.22.67.101:8983_solr",
"base_url":"http://172.22.67.101:8983/solr",
"leader":"true"},
"172.22.67.102:8983_solr_collection1":{
"shard":"shard1",
"state":"active",
"core":"collection1",
"collection":"collection1",
"node_name":"172.22.67.102:8983_solr",
"base_url":"http://172.22.67.102:8983/solr"}}},
"shard2":{
"range":"0-7fffffff",
"state":"active",
"replicas":{"172.22.67.103:8983_solr_collection1":{
"shard":"shard2",
"state":"active",
"core":"collection1",
"collection":"collection1",
"node_name":"172.22.67.103:8983_solr",
"base_url":"http://172.22.67.103:8983/solr",
"leader":"true"}}}},
"router":"compositeId"}}
It can be seen that there are 2 shards. Shard 1 has only 1 replica and shard 2 has 2 replicas. As you keep on adding more nodes, the number of replicas per shard will keep on increasing.
Now lets configure solr to use tomcat and add zookeeper related and numShards configuration into tomcat for solr.
Install tomcat on all machines in /opt folder. And create the following files on all machines.
cat /opt/apache-tomcat-7.0.40/conf/Catalina/localhost/solr.xml
<Context path="/solr/home"
docBase="/opt/solr/example/webapps/solr.war"
allowlinking="true"
crosscontext="true"
debug="0"
antiResourceLocking="false"
privileged="true">
<Environment name="solr/home" override="true" type="java.lang.String" value="/opt/solr/example/solr" />
</Context>
on solr1 & solr2 create the solr.xml file for shard1
root@solr1$ cat /opt/solr/example/solr/solr.xml
<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="true" zkHost="solr1:2181,solr2:2181">
<cores defaultCoreName="collection1" adminPath="/admin/cores" zkClientTimeout="${zkClientTimeout:15000}" hostPort="8080" hostContext="solr">
<core loadOnStartup="true" shard="shard1" instanceDir="collection1/" transient="false" name="collection1"/>
</cores>
</solr>
on solr3 create the solr.xml file for shard2
root@solr3$ cat /opt/solr/example/solr/solr.xml
<solr persistent="true" zkHost="solr1:2181,solr2:2181">
<cores defaultCoreName="collection1" adminPath="/admin/cores" zkClientTimeout="${zkClientTimeout:15000}" hostPort="8080" hostContext="solr">
<core loadOnStartup="true" shard="shard2" instanceDir="collection1/" transient="false" name="collection1"/>
</cores>
</solr>
set the numShards variable as a part of solr starup environment variable on all machines.
root@solr1$ cat /opt/apache-tomcat-7.0.40/bin/setenv.sh
export JAVA_OPTS=' -Xms4096M -Xmx8192M -DnumShards=2 '
To be on the safer side, cleanup the clusterstate.json in zookeeper. Now start tomcat on all machines and check the catalina.out file for errors if any. Once all nodes are up, you should be able to point your browser to http://172.22.67.101:8080/solr -> cloud -> graph and see the 3 nodes which form the cloud.
Lets add a new node to shard 2. It will be added as a replica of current node on shard 2.
http://172.22.67.101:8983/solr/admin/cores?action=CREATE&name=collection1_shard2_replica2&collection=collection1&shard=shard2
And lets check the clusterstate.json file now
root@solr3$ cd /opt/zookeeper/
root@solr3$ ./bin/zkCli.sh -server solr1:2181
[zk: 172.22.67.101:2181(CONNECTED) 2] get /clusterstate.json
{"collection1":{
"shards":{
"shard1":{
"range":"80000000-ffffffff",
"state":"active",
"replicas":{
"172.22.67.101:8080_solr_collection1":{
"shard":"shard1",
"state":"active",
"core":"collection1",
"collection":"collection1",
"node_name":"172.22.67.101:8080_solr",
"base_url":"http://172.22.67.101:8080/solr",
"leader":"true"},
"172.22.67.102:8080_solr_collection1":{
"shard":"shard1",
"state":"active",
"core":"collection1",
"collection":"collection1",
"node_name":"172.22.67.102:8080_solr",
"base_url":"http://172.22.67.102:8080/solr"}}},
"shard2":{
"range":"0-7fffffff",
"state":"active",
"replicas":{
"172.22.67.103:8080_solr_collection1":{
"shard":"shard2",
"state":"active",
"core":"collection1",
"collection":"collection1",
"node_name":"172.22.67.103:8080_solr",
"base_url":"http://172.22.67.103:8080/solr",
"leader":"true"},
"172.22.67.103:8080_solr_collection1_shard2_replica2":{
"shard":"shard2",
"state":"active",
"core":"collection1_shard2_replica2",
"collection":"collection1",
"node_name":"172.22.67.103:8080_solr",
"base_url":"http://172.22.67.103:8080/solr"}}}},
"router":"compositeId"}}
Similar to adding more nodes, you can unload and delete a node in solr.
http://172.22.67.101:8080/solr/admin/cores?action=UNLOAD&core=collection_shard2_replica2&deleteIndex=true
More details can be obtained from
http://wiki.apache.org/solr/SolrCloud