Tuesday, April 10, 2012

introducing varnish-cache

Varnish is a web accelerator. It is used as a reverse proxy in front of the actual web server. You must have used either nginx or lighttpd as a reverse proxy in front of you apache( or any other web server ). Why Varnish? Varnish claims that it is very fast - really really fast. The plus point that i can see over here is that varnish has been designed from root up as a reverse proxy. Where as nginx and lighttpd can also be used as a web server.

Lets compile varnish and try setting it up.

Get varnish from https://www.varnish-cache.org/releases. I got the source code of 3.0.2. To compile simply run.

./configure
make
sudo make install

If you went ahead and installed varnish at the default prefix /usr/local, you will be able to find the varnish configuration file at

/usr/local/etc/varnish/default.vcl

The very basic configuration required for starting varnish is the setting of the backend servers. Open up the default.vcl file and put

backend default {
     .host = "";
     .port = "";
}

To start varnish simply run

varnishd -f /usr/local/etc/varnish/default.vcl -s malloc,2G -T 127.0.0.1:2000 -a 0.0.0.0:8080
This states that varnish
-f should use the default.vcl configuration file.
-s has been allocated memory of 2GB.
-T The administration interface is running on localhost at port 2000
-a varnish should listen to port 8080. You will need to change it to 80 when you want to make it live.

Ideally varnish does not cache any page which has a cookie header attached to it. If you have a dynamic web site and are using cookies heavily, you will find that your varnish hit ratio is too low. You can check the hit ratio on varnish using the varnishstat command.

There are few ways to get around it.

1. cache the content along with the cookie in the hash key. This results in a per user cache and there can be hit ratio but it is low.
sub vcl_hash {
    set req.hash += req.http.cookie;
}

2. Remove setcookie from the backend for a particular path. Can be used for static content
sub vcl_recv {
    if (req.url ~ "^/images") {
        unset req.http.cookie;
    }
}

sub vcl_fetch {
    if (req.url ~ "^/images") {
        unset beresp.http.set-cookie;
    }
}

3. Throw away the cookie header for certain file extensions. Mostly js/css and images.

sub vcl_recv {
 if (req.url ~ "\.(png|gif|jpg|swf|css|js)$") {
    lookup;
 }
}

# strip the cookie before the image is inserted into cache.
sub vcl_fetch {
 if (req.url ~ "\.(png|gif|jpg|swf|css|js)$") {
   unset beresp.http.set-cookie;
}





Varnish can also be used as a load balancer with multiple backends. Lets see the configuration.

First create multiple backends in the config file

backend web1 {
     .host = "192.168.1.1";
     .port = "81";
     .probe = {
        .url = "/";
        .interval = 5s;
        .timeout = 1 s;
        .window = 5;
        .threshold = 3;
     }
}

backend web2 {
     .host = "192.168.1.2";
     .port = "81";
     .probe = {
        .url = "/";
        .interval = 5s;
        .timeout = 1 s;
        .window = 5;
        .threshold = 3;
     }
}

For each backend there is a health check probe. Varnish should fetch the "/" every 5 sec. If it takes more than 1 sec, it is considered a failure.
If more than 3 out of last 5 probes are ok, the backend is considered healthy.

Now create a director. There are a number of directors - random, client, hash, round-robin, DNS and fallback. Lets configure a random director and we will see what can be done using the different directors. To configure random director

director web random {
        {
                .backend = web1;
                .weight = 1;
        }
        {
                .backend = web2;
                .weight = 1;
        }
}

Now tell your requests to use the "web" director for serving requests

sub vcl_recv {
   if (req.http.host ~ "^(www.)?mysite.com$") {
       set req.backend = web;
   }
}

Lets see what the different directors are there for.

The client director
       The client director picks a backend based on the clients identity. You can set the VCL variable client.identity to identify the client by picking up the value of a  session  cookie.

The hash director
       The hash director will pick a backend based on the URL hash value. This is useful is you are using Varnish to load balance in front of other Varnish caches or other web accelerators as objects won't be duplicated across caches.      

The round-robin director
       The round-robin director does not take any options. It will use the first backend for the first request, the second backend for the second request and so on, and start from the top again when it gets to the end. If a backend is unhealthy or Varnish fails to connect, it will be skipped.  The round-robin director will try all the backends once before giving up.

The DNS director
       The DNS director can use backends in two different ways. Either like the random or round-robin director or using .list:

       director directorname dns {
               .list = {
                       .host_header = "www.example.com";
                       .port = "80";
                       .connect_timeout = 0.4s;
                       "192.168.15.0"/24;
                       "192.168.16.128"/25;
               }
               .ttl = 5m;
               .suffix = "internal.example.net";
       }


       This will specify 384 backends, all using port 80 and a connection timeout of 0.4s. Options must come before the list of IPs in the .list statement. The .ttl defines the cache duration of the DNS lookups. Health checks are not thoroughly supported. DNS round robin balancing is supported. If a hostname resolves to multiple backends, the director will divide the traffic between all of them in a round-robin manner.

The fallback director
     The fallback director will pick the first backend that is healthy. It considers them in the order in which they are listed in its definition. The fallback director does not take any options.

       An example of a fallback director:

       director b3 fallback {
         { .backend = www1; }
         { .backend = www2; } // will only be used if www1 is unhealthy.
         { .backend = www3; } // will only be used if both www1 and www2
                              // are unhealthy.
       }

There is a huge list of configurations that can be done in varnish. You can check the list here and see which suits your needs.

https://www.varnish-cache.org/trac/wiki/VCLExamples

No comments: