Whatever....: apache

Showing posts with label apache. Show all posts

Tuesday, April 10, 2012

introducing varnish-cache

Varnish is a web accelerator. It is used as a reverse proxy in front of the actual web server. You must have used either nginx or lighttpd as a reverse proxy in front of you apache( or any other web server ). Why Varnish? Varnish claims that it is very fast - really really fast. The plus point that i can see over here is that varnish has been designed from root up as a reverse proxy. Where as nginx and lighttpd can also be used as a web server.

Lets compile varnish and try setting it up.

Get varnish from https://www.varnish-cache.org/releases. I got the source code of 3.0.2. To compile simply run.

./configure
make
sudo make install
If you went ahead and installed varnish at the default prefix /usr/local, you will be able to find the varnish configuration file at

/usr/local/etc/varnish/default.vcl

The very basic configuration required for starting varnish is the setting of the backend servers. Open up the default.vcl file and put

backend default {
     .host = "";
     .port = "";
}
To start varnish simply run

varnishd -f /usr/local/etc/varnish/default.vcl -s malloc,2G -T 127.0.0.1:2000 -a 0.0.0.0:8080
This states that varnish
-f should use the default.vcl configuration file.
-s has been allocated memory of 2GB.
-T The administration interface is running on localhost at port 2000
-a varnish should listen to port 8080. You will need to change it to 80 when you want to make it live.

Ideally varnish does not cache any page which has a cookie header attached to it. If you have a dynamic web site and are using cookies heavily, you will find that your varnish hit ratio is too low. You can check the hit ratio on varnish using the varnishstat command.

There are few ways to get around it.

1. cache the content along with the cookie in the hash key. This results in a per user cache and there can be hit ratio but it is low.
sub vcl_hash {
    set req.hash += req.http.cookie;
}
2. Remove setcookie from the backend for a particular path. Can be used for static content
sub vcl_recv {
    if (req.url ~ "^/images") {
        unset req.http.cookie;
    }
}
sub vcl_fetch {
    if (req.url ~ "^/images") {
        unset beresp.http.set-cookie;
    }
}
3. Throw away the cookie header for certain file extensions. Mostly js/css and images.

sub vcl_recv {
if (req.url ~ "\.(png|gif|jpg|swf|css|js)$") {
    lookup;
}
}
# strip the cookie before the image is inserted into cache.
sub vcl_fetch {
if (req.url ~ "\.(png|gif|jpg|swf|css|js)$") {
   unset beresp.http.set-cookie;
}

Varnish can also be used as a load balancer with multiple backends. Lets see the configuration.

First create multiple backends in the config file

backend web1 {
     .host = "192.168.1.1";
     .port = "81";
     .probe = {
        .url = "/";
        .interval = 5s;
        .timeout = 1 s;
        .window = 5;
        .threshold = 3;
     }
}

backend web2 {
     .host = "192.168.1.2";
     .port = "81";
     .probe = {
        .url = "/";
        .interval = 5s;
        .timeout = 1 s;
        .window = 5;
        .threshold = 3;
     }
}
For each backend there is a health check probe. Varnish should fetch the "/" every 5 sec. If it takes more than 1 sec, it is considered a failure.
If more than 3 out of last 5 probes are ok, the backend is considered healthy.

Now create a director. There are a number of directors - random, client, hash, round-robin, DNS and fallback. Lets configure a random director and we will see what can be done using the different directors. To configure random director

director web random {
        {
                .backend = web1;
                .weight = 1;
        }
        {
                .backend = web2;
                .weight = 1;
        }
}
Now tell your requests to use the "web" director for serving requests

sub vcl_recv {
   if (req.http.host ~ "^(www.)?mysite.com$") {
       set req.backend = web;
   }
}
Lets see what the different directors are there for.

The client director
       The client director picks a backend based on the clients identity. You can set the VCL variable client.identity to identify the client by picking up the value of a session cookie.

The hash director
       The hash director will pick a backend based on the URL hash value. This is useful is you are using Varnish to load balance in front of other Varnish caches or other web accelerators as objects won't be duplicated across caches.

The round-robin director
       The round-robin director does not take any options. It will use the first backend for the first request, the second backend for the second request and so on, and start from the top again when it gets to the end. If a backend is unhealthy or Varnish fails to connect, it will be skipped. The round-robin director will try all the backends once before giving up.

The DNS director
       The DNS director can use backends in two different ways. Either like the random or round-robin director or using .list:

       director directorname dns {
               .list = {
                       .host_header = "www.example.com";
                       .port = "80";
                       .connect_timeout = 0.4s;
                       "192.168.15.0"/24;
                       "192.168.16.128"/25;
               }
               .ttl = 5m;
               .suffix = "internal.example.net";
       }

       This will specify 384 backends, all using port 80 and a connection timeout of 0.4s. Options must come before the list of IPs in the .list statement. The .ttl defines the cache duration of the DNS lookups. Health checks are not thoroughly supported. DNS round robin balancing is supported. If a hostname resolves to multiple backends, the director will divide the traffic between all of them in a round-robin manner.

The fallback director
     The fallback director will pick the first backend that is healthy. It considers them in the order in which they are listed in its definition. The fallback director does not take any options.

       An example of a fallback director:

       director b3 fallback {
         { .backend = www1; }
         { .backend = www2; } // will only be used if www1 is unhealthy.
         { .backend = www3; } // will only be used if both www1 and www2
                              // are unhealthy.
       }
There is a huge list of configurations that can be done in varnish. You can check the list here and see which suits your needs.

https://www.varnish-cache.org/trac/wiki/VCLExamples

Saturday, October 15, 2011

Nginx with php-fpm versus apache with modphp

I have been using apache with modphp for ages now. It works like a charm. But as the concurrency increases, apache chews up a lot of resources. I came across nginx some time back. It is a very light weight server which I had used earlier to serve static content. I thought why not try it to serve dynamic pages as well. After some searching, i found that nginx with php-fpm is a lethal combination as a dynamic web server.

php-fpm stands for php fastcgi process manager. It is a fastcgi implementation of php with some additional features. Have a look here http://php-fpm.org/

So without getting too much into the technicalities, lets focus on the benchmark. I did a benchmark on my laptop which has a i5 processor and 4 GB of ram. Running ubuntu 11.10 - kernel 3.0.0.12 64 bit. The software versions which i used were

php-5.3.8
apache 2.2.19
nginx 1.1.5
XCache v1.3.2

Benchmark process :
First i compiled php with apache using the following configure command.

'./configure' --with-apxs2=/usr/local/apache2/bin/apxs '--with-gd' '--with-curl' '--with-mysql=mysqlnd' '--with-mysqli=mysqlnd' '--with-pdo-mysql=mysqlnd' '--enable-mbstring'

Ran my tests and then compiled the php again with fpm, configured nginx and ran the benchmark again.

To compile php with fpm do

'./configure' '--enable-fpm' '--with-fpm-user=jayant' '--with-fpm-group=jayant' '--with-gd' '--with-curl' '--with-mysql=mysqlnd' '--with-mysqli=mysqlnd' '--with-pdo-mysql=mysqlnd' '--enable-mbstring'

I ran tests for 3 levels of concurrency 100, 500 and 1000 for 10 minutes each. The code that i was benchmarking was simple phpinfo with a random number display. I used siege for benchmarking.

x.php
------
<?php
echo "Random : ".rand(1,100000).'
';
phpinfo();
?>

nginx with php-fpm	apache with mod php
concurrency : 100 Time : 10 min siege -i -c 100 -t 10m http://localhost/x.php Load : 0.10 Lifting the server siege... done. Transactions: 118171 hits Availability: 100.00 % Elapsed time: 599.56 secs Data transferred: 6611.79 MB Response time: 0.00 secs Transaction rate: 197.10 trans/sec Throughput: 11.03 MB/sec Concurrency: 0.96 Successful transactions: 118171 Failed transactions: 0 Longest transaction: 0.07 Shortest transaction: 0.00	concurrency : 100 Time : 10 min siege -i -c 100 -t 10m http://localhost/x.php Load : 0.25 Lifting the server siege... done. Transactions: 118688 hits Availability: 100.00 % Elapsed time: 599.55 secs Data transferred: 7278.54 MB Response time: 0.01 secs Transaction rate: 197.96 trans/sec Throughput: 12.14 MB/sec Concurrency: 0.99 Successful transactions: 118688 Failed transactions: 0 Longest transaction: 0.09 Shortest transaction: 0.00
siege -i -c 500 -t 10m http://localhost/x.php concurrency : 500 Time : 10 min Load : 2.0 Lifting the server siege... done. Transactions: 589098 hits Availability: 100.00 % Elapsed time: 599.44 secs Data transferred: 32960.63 MB Response time: 0.01 secs Transaction rate: 982.75 trans/sec Throughput: 54.99 MB/sec Concurrency: 7.59 Successful transactions: 589098 Failed transactions: 0 Longest transaction: 3.23 Shortest transaction: 0.00	siege -i -c 500 -t 10m http://localhost/x.php concurrency : 500 Time : 10 min Load : 20 siege aborted due to excessive socket failure; you can change the failure threshold in $HOME/.siegerc Transactions: 45954 hits Availability: 97.36 % Elapsed time: 50.84 secs Data transferred: 2818.13 MB Response time: 0.02 secs Transaction rate: 903.89 trans/sec Throughput: 55.43 MB/sec Concurrency: 14.47 Successful transactions: 45954 Failed transactions: 1248 Longest transaction: 3.30 Shortest transaction: 0.00
siege -i -c 1000 -t 10m http://localhost/x.php concurrency : 1000 Time : 10 min Load : 48 Lifting the server siege... done. Transactions: 941105 hits Availability: 99.98 % Elapsed time: 599.43 secs Data transferred: 52655.81 MB Response time: 0.14 secs Transaction rate: 1570.00 trans/sec Throughput: 87.84 MB/sec Concurrency: 213.57 Successful transactions: 941105 Failed transactions: 167 Longest transaction: 21.17 Shortest transaction: 0.00	siege -i -c 1000 -t 10m http://localhost/x.php concurrency : 1000 Time : 10 min Load : 58 siege aborted due to excessive socket failure; you can change the failure threshold in $HOME/.siegerc Transactions: 45454 hits Availability: 96.86 % Elapsed time: 36.27 secs Data transferred: 2787.47 MB Response time: 0.19 secs Transaction rate: 1253.21 trans/sec Throughput: 76.85 MB/sec Concurrency: 240.04 Successful transactions: 45454 Failed transactions: 1475 Longest transaction: 9.37 Shortest transaction: 0.00

As you can see apache buckles its knees and stops responding at a concurrency of 500. The load shot upto 20 in just 50 seconds and there are lots of socket errors. Siege gave up the benchmark stating that there are too many errors. Whereas nginx+php-fpm runs well with a concurrency of 500 with 0 failed transactions. In fact when apache+modphp is aborted by siege in just 36 seconds due to excessive errors for a benchmark with concurrency of 1000. Nginx runs for the whole 10 minutes and with a success rate of 99.98%.

Without any doubt i can conclude that nginx with php-fpm is the web server for large scale websites.

Thursday, May 15, 2008

python on web - getting started : building your first app

I would be focusing here on the following tasks.

install pylons.

create a helloworld app using pylons.

install genshi.

change template engine.

deploy on apache using mod_wsgi.

Install Pylons

Run the following:

$ easy_install Pylons==0.9.6.1
OR
$ easy_install -f http://pylonshq.com/download/ Pylons==0.9.6.1

More details at http://wiki.pylonshq.com/display/pylonsdocs/Installing+Pylons

Create helloworld app

$ paster create -t pylons helloworld

A directory helloworld with the following structure is created

jayant@jayantbox:~/myprogs/python/helloworld$ ls -lh
total 44K
drwxr-xr-x 4 jayant jayant 4.0K 2008-04-25 17:47 data
-rwxr-xr-x 1 jayant jayant 1.5K 2008-05-15 10:41 development.ini
drwxr-xr-x 2 jayant jayant 4.0K 2008-04-25 17:47 docs
drwxr-xr-x 2 jayant jayant 4.0K 2008-04-25 17:47 ez_setup
drwxr-xr-x 9 jayant jayant 4.0K 2008-05-14 19:52 helloworld
drwxr-xr-x 2 jayant jayant 4.0K 2008-04-25 17:47 helloworld.egg-info
-rwxr-xr-x 1 jayant jayant 79 2008-04-25 17:47 MANIFEST.in
-rwxr-xr-x 1 jayant jayant 463 2008-04-25 17:47 README.txt
-rwxr-xr-x 1 jayant jayant 1.2K 2008-04-25 17:47 setup.cfg
-rwxr-xr-x 1 jayant jayant 865 2008-04-25 17:47 setup.py
-rwxr-xr-x 1 jayant jayant 507 2008-04-25 17:47 test.ini

To run the app, from within the helloworld directory do

jayant@jayantbox:~/myprogs/python/helloworld$ paster serve --reload development.ini
Starting subprocess with file monitor
Starting server in PID 15555.
serving on 0.0.0.0:5000 view at http://127.0.0.1:5000

Create a test.html in the <path_to_helloworld>helloworld/helloworld/public directory

<html>
<body>
Hello World!!
</body>
</html>

And point your browser to http://127.0.0.1:5000/test.html to see the "Hello World!!" page.

Now lets create a controller and try printing info through a template.

jayant@jayantbox:~/myprogs/python/helloworld$ paster controller hello
Creating /home/jayant/myprogs/python/helloworld/helloworld/controllers/hello.py
Creating /home/jayant/myprogs/python/helloworld/helloworld/tests/functional/test_hello.py

Edit the <path_to_helloworld>/helloworld/helloworld/controllers/hello.py file and put in some python code

import logging

from helloworld.lib.base import *

log = logging.getLogger(__name__)

class HelloController(BaseController):

    def index(self):
        # Return a rendered template
        # return render('/some/template.mako')
        # or, Return a response
        return 'Hello World'
    def serverinfo(self):
        import cgi
        import pprint
        c.pretty_environ = cgi.escape(pprint.pformat(request.environ))
        c.name = 'Jayant Kumar'
        return render('serverinfo.mako')

Now create a template serverinfo.mako in <path_to_helloworld>/helloworld/helloworld/templates

<h2>
Server info for ${request.host}
</h2>

<p>
The URL you called: ${h.url_for()}
</p>

<p>Hi there ${c.name or c.full_name or "Joe Smith"}</p>

<p>
The name you set: ${c.name}
</p>

<p>The WSGI environ:<br>
<pre>${c.pretty_environ}</pre>
</p>

Edit the <path_to_helloworld>/helloworld/helloworld/config/routing.py file and check if the following code there in the "CUSTOM ROUTES" section

map.connect('', controller='hello', action='index')
map.connect(':controller/:action/:id')
map.connect('*url', controller='template', action='view')

This means that an empty URL is matched to the index action of the hello controller. Otherwise, the route mapper looks for URLs in the form controller/action/id, but if action or controller are not specified the request is routed to the view action of the templates controller (created by the Pylons template). This raises a 404 error by default

Now go to http://localhost:5000/hello/hello and http://localhost:5000/hello/serverinfo and check out the output.

Installing genshi and switching the template engine

$ easy_install Genshi

To enable genshi in your project edit the <path_to_helloworld>/helloworld/helloworld/config/environment.py file and change the template engine from mako to genshi

    config.init_app(global_conf, app_conf, package='helloworld',
        template_engine='genshi', paths=paths)

Now create a new template in <path_to_helloworld>/helloworld/helloworld/templates, say serverinfo.html and put in the following code

<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:py="http://genshi.edgewall.org/"
lang="en">
<h2>
Testing...
</h2>

<p>Hi there ${c.name}</p>

<p>
The name you set: ${c.name}
</p>

<p>The WSGI environ:<br/>
<div py:content="c.pretty_environ">Pretty Environ</div>
</p>
</html>

And change your <path_to_helloworld>/helloworld/helloworld/controllers/hello.py to render the new template

        return render('serverinfo')

And check the output at http://localhost:5000/hello/hello and http://localhost:5000/hello/serverinfo

Deploy using mod_wsgi

For those who have not yet configured mod_wsgi, you can get mod_wsgi from http://code.google.com/p/modwsgi/.

Simply untar it and do a

./configure --with-apxs=/path/to/apache/bin/apxs
make
sudo make install

Bingo and you have the mod_wsgi.so file in your apache/modules directory

Change the httpd.conf and add the following

LoadModule wsgi_module modules/mod_wsgi.so

to load the wsgi module and then deploy the hello world application

WSGIDaemonProcess hello thread=25
WSGIScriptAlias /hello "/path/to/apache/anydirectory/hello.wsgi
<Location /hello>
        WSGIProcessGroup hello
        WSGIReloadMechanism Process
</Location>

This says that the application helloworld would run as a separate process with 25 threads. And since we have also enabled process reload mechanism available with mod_wsgi-2.0, all that is needed to restart/reload the application is to touch/change the wsgi script modification time.

Wait a minute, we did not create the hello.wsgi script. Create a directory in /path/to/apache or anywhere where apache has read access and where you want to keep your application startup scripts. So what i did was.

$ mkdir /path/to/apache/wsgi

And create a hello.wsgi inside it

cd /path/to/apache/wsgi
vim hello.wsgi

Add the following code here

import os, sys
sys.path.append('/path/to/python/application/helloworld')

from paste.deploy import loadapp

application = loadapp('config:/path/to/python/application/helloworld/development.ini')

And we are done. Please note that the /path/to/python/application should be readable and executable by apache. Or you can do (very unsafe - not recommended on production servers)

chmod -R a+rx /path

Now simply restart apache and point your browser to

http://localhost/hello/
and
http://localhost/hello/hello/serverinfo

To see the output.

Source:
http://wiki.pylonshq.com/display/pylonsdocs/Installing+Pylons
http://wiki.pylonshq.com/display/pylonsdocs/Getting+Started
http://wiki.pylonshq.com/display/pylonsdocs/Using+Other+Template+Languages
http://code.google.com/p/modwsgi/wiki/InstallationInstructions
http://code.google.com/p/modwsgi/wiki/IntegrationWithPylons

Python on web - getting started : selecting the tools

Why am i writing this? Well, because i felt that there are so many options, it is difficult to select an option for yourself. I was confused for quite some time in terms of what framework to use, what templating engine to use, which ORM layer to use and how to deploy the app on the web.

First of all, lets start of with the framework. The major frameworks are Django, TurboGears and pylons. Django is supposed to be the best framework from among the three. But Django is very tightly integrated to its template engine and ORM layer. And any sort of customization is not only difficult but also slows down the app. But it is good thing to start off python-on-web as a beginer. And its tight integration also accounts for its speed. It is the fastest framework among all of them. Take a look at some of the benchmarks on web here
http://www.rkblog.rk.edu.pl/w/p/simple-python-frameworks-benchmark/ And http://www.alrond.com/en/2007/feb/04/in-addition-to-the-test-of-mvc-frameworks/

Django is a good framework to use if you are building a small site with default settings. But when you need to have an enterprise level site with customizations, it is better to use something like Turbogears or Pylons. Turbogears is good and has "Turbogears Widgets" but it lacks proper documentation. TurboGears uses CherryPy whereas Pylons uses paste. I felt pylons to be much easier to use and easier to configure. Pylons has a very small base though it is not as fast as Django. There are lots of comparisons available on the net - just google for it. http://nxsy.org/unscientific-and-biased-comparison-of-django-pylons-and-turbogears gives a detailed comparison of different features of all the 3 frameworks.

After selecting Pylons as my framework, i have tons of options with respect to templating engine. I can use the default mako or kid or genshi or Cheetah (complete list available at http://projects.dowski.com/projects/buffet). I took a look on the performance benchmark and ease of use of all of them (from googlebaba ofcourse). And what i could figure out was that mako is the fastest template engine available. Genshi is a successor of kid and has a very clean xml-based integration. Which means that if i am using genshi, the html designer would not be able to see the code embedded within templates. So, i would prefer to use Mako or Genshi depending on the requirements. A list of different types of available template engines can be obtained from http://wiki.python.org/moin/Templating. Benchmarks are available here http://www.kuwata-lab.com/tenjin/pytenjin-users-guide.html#benchmark. Changing template engine for pylons is described here http://wiki.pylonshq.com/display/pylonsdocs/Using+Other+Template+Languages

I did not do any DB operations, so i did not go into selecting the ORM layer. Anyways, i always prefer to write my own queries instead of letting the ORM layer decide what query to fire. SQLAlchemy seems good though if you plan to use an ORM layer. SQLObject is also there, but it is old.

For deployment few options available are mod_wsgi, mod_python and the default paster. The paster server comes integrated with pylons and ofcourse cannot be used for production level deployment. mod_wsgi looked easy to use and is also supposed to be a small bit faster than mod_python. You can look at http://wiki.pylonshq.com/display/pylonscookbook/Production+deployment+using+mod_python for mod_python or http://code.google.com/p/modwsgi/wiki/IntegrationWithPylons or mod_wsgi integration.

Next we would look at building and deploying the "hello world" application