Configuring Varnish

At $WORK I’m currently working on deploying a pool of Varnish servers to sit in front of some Apache servers running Pressflow. On our current infrastructure we’ve been running Squid for the past few years with very good success , minus a hiccup or two along the way, one involving memory fragmentation (thank you tcmalloc). Varnish has a few nice features that Squid lacks.

  • The ability to PURGE objects using wildcards
  • Better support for multiple processors (Squid can benefit from multi procs when using AUFS)
  • Grace period that can be configured to serve objects from the cache after they’ve expired while fetching the new content from the backend. You can also use this to serve up stale content if your backend is down
  • Ships with a nice set of command line tools (varnishtop,varnishlog,varnishstat,varnishhist,etc…)
  • A very flexible scripting/configuration language (you can even do inline C if you’re feeling saucy) that allows you to manipulate the objects at any point in the request or response (See flow chart)
  • There are many others but these are just a few off the top of my head and I’m still discovering what other capabilities Varnish has. The site has not gone live yet so I’m still testing on a dev version of the site and have not had an opportunity to perform any load testing yet. So far with my current working configuration I’ve made the following tweaks

  • Stripped cookies off static objects
  • Stripped Google analytics cookies
  • Removed empty cookies
  • Configured a graceful period to serve up stale objects from cache
  • Added a debugging header to show weather the object was a cache HIT or MISS
  • The use of mod_expires on the Apache backend controls cache times for static assets (css,js,images,etc..). In my googling around when reading about Varnish I see a lot of people are setting cache times in their VCLs. IMO you should be letting the backend or application itself control the TTLs on objects. Within your application you can set more defined TTLs for certain sections of your site or even certain types of dynamic content without having to rely on complex VCL rules or deal with the deployment of the rules into Varnish. While Varnish does support a “graceful” style restart, its not quite as eloquant as doing service apache graceful. Kristian Lyngstol (one of the Varnish devs) has a good post on his blog on dealing with this. Also with the use of mod_expires you can set TTLs based on MIME-type within Apache.

    One other thing I see a lot of people blindly recommending in configurations to deal with Varnish’s behavior of not caching cookies is to take the cookie value and add it into Varnish’s hash of the object. e.g.

    sub vcl_hash {
    set req.hash += req.http.cookie;
    }

    If a light bulb just went off in your head as to why this is a bad idea, kudos to you. What you’re basically doing is creating a cache per-user on your Varnish server. Your hit ratio will plummet from this config. There are scenarios where this can be used in a good way. In talking with some folks in #varnish on irc.linpro.no, a scenario where you’d want this is if say you had a cookie that was a display filter on your site or some sort of site customization that didn’t have a large number of combinations.

    One thing that bothers me about Varnish currently is that it’s admin interface is completely unsecured. By default it listens on localhost but without any authentication, anyone with a shell on your Varnish box can bring down your Varnish instance or modify the config in anyway they feel fit. For those that allow dev’s on production servers to debug logs, this is a bit of a security concern. I’m not really sure of a workaround for this, so if anyone has any ideas, leave it in the comments below.

    If you use Cacti for trending, there are some great templates available over at the cacti forums. They utilize a python script that needs access to the admin interface.

    I’ll probably post some more in the future on Varnish as I do further reading and testing with it.

    Thinkpad Trackpoint sensitivity on Ubuntu

    A while back I found some notes on configuring the sensitivity of my trackpoint on my Thinkpad T43 and took the time to tweak the values to get it just right. The commands were

    /bin/echo -n 171 > /sys/devices/platform/i8042/serio1/serio2/sensitivity
    /bin/echo -n 119 > /sys/devices/platform/i8042/serio1/serio2/speed

    In order to keep those values the same on reboot, I placed those commands in /etc/rc.local. I rebooted and… values got reset. After struggling a bit and just giving up on the issue, I ended up just making a shell script that I would execute on boot each time (crappy solution). Finally I got annoyed with the issue enough and researched it again some more and stumbled uponthis post on Ubuntu Forums. A couple of things I learned from that…

    1. The proper ways to set the values in the sysfs is to use udev rules, Writing udev rules
    2. Even if I used a udev rule, there appears to be an issue where the device does not exist yet at the time the rule is processed to set the value for the device

    Solution (per forum post) create a file /etc/udev/rules.d/trackpoint.rules and place the following in

    SUBSYSTEM=="serio", DRIVERS=="psmouse", ATTR{description}=="Synaptics pass-through", WAIT_FOR="/sys/devices/platform/i8042/serio1/serio2/sensitivity" ATTR{sensitivity}="171", ATTR{speed}="119"

    Check for missing reverse DNS entries on network

    A quick way to check your network for IPs missing reverse DNS entries….

    nmap -PE -sP 10.0.0.0/24 | awk '{if ($2 ~ /^[[:digit:]]/ ) print $2}'

    Support for authorized-keys.d/

    Why is there no subdir inside .ssh called authorized-keys.d where I can just throw my ssh keys and easily manage them by file name instead of having to edit the authorized-keys(2) file?

    I need to do some googling on this , a quick search yields this debian bug report on wishing for support for one.

    Python Quote module

    Over the past couple of years I’ve been wanting to learn Python more seriously but really haven’t sat down and just done it.   I’ve written a couple of various scripts for personal and work use but always felt they weren’t coded in a “pythonic” way.  I’m now reading Learning Python (4th edition) from the beginning and making sure I learn things properly from the ground up.  When I was writing my code in the past I wasn’t aware of what objects were immutable vs mutable or how generators worked.  I basically knew what I wanted to accomplish before writing my code and would reference the online documentation and just go at it.  Overtime I picked up on some “pythonic” methods from looking at code examples, for example, using optparse for handling arguments passed into your program.

    My first project I decided to tackle as I go through my book is writing a module that will fetch stock quotes from Yahoo’s Finance page and store them in memcached.  I just pushed it out to github available @ http://github.com/jlintz/python_quote.  I’m hoping to get some feedback from some people on where I can improve in my code and hopefully it will be useful to someone else out there.  As I go through the book I plan on going back and looking at the code and see what I can refactor and I’m sure along the way I’ll probably re-architect things.

    One thing I know I need to do is write some unit tests for the module.   When I was in my Computer Science program in college, we really never had any exposure to unit tests.  The unit tests consisted of assert statements and really never had any real lessons on them.  I need to do more reading on unittest.  I know many developers write their unit tests first before writing a single line of code in a test driven development approach.  It’s something I want to look at more and see if I should consider picking that up.

    Also as part of this coding project, it was a good excuse to get some exercise with Git.  It’s pretty easy to work with and I haven’t even scratched the surface of its potential.  I just need to begin looking at some of its more advanced features and incorporating them into my work flow.

    Vim is another program I’m getting some finger exercise in.  When I started my job 3 years ago I had never really opened Vim/Vi but one day forced myself to learn it because as a sysadmin, Vi will always be there for you, like your friend Stewie.  In my day to day uses of Vi, I just used the basics, hjkl and :wq.  Slowly I picked up some commands from co-workers and got used to using them daily.  When writing code in Vim , there are a lot of tricks and commands to aide you in writing code , I’ve read about many but now it’s a matter of incorporating them so I don’t have to think much and breaking my old habits to use the new commands I’ve read about.