Performance Zone is brought to you in partnership with:

Geoff Papilion has made a living running infrastructure for the past 15 years. He is currently employeed at Wikia.com, scaling the infrastructure to 1.5 billion request per day. Geoffrey is a DZone MVB and is not an employee of DZone and has posted 26 posts at DZone. You can read more from them at their website. View Full User Profile

The Pitfalls of Web Caches

09.21.2012
| 5926 views |
  • submit to reddit

At Wikia we’ve written a lot of code and used many different tools to improve performance. We have profilers, we’ve used every major linux webserver, APC, xcache, memcache, redis, not to mention all the work we’ve put into the application. But nothing get us as much bang for our buck as our caching layer built around Varnish. We’ve used it to solve big and small problems, and for the most part it does its jobs and saves us hundreds of thousand per year in server costs, but that does not mean there aren’t problems. If you’re looking at using something like Varnish here a few things to keep in mind.

Performance Problems

When deploying your caching layer, remember you really haven’t “fixed” anything. You’re page loads will improve because you’ll skip the high cost of rendering a raw page, but if you have a cache miss the page will be as slow as it ever was. This can be important if you pass sessioned clients to your backend application servers, since they will suffer the slowest load times. So, use caching to cut load times, but also try to fix the underlying performance problems.

Rewrites

We use a lot of rewrites. They are extremely useful for normalizing URLs for purging, and to hide ugly URLs from end users; no one wants to go to wiki/index.php?title=Foo when the can just go to wiki/Foo. Unfortunately, applying this pattern leads to very complex logic, that relies on using the request URL as an accumulator. This makes for very difficult to understand code, since you can have multiple changes apply to one URL. Rewrites are also very difficult to test, since they are built into your caching server. Don’t forget that you’re estentially making you’re cache part of your application; there isn’t much difference between a rewrite in varnish and a route in rails.

If you have the choice, don’t put rewrites in your caching layer; it is very difficult to debug, and can be very fragile. Use redirects if at all possible, they may be slower, but are easier to figure out when something goes wrong.

Complicated Logic

Rewrites aren’t the only place you can make things complicated. Its very easy to build complex logic in something like Varnish. You can have several levels of if statements, and several points which you move on to the next step in the document delivery. This, just like rewrites, can lead to very difficult problems to debug, since you may have several conditions applied to the same request.

If you find yourself doing this, ask yourself if its needed? You may find that while its fast to implement in a caching layer, you may be better off building this into the application itself. Remember that caching layers are difficult to test, and you may not know that you logic is full working for at least one eviction cycle.

Wrapping it Up

My advice for using a web caching server is easy, keep it simple. Try and keep as much logic out of it as possible, don’t ignore your performance problems, and use it for what its best at, caching.

Published at DZone with permission of Geoffrey Papilion, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)