Code Name Velocity

4 June 2008 –

At TechEd 2008 (which unfortunately I was not able to attend, but a few colleagues are providing me continuous updates), Microsoft announced a lot of things that are really important to me. New breakthroughs in voice recognition are always cool, but I am a web guy at my core so that's what turns my crank.

I have a few side projects that need a distributed in-memory cache. These projects are all in .NET so my obvious choices were SharedCache, ScaleOut StateServer, NCache or memcached Win32 with one of several .NET clients. Memcached is a lot more free than the latter two and more well-regarded than the first, so last week I started a little personal project of porting memcached to .NET. My C is pretty rusty but relatively speaking memcached is pretty simple, and it seems like a managed version could perform quite well. Since I know c# better than English, it makes enhancements and customizations to my caching layer much easier.

Thankfully I didn't burn too much time on memcached.NET before Microsoft announced Velocity - a distributed cache written to natively support the .NET stack. After digging in to the CTP code, I came away very impressed. In the past I've viewed the caching layer of an application very simplistically from a usage perspective - a persistent dictionary of keys and frequently used objects. Distributing it across multiple machines adds complexity to the service but not the consumers. Velocity has some features with huge implications in the kind of value a cache can deliver:

  • Lookups by tag. This means one thing to me: multi-dimensional object keying. No more ugly pointers or multiple copies of an object per key. This increases the value of the caching layer by orders of magnitude over a simple key-value pair table.
  • In-process storage. With a typical cache service, our cache will store a serialized copy of User::Bob in memory. Bob is very popular in our application, so his user object is deserialized into the application process on almost every request. This is wasteful and contributes to memory pressure since we are essentially storing the same object at least twice - sometimes more. In-process storage allows heavily used items like User::Bob to stay in the application process space instead of being serialized to the cache service.
  • OOTB Session Provider. Velocity includes a SessionStateStoreProvider implementation out of the box. A high performance, high availability state server for free without putting any pressure on your SQL box (which is probably why you are considering Velocity in the first place).
  • Atomic operations & concurrency management. Optimistic locking with versioning opens the door to much more sophisticated functionality than would otherwise be practical with the typical pessimistic locking. I predict applications that are able to align themselves with actual usage patterns much more intelligently.

As I noted in the beginning, memcached is pretty simple. And as my short list of highlights suggests, Velocity is anything but. Although memcached and Velocity solve the same problem - offloading high-volume requests from the data tier to an in-memory intermediary - they serve surprisingly different purposes. Scott Watermasysk summarizes the distinction neatly:

On one hand you have Memcached which treats the cache as something you should never rely on. It is there to help but you should always assume it is going to fail on you and even more importantly (to Memcached) you should accept that as a fact. If you read the Memcached FAQ you can almost here the author laughing when talking about fault tolerance. On the other side of the fence you have features like replication and high availability.

The practical implications are significant. What kind of service are you running? What kind of data are you moving? Who is your audience? These are very high level architectural questions that help shape whether to go with a memcached-like solution or a Velocity-like solution. Jim Benedetto, in an interview with Baseline Magazine, described MySpace's answer to these questions:

100% reliability is not necessarily [the] top priority. "That's one of the benefits of not being a bank, of being a free service"...on MySpace the occasional glitch might mean the Web site loses track of someone's latest profile update, but it doesn't mean the site has lost track of that person's money. "That's one of the keys to the Web site's performance, knowing that we can accept some loss of data"

Usually my default reaction such an idea is "no way!" Loosely coupling system components to the point where data in one layer might not make it to the next layer every time makes me feel very uncomfortable. Embracing discomfort is necessary for growth though, so let's give it a big bear hug and dive in. The original purpose of memcached was to support LiveJournal. In this environment, a failure rate of the caching layer might just mean an extra few round trips to the database. As long as the caching layer works some of the time, we still see a benefit. But then issues start to get sticky. What if the failure comes when we try to invalidate a cache item because a journal post was updated? The old version will be exposed to the public longer than intended. Is this acceptable behavior? Obviously I don't know the answer to that for LiveJournal's case. But the point is significant - how does reliability affect the business value of your application? For MySpace, responding promptly to a hundred million requests and failing a percentage of those delivers greater value than rolling service brownouts or throttling everyone.

Ultimately, the Velocity announcement has brought these questions to the forefront of my mind. I'll be relaxing in Ireland for the next week and a half, but I'm sure I'll be thinking about how to answer these questions for my apps. Make sure to do the same for yours.

Comments are closed for this entry
© 2002-2009 Rex Morgan.
Content available under a Creative Commons license.
Site code and design may not be reproduced.