Every now and then @Ruzzie and I talk about our pet projects and the issues we encounter. I really like those talks, as most of the time we balance on the line of things that are not common in programming.
So just the other day @Ruzzie and I where talking about a challenge he was having with one of his pet projects and the caching of some frequent operations. The requirements, maybe not uncommon, where the following:
- Fixed size, everything is done in memory, so control over maximum size is needed to prevent out of memory errors
- Multithreaded, multiple readers and writers are accessing the cache
- Fast as hell
- Newer added values are expected to be read more frequently
On top of that we concluded we had some loose semantics about the following topic:
- Doesn't need to be LIFO, happy with 'good enough'
Our first thoughts about this came in terms data structures like ringbuffers or linkedbuffers. A ringbuffer has the advantage that it can be implemented as an array which reduces the overhead of nodes that a linkedbuffer has. A linkedbuffer is a bit easier to implement because we can't hit the boundaries of readers and writers. But both structures have the same drawbacks:
- Contention; insertion of items leads to contention as multiple threads 'fight' over the insertion point
- O(n) lookup, lookup of keys is done by iteration
- Difficult and costly to guard against duplicate keys
So we both concluded that we'd rather use a hash based solution as that has O(1) lookup. But what about expiring items? How would we know which entry to expire if the cache hits its limit? We decided we actually didn't care. It could just be any key because we didn't care about LIFO orderings.
So the resulting idea is quite simple. We define a fixed array with Key Value pairs. We store the pairs simply like any hash based solution based on the hash of the key. But, we don't allow multiple keys for the same hash, as opposed to a 'real' hashmap/dictionary.
Looking up a value is nothing more than jumping to te right position in the array based on the hash and comparing whether the keys match. If not, and here's the kicker, we simply override the current present key with a new value. In terms of concurrency, the only synchronization we have to do is make sure the the reads and writes to and from the array follow volatile semantics. No need for locks and thus threads to block ad no spinlocks or CAS loops.
This of course, comes with a price, some entries can become really old and never overwritten without ever being used. And new values can be overridden fairly quickly. So make sure that:
- You tune the size of the array: too big is waste of space, too small and entries get overridden too quickly.
- Your hash method is of good quality so that it spreads naturally over the array.
And furthermore we concluded it's best to use an array that has a size that is a power of two. It avoids modulus operations and simply can use a bitwise and operation for determining the position in the array.
Al that was left is give it a name, so for now we decided to name it FlashCache, mainly because it rhymes and because of the volatile nature of entries.