Local vs Distributed caching

I’ve been thinking a lot about caching recently, after reading this great write-up from Nick Craver on how they do it over at StackOverflow.

One thing that struck me is how much faster a lookup against local memory is compared to one against a distributed cache. Assuming you can do that Redis lookup inside of .5ms, that is upto 5000x faster. From the performance side, it’s a no brainer. Local cache all the things! But then my pessimist personality kicks in and I started thinking about what the downsides are of doing so.

Distributed caching introduces inconsistency as a failure state between what data our source of truth holds (the database) and what data our code has access to. Local caching introduces a much more insidious version of this failure state, where the inconsistency can happen across hosts rather than across layers of our stack. This is IMO a much more significant and undesirable type of inconsistency. “Layer-level” inconsistency is actually fairly easy to debug and deal with. My code says “a”, the database says “b”. We purge the entry in the distributed cache and the problem is solved. Because the inconsistency is global, it is very easy to reason about which users/requests would have been affected. This makes potential remediation a lot easier, as it’s just based on a window of time between when the database was updated and when you purge the stale cache entries.

Host-level inconsistency by definition only happens some % of the time, based on the number of hosts that fell out of sync vs the ones that didn’t. This produces an inconsistent experience for your users, where they can hit your API/site 10 times and get an incorrect result 1 or 2 those times for example. Because it doesn’t surface as easily, it may go undetected and unresolved for longer. When we do identify and try to fix it, it’s quite difficult to know who would have been affected unless we are logging every single HTTP request in full and which host the load balancer routed it to.

You can use pub/sub mechanisms to communicate refreshes/invalidations to all hosts, but you still have a period of inconsistency between when the write happens and when every single host has received the invalidation. Such mechanisms also don’t work for functions as a service runtime environments such as Lambda, where containers that run your code are paused and resumed to prevent cold starts. This could be mitigated somewhat if Lambda populated the invocation context with some flag that indicates that the container running your code was “unpaused” to service this invocation, though you’d still need to have some manual priming step before that request to ensure it was running against a correct view of the world.

Local caching is memory-inefficient. If each machine in the system needs to cache item A for the system to be consistent, then we need to store N copies of the data instead of 1. As the amount of data we cache and the number of “hosts” running our code grows, so does the total amount of memory we require to keep them all consistent. Minimizing the memory footprint of our programs is becoming extremely important in a containerized everything world. The smaller the memory footprint your server has, the smaller the “cracks” of under-utilization it can fill.

Beyond the containerization angle, network IO latency has been decreasing much faster than memory latency has in recent years. This post shows an average 0.09ms round-trip inside the same AZ which is almost 2x faster than the 0.17ms round-trip quoted by Nick Craver in his post. According to this post, a typical 200-300m datacentre round trip takes 1 microsecond at the speed of light so it seems feasible that 0.09ms AWS round-trip could be shaved down by an order of magnitude. While it will always be slower to go over the wire, these advances in networking do make the trade-off discussion a lot more interesting.

In conclusion, Distributed-caching is slower but it provides better consistency guarantees. It’s inherent memory efficiency allows you to scale your code out using FaaS runtimes in a cost-effective manner. Yes, there is an additional cost to running the cache but it nets out to a saving by avoiding consistency bugs, requiring less memory per executing host and not requiring any pub/sub mechanism to keep the hosts consistent. In my opinion it should be the default-mode of caching, and we should only drill-down to local when the trade-offs are worth it.