The (non)sense of caching

I have seen several projects where the developers had implemented caching all over the place. Caches were causing a large increase of heap usage, and users were always complaining that they were not seeing the latest data.

My opinion on this is that a decision to add caching should not be taken lightly. Adding a cache means adding a lot of additional (or so-called accidental) complexity and also has a functional impact on the users.

Adding a cache raises a lot of questions that need to be answered:

  • What if cached data is updated, should the cached record be updated or evicted too?
  • What should we do in a distributed environment, use a distributed cache? Is this distributed cache scalable?
  • Do we get the performance improvements we’re expecting?
  • What is an acceptable delay for users to see the updated data?
  • How many elements should we store in the cache?
  • What eviction policy do we need when not all data fits in the cache?

Why would you want to add a cache?

  • To improve response times
  • To reduce server load and/or increase maximum server throughput

Alternatives to caching

So you want to improve performance? Caching is not your only option here. There are a lot of alternatives to caching when it comes to performance improvements, across all layers of your system. Some examples:

  • add database indices
  • optimize queries
  • use a more efficient architecture for reads: skip ORM and doing fast lane reader data access or use CQRS
  • code improvements on areas of code that are executed most often
  • fix obvious inefficiency mistakes (using list instead of set / map for example)
  • fix ‘N+1 queries to the database’ problem
  • solve network bandwidth issues by compressing http responses

Knowledge is key here. Always measure where the most time is spent (processing time * number of invocations). That is where you can potentially make the largest improvements. And measure again after making a modification to make sure it makes a difference.

Disadvantages of adding a cache

  • adding a lot of complexity
  • dependency on additional frameworks, need for additional configuration
  • (heap) memory usage increase
  • users may see stale data

Ideal areas for caching

  • When data is frequently accessed, but not often updated
  • When it does not hurt if a user sees data that is (slightly) outdated
  • In case of aggregated data, for example a lot of rows in the database, but simple representation on the client side. If your application has layers, cache close to the client.

The conclusion is that you should be careful adding caching to your application. Caching is no silver bullet. Only in certain situations it is the right way to go. Are you having performance issues? Can’t they be easily fixed by some other approach? Is it acceptable to users that they see outdated information? Maybe you should give caching a try… In any other case, don’t say I didn’t warn you…