Building your own self refreshing cache in Java EE

If you have read my previous post about caching, The (non)sense of caching, and have not been discouraged by it, I invite you to build your own cache. In this post we will build a simple cache implementation that refreshes its data automatically, using Java EE features.

Context: A slow resource

Let's describe the situation. We are building a service that uses an external resource with some reference data. The data is not frequently updated and it's allright to use data that's up to 1 hour old. The external resource is very slow, retrieving the data takes about 30 seconds. Our service needs to respond within 2 seconds. Obviously we can't call the resource each time we need it. To solve our problem we decide to introduce some caching. We are going to retrieve the entire dataset, keep it in memory and allow retrieval of specific cached values by their corresponding keys.

Step 1: Getting started

A very basic implementation of a key-value cache is a (java.util.)Map, so that's where we'll start. One step at a time we will extend this implementation untill we have a fully functional cache.

public class MyFirstCache {
    private Map cache;
}

Step 2: Populating the cache

We will inject a bean that serves as a facade to our slow external resource. To keep things simple in this example, the bean returns a list of SomeData objects that contain a key and a value.

@Inject
MySlowResource mySlowResource;

private Map<String, String> createFreshCache() {
    Map<String, String> map = new HashMap<>();
    List<SomeData> dataList = mySlowResource.getSomeData();
    for (SomeData someData : dataList) {
        map.put(someData.getKey(), someData.getValue());
    }
    return map;
}

Step 3: Keeping state between requests

Now we can populate the cache we need to keep the state so that future requests can also make use of it. That's where we use the Singleton bean. A singleton session bean is instantiated once per application and exists for the lifecycle of the application, see also the JEE tutorial page about Session Beans. Note that, when we run our application in a clustered environment, each instance of our application will have its own Singleton bean instance.

@Singleton
public class MyFirstCache { }

Note that, when we run our application in a clustered environment, each instance of our application will have its own Singleton bean instance.

Step 4: Populating the cache before first use

We can use the @PostConstruct annotation to fill the cache with the reference data when the bean is created. If we want the cache to load at application startup instead of on first access, we use the @Startup annotation.

@Startup
@Singleton
public class MyFirstCache {

    @PostConstruct
    private void populateCache(){
        cache = createFreshCache();
    }
}

Step 5: Accessing the cached data

To make the data available, we create a public method getData, that will retrieve the cached value by its key.

public String getData(String key){
    return cache.get(key);
}

Step 6: Refreshing the cache periodically

As the cached data becomes outdated over time, we want to refresh the het dataset automatically after a specified time period. JEE offers a solution with automatic timers. See also the JEE tutorial page about the Timer Service. We configure the timer to be not persistent.

@Schedule(minute = "\*/30", hour = "\*", persistent = false)
@PostConstruct
private void populateCache(){
    cache = createFreshCache();
}

Step 7: Manage concurrency

Finally, we need to make sure concurrency is handled correctly. In JEE, you can do this either Container-Managed or Bean-Managed. For Singleton Session Beans the default is Container-Managed Concurrrency with a Write Lock on each public method. Whenever the bean is called, all subsequent calls will be held until the lock is released. This is safe, even if you are modifying the data, hence the name Write Lock. We can improve on this by allowing concurrent read acces on methods that are only reading data, in our case the getData method. To do that we add the @Lock(LockType.READ) annotation. This way, calls to the getData method are only held when a method with a Write Lock is being accessed.

@Lock(LockType.READ)
public String getData(String key){
    return cache.get(key);
}

(In our simple case we could get away without any locking at all because updating the object reference of our instance variable cache in the populateCache method is an atomic operation, but in practice you don't want to depend on implementation details of the populateCache method.) For more information about Container-Managed Concurrency check the JEE tutorial page about Managing Concurrent Access.

Practical use

Above example code is perfeclty usable, but there are several things to consider:

In the example we load the entire dataset into memory. This is only feasable if the dataset is not too big, e.g. a list of Currency Conversion Rates.
When you deploy on multiple servers, each of them will have its own cache. Because they will each be refreshing independently, they might not hold the exact same dataset. This might confuse the users of your application.

Conclusion

We have created a simple cache with a minimal amount of code. By making use of built-in Java EE features, a lot of complex tasks are managed by the JEE container, making our job easier.