About Jeroen Resoort

Jeroen Resoort is an Enterprise Java Developer at JDriven. He has over 10 years of experience writing enterprise applications. He is passionate about writing good code, learning new stuff and sharing his knowledge with others.

Running SonarCube on your laptop using Docker

I recently wanted to do some source code analysis and found it difficult to find a good eclipse plugin. Luckily, it’s now very easy to get your own SonarCube server running. Basically you only need a docker installation and a few simple steps.

To start a SonarQube instance you run the following command:

Starting the SonarQube server will take several minutes. After it has started, you can generate a sonar report of your maven application with the followng command:

If all goes well, you’ve just created your first report and can access it on localhost on port 9000! Because the local SonarQube server stores every analysis in an internal H2 database you can even see what has changed since the last run. Have fun!

sonarcube-issues

It seems that this project needs some attention…

Please note that this setup is not recommended for production. If you want to know more, checkout the SonarQube docker page.

Building your own self refreshing cache in Java EE

If you have read my previous post about caching, The (non)sense of caching, and have not been discouraged by it, I invite you to build your own cache. In this post we will build a simple cache implementation that refreshes its data automatically, using Java EE features.

Context: A slow resource

Let’s describe the situation. We are building a service that uses an external resource with some reference data. The data is not frequently updated and it’s allright to use data that’s up to 1 hour old. The external resource is very slow, retrieving the data takes about 30 seconds.

Our service needs to respond within 2 seconds. Obviously we can’t call the resource each time we need it. To solve our problem we decide to introduce some caching. We are going to retrieve the entire dataset, keep it in memory and allow retrieval of specific cached values by their corresponding keys.

Step 1: Getting started

A very basic implementation of a key-value cache is a (java.util.)Map, so that’s where we’ll start. One step at a time we will extend this implementation untill we have a fully functional cache.

Step 2: Populating the cache

We will inject a bean that serves as a facade to our slow external resource. To keep things simple in this example, the bean returns a list of SomeData objects that contain a key and a value.

Step 3: Keeping state between requests

Now we can populate the cache we need to keep the state so that future requests can also make use of it. That’s where we use the Singleton bean. A singleton session bean is instantiated once per application and exists for the lifecycle of the application, see also the JEE tutorial page about Session Beans.

Note that, when we run our application in a clustered environment, each instance of our application will have its own Singleton bean instance.

Note that, when we run our application in a clustered environment, each instance of our application will have its own Singleton bean instance.

Step 4: Populating the cache before first use

We can use the @PostConstruct annotation to fill the cache with the reference data when the bean is created. If we want the cache to load at application startup instead of on first access, we use the @Startup annotation.

Step 5: Accessing the cached data

To make the data available, we create a public method getData, that will retrieve the cached value by its key.

Step 6: Refreshing the cache periodically

As the cached data becomes outdated over time, we want to refresh the het dataset automatically after a specified time period. JEE offers a solution with automatic timers. See also the JEE tutorial page about the Timer Service. We configure the timer to be not persistent.

Step 7: Manage concurrency

Finally, we need to make sure concurrency is handled correctly. In JEE, you can do this either Container-Managed or Bean-Managed. For Singleton Session Beans the default is Container-Managed Concurrrency with a Write Lock on each public method. Whenever the bean is called, all subsequent calls will be held until the lock is released. This is safe, even if you are modifying the data, hence the name Write Lock.

We can improve on this by allowing concurrent read acces on methods that are only reading data, in our case the getData method. To do that we add the @Lock(LockType.READ) annotation. This way, calls to the getData method are only held when a method with a Write Lock is being accessed.

(In our simple case we could get away without any locking at all because updating the object reference of our instance variable cache in the populateCache method is an atomic operation, but in practice you don’t want to depend on implementation details of the populateCache method.)

For more information about Container-Managed Concurrency check the JEE tutorial page about Managing Concurrent Access.

Practical use

Above example code is perfeclty usable, but there are several things to consider:

  • In the example we load the entire dataset into memory. This is only feasable if the dataset is not too big, e.g. a list of Currency Conversion Rates.
  • When you deploy on multiple servers, each of them will have its own cache. Because they will each be refreshing independently, they might not hold the exact same dataset. This might confuse the users of your application.

Conclusion

We have created a simple cache with a minimal amount of code. By making use of built-in Java EE features, a lot of complex tasks are managed by the JEE container, making our job easier.

Mission to Mars follow up

Last week I presented my talk ‘MISSION TO MARS: EXPLORING NEW WORLDS WITH AWS IOT’ at IoT Tech Day 2016 and it was great fun! In the presentation I showed how to build a small robot and control it over MQTT messaging via Amazons IoT platform. The room was packed and the demo went well too.

mission_to_mars_presentation

I promised to share some info about it on my blog so here we are. I’ve composed a shopping list and a collection of useful links:
Mission to Mars – Shopping list
Mission to Mars – Useful links

The original presentation is available here:
Mission_to_Mars-Jeroen_Resoort-IoT_Tech_Day.pdf

So what’s next? I should publish my Pi robot and Mission Control Center web client code on github. Maybe I’ll extend the python code for controlling the mBot over a serial connection and make a proper library for it. Will keep you updated…

The (non)sense of caching

I have seen several projects where the developers had implemented caching all over the place. Caches were causing a large increase of heap usage, and users were always complaining that they were not seeing the latest data.

My opinion on this is that a decision to add caching should not be taken lightly. Adding a cache means adding a lot of additional (or so-called accidental) complexity and also has a functional impact on the users.

Adding a cache raises a lot of questions that need to be answered:

  • What if cached data is updated, should the cached record be updated or evicted too?
  • What should we do in a distributed environment, use a distributed cache? Is this distributed cache scalable?
  • Do we get the performance improvements we’re expecting?
  • What is an acceptable delay for users to see the updated data?
  • How many elements should we store in the cache?
  • What eviction policy do we need when not all data fits in the cache?

Why would you want to add a cache?

  • To improve response times
  • To reduce server load and/or increase maximum server throughput

Alternatives to caching

So you want to improve performance? Caching is not your only option here. There are a lot of alternatives to caching when it comes to performance improvements, across all layers of your system. Some examples:

  • add database indices
  • optimize queries
  • use a more efficient architecture for reads: skip ORM and doing fast lane reader data access or use CQRS
  • code improvements on areas of code that are executed most often
  • fix obvious inefficiency mistakes (using list instead of set / map for example)
  • fix ‘N+1 queries to the database’ problem
  • solve network bandwidth issues by compressing http responses

Knowledge is key here. Always measure where the most time is spent (processing time * number of invocations). That is where you can potentially make the largest improvements. And measure again after making a modification to make sure it makes a difference.

Disadvantages of adding a cache

  • adding a lot of complexity
  • dependency on additional frameworks, need for additional configuration
  • (heap) memory usage increase
  • users may see stale data

Ideal areas for caching

  • When data is frequently accessed, but not often updated
  • When it does not hurt if a user sees data that is (slightly) outdated
  • In case of aggregated data, for example a lot of rows in the database, but simple representation on the client side. If your application has layers, cache close to the client.

The conclusion is that you should be careful adding caching to your application. Caching is no silver bullet. Only in certain situations it is the right way to go. Are you having performance issues? Can’t they be easily fixed by some other approach? Is it acceptable to users that they see outdated information? Maybe you should give caching a try… In any other case, don’t say I didn’t warn you…