Use jsoniter for easy and fast json parsing in Scala

Although not as popular as json4s, circe, or spray, jsoniter’s simplicity and impressive performance benchmarks caught my teams attention. How easy was is it to integrate into an existing codebase? And was it really going to deliver on its promises in real projects? It sounded almost too good to be true, so we decided to give it a go!

"What if we just switched the library we used to parse JSON?"

We’ve asked ourselves this question during a during a team brainstorm, looking for quick performance wins.

In our team, we manage several streaming pipelines that handle tens of millions of JSON messages daily. Given this high throughput, we’re always looking for ways to reduce latency and improve efficiency. So, we asked ourselves: Could switching to Jsoniter be one of those low-effort, high-reward moves?

Our backfill processing, in particular, was where we hoped to see the biggest improvements. When replaying and reprocessing large amounts of historical data, the efficiency of JSON parsing can significantly impact the overall processing time.

Code Example: Integrating Jsoniter in Scala

Getting started couldn’t be easier. Here’s what you need to do:

Add the dependencies:

libraryDependencies ++= Seq(
  "com.github.plokhotnyuk.jsoniter-scala" %% "jsoniter-scala-core"   % "jsoniterVersion",
  // The "compile-internal" scope is also supported
  "com.github.plokhotnyuk.jsoniter-scala" %% "jsoniter-scala-macros" % "jsoniterVersion" % "provided"
)

Create a case class to represent the JSON structure:

case class Person(name: String, age: Int, address: Address)
case class Address(street: String, number: Int)

Let Jsoniter derive a codec for you and parse the JSON input into your case class:

import com.github.plokhotnyuk.jsoniter_scala.macros._
import com.github.plokhotnyuk.jsoniter_scala.core._

// There is no need to derive intermediate codecs for the inner (nested) classes.
given personCodec: JsonValueCodec[Person] = JsonCodecMaker.make

val jsonString = """{"name":"Lucas","age":32,"address":{"street":"my-street","number":1}}"""

val person = readFromString(jsonString)

// -- BONUS -- The same codec can be used to encode the case class back into JSON string
val backToJson = writeToString(person)

That’s it! You have successfully decoded - and encoded ;) - a JSON string into a case class.

Good to know: Jsoniter also provides methods to read or stream JSON directly from disk or network without transforming it into a string first.

Supported input types include: Array[Byte], java.nio.ByteBuffer, and java.io.InputStream/java.io.FileInputStream.

Putting Jsoniter to the Test: Our Results

Performance Gains

After implementing Jsoniter, we were impressed by the results:

40% to 60% reduction in deserialization time, depending on the size of the JSON messages. The larger the JSON, the greater the improvement.
Deserializing a years worth of JSON data dropped from ~18.5 hours to ~7.4 hours in one of our busiest pipelines, cutting down 10% of the total backfilling time.

We did need to write some custom codecs to work around bugs and inconsistencies in the JSON messages data model (we don’t control the source of the JSONs). This likely incurred a slight performance penalty.

Unexpected Data Quality Bonus

An unexpected bonus came up when Jsoniter flagged parsing errors we hadn’t noticed before. Our previous library had sensible configuration defaults, but would sometimes silently ignore or fail to parse certain properties that didn’t match the expected format.

Jsoniter, by default, is very strict. If the JSON input doesn’t exactly match the expected case class, it fails fast and loudly. Of course, you can configure it to be more lenient, but you have to do so explicitly.

Points of Attention

Overall, we’re very positive about Jsoniter. However, there are a couple of areas that could be improved:

Error Messages

The error messages are not very user-friendly. They mostly consist of hexadecimal dumps indicating where unexpected tokens were found. While this is a deliberate choice by the library authors to prioritize performance and safety, it can be intimidating, especially for new users.

expected digit, offset: 0x0000004d, buf:
+----------+-------------------------------------------------+------------------+
|          |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f | 0123456789abcdef |
+----------+-------------------------------------------------+------------------+
| 00000020 | 69 22 3a 22 32 30 30 37 2d 31 32 2d 30 33 54 31 | i":"2007-12-03T1 |
| 00000030 | 30 3a 31 35 3a 33 30 2e 30 30 31 5a 22 2c 22 6c | 0:15:30.001Z","l |
| 00000040 | 64 22 3a 22 32 30 30 37 2d 31 32 2d 33 22 2c 22 | d":"2007-12-3"," |
| 00000050 | 6c 64 74 22 3a 22 32 30 30 37 2d 31 32 2d 30 33 | ldt":"2007-12-03 |
| 00000060 | 54 31 30 3a 31 35 3a 33 30 22 2c 22 6c 74 22 3a | T10:15:30","lt": |

Writing Custom Codecs

There’s no documentation on writing custom codecs. We had to dig through the source code, unit tests, and github issues to figure it out. This isn’t ideal and could definitely be improved.

Give Jsoniter a Try

If you’re looking for performance gains, you might be surprised by how much Jsoniter can deliver. We definitely were!