Use jsoniter for easy and fast json parsing in Scala
Although not as popular as json4s, circe, or spray, jsoniter’s simplicity and impressive performance benchmarks caught my teams attention. How easy was is it to integrate into an existing codebase? And was it really going to deliver on its promises in real projects? It sounded almost too good to be true, so we decided to give it a go!
"What if we just switched the library we used to parse JSON?"
We’ve asked ourselves this question during a during a team brainstorm, looking for quick performance wins.
In our team, we manage several streaming pipelines that handle tens of millions of JSON messages daily. Given this high throughput, we’re always looking for ways to reduce latency and improve efficiency. So, we asked ourselves: Could switching to Jsoniter be one of those low-effort, high-reward moves?
Our backfill processing, in particular, was where we hoped to see the biggest improvements. When replaying and reprocessing large amounts of historical data, the efficiency of JSON parsing can significantly impact the overall processing time.
Code Example: Integrating Jsoniter in Scala
Getting started couldn’t be easier. Here’s what you need to do:
-
Add the dependencies:
libraryDependencies ++= Seq( "com.github.plokhotnyuk.jsoniter-scala" %% "jsoniter-scala-core" % "jsoniterVersion", // The "compile-internal" scope is also supported "com.github.plokhotnyuk.jsoniter-scala" %% "jsoniter-scala-macros" % "jsoniterVersion" % "provided" )
-
Create a case class to represent the JSON structure:
case class Person(name: String, age: Int, address: Address) case class Address(street: String, number: Int)
-
Let Jsoniter derive a codec for you and parse the JSON input into your case class:
import com.github.plokhotnyuk.jsoniter_scala.macros._ import com.github.plokhotnyuk.jsoniter_scala.core._ // There is no need to derive intermediate codecs for the inner (nested) classes. given personCodec: JsonValueCodec[Person] = JsonCodecMaker.make val jsonString = """{"name":"Lucas","age":32,"address":{"street":"my-street","number":1}}""" val person = readFromString(jsonString) // -- BONUS -- The same codec can be used to encode the case class back into JSON string val backToJson = writeToString(person)
That’s it! You have successfully decoded - and encoded ;) - a JSON string into a case class.
Good to know: Jsoniter also provides methods to read or stream JSON directly from disk or network without transforming it into a string first.
-
Supported input types include:
Array[Byte]
,java.nio.ByteBuffer
, andjava.io.InputStream
/java.io.FileInputStream
.
Putting Jsoniter to the Test: Our Results
Performance Gains
After implementing Jsoniter, we were impressed by the results:
-
40% to 60% reduction in deserialization time, depending on the size of the JSON messages. The larger the JSON, the greater the improvement.
-
Deserializing a years worth of JSON data dropped from ~18.5 hours to ~7.4 hours in one of our busiest pipelines, cutting down 10% of the total backfilling time.
We did need to write some custom codecs to work around bugs and inconsistencies in the JSON messages data model (we don’t control the source of the JSONs). This likely incurred a slight performance penalty.
Unexpected Data Quality Bonus
An unexpected bonus came up when Jsoniter flagged parsing errors we hadn’t noticed before. Our previous library had sensible configuration defaults, but would sometimes silently ignore or fail to parse certain properties that didn’t match the expected format.
Jsoniter, by default, is very strict. If the JSON input doesn’t exactly match the expected case class, it fails fast and loudly. Of course, you can configure it to be more lenient, but you have to do so explicitly.
Points of Attention
Overall, we’re very positive about Jsoniter. However, there are a couple of areas that could be improved:
Error Messages
The error messages are not very user-friendly. They mostly consist of hexadecimal dumps indicating where unexpected tokens were found. While this is a deliberate choice by the library authors to prioritize performance and safety, it can be intimidating, especially for new users.
expected digit, offset: 0x0000004d, buf:
+----------+-------------------------------------------------+------------------+
| | 0 1 2 3 4 5 6 7 8 9 a b c d e f | 0123456789abcdef |
+----------+-------------------------------------------------+------------------+
| 00000020 | 69 22 3a 22 32 30 30 37 2d 31 32 2d 30 33 54 31 | i":"2007-12-03T1 |
| 00000030 | 30 3a 31 35 3a 33 30 2e 30 30 31 5a 22 2c 22 6c | 0:15:30.001Z","l |
| 00000040 | 64 22 3a 22 32 30 30 37 2d 31 32 2d 33 22 2c 22 | d":"2007-12-3"," |
| 00000050 | 6c 64 74 22 3a 22 32 30 30 37 2d 31 32 2d 30 33 | ldt":"2007-12-03 |
| 00000060 | 54 31 30 3a 31 35 3a 33 30 22 2c 22 6c 74 22 3a | T10:15:30","lt": |
Writing Custom Codecs
There’s no documentation on writing custom codecs. We had to dig through the source code, unit tests, and github issues to figure it out. This isn’t ideal and could definitely be improved.
Give Jsoniter a Try
If you’re looking for performance gains, you might be surprised by how much Jsoniter can deliver. We definitely were!