- A side by side example using JMH and JLBH for Date serialisation
- Measuring Date serialisation in a microbenchmark
- Measuring Date serialisation as part of a proper application
- How to add a probe to your JLBH benchmark
- Understanding the importance of measuring code in context
In the last post, 'Introducing JLBH' we introduced JLBH the latency testing tool that Chronicle used to test Chronicle-FIX and is now available as open source.
In the next few posts we are going to look at some example applications:
All the code for the examples cane be found here in my GitHub project:
One of the points I made in the introduction to JLBH was that it is important to benchmark code in context. That means benchmarking code in an environment as close to how it will be running in real life as possible. This post demonstrates this in practice.
Let's look at a relatively expensive Java operation - Date Serialisation - and see how long it takes:
First here's a JMH benchmark:
Running on my laptop (MBP i7) these are the results I get:
Result "test":
4.578 ±(99.9%) 0.046 us/op [Average]
(min, avg, max) = (3.664, 4.578, 975.872), stdev = 6.320
CI (99.9%): [4.533, 4.624] (assumes normal distribution)
Samples, N = 206803
mean = 4.578 ±(99.9%) 0.046 us/op
min = 3.664 us/op
p( 0.0000) = 3.664 us/op
p(50.0000) = 4.096 us/op
p(90.0000) = 5.608 us/op
p(95.0000) = 5.776 us/op
p(99.0000) = 8.432 us/op
p(99.9000) = 24.742 us/op
p(99.9900) = 113.362 us/op
p(99.9990) = 847.245 us/op
p(99.9999) = 975.872 us/op
max = 975.872 us/op
# Run complete. Total time: 00:00:21
Benchmark Mode Cnt Score Error Units
DateSerialiseJMH.test sample 206803 4.578 ± 0.046 us/op
A mean time of 4.5us for the operation:
We get pretty much the same results running the test with JLBH:
These are the results:
Warm up complete (400000 iterations took 2.934s)
-------------------------------- BENCHMARK RESULTS (RUN 1) ---------
Run time: 10.0s
Correcting for co-ordinated:true
Target throughput:100000/s = 1 message every 10us
End to End: (1,000,000) 50/90 99/99.9 99.99/99.999 - worst was 4.2 / 5.8 352 / 672 803 / 901 - 934
OS Jitter (13,939) 50/90 99/99.9 99.99 - worst was 8.4 / 17 639 / 4,130 12,850 - 20,450
--------------------------------------------------------------------
-------------------------------- BENCHMARK RESULTS (RUN 2) ---------
Run time: 10.0s
Correcting for co-ordinated:true
Target throughput:100000/s = 1 message every 10us
End to End: (1,000,000) 50/90 99/99.9 99.99/99.999 - worst was 4.2 / 5.8 434 / 705 836 / 934 - 967
OS Jitter (11,016) 50/90 99/99.9 99.99 - worst was 8.4 / 17 606 / 770 868 - 1,340
--------------------------------------------------------------------
-------------------------------- BENCHMARK RESULTS (RUN 3) ---------
Run time: 10.0s
Correcting for co-ordinated:true
Target throughput:100000/s = 1 message every 10us
End to End: (1,000,000) 50/90 99/99.9 99.99/99.999 - worst was 4.2 / 5.8 434 / 737 901 / 999 - 1,030
OS Jitter (12,319) 50/90 99/99.9 99.99 - worst was 8.4 / 15 573 / 737 803 - 901
---------------------------------------------------------------------------------------------------- SUMMARY (end to end)---------------Percentile run1 run2 run3 % Variation
50: 4.22 4.22 4.22 0.00
90: 5.76 5.76 5.76 0.00
99: 352.26 434.18 434.18 0.00
99.9: 671.74 704.51 737.28 3.01
99.99: 802.82 835.58 901.12 4.97
worst: 901.12 933.89 999.42 4.47
--------------------------------------------------------------------
A mean time of 4.2us for the operation:
Note: This is case where there is no advantage using JLBH over JMH. I just include the code as a comparison.
Now we're going to run exactly the same operation but inside a TCP call the code will work like this:
- Client send the server a fix message over TCP loopback (localhost)
- Server reads the message
- Server does the date serialisation
- Server returns a message to the client
This time the results look like this:
Warm up complete (50000 iterations took 3.83s)
-------------------------------- BENCHMARK RESULTS (RUN 1) ------------------------
Run time: 6.712s
Correcting for co-ordinated:true
Target throughput:20000/s = 1 message every 50us
End to End: (100,000) 50/90 99/99.9 99.99 - worst was 822,080 / 1,509,950 1,711,280 / 1,711,280 1,711,280 - 1,711,280
date serialisation (100,000) 50/90 99/99.9 99.99 - worst was 11 / 19 31 / 50 901 - 2,420
OS Jitter (64,973) 50/90 99/99.9 99.99 - worst was 8.1 / 16 40 / 1,540 4,850 - 18,350
--------------------------------------------------------------------
-------------------------------- BENCHMARK RESULTS (RUN 2) ---------
Run time: 6.373s
Correcting for co-ordinated:true
Target throughput:20000/s = 1 message every 50us
End to End: (100,000) 50/90 99/99.9 99.99 - worst was 1,107,300 / 1,375,730 1,375,730 / 1,375,730 1,375,730 - 1,375,730
date serialisation (100,000) 50/90 99/99.9 99.99 - worst was 11 / 19 29 / 52 901 - 1,670
OS Jitter (40,677) 50/90 99/99.9 99.99 - worst was 8.4 / 16 34 / 209 934 - 1,470
--------------------------------------------------------------------
-------------------------------- BENCHMARK RESULTS (RUN 3) ---------
Run time: 5.333s
Correcting for co-ordinated:true
Target throughput:20000/s = 1 message every 50us
End to End: (100,000) 50/90 99/99.9 99.99 - worst was 55,570 / 293,600 343,930 / 343,930 343,930 - 343,930
date serialisation (100,000) 50/90 99/99.9 99.99 - worst was 9.0 / 16 26 / 38 770 - 1,030
OS Jitter (32,042) 50/90 99/99.9 99.99 - worst was 9.0 / 13 22 / 58 737 - 934
--------------------------------------------------------------------
-------------------------------- SUMMARY (end to end)---------------
Percentile run1 run2 run3 % Variation
50: 822083.58 1107296.26 55574.53 92.66
90: 1509949.44 1375731.71 293601.28 71.07
99: 1711276.03 1375731.71 343932.93 66.67
99.9: 1711276.03 1375731.71 343932.93 66.67
99.99: 1711276.03 1375731.71 343932.93 66.67
worst: 1711276.03 1375731.71 343932.93 66.67
--------------------------------------------------------------------
-------------------------------- SUMMARY (date serialisation )------
Percentile run1 run2 run3 % Variation
50: 11.01 11.01 8.96 13.22
90: 18.94 18.94 15.62 12.44
99: 31.23 29.18 26.11 7.27
99.9: 50.18 52.22 37.89 20.14
99.99: 901.12 901.12 770.05 10.19
worst: 2424.83 1671.17 1032.19 29.21
--------------------------------------------------------------------
As can be seen the very same Date Serialisation take over twice as long from ~4.5us to ~10us.
It's not really the place here to go into too much detail about why the code takes longer to run when in context but it's to do with CPU caches getting filled in between calls to the date serialisation.
When all we are running (as in the micro benchmark) is the Date Serialisation then that can fit nicely into a CPU cache and never needs to get cleared out. However when there is a gap between calls to the Date serialisation the code for the operation gets cleared out and needs to be reloaded.
JLBH allows you to benchmark code in context and that's an important part of latency benchmarking.
