JFR is a Java profiler which will allow you to investigate the runtime characteristics of your code. Typically you will use a profiler to determine which parts of your code are causing large amounts of memory allocation or causing excess CPU to be consumed.
There are plenty of products out there. In the past I've used YourKit, OptimizeIt, JProfiler, NetBeans and others. Each has its benefits and it is largely a matter of personal preference as to which you choose. My current personal favourite is YourKit. It integrates nicely into IntelliJ has a relatively low overhead and presents its reports well.
The truth is that profiling is a very inexact science and it is often worth looking at more than one profiler to build up a clearer picture of what exactly is going on in your program. To my knowledge most of the profilers rely on the JVMP/JVMTI agents to probe the Java program. A major problem with this is safe points. This means your Java program can only be probed when it is at a safe point. This means that you will get a false picture of what is really going on in your program especially if much of the activity is between safe points. Also all profilers, to a varying degree add overhead. Profiler overhead will change the characteristics of your program and may cause misleading results from your analysis. Much more information here.
Enter JFR. JRF has been bundled with the JDK since release 7u40. JFR is built with direct access to the JVM. This not only means that there is a very low overhead (claimed to be less than 1% in nearly all cases) but also does not rely on safe points. Have a look here at an example of how radically different an analysis from YourKit and JFR can look.
To run JFR you need to add these switches to your Java command line:
-XX:+UnlockCommercialFeatures -XX:+FlightRecorder
JFR is located in Java Mission Control (JMC). To launch JMC just type jmc in your command line and if you have the JDK in your path the JMC console will launch. You should see your Java program in the left hand pane. Right click on your program and then start flight recording.
You will be presented with a dialog box where you can just accept the defaults (sample for a minute) and then your results will be displayed. It's worth paying around with the options to find how this will work best for you. As with all good products this GUI is fairly intuitive.
As you can tell from the command line switches it is commercial feature. I'm not exactly sure what that means but you can read more about that in the documentation here. Also you can run this from the command line, it's all in the documentation.
One problem I did find was when I downloaded the latest Java8 snapshot (at this time 1.8.0_40-ea) I was unable to launch my program and got the following message:
/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/bin/
Error: Trying to use 'UnlockCommercialFeatures', but commercial features are not available in this VM.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
In summary, JFR is a great addition to any developers toolkit and as long as you are using JDK release 7u40 or above it's certainly worth trying it out on your code.
(I encourage you to have a look at a previous post First rule of performance optimisation in conjunction with JFR)
Showing posts with label yourkit. Show all posts
Showing posts with label yourkit. Show all posts
Friday, 16 January 2015
Friday, 9 January 2015
First rule of performance optimisation
Let's start with a system with no obvious performance bottlenecks. By that I mean that there are no glaring algorithmic problems which are grinding your system to a halt. e.g. a tight loop which is reading a property from a file without caching the result.
You want your system to run as fast as possible, where do you start? Most profilers (e.g. my current favourite YourKit) have modules for memory tracing and CPU tracing. Since the aim of the exercise is for your program to run faster you start by looking at the CPU? - Wrong! The first place to start is by looking at the memory and in particular at object allocation.
What you should always try and do in first instance is to reduce your object allocations as much as possible. The reason that this is not intuitive (at least it wasn't to me) was because we know that object allocation is fast, in fact it's super fast even compared to languages like C. (Lots of discussion on the web about exactly which and in what circumstances it can be faster - but it's undeniably fast). So, if object allocation is so fast why is it slowing my program down and why should I start by minimising my object allocation?
You want your system to run as fast as possible, where do you start? Most profilers (e.g. my current favourite YourKit) have modules for memory tracing and CPU tracing. Since the aim of the exercise is for your program to run faster you start by looking at the CPU? - Wrong! The first place to start is by looking at the memory and in particular at object allocation.
What you should always try and do in first instance is to reduce your object allocations as much as possible. The reason that this is not intuitive (at least it wasn't to me) was because we know that object allocation is fast, in fact it's super fast even compared to languages like C. (Lots of discussion on the web about exactly which and in what circumstances it can be faster - but it's undeniably fast). So, if object allocation is so fast why is it slowing my program down and why should I start by minimising my object allocation?
- It puts extra pressure on the garbage collector. Having more objects in the system (especially if they are not short lived) will give your garbage collector more work and slow down the system that way.
- It fills up your CPU caches with garbage forcing them to flush and have to keep going higher up the stack to L2 and L3 cache and then to main memory to retrieve the data. Roughly speaking, each level up the stack from which data has to fetched takes an order of magnitude longer in time (see graphic below). So even if object allocation is fast it causes cache misses and thus many wasted cpu cycles which will slow your program down.
- Do the easy things first. It's far easier in general to reduce allocation (by caching etc) than it is fix algorithms when looking at a CPU performance trace. Changing the allocations may completely change the performance characteristics of your program and it may be that any changes to algorithms carried out prior to that will have been a waste of time.
- Profilers lie (this is a must watch video). It's really hard to know, when looking at CPU traces where exactly the bottlenecks lie. Profilers however do not lie about the allocations.
- High object allocation is often a bad smell in the code. Looking for excess object allocation will lead you to algorithmic issues.
Subscribe to:
Comments (Atom)

