The license for flight recorder doesn't allow production use without buying a license, but you can use it for development. See Oracle Binary Code License Agreement for the JAVA SE Platform Products for the official word. A text search for production will take you to the right section.
Oracle claims 2-3% runtime overhead and my measurements tend to agree with that. The data produced is compact with several days of run time coming in at less than a gigabyte. The profiler doesn't do aggregation at run time, it just logs the data so you pay the processing penalty later when you go to view it which is as it should be.
The defaults are conservative for development purposes. You want to enable logging data to disk, and crank up the amount of data kept, and increase the resolution of pauses that are reported from 10 to 1 milliseconds. There is a bunch of GC related stuff that isn't tracked by default, but the defaults will give you parity with basic GC logging which is enough if you already have a handle on how your application uses heap.
Flight recorder tracks a variety of key performance metrics, some of which aren't covered at all by other profilers. It has the usuals things like stack sampling to track where CPU time is going, lock contention that tracks blockers and blockees, but it also tracks disk IO and socket IO so you can see when a thread blocks longer than a configurable threshold. I was shocked to see several hundred milliseconds of blocked socket IO when Volt only does non-blocking IO on sockets!
|File and Socket IO view|
|Summary of GC, individual GC descriptions|
The stack sampling is better than most profilers because Flight Recorder doesn't wait for a safe point to retrieve a stack. The presentation is decent if limited, you can't get a code heat map the way some profilers do. There is a section that will give you a summary of hot code as well as a section that will give you a summary of hot threads and hot code for those hot threads. This breaks down into expandable call stacks that show you where the time goes as you decompose every method.
|Wow checksums are expensive|
|What is java.lang.ref.Reference$Lock?|
|Selecting a subset of recorded data|
Either they are going to sink a lot of time into it or I am going to sink a lot of time into exploring the results that don't fit because I don't have confidence in them. Benchmark hygiene is hard and it's not something you want to throw away by doing manual benchmarks. Having the why bundled up with the what allows you to (or someone in your stead) to react a lot faster and that means more performance issues get fixed sooner and non-issues get pushed to the bottom of the priority list.