It's been almost one and a half year that I have not updated anything related to Groovy performance in general. As I posted previously that I have been still doing on this topic, now it's time for updating a bit.
The technique is to hook the callsite caching process. Each callsite-to-be-called is profiling, its runtime information is analysed, and the faster code for it is then generated. The callsite is replaced by its equivalent direct call. Replacement is done, anyway, through JVMTI similar to the trick implemented in the previous version of GJIT. However, the concept used in this version of GJIT has been changed from totally automatic to manually controlled. This means that only selected callsites will be optimised, and you, as a developer, can control this optimisation via the mechanism provided. This concept is not that new, I have posted about it a while ago.
Latest results I came up with are micro-benchmarks. All of them in the long run, after related call sites are optimised, are running at the same speed as Java programs.
Let's examining the graph of the Fibonacci benchmark:
You can see that there are 2 interesting regions of graphs before the Groovy program getting the same speed as Java.
1. At the beginning, there is an overhead of profiling the callsite instantiation system of Groovy.
2. For background optimisation, an optimiser thread requires times for analysis and code generation. (I also make a note on the overhead of HotSpot VM, just for referencing).
After the optimisation phase is completed, the speed of Groovy program goes overlapping with Java. This approach would be working fine for a long running program, but not for scripting. Why? I left this as an exercise for the readers ;-)