Friday, August 15, 2008

A Java-near-speed Groovy

Almost 2 months that I have not posted here. I'm still in the final phase of Google Summer of Code. The world is going to be wonderful when you have got a really good result from your hard work, is that right ?

I'm currently be able to make Groovy speed step closer to Java. It's a high aim, and it's clear to be possible with JVMIT now. I was working on optimising callsites, but could not find a way to make it faster than Alex T.'s implementation because he did really good job of implementing it. So I went back to look at my old GJIT code. What I found is that something had been fooling my eyes for almost a year. There is a way to avoid JVM 6 specific code to implement an agent for JVM 5. Then I re-read AspectJ source again, and found something useful. I has started to re-write a GJIT agent for JVM 5 from then.

My prototype of the second generation of GJIT was done with Soot framework, where I really learned a lot of useful optimisation tricks (e.g., SourceValue). Later, I found that Soot is not suite for implementing JIT, because of 1. it's somewhat big structure 2. it's class loading problem. I then have had to implement a whole thing using ASM. It really was a nightmare because there is no Jimple to help your analysis. I re-wrote at least 3-4 version of ASM optimisers, and finally got the current one, which has a good enough structure to patch.

Now my goal is to beat all shootout benchmarks. Here's the result from partial sums benchmark compared with Java on my machine.


$ export JAVA_OPTS="-server"
$ time `java -cp bin alioth.PartialSumsJ 5000000`
real 0m10.246s
user 0m0.030s
sys 0m0.015s

$ time `groovy.bat test/partial_sums.groovy 5000000`
real 0m27.110s
user 0m0.030s
sys 0m0.016s

$ time `groovy.bat test/alioth/PartialSums.groovy 5000000`
real 0m28.409s
user 0m0.030s
sys 0m0.015s

$ export JAVA_OPTS="-server
-javaagent:C:\\groovy-ck1\\healthy\\target\\install\\lib\\gjit-0.1.jar"
$ time `groovy.bat test/alioth/PartialSums.groovy 5000000`
real 0m13.434s
user 0m0.030s
sys 0m0.030s



It's 200% faster than current Groovy, and ~20-30% slower than Java. There's still room to go. I hope to take some more steps closer to Java's speed.

11 comments:

the-very said...

WOW! Awesome!
BTW, are you using groovy 1.5.6 or 1.6-beta1 in benchmark?

chanwit said...

@the-very:

Thanks you :-)
It ran against 1.6-beta2-snapshot from last week's trunk.

Bearophile said...

Can your JIT modified to work with Jython too?

Jacek Furmankiewicz said...

You guys should rush like crazy to make this a stable official build and get as close to Java as possible (if you get within 10% I think that will be enough for many to take notice).

Compared to the lack of progress on Java lang features in Java 7 this may truly give us Java developers a reason to look at Groovy more seriously.

Great work, much respect.

Artur said...

This is quite interesting comparison - but unfortunately, this benchmark is just testing speed of sin and cos in java.

Try to replace two lines in the code with
def sk = k
def ck = k

Obviously, results doesn't make sense anymore, but there is still some computation (and SAME computation) going on.

On my PC, results have changed from 6.5s for java and 21s for groovy, to 0.8s for java and 15.5s for groovy. Which means that around 6 seconds in benchmark is spend in sin/cos (which obviously is same speed in both languages) and rest in something which we can call 'language speed'.

Which means that by default groovy seems to be 20 times slower than java, as long as you compare groovy, not heavy native calls. Which is still a HUGE improvement, as I remember groovy being 300 times slower than java few versions ago (and around 60x slower in beta 1 on this benchmark)

If you are working on optimizing language speed, you should try to find a benchmark which actually spends most time in your code, not in native code.

This also means that your changes are probably not 2x improvement like it seems from first look, but 4+x improvement (as 6seconds part is constant and cannot be optimized).

I still think that all the rest of the time is spent in pow call. After removing 2 other Math power calls, java is around 0.2s, groovy around 11s. This looks more or less reasonable - beta2 snapshot is now around 50-60 times slower than java, which would mean 5 times improvement from times I remember.

Before arguments starts about real world usage etc - we can make a test with jdbc, performing a 1 hour query against database. Groovy will be same fast/slow as java. If we are comparing language/compiler speeds, we need to remove the external noise from equation - and in this case, native calls to Math are exactly the noise.

I would suggest doing the benchmark on some sorting code. Try to do a quicksort on array of ints with huge size - this should give a rough idea of pure execution speed.

chanwit said...

@Bearophile:
Don't know about that. To my knowledge, Jython is different from Groovy internally, but they would share some concepts in common.

@Jacek:
Thanks for that. To be honest, it's really hard to get into 10% margin at the moment ;-)

@Artur:
As I've mentioned, benchmarks from shootout will be used to evaluate GJIT. Shootout is a quite well-known place for people to compare languages performance, and Groovy should have a good position there.

I understand that there's a different in term of performace for realworld usage, and benchmark things.
Anyway, GJIT is now in a good shape for me to put some more optimisation techniques into it, and I think
we can later have tuning for specific applications (like you said, e.g. for JDBC APIs or Grails APIs).

Artur said...

I think I have failed to express my point (as I meant to NOT compare JDBC as it is hiding the actual language speed). English is not my native language - sorry for that.

In short - please check your results with spectral-norm. This way, you will be measuring improvements you have done, without noise introduced by trig functions.

chanwit said...

@Artur:

Of course, English is not my first language as well. So probably it's my side that cannot catch your point correctly. Thank for your suggestion anyway.

Eric said...

This was interesting to read. Would you mind telling me a bit more about what your problems were with using Soot? I am maintaining Soot at the moment and we are always trying to find out how we can improve the framework.

chanwit said...

@Eric:

Glad to see your comment here. The following are what I can think of from my Soot experience at the moment.

1. I was using "instanceof" mainly to check that a statement is an assignment, an invocation, etc. or not.
Is it possible to have this checking cheap ? i.e. pseudo opcode (ASSIGNMENT, INVOCATION, etc.)

2. for JVMTI, I want a result as byte[]. To produce this, I need to call Jasmin to compile the generated code to be byte[] for me.
Is it possible to integrate ASM into Soot and use it to produce byte[] directly with out Soot -> String -> Jasmin -> byte[] ?

3. I wanted only Jimple for my analysis, but a whole Soot contains other thing I don't want (for example Dava).
Is it possible to have separated package i.e. Soot-core, Soot-baf, Soot-jimple, etc.

4. The current Pack model is flexible for research, but not practical, IMO. I want to perform several optimizations in the same iteration to save time. Currently in Soot, you need n iterations for n optimisation algorithms, which is fine for offline optimisation, but not for JIT.

5. Class resolving is mandatory in Soot, but I don't need it. Is it possible to have a quick class or method reference using a string as its descriptor? Currently in Soot, I need to get a class reference from Scene.v(), but for example in ASM, it's just a string. This can improve performance a lot when doing JIT compilation.

Eric said...

Ok let's see.

@1: Yes, you can dispatch on the statement type by using a visitor:
http://www.sable.mcgill.ca/soot/doc/soot/jimple/StmtSwitch.html
There is also a visitor for expressions: http://www.sable.mcgill.ca/soot/doc/soot/jimple/ExprSwitch.html

@2: You don't need to use ASM. Jasmin generates a sequence of bytes, which we then write to disk using an OutputStream. You would just have to modify Soot so that this sequence of bytes is written to a ByteArrayOutputStream instead. Then you can get your byte array from there.

@3: Unfortunately there's no stripped-down version of Soot at the moment. You are right, this is because it's a research tool.

@4: I think you could just pply multiple optimizations within the same "Transform". That should solve your problem if I understood it correctly.

@5: Yes, this is indeed a bit of a problem. Sometimes you can get away with the "-allow-phantom-refs" command line option. This allows Soot to create a "phantom class" for any class that it cannot find on the command line. As I understand that's similar to this ASM option. However it may lead to suboptimal results because Soot may be missing potentially important information.

Hope this helps.