Saturday, June 21, 2008

Expression-Wide Optimisation (Primitive Callsite #2)

This is a technique I proposed for optimizing Groovy primitive expressions.

Let's say, we have:
public double a(int i, int j) {
return 1 / ((i+j)*(i+j+1)/2.0D +i+1);
This expression consists of 7 callsites. If we can make sure that the following 3 conditions
1. All variables are primitive.
2. The metaclass of each callsite is default
3. The meta method of each callsite is default
are true, the we can replace the entire expression with primitive and native Java bytecodes. Thus, we can by-pass any overhead here.

When some meta-method has been replaced, then the above condition is invalid. The original code will be performed instead of the fast one.

This technique is reasonable to implement because:
1. Callsite makes this checking cheap. We can define "isDefault" flag for a "default" callsite.
2. Instead of checking the default flag one-by-one, checking these flags for a whole expression makes this even cheaper. This also results in easier code generation.

However, we have to generate code twice, say fast and slow paths, for each expression-to-be-optimised.

final boolean exprCallSet = $getExprCallSet(0);
if(exprCallSet) {
return 1 / ((i+j)*(i+j+1)/2.0D +i+1);
} else {
$reevaluateExprCallSet.get()[0] = true;
return DefaultTypeTransformation
acallsite[0].call($const$0, acallsite[1].call(
acallsite[4].call(acallsite[5].callOII(i, j),

The above snippet is a mock code I'm working on. The next step is to generalise this and modify the Groovy class generator. With this technique, we can unleash all power of Java primitives which will be > 800% faster than the normal callsite caching in 1.6-beta1.

Tuesday, June 17, 2008

Primitive Callsite Again (Part 1)

Current Groovy implementation always do boxing, unboxing regardless types of primitive parameters. This is the reason why it's slow down for arithmetic operations. I've come up with some experiments to optimise primitive for Groovy.

The idea is that I'll generate primitive callsite, or primitive calls when:

1. typing is obvious. For example,

int a = 5
int b = a * (a + 5)

in the above example, the second expression will have 1 box/unbox call and 1 primitive call because of type of (a+5).

With (a +5), this can be generated as Object call((int)a, (int)5) because type of a is obvious, also '5'.
But we won't know type of the result. So that a * (...) will be generated using Object call ((int)a, Object) rather than Object call(int, int).

One may argue that this might be not worth, as for a very long and complex expression this optimisation will be useless. For example,

private double a(int i, int j) {
return 1 / ((i+j)*(i+j+1)/2.0D +i+1)

The above optimisation can do it job only just for 2 inner most expressions. But you will be surprised that it's actually ~23% faster ! Thus IMHO, it's worth to do that.

2. This second part is to trying to optimise that long expression. The entire expression can be replaced with all primitive calls if only if their meta-class and meta-methods are "DEFAULT". I hope I could detect this via a property of CallSite, then generate 1. fast path, 2. original path. This second idea has not been fully experimented yet. So wait and see what can be done.