Sunday, November 04, 2007

Convention over Configuration for Map-Ruduce

I'd just like to have a quick look to Hadoop's word-count example, if it can be written in Groovy.

Disclaimer: this code doesn't work !

class WordCountMapReduce {
def map = {key, value, output, reporter ->
def line = value.toString()
def itr = new StringTokenizer(line)
while(itr.hasMoreTokens()) {
word.set(itr.nextToken())
output.collect(word, one)
}
}

def reduce = {key, values, output, reporter ->
int sum = 0;
values.each {
sum += it.get()
}
output.collect(key, sum)
}

}



It's time for Groovy to go fore massive computation?

1 comment:

Alan Ho said...

I just finished building one. There are so many advantages of using groovy - especially the GPath and XML support. Think about it - XML is the lingua franca of the internet. Yet its such a pain in the ass to write code that maps xml to java objects. If you do everything in groovy - you can just use XMLParse on the xml records, and you have fully usable groovy objects - no craziness.