« XML and authority control | Main | Entire Library of Congress »


Jacques Mattheij

Hi there, I came across your site while looking for information about map-reduce using MPI, and I was wondering what your take is on the 'hadoop' way of doing things, moving the code to the data. Apparently they migrate the programs to be as close to the data as possible during the 'map' stage, which to me seems a good way of doing things. Unfortunately they decided to write their stuff in Java so now you need all kinds of silly trickery to access the data from non-java programs. 10 for the idea, but -3 for implementation...

best regards,

Jacques Mattheij


We typically spread our data out across the cluster so that the input files for the mappers are on the node the mapper runs on.

We've looked at Hadoop some. Our implementation is a lot simpler, but we think we will have scaling issues if we move into thousands of nodes (currently we have 132 cpu on 33 nodes). One of things that Hadoop does is implement its own file system. This has some advantages, but adds overhead and may be part of what Jacques is talking about.


Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Comments are moderated, and will not appear until the author has approved them.

Your Information

(Name is required. Email address will not be displayed with the comment.)

My Photo

April 2018

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30