[Beowulf] Hadoop

Tim Cutts tjrc at sanger.ac.uk
Sun Dec 28 01:24:51 PST 2008

On 27 Dec 2008, at 9:04 pm, Jeff Layton wrote:

> I hate to tangent (hijack?) this subject, but I'm curious about your  
> class poll. Did the people who were interested in Matlab consider  
> Octave?

I can't speak for Joe's class, but when I've asked Matlab users here  
about using Octave instead, they're generally not interested.  Partly  
this is a somewhat irrational "it doesn't have Matlab on the cover"  
thing, but largely it's the Matlab toolboxes they want access to.

A recurrent theme we find with anyone doing stuff with any similar  
package, be it Matlab, Stata, R or whatever, is that they always hit  
massive difficulties  when they try to scale up their analysis to  
larger problem sizes.  If they come to me early in their project, I  
will point this out in advance, and suggest that while such languages  
are great for prototyping, they don't scale well.  Usually, of course,  
the warning is ignored (they didn't pay for this advice, so no need to  
take it, right?) and six months down the line their stuff doesn't work  
any more when they try to run it on large production datasets, and I  
have to avoid the "I told you so" speech...  :-)

People have mentioned Hadoop here, and it might work for some types of  
work being done (I'm interested in the data-locality feature of its  
filesystem that was mentioned earlier in the thread) but no-one's  
using it at this site yet as far as I know.


 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

More information about the Beowulf mailing list