[Beowulf] Hadoop
Tim Cutts
tjrc at sanger.ac.uk
Sun Dec 28 01:24:51 PST 2008
On 27 Dec 2008, at 9:04 pm, Jeff Layton wrote:
> I hate to tangent (hijack?) this subject, but I'm curious about your
> class poll. Did the people who were interested in Matlab consider
> Octave?
I can't speak for Joe's class, but when I've asked Matlab users here
about using Octave instead, they're generally not interested. Partly
this is a somewhat irrational "it doesn't have Matlab on the cover"
thing, but largely it's the Matlab toolboxes they want access to.
A recurrent theme we find with anyone doing stuff with any similar
package, be it Matlab, Stata, R or whatever, is that they always hit
massive difficulties when they try to scale up their analysis to
larger problem sizes. If they come to me early in their project, I
will point this out in advance, and suggest that while such languages
are great for prototyping, they don't scale well. Usually, of course,
the warning is ignored (they didn't pay for this advice, so no need to
take it, right?) and six months down the line their stuff doesn't work
any more when they try to run it on large production datasets, and I
have to avoid the "I told you so" speech... :-)
People have mentioned Hadoop here, and it might work for some types of
work being done (I'm interested in the data-locality feature of its
filesystem that was mentioned earlier in the thread) but no-one's
using it at this site yet as far as I know.
Tim
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Beowulf
mailing list