[Beowulf] Hadoop

Jeff Layton laytonjb at att.net
Sun Dec 28 07:17:37 PST 2008

I think I understand why people want the toolboxes - it makes coding easy. From what I've seen people then stay with the "prototype" code and never move to a compiled language such as C or Fortran. It's been a long, long time, but  I did all of my code prototyping for my PhD in Matlab and rewrote it in Fortran. I was easily able to get a 10x speedup. I guess people don't like 10x improvements in performance any more :)

An option for people is to use the Matlab compiler to build faster code. I've never used it but the reports I've seen is that it works quite well. I'm not sure about how the toolboxes work with it - whether you compile them as well or they run as interpreted along with the compiled code.

Octave has a number of toolboxes. I'm not sure if it covers what various people are doing, but they are out there.

For larger problems, what about using ScaleMP? You can get a very large SMP system for running large problems. You may not need the extra CPUs, but they do get you the extra memory. That's one of the neat things about ScaleMP. You buy a couple of nodes with really fast CPUs and memory, and then buy the remaining nodes with really slow processors (cheap) with lots of memory. This allows you to tailor the SMP box to the application. Just be sure to run the code on the fastest CPUs (numactl).

Thanks for the feedback.


From: Tim Cutts <tjrc at sanger.ac.uk>
To: Jeff Layton <laytonjb at att.net>
Cc: landman at scalableinformatics.com; Beowulf Mailing List <beowulf at beowulf.org>
Sent: Sunday, December 28, 2008 4:24:51 AM
Subject: Re: [Beowulf] Hadoop

On 27 Dec 2008, at 9:04 pm, Jeff Layton wrote:

> I hate to tangent (hijack?) this subject, but I'm curious about your  
> class poll. Did the people who were interested in Matlab consider  
> Octave?

I can't speak for Joe's class, but when I've asked Matlab users here  
about using Octave instead, they're generally not interested.  Partly  
this is a somewhat irrational "it doesn't have Matlab on the cover"  
thing, but largely it's the Matlab toolboxes they want access to.

A recurrent theme we find with anyone doing stuff with any similar  
package, be it Matlab, Stata, R or whatever, is that they always hit  
massive difficulties  when they try to scale up their analysis to  
larger problem sizes.  If they come to me early in their project, I  
will point this out in advance, and suggest that while such languages  
are great for prototyping, they don't scale well.  Usually, of course,  
the warning is ignored (they didn't pay for this advice, so no need to  
take it, right?) and six months down the line their stuff doesn't work  
any more when they try to run it on large production datasets, and I  
have to avoid the "I told you so" speech...  :-)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20081228/9b1c1a44/attachment.html>

More information about the Beowulf mailing list