[Beowulf] $2500 cluster. What it's good for?

Robert G. Brown rgb at phy.duke.edu
Mon Dec 20 05:58:24 PST 2004

On Sun, 19 Dec 2004, Douglas Eadline, Cluster World Magazine wrote:

> On Sat, 18 Dec 2004, Jim Lux wrote:
> > I think it would be interesting to contemplate potential uses of a $2500
> > cluster.  Once you've had the thrill of putting it together and rendering
> > something with POVray, what next?
> That is the $64,000 dollar question. Here is my 2 cent answer.
> BTW, your ideas are great. I would love to see a discussion like this 
> continue because we all know the hardware is easy part!

I'll kick in a pennysworth.

What I use my somewhat more disorganized home cluster for is largely
prototyping and development.  Production clusters hate giving up cycles
to little test runs, as they tend to slow down the entire (tightly
coupled) computation.  Having a small/cheap cluster that is big enough
to be able to learn something useful about the scaling of the task and
to debug parallel code is very useful.

In addition, as already noted, all sorts of embarrassingly parallel
computations can be run on the test cluster WHILE it is thus being used
without much loss of efficiency, as EP tasks can finish whereever, and
if you steal five minutes from them here or there it is no big loss.
The CWM cluster is already better than one of my "remnant" clusters that
I run at Duke (systems with the dread capacitor problem on the mobos
that are gradually dying as the capacitors eventually blow) that is
certainly useful in production, old/slow as it is.

Learning has been mentioned -- I'm informally advising some five or six
different students at different institutions in India who have picked
building a cluster as a serious academic project.  For these students
the issue is building a cluster and then running some toy tasks that can
demonstrate parallel scaling (generalized Amdahl's Law) relations, such
as the ones that I published last year in CWM or that are available in
examples directories in e.g. PVM.  They also learn all sorts of useful
things about networking and systems administration that are excellent
preparation for a career in IT, cluster-oriented or not.  

Many of these students cannot afford to spend even $2500 on a cluster --
they make them out of obsolete or cast-off systems, adding perhaps $100
worth of additional/new hardware and a lot of figurative elbow grease.
This is fine -- five year old systems (even ones that also run Windows
as well as linux in a dual boot or diskless boot configuration) are
precisely what I was lusting after >>six<< years ago to do real work!
They don't make economic sense now for production when a single new
system is much faster than a whole obsolete cluster (Moore's Law is
brutal) but they are fabulous for learning.

Finally, (Doug's remarks below notwithstanding) I actually think that it
would be lovely if e.g. octave had a fully parallel component.  We
currently have several matlab clusters on campus at this point.  Matlab,
mathematica, octave -- these sorts of environments are perfectly great
for a particular class of researcher.  They fill a very similar niche to
the one perl or python fills for programmers.  For these researchers,
the time required to "do a parallel computation right" vastly exceeds
the time saved by doing the parallel computation right, if "right" is
interpreted as maximally efficiently with PVM or MPI or raw sockets or
something.  If "right" means "in such a way as to maximize the
productive work done per unit of their invested time and money" then
using matlab with a suitable parallel library that hides the detail of
parallelism from them entirely is as right as it can get, compared to
the investment of as long as years learning C or Fortran, studying
parallel algorithms, learning PVM or MPI, analyzing their task, and
efficiently implementing their problem in parallel code (when perhaps
their problem is just to solve a set of coupled equations that
parallelizes well and transparently behind a single call).  

So sure, a little minicluster like this can be very useful indeed for
folks who do this sort of work, although in a lot of cases tools like
matlab/octave are memory hogs and the nodes will need to be equipped
with a lot more than 256 MB of RAM to be useful.

OK, so more than just a pennysworth...


> There is part of this project which has a "build it and they will come
> (and write software)" dream. Not being that naive, I believe there are
> some uses for systems like this. The indented audience are not the
> uber-cluster-geeks on this list, but rather the education, home, hacker,
> crowd. In regards to education, I think if cluster technology is readily
> available, then perhaps students will look to these technologies to solve
> problems. And who knows maybe the "Lotus 123 of the cluster" will be built
> by some person or persons with some low cost hardware and an idea everyone
> said would not work.
> If you have followed the magazine, you will see that we highlighted 
> many open projects that are useful today. From an educational standpoint, 
> a small chemistry/biology department that can do quantum chemistry, 
> protein folding, or sequence analysis  is pretty interesting to me. 
> There are others ares as well. 
> There are also some other immediate things like running Mosix or Condor
> on the cluster. A small group that has a need for a computation server
> could find this useful for single process computational jobs.
> I also have an interest in seeing a cluster version of Octave or SciLab
> set to work like a server. (as I recall rgb had some reasons not to use
> these high level tools, but we can save this discussion for later)
> What I can say as part of the project, we will be collecting a software 
> list of applications and projects.
> Finally, once we all have our local clusters and software running to our
> hearts content, maybe we can think about a grid to provide spare compute
> cycles to educational and public projects around the world. 
> Oh well, enough Sunday afternoon philosophizing. 
> Doug
> > 
> > You want to avoid the "gosh, I can run 8 times as many Seti at Home units as I
> > could before" or "Look, I can calculate Pi" kind of
> > not-particularly-value-laden-to-the-casual-observer tasks.
> > 
> > Sure, there's some value in learning how to build and manage a cluster, but
> > I think the real value is in doing something useful with that $2500.  So,
> > what sort of "useful" could one do? Say you were to negotiate with your
> > spouse to get $2500 to play with (or you were able to get a "mini-grant" at
> > a high school).  Is there something that is useful to the "general consumer
> > public" that could be done better with a cluster than with a $2500 desktop
> > machine?
> > 
> > One computationally intensive task that might be applicable is making
> > panoramas from multiple digital photos.  It's incredibly tedious and time
> > consuming to stitch together 30 or 40 digital photos into one seamless
> > panorama (google for PanoTools and PTGui for ideas).
> > 
> > What about kids in school? Is there some simulation that, if clusterized,
> > would be more interactive and useful?
> > 
> > What about interactive rendering from one of NASA's world view databases:
> > layering the terrain models and imagery to do "fly bys"?
> > 
> > Are there consumer type iterative optimization problems that could profit
> > from a cluster?  In my own fooling around, I do lots of antenna simulations,
> > which are essentially embarassingly parallel.  The ham radio community likes
> > "scrounged and homebuilt" solutions to problems, so the $2500 cluster is a
> > potential winner there.
> > 
> > What about outreach to poverty stricken branches of academe who don't use
> > computers much?  literary analysis searching texts for common phrases?
> > figuring out how to fit potsherds together?
> > 
> > Jim Lux
> > 
> > 
> > 
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > 

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu

More information about the Beowulf mailing list