Small cluster

Mon Oct 28 11:18:42 PST 2002

On Mon, 28 Oct 2002, Garriss, Michael wrote:

> I have a $5000 budget to build a sort of "proof of concept" cluster before a
> larger sum of money gets spent.  Any suggestions on getting the most bang
> for the buck.  The application is a simulation that would benefit from
> parallel computation and lots'o RAM.

The key questions are:

  a) how much and what pattern of IPCs between the parallelized
subtasks; and

  b) are the tasks synchronous (with "barriers" that each subtask must
reach, typically to enable a round of IPCs, before any subtask is
allowed to proceed)?

The range of possible answers to these two questions drive one through a
huge range of design optimax decisions, from (as a very possible answer)
"no cluster I can afford for $5000 will show an advantage" to "the
most/fastest/cheapest nodes wired together with dixie cups and string
will show linear speedup for any number of nodes".

To understand the range of answers and how to BEGIN to architect a
cluster that will work for your problem, you might look at my online
book on engineering a compute cluster on www.phy.duke.edu/brahma.  It
will at least summarize some of the extremes for how you might want/need
to spend your money -- mostly on CPU vs mostly on network -- and
indicate a bit about how you might make measurements or estimates of
your tasks' requirements to enable you to do the engineering.

The short version is:

  a) if IPCs are a tiny-to-small fraction of compute time, and the task
is coarse grained (with the embarrassingly parallel architypal ideal
being SETI or RC5) then get the cheapest/fastest/most nodes, trying to
optimize aggregate single-threaded task completion and spend relatively
little on a cheapo 100BT network.  For $5000, you ought to be able to
get something like 6-8 nodes (cheap stripped Celeron, P4, or Athlon
single processor) and a 100 BT switch as a demo unit.

  b) If !a), try to figure out what sort of network your task requires.
VERY crudely (and even so I'm likely to be corrected by listbots more
advanced than myself). If your communication pattern is synchronous and
fine grained (lots of small messages, barriers) you're likely to need
Myrinet or an equivalent low-latency, high bandwidth network, and
probably cannot afford it and any nodes too.  If your communication is
somewhat fewer but very large messages (big blocks of data shipped
between nodes or between slaves and a master) you might get by with
gigabit ethernet, which has mediocre latency but, well, gigabit
bandwidth.  In this case you can probably afford four or five nodes and
a gigabit switch that is likely to have more capacity than you really
need but gives you room for growth later.

Really, if !a), be sure to do ALL your homework before spending your
money, as solutions like a) are very likely to run more SLOWLY in
parallel on as few as two nodes...;-)

Last warning:  The beowulf book by Sterling, Becker, etc... notes that
one has to do the scaling computations that guide the a) vs b) choices
taking into account BOTH the IPC scaling AND the task scaling.  That is,
a lot of times the ratio between IPCs and computation can be shifted by
e.g. making the task a lot bigger, so a task that is fine grained for
small tasks becomes effectively coarse grained for larger tasks.  So
don't be discouraged if you do benchmarks of small versions of your
application(s) and get "fine grained" sorts of numbers -- look at how
the IPCs scale relative to CPU as you crank up to production sizes and
try to figure out if you'll still be fine grained there.

HTH

   rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu