[Beowulf] 96 Processors Under Your Desktop (fwd from brian-slashdotnews at hyperreal.org)

Tue Aug 31 05:48:21 PDT 2004

On Tue, 31 Aug 2004, Mark Hahn wrote:

> > Also, a low power use cluster is the only way I can have a significant 
> > cluster in my apartment, so it was to be this way, or no way. At 
> 
> no insult intended, but do you need a cluster in your apt?  I've
> personally found no real need for more than a terminal (old athlon
> and nice screen) at home, since the cable monopoly gives me a 14ms 
> link to plenty o clusters at work.

Now now, Mark, I cherish MY cluster at home.  It serves many useful
purposes, including making it easy to test lots of strange things and
programs on a bunch of boxes that AREN'T in production and which WON'T
screw up anybody's work flow but my own if I crash them, drive them
through the wall swap/memory wise, and so forth.  Sometimes I even do
production work on it, although the systems tend to be behind the curve
(to where I can afford them) performance wise.  It is absolutely
critical to my writing the CWM column -- sometimes I need to run stuff
like xpvm or other X graphical apps which can be total pigs run over
even a DSL link.

The electricity issue is a pain -- I probably pay close to $600/year to
run my cluster at home (although most of the systems do dual duty as
desktops and would have to be here anyway) -- but so far there hasn't
been any cost-effective low power alternative to buying build-it-myself
$500 vanilla boxes.  If it costs me $1000 for a low-power alternative, I
would lose money over the lifetime of the equipment AND the low power
systems are typically slower, because of the unavoidable connections
between frequency and switching power that Jim republishes from time to
time on the list.

Power and heat will continue to be major issues for all of us.  I
attended roadmap talks by both intel and amd this summer.  Chip design
appears headed more and more towards designs that just plain slow down
under load to avoid melting down (so there will be a larger and larger
discrepancy between "maximum speed" and "speed under a sustained load",
especially for certain operations).  You can only go so far with any
given fabrication before laws of physics kick in...

   rgb

> > I can think of 
> > many situations where it would be desirable to have a deskside cluster 
> > for computation, development, or testing, 

I think it is an issue of differentiating production clusters (which are
not easily or cheaply diverted from their primary task) from development
clusters, which might be older machines no longer suitable for
production or might be brand new bleeding edge machines that you want to
play with for weeks or months NOT in production before endorsing them
for the next generation of production machines.  My home cluster is an
example of the former, which doesn't strictly have to be at home but as
noted it is convenient and even appropriate, given my often non-research
applications (such as my kids running Diablo II under Winex when I'm not
prototyping or developing:-).  We have old/dying boxes at Duke that I
use for similar purposes when they aren't in low-grade production (which
they usually are, old or not).  An example of the latter would be buying
a small stack of opterons (which I among others did early on) to test
and benchmark before "endorsing" them for other campus groups to buy
more of in production clusters.  A few boxes of the the small stack
operated "deskside" for convenience for a few months because it is a
PITA to go down into our server room -- it is cold, noisy, inaccessible
to others who might be look for help and not the best place for humans
to work for extended periods of time.  At least, not as good as an
office.

> > A 450 watt , 10 GFLP parallel computing machine for about $10K seems 
> 
> again, it depends on your code - the orion machine will work well for 
> embarassingly parallel, cache-friendly codes.  nothing wrong with them!
> 
> more interesting is what this says about the "blade" market.  I looked 
> at the IBM bladecenter again today, configured with the dual-ppc blades.
> it's reasonable as blades go, but it's pretty obvious that it doesn't 
> compare all that well versus this Orion stuff.  maybe Orion's real niche
> will be web-hosting ;)
> 
> btw, did anyone notice whether their ram is ECC or not?  like memory
> bandwidth or interconnect latency, that's another men-from-boys HPC issue.

Here I agree with you.  With the usual caveat (what IS a "flop" anyway)
$1K/GFLOP at a cost of $45/year for power (based on a cost of $0.06 or
thereabouts per KW-hour) has to be compared to $500/GFLOP at a cost of
$100/year, where Amdahl's law usually favors fewer faster systems to get
superior parallel scaling over more slower systems to an equivalent
total.

When they make low power systems that cost the same and have the same
performance characteristics as a higher power system, it is obviously
desireable.  However, the low power systems I generally see discussed
-- with a few exceptions e.g. second generation steppings or low power
versions of standard AMD and Intel chips that fit into commodity designs
at little to no marginal cost -- typically come with a markup that make
them attractive primarily to a relatively small market segment for whom
power and available space are THE primary scarce resources and hence
design constraints.

Maybe this newest offering will be an exception, but I'd really like to
see a full suite of e.g. stream, lmbench, cpu_rate, or other
microbenchmark numbers for a sweep of memory sizes.  "flops" don't mean
much to me (not even "linpack flops") but I can sometimes tease a fuzzy
conceptualization of expected performance from a broad suite of numbers
that represent specific subsystems.  High end complex app suites such as
SPEC (when published as individual scores on the component applications)
can also be useful, although <sigh> your own application IS the best
benchmark.

With that sort of performance matrix in hand, a complete picture of
their infrastructure requirements, and real-world prices (quoted
delivered and with 3+ years onsite service contracts) one can actually
do real cost benefit analysis.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu