[Beowulf] Re: Finally, a solution for the 64 core 4TB RAM market

Jason Riedy jason at acm.org
Fri May 29 05:56:09 PDT 2009

And Mark Hahn writes:
> the question is how much volume there is in the >= 8-socket market,
> and I don't mean "how many PHB's can be persuaded they need one
> because they're important".

I know a few large companies bought a handful of high-end Starfires
each for their database systems.  Not much in volume (this is less
than 100 total for these folks), but a bit in profits and obscenely
expensive support contracts.

The processor count (or performance) had less impact than the
amount of memory available.  I suspect this semi-vapor-hardware
announcement was targeted at current Sun users...  Showing a steady
upgrade path may move them to IBM+Intel even if not these
particular systems.  And, because one party is IBM, they may sell
these with 1, 2, 4, or 8 sockets activated according to your
contract.  Keeps the hardware volume up.  ;)

And I'm mostly being hopeful because I want a box using these 8x8
boards to replace something I'm suffering against.  "Imagine a
Beowulf cluster of these!"  (with a bit more latency tolerance,
although there may be evidence of a diminishing return)

>> Likely replacing current mid-range, <100-node clusters with a
>> single box.
> unclear to me.  a current mid-range 100-node cluster is 800 cores,
> and I don't think we're talking about that in an SMP.  Intel's recent
> nehalem-ex preview was 128 hyperthreads (64 real).

That 100-node cluster likely has 400-1600 GiB of memory, which is a
bit smaller than 4000 GiB.  But that 4 TiB number includes *really*
expensive memory.

Plus, I imagine a Larrabee-successor or merge could drop into these
boards for workloads heavier on computing.  That may be 3 years
off, but I can see ramping up the core counts and keeping the
relatively inexpensive but fast interconnect as quite useful.  If
your code is latency sensitive (i.e. not one-sided linear algebra
decompositions), fewer cores, more memory, and a fast+cheaper
interconnect may end up being faster.

But then I'm more accustomed to poorly designed systems that have 2
cores per node, an expensive interconnect, and NFS as the only
shared file space.  ;) Replacing one of *those* is a no-brainer,
which is about what went into it in the first place...


