[Beowulf] Re: Finally, a solution for the 64 core 4TB RAM market
jason at acm.org
Fri May 29 05:56:09 PDT 2009
And Mark Hahn writes:
> the question is how much volume there is in the >= 8-socket market,
> and I don't mean "how many PHB's can be persuaded they need one
> because they're important".
I know a few large companies bought a handful of high-end Starfires
each for their database systems. Not much in volume (this is less
than 100 total for these folks), but a bit in profits and obscenely
expensive support contracts.
The processor count (or performance) had less impact than the
amount of memory available. I suspect this semi-vapor-hardware
announcement was targeted at current Sun users... Showing a steady
upgrade path may move them to IBM+Intel even if not these
particular systems. And, because one party is IBM, they may sell
these with 1, 2, 4, or 8 sockets activated according to your
contract. Keeps the hardware volume up. ;)
And I'm mostly being hopeful because I want a box using these 8x8
boards to replace something I'm suffering against. "Imagine a
Beowulf cluster of these!" (with a bit more latency tolerance,
although there may be evidence of a diminishing return)
>> Likely replacing current mid-range, <100-node clusters with a
>> single box.
> unclear to me. a current mid-range 100-node cluster is 800 cores,
> and I don't think we're talking about that in an SMP. Intel's recent
> nehalem-ex preview was 128 hyperthreads (64 real).
That 100-node cluster likely has 400-1600 GiB of memory, which is a
bit smaller than 4000 GiB. But that 4 TiB number includes *really*
Plus, I imagine a Larrabee-successor or merge could drop into these
boards for workloads heavier on computing. That may be 3 years
off, but I can see ramping up the core counts and keeping the
relatively inexpensive but fast interconnect as quite useful. If
your code is latency sensitive (i.e. not one-sided linear algebra
decompositions), fewer cores, more memory, and a fast+cheaper
interconnect may end up being faster.
But then I'm more accustomed to poorly designed systems that have 2
cores per node, an expensive interconnect, and NFS as the only
shared file space. ;) Replacing one of *those* is a no-brainer,
which is about what went into it in the first place...
More information about the Beowulf