[Beowulf] cluster flexibility vs. design specificity, cluster vs. supercomputing

Mark Hahn hahn at physics.mcmaster.ca
Mon Aug 7 14:15:25 PDT 2006

>   Myself, prefer every cluster is specifically designed with "rooted" firmware
> and even hardware design, and benchmarked with specific application. That is my
> understanding of the cluster for supercomputing in scientific community.

there are communities like that, generally characterized by single/few
applications and fairly homogenous user communities.  benchmarking is indeed
a good way to address them.

I think some people overlook the prevalence of general-purpose HPC clusters:
extremely disparate user communities, cross-discipline, cross-institution,
having O(nusers) number of applications.  benchmarking for such clusters is 
either an epic undertaking or you have to use microbenchmarks.

>    But, in real life, "cluster" diverges from "supercomputing", in order to

huh?  IMO, "supercomputing" is just computing beyond small scale.  whether 
it's cluster or vector doesn't make the difference; neither does being single 
or multi-vendor.

> expand the cluster without being tangled by the product provider, and, in order
> to managing a evolving cluster, we have to allow certain/great flexibility for
> the cluster. in my mind, that is not really a supercomputing or "high
> performance computing" for computing efficiency ((e.g. possible conflicts of
> mixtures of hardwares within the cluster?) Or, I am just plain wrong.

the interesting part of this is how "openness" fits in here.
vendors love lock-in, which is the opposite of openness.
the distinction above (dedicated vs general-purpose) maps well 
onto this: if you are buying a dedicated, single-purpose machine,
then you probably don't mind lock-in.  if you're a general-purpose 
HPC provider with approximately as many applications as users,
and intend to operate for more than a couple years, you'll need to 
think about evolution.

IMO, the real value of open standards is that they eliminate lock-in.
ethernet+IP is a great example - it's one of the finest examples of wide 
interoperability, and the world would be a sad, pathetic place without it.
Fibrechannel, on the other hand, is basically a failure, though one which
still has adherents, and even some sensible use-cases.  it's violently 
non-interop, and has always been part of vendors' lock-in strategy.

I'm sure you can imagine other examples, perhaps involving MSFT or DRM.

this open-vs-lockin war is big, diffuse, obscure and confusing, and deserves
much more media play than it gets, since it's determining the shape of our 
computational future.

>   Any expert can clarify the confusion?  How can we independently (not being
> tangled by specific vendor) expand and keep evolving the cluster but without
> sacrificing the computing efficiency? Or, I just worry too much? Normally, if we
> mix different hardwares together (like traditional beowulf cluster), how bad it
> could be to affect the specifically designed cluster?

it's not about "specific design", it's about conforming to open standards.
the world has gotten a lot better in recent years, in the sense that you can 
easily buy 1U dual-socket nodes from any number of vendors (even most large
ones) which will correctly PXE, IPMI, etc.

a Linux god recently stated that "Linux is evolution, not intelligent design".
I think he's right, and I'm always cheered when the darwinian ecosystem of
the market is working well enough to quash vendor locking attempts.  remember
who said "embrace and extend"?  it doesn't sound very threatening until you
finish the thought: "... and lock-in"!

practically, you _can_ require open-standard-conformance when you buy
hardware.  in fact, the act of doing so is what has improved the world.
it's entirely possible to keep adding hardware to a cluster, and maintain
its integrity.  every bit of non-standard stuff (power management based on 
daisy-chained serial, etc) makes it harder, though.  vendors don't like 
this approach because it means they have to compete based on lockin through
sustained excellence.  we should expect it of them, though: better
reliability, service, price, performance, power dissipation, even appearance.

More information about the Beowulf mailing list