[Beowulf] Anybody using Redhat HPC Solution in their Beowulf

Hearns, John john.hearns at mclaren.com
Tue Oct 26 01:16:47 PDT 2010


> I don't think you could find a statement more orthogonal to the spirit
> of the Beowulf list than, "Please, please don't "roll your own"
> system..."  Isn't Beowulfery about the drawing together of inexpensive
> components in an intelligent fashion suited just for your particular
> application while using standardized (and thereby cheap by the law of
> scale) hardware?  I'm not suggesting Richard build his own NIC - but
> there is nothing wrong with using even a distribution of Linux not
> intended for HPC (so long as you're smart about it) and picking and
> choosing the software (queuing managers, tracers, etc) he finds works
> best.
> 
> Also, I would argue if a company is selling you an HPC solution, it's
> either:
> 1. A true Beowulf in terms of using COTS hardware, in which case you
> are
> likely getting less than your money is worth or


Ellis, I am going to politely disagree with you - now there's a
surprise!

I have worked as an engineer for two HPC companies - Clustervision and
Streamline.
My slogan phrase on this issue is "Any fool can go down PC World and buy
a bunch of PCs"
By that I mean that CPU is cheap these days, but all you will get is a
bunch of boxes
on your loading bay. As you say, and you are right, you then have the
option of installing
Linux plus a cluster management stack and getting a cluster up and
running.

However, as regards price, I would say that actually you will be paying
very, very little premium
for getting a supported, tested and pre-assembled cluster from a vendor.
Academic margins are razor thin - the companies are not growing fat over
academic deals.
They also can get special pricing from Intel/AMD if the project can be
justified - probably ending
up at a price per box near to what you pay at PC World.

Or take (say) rack top switches. Do you want to have a situation where
the company which supports your cluster
has switches sitting on a shelf, so when a switch fails someone (me!) is
sent out the next morning to deliver
a new switch in a box, cable it in and get you running?
Or do you want to deal direct with the returns department at $switch
vendor, or even (shudder) take the route
of using the same switches as the campus network - so you don't get to
choose on the basis of performance or
suitability, but just depend on the warm and fuzzies your campus IT
people have.


We then come to support - say you buy that heap of boxes from a Tier 1 -
say it is the same company your
campus IT folks have a campus wide deal with. You'll get the same type
of support you get for general
servers running Windows - and you'll deal with first line support staff
on the phone every time.
Me, I've been there, seen there, done it with tier 1 support like that.
As a for instance, HPC workloads tend to stress the RAM in a system, and
you get frequent ECC errors on 
a young system as it is bedding in. Try phoning support every time a
light comes on, and get talked through
the "have you run XXX diagnostic", it soon gets wearing.
Before Tier 1 companies cry foul, of course both the above companies and
all other cluster companies integrate
Tier 1 servers - but that is a different scenario from getting boxes
delivered through your campus agreement with
$Tier1.












The contents of this email are confidential and for the exclusive use of the intended recipient.  If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy.




More information about the Beowulf mailing list