[Beowulf] A couple of interesting comments

Prentice Bisbal prentice at ias.edu
Tue Sep 23 12:51:58 PDT 2008


Gerry,

I wanted to let you know off-list that I'm going through the same
problems right now. I thought you'd like to know you're not alone.  We
purchased a cluster from the *allegedly* same vendor. The PXE boot and
keyboard errors were the least of our problems.

First, our cluster was delayed 2 months due to shortages of the network
hardware we specified. It was not the vendor standard for clustering,
but still a brand they resold.

When it did arrive, the doors were damaged by the inadequately equipped
delivery co.

When the technician arrived to finish setting up the cluster, he
discovered that the IB cables provided were too short to be within spec:
the bend radius would be too tight, and were too short to be supported
from above the connectors.

And, the final problem I'm going to mention: the fiber network cables to
connect our ethernet switches to each other (we have Ethernet and IB
networks in this cluster) were missing.

It's been over two weeks since our cluster arrived, and one week since
the technician noticed these shortages and reported them. Still haven't
had these problems rectified, and the technician will have to fly to our
site again in a couple weeks to complete the installation.

I'm writing an article about this experience for Doug to publish. I
haven't posted this to the mailing list b/c I'm not sure what my
management will be happy with me sharing (the article will be reviewed
by them before publishing).

--
Prentice


> We recently purchased a set of hardware for a cluster from a hardware 
> vendor.  We've encountered a couple of interesting issues with bringing 
> the thing up that I'd like to get group comments on.  Note that the RFP 
> and negotiations specified this system was for a cluster installation, 
> so there would be no misunderstanding...
> 
> 1.  We specified "No OS" in the purchase so that we could install CentOS 
> as our base.  We got a set of systems with a stub OS, and an EULA for 
> the diagnostics embedded on the disk.  After clicking thru the EULA, it 
> tells us we have no OS on the disk, but does not fail to PXE.
> 
> 2.  BIOS had a couple of interesting defaults, including warn on 
> keyboard error (Keyboard?  Not intentionally.  This is a compute node, 
> and should never require a keyboard.  Ever.)  We also find the BIOS is 
> set to boot from hard disk THEN PXE. But due to item 1, above, we never 
> can fail over to PXE unless we load up a keyboard and monitor, and hit 
> F12 to drop to PXE.
> 
> In discussions with our sales rep, I'm told that we'd have had to pay 
> extra to get a real bare hard disk, and that, for a fee, they'd have 
> been willing to custom-configure the BIOS. OK, with the BIOS this isn't 
> too unreasonable: They have a standard BIOS for all systems and if you 
> want something special, paying for it's the norm...  But, still, this is 
> a CLUSTER installation we were quoted, not a desktop.
> 
> Also, I'm now told that "almost every customer" ordered their cluster 
> configuration service at several kilobucks per rack.  Since the team I'm 
> working with has some degree of experience in configuring and installing 
> hardware and software on computational clusters, now measured in at 
> least 10 separate cluster installations, this seemed like an unnecessary 
> expense.  However, we're finding vendor gotchas that are annoying at the 
> least, and sometimes cause significant work-around time/effort.
> 
> Finally, our sales guy yesterday was somewhat baffled as to why we'd 
> ordered without OS, and further why we were using Linux over Windows for 
> HPC.  Not trying to revive the recent rant-fest about Windows HPC 
> capabilities, can anyone cite real HPC applications generally run on 
> significant clusters (I'll accept Cornell's work, although I remain 
> personally convinced that the bulk of their Windows HPC work has been 
> dedicated to maintaining grant funding rather than doing real work)?
> 
> No, I won't identify the vendor.
> -- 
> Gerry Creager -- gerry.creager at tamu.edu
> Texas Mesonet -- AATLT, Texas A&M University
> Cell: 979.229.5301 Office: 979.862.3982 FAX: 979.862.3983
> Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843

-- 
Prentice



More information about the Beowulf mailing list