[Beowulf] A couple of interesting comments

Bill Broadley bill at cse.ucdavis.edu
Fri Jun 6 10:00:29 PDT 2008


Gerry Creager wrote:
> 1.  We specified "No OS" in the purchase so that we could install CentOS 
> as our base.  We got a set of systems with a stub OS, and an EULA for 
> the diagnostics embedded on the disk.  After clicking thru the EULA, it 
> tells us we have no OS on the disk, but does not fail to PXE.

If  you want to avoid hooking up a KVM to each node and rebooting it once or 
twice I'd suggest putting "Nodes must PXE boot by default" in your specifications.

> 2.  BIOS had a couple of interesting defaults, including warn on 
> keyboard error (Keyboard?  Not intentionally.  This is a compute node, 
> and should never require a keyboard.  Ever.)  We also find the BIOS is 
> set to boot from hard disk THEN PXE. But due to item 1, above, we never 
> can fail over to PXE unless we load up a keyboard and monitor, and hit 
> F12 to drop to PXE.

Very strange standard for a server, let alone a cluster node.

> In discussions with our sales rep, I'm told that we'd have had to pay 
> extra to get a real bare hard disk, and that, for a fee, they'd have 
> been willing to custom-configure the BIOS. OK, with the BIOS this isn't 
> too unreasonable: They have a standard BIOS for all systems and if you 
> want something special, paying for it's the norm...  But, still, this is 
> a CLUSTER installation we were quoted, not a desktop.

This whole thing sounds strangely like the vendor has already been picked. 
Certainly changing any default in the pipeline can cost money, even deleting a 
floppy, cd/dvd etc can cost money if the machine ships to the integration 
center with it installed.  With that said when someone charges an unreasonable 
amount for said customizations they lose the bid and someone else wins.

> Also, I'm now told that "almost every customer" ordered their cluster 
> configuration service at several kilobucks per rack.  Since the team I'm

Not sure of the relevance here.  Sounds like the upsell and padding that
sales folks love, it is there job to sell equipment preferably high margin at 
that.  Seems way high for a BIOS reset, less so if it includes a cabling 
harness for power, console, rails premounted, and network.  Again if it's a 
bid process....

> working with has some degree of experience in configuring and installing 
> hardware and software on computational clusters, now measured in at 
> least 10 separate cluster installations, this seemed like an unnecessary 
> expense.  However, we're finding vendor gotchas that are annoying at the 
> least, and sometimes cause significant work-around time/effort.

Well there's two choice, either deal with the gotchas, or make them part of 
the specifications.  All vendors have their differences, defaults, and cost 
structures.  Do you want a cluster that could conceivable allow users to
start submitting jobs within a day?  Or do you want to play BIOS games, 
testing, and integration that might take a week or two.  Every time I order a 
cluster (well over 10 now) I get vendor queries of the "Sounds like X might 
mean you need Y which costs $Z".  I'm always very clear, it's in the spec, and 
not meeting the spec will mean the bid isn't considered.  Definitely seems 
like some high margin items end up included... without the margin.

> Finally, our sales guy yesterday was somewhat baffled as to why we'd 
> ordered without OS, and further why we were using Linux over Windows for 
> HPC.

Heh, some sales folks seem to have a right to exert design pressure on cluster 
design, not sure why your even entertaining that one.  If you want to be
particularly friendly I'd just point at top500.org and that linux is the 
standard and not the exception for beowulf clusters.

 > No, I won't identify the vendor.

How about the number of letters in their name ;-).  In general I find that the 
big vendors build in large profits (I.e. negotiating down to 50% of list price 
is not unusual) and often the preferred cluster defaults often mean higher 
costs instead of less, despite the typically higher volume purchases, 
identical compute nodes, don't need a dvd, don't need an OS, don't (typically) 
need a redundant power supply for compute nodes, etc.  The smaller cluster 
specific shops default (usually) to mostly reasonable cluster configurations, 
and seem to default to smaller margins.  In my experience, writing a spec that 
welcomes both ends up with the best deals.  Even something trivial like 
specifying 14 or 15 disks in a array (often the max for an external array) 
instead of 16 (common for direct attached) can be the different to allow
a competitive bid from a big vendor.  Sometimes Intel or AMD intercedes to get 
a design win and sometimes a big vendor decides to get more competitive.

Of course these specifications directly effect costs and lead to endless 
discussions on this list.  KVM over IP?  Serial console?  Any console access 
at all? IMPI or just switched PDUs?  But in my experience things like "must 
boot from PXE" is not a big deal, and not worth several kilobucks.





More information about the Beowulf mailing list