[Beowulf] Repeated Dell SC1435 crash / hang. How to get the vendor to resolve the issue when 20% of the servers fail in first year?

Jan Heichler jan.heichler at gmx.net
Mon Apr 6 12:00:36 PDT 2009


Hallo Prentice,

Montag, 6. April 2009, meintest Du:

PB> Mark Hahn wrote:
>> buying an extended warranty might help.  buying a shrink-wrapped cluster
>> might help too.

PB> Not really. My cluster was a "shrink-wrapped" cluster from Dell. Turns
PB> out Dell hired someone from a 3rd-party to actually turn on the cluster
PB> (for the first time) and install all the software (nothing more than a
PB> vanilla ROCKS installed, without even a queuing system!) *after* the
PB> cluster arrived at our site.

What was in the "Statement of Work"? I learnt: if you don't specify
everything you need/want then some company will offer without
that/these features - even if it is a crucial one for the solution.

PB> An arrangement like this just muddies the situation even further. If I
PB> had a software problem, do I call cluster, or the 3rd-party hired to
PB> install the software?

What does your "Statement of Work" say about this?

Contractors can be "first point of contact" for the customer. So you
always call them - they tell you when you have to call Dell (or call
Dell for you if it is covered by the contract).

PB> I think you mean "buy a shrink-wrapped cluster from a well-respected,
PB> cluster-specific vendor that has proven in-house cluster expertise"

Right! Go for the specialists. There are some "hardware independent"
companys. They use whatever Hardware you like. As a customer (of a
certain size) you can even make the big ones work with a small
specialised company. The big guys just care about the number of
servers they sell... whatever makes that happen is okay...


Jan




More information about the Beowulf mailing list