[Beowulf] Repeated Dell SC1435 crash / hang. How to get the vendor to resolve the issue when 20% of the servers fail in first year?

Rahul Nabar rpnabar at gmail.com
Mon Apr 6 12:06:06 PDT 2009


On Mon, Apr 6, 2009 at 12:37 PM, Mark Hahn <hahn at mcmaster.ca> wrote:
> what's the term of the warranty?

Thanks Mark. All I have from my quote is the pretty generic "Dell
Hardware Warranty Plus Onsite Service Initial Year (985-5117)"

I'll have to dig deeper to find the full legal terms and conditions of
the warrenty.

> buying an extended warranty might help.  buying a shrink-wrapped cluster
> might help too.
>
We do have an extended warranty in place.
############################################
Basic Enterprise Support: Business Hrs 5X10 Next Business Day Onsite
Service Post ProblemDiagnosis 2 Year Extended (980-2932)
#################################################

Actually, in hindsight, I'm not sure if this means "extended warranty"
or just "extended support".

>
> well, I think it's worth asking whether you're sure your power feed
> is in good shape.

I think it is. THis is a dedicated cluster room with conditioned
power. It has many more machines than ours and I haven't heard any
other complaints. Of course, I have to admit I have no power quality
analysis that I have run myself. Besides, it's only some of these
machines that crash. A power line problem ought to have manifested in
a correlated manner across machines I feel.

>
> IMO, no.  not without some indication that the fault is well reproducable
> and actually fault is theirs...
>

Yes, but that does load the die against me. It is fairly difficult for
me to identify which one of the numerous hardware components is
causing the problem. And blaming problems on the OS or apps is easy. I
feel the onus of proof should not be on me especially for symptoms
trapped at the baseboard-controller level and flagging a clear error.
The machine does not even reboot from the front button at that point.
To me this strongly indicates a hardware fault. But I might be wrong.

> my organization has been an HP shop, more or less, since inception in 2001,
> for reasons I won't go into.  I believe they've done well by us - I could
> criticize prices, some hardware design issues, etc, but they're quite
> responsible and responsive to problems.

Thanks! Maybe we will give HP a shot too the next time.

-- 
Rahul




More information about the Beowulf mailing list