[Beowulf] Repeated Dell SC1435 crash / hang. How to get the vendor to resolve the issue when 20% of the servers fail in first year?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Greg Lindahl lindahl at pbm.comMon Apr 6 01:08:36 PDT 2009
- Previous message: [Beowulf] Repeated Dell SC1435 crash / hang. How to get the vendor to resolve the issue when 20% of the servers fail in first year?
- Next message: [Beowulf] Repeated Dell SC1435 crash / hang. How to get the vendor to resolve the issue when 20% of the servers fail in first year?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, Apr 06, 2009 at 02:54:23AM -0500, Rahul Nabar wrote: > Eventually they send me swaps for the Motherboard and CPU. No > go. Still hangs at random. Unfortunately there is no magic bullet. I have seen bad batches of power supplies cause a problem like this. In another case, an integrator-supplied Linux kernel built with some unfortunate debugging options turned on was causing all the hangs. >From your symptoms, the power supply seems to be the next thing to suspect. From your switch of distros, it's probably not a particular bad Linux kernel. You have a few completely new machines that don't hang; move the known good power supplies to other nodes with suspect mobos and cpus. > I haven't really pored at all the legalese in our contracts but is > there a "lemon-law" analog for computers? If 20% of the machines are > bad in the first one year do you think I can press for a better > resolution from Dell? Your university's boilerplate T's & C's probably have some text that says something like "the stuff you sell us has to work, even if the way it fails isn't something explicitly discussed in the contract." But, after an entire year, it will be hard to do anything. You lost leverage when you paid Dell. It's more likely that Dell will convince your University purchasing people that you are an idiot than the reverse. > Just wanting to hear more about how I can best resolve this issue. For > our future purchases would changing vendors help? Not really. I don't think there's any global trend among vendors; you find people with horror stories all over. Have I ever told the story of the mobo with the exploding caps? 1/1000 chance of blowing up each time it was power cycled. Kinda obvious in a 1000 node cluster... how it slipped through the mobo vendor's QA ? ... -- greg
- Previous message: [Beowulf] Repeated Dell SC1435 crash / hang. How to get the vendor to resolve the issue when 20% of the servers fail in first year?
- Next message: [Beowulf] Repeated Dell SC1435 crash / hang. How to get the vendor to resolve the issue when 20% of the servers fail in first year?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
