[Beowulf] Tyan 2466 crashes, no obvious reason why
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David Mathog mathog at mendel.bio.caltech.eduTue Oct 12 11:38:10 PDT 2004
- Previous message: [Beowulf] Re: HPC in Windows
- Next message: [Beowulf] choosing a high-speed interconnect
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Just thought I'd share the final outcome of this. After much swapping around of components and days of running memtest86 the problem was moving with the power supply. Swapping in the spare PS fixed it and that node has not so much as hiccupped in the month since. Note in particular that all of the voltages seen by the motherboard were always in range. My working hypothesis is that the PS either passes too much noise or just glitches occasionally (for instance, an intermittant internal short). The PS was a Zippy power supply with a power cord that attached via spades to the socket at the back of the 2U case. model AX2-5300FB-2S P/N 6AX2-300B055 ser no: T21905564M1A977732 Big EMACS loggy, tiny www.zippy.com.tw down at the bottom. It was still under Zippy's warranty and the good folks at PSSC handled the exhange promptly. A day (!) after the replacement unit came in a second node started doing the exact same thing - unexplained crashes and lock ups with nothing in the log file. Logging lm_sensors every 2 minutes showed nothing untoward up through the last entry. Crashes were every few hours. This time I just swapped the PS first thing and it has been ok now for over 4 days. Same type of power supply inside, this one with Serial No. T21905562M1A977732, which differs by only one digit from the first one that failed. Could be a coincidence but I'm beginning to suspect that there may be a bad component in this lot of power supplies, in which case an unpleasant series of node failures can probably be expected in the not too distant future. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
- Previous message: [Beowulf] Re: HPC in Windows
- Next message: [Beowulf] choosing a high-speed interconnect
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
