[Beowulf] Tyan 2466 crashes, no obvious reason why
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joshua Baker-LePain jlb17 at duke.eduFri Sep 3 12:01:30 PDT 2004
- Previous message: [Beowulf] Tyan 2466 crashes, no obvious reason why
- Next message: [Beowulf] Tyan 2466 crashes, no obvious reason why
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, 3 Sep 2004 at 11:31am, David Mathog wrote > One of 20 identical nodes containing > > Tyan 2466 > Single Athlon MP 2200+ > 1GB ECC memory > > is starting to flake out. > > For no apparent reason it just drops dead (as far as > linux is concerned) after a few minutes to a few days. > At that point the network is down, the serial lines are down, > and near as I can tell the OS just blew up. There is zip, > nothing, nada in the log files to indicate a problem. I haven't put the time into this yet that you have, so this is more of a "me too" than anything else. But, FWIW, I have 12 similar nodes, and have had several of them start doing this. When the first one died, I swapped its RAM with a "known good" node, and that worked for a while (that is, the problem followed the RAM, so I thought I'd found the culprit). But, eventually, it started happening to the original node again. And then another. And then... One thing I'd say is that 10 min worth of memtest86 (or, better yet, memtest86+) is not enough. Run it over the weekend and see if it catches anything. Good luck. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
- Previous message: [Beowulf] Tyan 2466 crashes, no obvious reason why
- Next message: [Beowulf] Tyan 2466 crashes, no obvious reason why
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
