[Beowulf] Keeping the Athlon MP cluster limping along

David Mathog mathog at mendel.bio.caltech.edu
Wed Dec 8 14:25:55 PST 2004

It's official, the Tyan S2466 nodes get  "biggest PITA award"
for systems that I've used. The two nodes that were crashing
frequently had their power supplies  replaced and then they
were stable for a couple of months.  Now they've both become
unstable again.  

Evil motherboard juju eating power supplies? Who knows?
Not that I can make them crash at will, oh no, that
would be too easy.  

cpuburn (20 minutes) doesn't even make them hiccup
They run memtest86+ 24 hours without a glitch.  The problem
never moved with memory anyway.
but leave them running linux, doing not much of anything, and
you never know when they're going to come down.
Sometimes there's an oops, sometimes not.  When there
is an oops it can be in any piece of software.

Today I upgraded the BIOS to 4.06 and, naturally, it didn't
fix any of the many little annoyances the S2466N
produces, ie, "who me boot?".  I don't seriously expect it
to fix the unstability.

So rather than keep trying to fix these monsters I'm starting
to think about the cheapest way to keep the cluster running by
replacing just the mobo/CPU with something else (as I'm not
expecting enough $$$ anytime soon to do more, and obtaining
Athlon MPs and S2466N mobos now is problematical anyway.)
I'll happily give up Tyan's serial line bios access for a system
where I don't have to employ that feature quite so often!

The S2466N is an ATX form factor, each one has one Athlon MP
2200+ and 1 Gb of 2100 DDR RAM, a 40G ATA disk, a floppy
and a little PCI graphics card in a 2U case.  If I could
find a nice mobo/CPU combo for, oh, <$200 that could
replace the S2466N and Athlon MP, and still do ECC, then I'd
probably go that route to patch systems up as they break.
Best if it has at least as much cache as the MP though.
Is there anything out there fitting
that description?  Historically ECC support isn't something
that shows up on cheap mobos but maybe on some low end
Athlon 64 variant?


David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech

