[Beowulf] longevity of 1U servers?

Gus Correa gus at ldeo.columbia.edu
Mon Feb 9 10:52:58 PST 2009


David Mathog wrote:
> The market seems to have gone over pretty much completely to 1U compute
> nodes, with 2U or higher reserved for head nodes or other storage.  The
> other trend is towards constant upgrading on relatively short time
> scales, with nodes being replaced every couple of years.   Which makes
> me wonder what sort of longevity one would expect for 1U servers made in
> the last few years.
> 
> Some of us have to keep our hardware running a very long time, and I'm
> concerned that a lot of the 1U hardware isn't going to hold up in the
> long haul.  I have some 2U nodes which are still cranking along in their
> 7th year.  (Yes, it is well past time to replace them.)  In that
> interval around 5 80mm fans (case or power supply, of 80, these were all
> rated at 50000 hours MTBF) have been replaced, most only in the last few
> years, and 1 disk out of 20 failed (in the 7th year).  

Our old cluster 2U nodes' are 7yr old also.
They are still alive and kicking, and producing science.
However, I can't report a level of component failure as low as
David reported.

Disk failure was higher:  12 out of 33 failed so far.
However, the original disks are IBM Deskstar 
(http://en.wikipedia.org/wiki/Deskstar_75GXP).
None of the replacement Maxtor disks has failed yet.

Other components also failed.
All original Tyan S2460 motherboards had
to be replaced by S2466, after a lot of yelling
at the now defunct vendor, with nodes going south for no reason.
After months of denial, the vendor told us the S2460
would not provide enough power to the dual Athlon MP processors,
and replaced the motherboards.

I had to replace three S2466 mobos later.
Another one seems to be flaky now, but there is little hope to
find a replacement at this point.

Two power supplies failed.

Bad optical transducers would make the Myrinet cards fail.
However, Myrinet took full responsibility
for the replacement of all cards
and the line cards on their switch,
which is a very good thing.



> My limited
> experience with (older) 1U nodes was that the shrieky little fans were
> failure prone and didn't move enough air to keep the innards of the 1U
> case as cool as a 2U case.  
> Heat is bad for longevity.
> 

Why so many retirees move to Florida, then? :)
A compromise between longevity and cost of maintenance perhaps?

> Realistically, how long should one expect current 1U servers to last?
> 

We have the same concern.

I would extend the question to 1U *twin* nodes,
and to blade servers.

> Thanks,
> 
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Thank you,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------



More information about the Beowulf mailing list