[Beowulf] Cooling vs HW replacement
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Josip Loncaric josip at lanl.govTue Jan 18 08:30:03 PST 2005
- Previous message: [Beowulf] Cooling vs HW replacement
- Next message: [Beowulf] Cooling vs HW replacement
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
At my old job, we had the unfortunate experience of AC failing on the hottest days of the year. Despite providing plenty of circulating fresh 35-40 deg. C air, we lost hardware, mainly disks. In fact, we'd start losing hard drives (even high quality SCSI drives in our servers) any time the ambient temperature approached 30 deg. C. Based on this experience, I'd say that keeping the ambient temperature under about 25-27 deg. C is a good policy. As Robert has pointed out, the cost of lost productivity while the system is down for hard drive replacement and reconstruction, not to mention the manpower required, can make an unreliable system "AWESOMELY expensive." In fact, I'd recommend installing a temperature activated kill switch in any cluster computing room. Remember: dissipating 5-10 KW in a small enclosed space can overheat your expensive cluster within minutes of AC failure, certainly faster than your system administrator can respond to an alarm triggered on a Sunday at 2am. Even a forced shutdown (when ambient temperature exceeds about 30 deg. C for more than a few minutes) is cheaper to fix than replacing and rebuilding several failed hard drives. Sincerely, Josip
- Previous message: [Beowulf] Cooling vs HW replacement
- Next message: [Beowulf] Cooling vs HW replacement
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
