[Beowulf] Re: Cooling vs HW replacement
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David Mathog mathog at mendel.bio.caltech.eduFri Jan 21 11:09:41 PST 2005
- Previous message: [Beowulf] Cell Architecture Explained
- Next message: [Beowulf] Re: Cooling vs HW replacement
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> > > or "Server" grade disks still cost a lot more than that. For > > this is a very traditional, glass-house outlook. it's the same one > that justifies a "server" at $50K being qualitatively different > from a commodity 1U dual at $5K. there's no question that there > are differences - the only question is whether the price justifies > those differences. The MTBF rates quoted by the manufacturers are one indicator of disk reliability, but from a practical point of view the number of years of warranty coverage on the disk is a more useful metric. The manufacturer has an incentive to be sure that those disks with a 5 year warranty really will last 5 years. Unclear to me what their incentive is to support the MTBF rates since only a sustained and careful testing regimen over many, many disks could challenge the manufacturer's figures. And who would run such an analysis??? Buy the 5 year disk and you'll have a working disk, or a replacement for it, for 5 years. In some uses it would clearly be cheaper to use (S)ATA disks and replace them as they fail, so long as they don't fail 4x faster than the Cheetahs. Google around for "disk reliability" though and you'll find some real horror stories about disk failure rates in, for instance, SCSI -> ATA RAID arrays. > > the real question is whether "server" disks make sense in your application. > what are the advantages? > > 1. longer warranty - 5yrs vs typical 3ys for commodity disks. > this rule is currently being broken by Seagate. the main caveat > is whether you will want that disk (and/or server) in 3-5 years. Generally yes, we do want that disk to still be working at 5 years. Cannot predict whether or not the hardware will have been replaced before then. > > 2. higher reliability - typically 1.2-1.4M hours, and usually > specified under higher load. this is a very fuzzy area, since > commodity disks often quote 1Mhr under "lower" load. Exactly. It's very, very hard to figure out just how much reliability one is trading for the lower price. Anecdotally, for heavy disk usage, it's apparently a lot. Anecdotally, for low disk usuage, ATA disks aren't all that reliable either. > > 3. very narrow recording band, higher RPM, lower track density. > these are all features that optimize for low and relatively > consistent seek performance. in fact, the highest RPM disks actually > *don't* have the highest sustained bandwidth - "consumer" disks are > lower RPM, but have higher recording density and bandwidth. Right. On the other hand, anecdotal evidence suggests that an application like, for instance, a busy Oracle database running on top of RAID - ATA storage will result in a very high rate of disk failure, whereas the equivalent RAID - SCSI/FC Cheetah solution will not suffer an equivalent disk failure rate. Again, from Google results, not personal experience. Well, not much personal experience, we do have a 4 disk FC Raid in one Sun server and have not lost a disk yet (coming up on 2 years). My personal experience with ATA disks in servers has been limited. A smallish Solaris server configured with "cutting edge, large capacity" ATA disks failed an IBM and the replacement Western Digital in 1 month each. Backing way off on the capacity and going to older 40Gb IBM ATA disks did the trick, with no further disk failures in 3 years. > > 4. SCSI or FC. always has been and apparently always will be > significantly more expensive infrastructure than PATA was > or SATA is. Agreed. I'd be perfectly happy to buy SATA or PATA disks _IF_ they were as reliable as the more expensive SCSI or FC disks. It would help a lot to have some objective measure of that. When Seagate starts selling 5 year SATA disks I'll consider buying them. > > so really, you have to work to imagine the application that > perfectly suits a "server" disk. for instance, you can > obtain whatever level of reliability > you want from raid, rather than ultra-premium-spec disks. In theory. In practice local experience (another lab) was that the RAID - ATA solution failed, twice, and was unable to rebuild from what was left, with all data lost. Maybe that was the controller or just a really bad set of disks. I wasn't there to witness the teeth gnashing and finger pointing. This wasn't a tier one storage vendor (Sun, EMC, HP, etc.) so they saved some money. Or did they??? There's also a school of thought that RAID arrays should be "disk scrubbed" frequently (all blocks on all disks read) to force hardware failures and block remapping to occur early enough so that the redundant information present in the array can rebuild from what's left. As opposed to a worst case where the data is written once, not touched for a year, and then fails unrecoverably when a read hits multiple bad blocks. > is your data > access pattern really one which requires a disk optimized for seeks? On the beowulf not so much. Most of the workload has been configured so that the compute nodes have their data cached in memory and only read the disks hard when booting up and the first time they read their databases. On the Sun Oracle server, much more so. > > under what circumstances will you have a 100% duty cycle? Probably never? But where in between 100% and 0% is the cutover point where increased disk failure rate costs just equal the savings from using cheaper disks? > > in summary: there is a place for super-premium disks, but it's just plain > silly to assume that if you have a server, it therefore needs SCSI/FC. > you need to look at your workload, and design the disk system based on > that, using raid for sure, and probably most of your space on 5-10x > cheaper SATA-based storage. I'd be a lot more comfortable buying the cheaper disks if there was some objective measure for an accurate prediction of their actual longevity. I tend to look at it from the other direction. A disk failure on the head node is a much bigger deal than a disk failure on the compute nodes. Also the number of disks involved is likely to be less for the former than the latter. That is, one might have 10 disks in a RAID on the head node but 70 ATA disks out on the compute nodes. So it might cost a couple of thousand more to use the most reliable disks available on the head node, but it's most likely worth it to avoid having to replace those critical components. Conversely, the number of compute nodes isn't usually critical so there's not as much reason to pay for more expensive disks there. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
- Previous message: [Beowulf] Cell Architecture Explained
- Next message: [Beowulf] Re: Cooling vs HW replacement
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
