[Beowulf] Not quite Walmart, or, living without ECC?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mark Hahn hahn at mcmaster.caFri Nov 16 13:56:16 PST 2007
- Previous message: [Beowulf] Not quite Walmart, or, living without ECC?
- Next message: [Beowulf] Not quite Walmart, or, living without ECC?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> I just asked the local NT goon, "do you use ECC for the servers?" and > he answered, "you have to". What he considers a server-class mobo > requires ECC whether you need ECC depends on many things. first, how much memory your machine has - my experience is that most generic servers (web, file, mail, etc) don't have much - maybe a few GB. the chance of needing ECC also depends on how "hard" you use the ram (again, mundane servers are pretty lightly utilized.) as well as factors like altitude, ram quality, and the ever popular "how important is your data". for clusters, I would say that ECC is basically a necessity, unless all the jobs can be run in a "checking" mode (ie, perform a search or optimization, then verify the results in case the hit was due to a bit flip.) that said, ECC events are not all that common. I have a 768-node cluster here, each node dual-socket opteron with 8GB PC3200 ddr. I just checked all nodes with mcelog, and 35 have reported corrected events over roughly the last 20 days. one may have hit an uncorrectable event (but in our clusters, corrected ECC rate is not a good predictor for uncorrectable ones...) > and he added that the tendency is now to FB-DIMM (fully > buffered, http://en.wikipedia.org/wiki/FBDIMM). This suggests to me > that next year(s) commodity mobos will be ECC. nah on both counts. I don't think anyone would claim that FBD is tearing up the market - you can reasonably argue that it was a stopgap to let Intel increase the memory capacity of chipsets whose MCH had inadequate fan-out. FBD is not a dumb idea, just not necessarily valuable enough to win. - the extra AMB has been a heat problem in the past and is, no matter how improved, still extra cost and space. - the design trades off latency and expandability. FBDIMMS were designed with 8 dimms/channel and up to 6 channels. that's pretty huge capacity - afaik, 4ch is the max implemented and then with 2-4/ch. presumably to avoid taking too bad of a latency hit, since FBD's are daisy-chained, and even one of them is slower than an AMB-less dimm... - FBD would be more attractive if dram chips themselves were not increasing in capacity (like cpus and disks - all area-based, and thus following moore's law). - attaching memory directly to cpus has the nice property of scaling with server "size". AMD led Intel to this realization ;) - it's unclear to me whether more cores onchip will lead to a push for more memory capacity per system. then again, I don't think the world is crying out for 8-core chips, either. I suspect that FBD will have only a little more market/history footprint than RDRAM did.
- Previous message: [Beowulf] Not quite Walmart, or, living without ECC?
- Next message: [Beowulf] Not quite Walmart, or, living without ECC?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
