[Beowulf] cheap PCs this christmas
Mark Hahn
hahn at physics.mcmaster.ca
Wed Nov 23 07:02:29 PST 2005
> I work with systems for which, literally, "failure is not an
> option" (actually, we call that criticality=1.. loss of life or
so what probability value (fit, mtbf, etc) do you assign to this?
"cannot fail" is not an option for this reality, of course...
> But how many of those corruptions would have resulted in an error had they
> not been caught?
unclear, but something I've always wondered about. it's easy to imagine
bit-flips that would be made permanent (files being written out, etc).
but it's also easy to imagine flips that would have no effect (a flip
in a part of the data you've already processed, for instance). as well
as flips which would fail nicely (segv, etc).
> Is the cache in your processor ECC? What's the impact on your performance
> of cache hit/miss vis a vis ECC and/or bit flips.
yes, commodity processors do ECC on cache and datapaths these days.
it's unclear how many applications could notice the extra cycle of
dram latency due to ECC - something pointer-oriented like GCC probably
would show a small effect. lots of codes are pretty cache-friendly, though,
or else bandwidth-intensive enough not to notice a cycle of latency.
More information about the Beowulf
mailing list