[Beowulf] Big storage
Chris Samuel
csamuel at vpac.org
Sun Sep 16 00:31:58 PDT 2007
On Sunday 16 September 2007 04:48:56 Greg Lindahl wrote:
> Several people have commented that fsprobe doesn't check existing files.
> For your system binaries, you can test them using rpm -V.
One "interesting" problem I've experienced on a non-HPC server is a latent
memory error corrupting files being read from the disk. This showed up as
occasional random SEGV's, bus errors, illegal instructions, etc and MD5
checksum errors when checked with dlocate (this is a Debian box).
Removing the dodgy DIMM meant everything went back to normal, the pre-existing
files weren't corrupt on disk and fortunately the server wasn't in production
use yet (it was being repurposed from minor duties, so the issues hadn't
surfaced before).
cheers,
Chris
--
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partnership for Advanced Computing
P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency
More information about the Beowulf
mailing list