[Beowulf] Big storage

Chris Samuel csamuel at vpac.org
Sun Sep 16 00:31:58 PDT 2007


On Sunday 16 September 2007 04:48:56 Greg Lindahl wrote:

> Several people have commented that fsprobe doesn't check existing files.
> For your system binaries, you can test them using rpm -V.

One "interesting" problem I've experienced on a non-HPC server is a latent 
memory error corrupting files being read from the disk.   This showed up as 
occasional random SEGV's, bus errors, illegal instructions, etc and MD5 
checksum errors when checked with dlocate (this is a Debian box).

Removing the dodgy DIMM meant everything went back to normal, the pre-existing 
files weren't corrupt on disk and fortunately the server wasn't in production 
use yet (it was being repurposed from minor duties, so the issues hadn't 
surfaced before).

cheers,
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency



More information about the Beowulf mailing list