[Beowulf] Use and stabilily of Lustre in cluster context ...

Guy Coates gmpc at sanger.ac.uk
Mon Dec 19 08:49:17 PST 2005


On Mon, 19 Dec 2005, Richard Walsh wrote:

> with large numbers of small files, applications which have bursts of
> intense file creation
> activity like swarms of DNA sequence analysis work).  These problems are
> intermittent
> and the idea that it is solely a Lustre problem is not certain.
>
> We are running version 1.2 on a older 2.4.21 kernel ... perhaps this is
> problem.

Therea are a couple of nasty bugs in the Lustre < v1.4; one of which we
could trigger with small file intensive workloads. The other was a bug in
mmap, which caused binaries located on a lustre partition to bus-error at
random intervals.

Both those bugs went away for us when we went from v1.2 -> v1.4

(disclaimer; we actually run the HP packaged version of lustre, but as far
as we are aware  both bugs existed in the CFS and HP codestreams).

Or experience with 1.4 has been very positive. We haven't kill the
filesystem, despite our best efforts. Our main problem is getting enough
network bandwidth between the clients and OSTs; we keep filling it up.

Cheers,

Guy

-- 
Dr. Guy Coates,  Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
Tel: +44 (0)1223 834244 x 6925
Fax: +44 (0)1223 494919






More information about the Beowulf mailing list