[Beowulf] High Performance for Large Database

Mon Nov 15 08:51:52 PST 2004

>
> I would still prefer the model of PVFS1/2 and Lustre where the data is
> distributed amongst the compute nodes
>
Lustre data isn't distributed on compute nodes; the data sits on dedicated
nodes called OSTs. You can't mount the lustre filesystem back onto nodes
which are OSTs as you hit all sorts of race conditions in the vfs layer.

One filesystem to look at is GPFS from IBM. You can run it in a direct-SAN
attached mode or you can run where the storage is distributed to local
disk on the compute nodes. We run both configurations on our cluster.

GPFS will also do "behind the scenes replication", so you can tolerate up
to two node failures per node group and still have a complete filesystem.

Cheers,

Guy Coates

-- 
Dr. Guy Coates,  Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
Tel: +44 (0)1223 834244 ex 7199