[Beowulf] commercial clusters

Buccaneer for Hire. buccaneer at rocketmail.com
Sat Sep 30 15:50:51 PDT 2006


Our  traditional software has a methodology for striping across a number of nodes and we have done that for years.  The new software is different-they will build a 10-12GB sparse file, for instance, and as each of these nodes finish they will update the information in their portion of the file.  the problem is the head node is bragging about doing close to 200MB/sec over NFS while the EMC is telling us it's pushing 25MB/sec.

So if I choose which GFS to test, to my way of thinking, it will need the ability to write to the same file across multiple head nodes.

----- Original Message ----
From: Stu Midgley <sdm900 at gmail.com>
To: Buccaneer for Hire. <buccaneer at rocketmail.com>
Cc: Beowulf List <beowulf at beowulf.org>
Sent: Friday, September 29, 2006 10:10:29 PM
Subject: Re: [Beowulf] commercial clusters

hmmm...  200 nodes writing to the same file.  That is a hard problem.
In all my testing of global FS's I haven't found one that is capable
of doing this while delivering good performance.  One might think that
that MPI-IO would deliver performance while writing to the same file
(on something like lustre) but in my experience, MPI-IO is more about
functionality not performance.

In any code that I write that needs lot of bandwidth, I always write
an n-m io routine.  That is, your n processor task can read the
previous m checkpoint-chunks (produced from an earlier m processor
job).  Then, when writing out the checkpoint or output file, you get
each process to open its own individual file and dump its data to it.
This gives you maximum bandwidth and stops meta-data thrashing on your
cluster FS.  It is also quite easy to write single-cpu tools which
concatenate the files together...

Alternatively, you can write a simple client-side FUSE file system
which sort of joins multiple NFS mounts together into a single FS.  In
this way, you can stripe your IO over multiple NFS mounts...  very
similar to the cluster file system that was present in the
Digital/Compaq SC machines.  In this fashon, your file in the FUSE FS
looks consistent and coherent while in the underlying nfs directories
you see your file split up into bits (file.1 file.2 file.3 file.4 etc
for a 4 nfs mount system).  A simple way to get your bandwidth up
(especially if your nfs mounts are coming in over different gig-e
nics) but still gives REALLY crap bandwidth when trying to have
multiple threads writing to the same file...

Try Lustre :)










More information about the Beowulf mailing list