[Beowulf] MPI-IO + nfs - alternatives?

Rob Latham robl at mcs.anl.gov
Tue Oct 19 07:25:37 PDT 2010


On Wed, Sep 29, 2010 at 05:24:13PM +0100, Robert Horton wrote:
> 1) Does anyone have any hints for improving the nfs performance under
> these circumstances? I've tried using jumbo frames, different
> filesystems, having the log device on an SSD and increasing the nfs
> block size to 1MB, none of which have any significant effect.

The challenge with MPI-IO and NFS is achieving correct performance.
NFS consistency semantics make it quite difficult, so we turn off the
attribute cache.  the MPI-IO library also locks around every I/O
operation in an attempt to flush the client cache.  Even those steps
do not always work. 

> 2) Are there any reasonable alternatives to nfs in this situation? The
> main possibilities seem to be:
> 
>  - PVFS or similar with a single IO server. Not sure what performance I
> should expect from this though, and it's a lot more complex than nfs.

You should expect pretty solid performance:  

First, your NFS server can go back to enabling caches for your other workloads

Second, the MPI-IO library is pretty well tuned to PVFS.  No
extraneous locks.

Third, if you decide one day you need a bit more performance, set up a
PVFS volume with more servers.  Presto-chango, clients will go faster
without any changes.

Fourth, is it really that much more complex?  I'm myopic on this
point, having worked with PVFS for a decade, but it's 90% userspace
with a small kernel module.  There's also a bunch of helpful people on
the mailing lists, which would be where we should take any further
PVFS discussions.

>  - Sharing a block device via iSCSI and using GFS, although this is also
> going to be somewhat complex and I can't find any evidence that MPI-IO
> will even work with GFS.

I haven't used GFS in a decade.  Back then, it only supported
entire-file locks, making parallel access difficult. GFS has had 
more fine-grained locking, so it might work well.   

If you thought standing up PVFS was complicated, wait until you check
out GFS (set up quorum, set up lock manager, set up kernel modules,
etc).  

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA



More information about the Beowulf mailing list