[Beowulf] integrating node disks into a cluster filesystem?

Alan Louis Scheinine alscheinine at tuffmail.us
Fri Sep 25 17:05:36 PDT 2009


I have done only a few experiments with parallel file systems but
I've run some benchmarks on each one I've encountered.

With regard to Joshua Baker-LePain's comment
> I played with PVFS1 a bit back in the day.  My impression at the time was
> they they were focused on MPI-IO, and the POSIX layer was a bit of an
 > afterthought -- access with "regular" tools (tar, cp, etc) was pretty slow.
 > I don't know what the situation is with PVFS2.

Of the file systems I tested, PVFS2 with Myrinet, but just 8 nodes, was one
of the best.  I have the impression that all file systems have bugs; so when
using a parallel file system that has not had a decade of development, you
should only use it for scratch space.  I was on the PVFS developer's mailing
list for many years, the unending reports of bugs is scarey.  My guess is
that other file systems have similar problems.  Filesystems have subtle
complexity.  From what little I read, you cannot have both POSIX and an
efficient parallel file system.  If you plan on using the cluster for jobs
that are not embarrassingly parallel, but really need parallelism, then it
would be a good idea to not have the filesystem on the compute nodes, in
order to avoid unbalanced computation -- for domain decomposition, just one
laggard subdomain can slowdown all the entire calculation.

> But there's a definite draw to a single global scratch space that
 > scales automatically with the cluster itself.

Using a parallel filesystem efficiently is difficult, for example, avoiding
hotspots.  I've read that for large parallel jobs the "hits" on each
storage node can be effectively random with collisions resulting in
inefficient use of the HDDs.  So for any parallel filesystem the
development of the program needs to use MPI-IO in a way that is
flexible enough to deal with the specifics of the filesystem: block size,
number of stripes and interconnection topology.

Alan

-- 

  Alan Scheinine
  200 Georgann Dr., Apt. E6
  Vicksburg, MS  39180

  Email: alscheinine at tuffmail.us
  Mobile phone: 225 288 4176



More information about the Beowulf mailing list