[Beowulf] Re: scratch File system for small cluster

David Mathog mathog at caltech.edu
Thu Sep 25 12:00:37 PDT 2008


Joe Landman <landman at scalableinformatics.com> wrote

> Glen Beane wrote:
> > I am considering adding a small parallel file system ~(5-10TB) my small
> > cluster (~32 2x dual core Opteron nodes) that is used mostly by a
handful of
> > regular users.  Currently the only storage accessible to all nodes
is home
> > directory space which is provided by the Lab's IT department (this
is a SAN
> > volume connected to the head node by 2x FC links, and NFS exported
to the
> > compute nodes). I don't have to "worry" about the IT provided SAN
space -
> > they back it up, provide redundant hardware, etc.  The parallel file
system
> > would be scratch space (and not backed up by IT).  We have a mix of home
> > grown apps doing a pretty wide range of things (some do a lot of
I/O, others
> > don't), and things like BLAST and BLAT.
> 
> Hi Glen:
> 
>    BLAST uses mmap'ed IO.  This has some interesting ... interactions 
> ... with parallel file systems.

Right, and it isn't just the mapping of the databases and input file. 
One must also be careful with how BLAST output is directed.  Sending it
all to the same NFS mounted file system as "node01.out", "node02.out",
etc. will do very unpleasant things to both your network and the file
server.  Far better to write those locally to /tmp/nodeXX.out, and then
take some care in moving them back to the central file system later, so
that the data transfer can proceed without interference.  

This doesn't mean you have to wait until the end of the run and send
each node's entire output file back at once.  It can be more efficient,
but more complicated, to write the output files on each node in
reasonable sized chunks and then interleave the transfer of those to the
central store with the ongoing run.  Whether this is worth the extra
effort depends mostly on the number of queries in the input file and
the verbosity of the output file.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech



More information about the Beowulf mailing list