[Beowulf] cluster storage design
apseyed at bu.edu
apseyed at bu.edu
Wed Mar 23 15:11:41 PST 2005
I concur with David, when necessary running jobs on local compute node disk takes immense load off on a storage node / nfs file server. Here is some brief documentation and template on our website for using this method (/scratch can be /tmp):
http://www.bu.edu/dbin/sph/departments/biostatistics/linga_documentation.php#scratch
Cheers,
Patrice
>
> Joe Landman <landman at scalableinformatics.com> wrote:
> >
> > Brian Henerey wrote:
> >
> > > Hello all,
> > >
> > > I have a 32 node cluster with 1 master and 1 data storage server with
> > > 1.5 TBs of storage. The master used to have storage:/home mounted on
> > > /home via NFS. I moved the 1.5TB RAID array of storage so it was
> > > directly on the master. This decreased the time it took for our
> > > program to run by a factor of 4. I read somewhere that mounting the
> > > data to the master via NFS was a bad idea for performance, but am not
> > > sure what the best alternative is. I dont want to have to move data
> > > on/off the master each time I run a job because this will slow it down
> > > as more people are using it.
> > >
> >
> > If your problems are I/O bound, and you have enough local storage on
> > each compute node, and you can move the data in a reasonable amount of
> > time, the local I/O will likely be the fastest solution. You have
> > already discovered this when you moved to a local attached RAID. If you
> > have multiple parallel reads/writes to the data from each compute node,
> > you will want some sort of distributed system. If the master thread is
> > the only one doing IO, then you want the fast storage where it is.
>
> Also keep in mind that if the data used on the nodes fits
> into memory _and_ you tend to run the same software over and
> over, then typically that data will only need to be read off disk once
> on each node and will subsequently be accessed from the file system
> cache. That mode of data access is many times faster
> than physically reading from a disk. So don't toss out the idea
> of local data storage if the cluster happens to have slowish disks
> on the compute nodes. It will also cache from NFS but it may take
> a very, very long time for all nodes to read it at once.
>
> Depending on your cluster topology, interconnect, and budget
> you might also consider multiple file servers. That will
> speed things up at the cost of a bit more hardware, more
> complexity (which node mounts which file server). Also, for that to
> work well data should be mostly reads, since writes to a common
> file need to go to M file servers instead of just one.
>
> Finally, and this effect can be surprising large - be careful about
> writes of results back to a single file server. When N nodes naively
> direct stdout back to a single NFS server the line by line writes can
> drive that server into the ground. Conversely, if the nodes
> write to /tmp and then when done copy that fall to the NFS server
> in one fell swoop it may work better, especially if the processes
> finish asynchronously. If they all finish at the same time think
> twice before having them all do:
>
> cp /tmp/$HOSTNAME_output.txt /nfsmntpoint/accum_dir/
>
> simultaneously.
>
>
> > NFS provides effectively a single point of data flow, and hence is a
> > limiting factor (generally).
>
> Also double check that NFS is using hard mounts. Else you may
> fall prey to the dreaded "big block of nulls" problem.
>
> Regards,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
More information about the Beowulf
mailing list