[Beowulf] Re: scratch File system for small cluster
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David Mathog mathog at caltech.eduThu Sep 25 12:00:37 PDT 2008
- Previous message: [Beowulf] Re: scratch File system for small cluster
- Next message: [Beowulf] Benchmark/apps showing benefit from instruction set advances
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Joe Landman <landman at scalableinformatics.com> wrote > Glen Beane wrote: > > I am considering adding a small parallel file system ~(5-10TB) my small > > cluster (~32 2x dual core Opteron nodes) that is used mostly by a handful of > > regular users. Currently the only storage accessible to all nodes is home > > directory space which is provided by the Lab's IT department (this is a SAN > > volume connected to the head node by 2x FC links, and NFS exported to the > > compute nodes). I don't have to "worry" about the IT provided SAN space - > > they back it up, provide redundant hardware, etc. The parallel file system > > would be scratch space (and not backed up by IT). We have a mix of home > > grown apps doing a pretty wide range of things (some do a lot of I/O, others > > don't), and things like BLAST and BLAT. > > Hi Glen: > > BLAST uses mmap'ed IO. This has some interesting ... interactions > ... with parallel file systems. Right, and it isn't just the mapping of the databases and input file. One must also be careful with how BLAST output is directed. Sending it all to the same NFS mounted file system as "node01.out", "node02.out", etc. will do very unpleasant things to both your network and the file server. Far better to write those locally to /tmp/nodeXX.out, and then take some care in moving them back to the central file system later, so that the data transfer can proceed without interference. This doesn't mean you have to wait until the end of the run and send each node's entire output file back at once. It can be more efficient, but more complicated, to write the output files on each node in reasonable sized chunks and then interleave the transfer of those to the central store with the ongoing run. Whether this is worth the extra effort depends mostly on the number of queries in the input file and the verbosity of the output file. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
- Previous message: [Beowulf] Re: scratch File system for small cluster
- Next message: [Beowulf] Benchmark/apps showing benefit from instruction set advances
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
