<HTML>
<HEAD>
<TITLE>Re: [Beowulf] scratch File system for small cluster</TITLE>
</HEAD>
<BODY>
<FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'><BR>
<BR>
<BR>
On 9/25/08 10:19 AM, "Joe Landman" <<a href="landman@scalableinformatics.com">landman@scalableinformatics.com</a>> wrote:<BR>
<BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'>Glen Beane wrote:<BR>
> I am considering adding a small parallel file system ~(5-10TB) my small<BR>
> cluster (~32 2x dual core Opteron nodes) that is used mostly by a handful of<BR>
> regular users. Currently the only storage accessible to all nodes is home<BR>
> directory space which is provided by the Lab's IT department (this is a SAN<BR>
> volume connected to the head node by 2x FC links, and NFS exported to the<BR>
> compute nodes). I don't have to "worry" about the IT provided SAN space -<BR>
> they back it up, provide redundant hardware, etc. The parallel file system<BR>
> would be scratch space (and not backed up by IT). We have a mix of home<BR>
> grown apps doing a pretty wide range of things (some do a lot of I/O, others<BR>
> don't), and things like BLAST and BLAT.<BR>
<BR>
Hi Glen:<BR>
<BR>
BLAST uses mmap'ed IO. This has some interesting ... interactions<BR>
... with parallel file systems.<BR>
<BR>
<BR>
for what its worth, we use Paracel BLAST and are also considering mpiBLAST-pio to take advantage of a parallel file system<BR>
<BR>
<BR>
<BR>
<BR>
><BR>
> Can anyone out there provide recommendations for a good solution for fast<BR>
> scratch space for a cluster of this size?<BR>
<BR>
Yes, but we are biased, as this is in part what we<BR>
design/build/sell/support. Linky in .sig .<BR>
<BR>
> Right now I was thinking about PVFS2. How many I/O servers should I have,<BR>
> and how many cores and RAM per I/O server?<BR>
<BR>
It turns out that PVFS2 sadly has a significant problem with BLAST<BR>
and mpiBLAST due to the mmap'ed files. We found this out when trying<BR>
to help a customer with a small tier-1 cluster deal with file system<BR>
instability. We saw this in PVFS2 2.6.9, 2.7.0 on 32 and 64 bit<BR>
platforms. The customer was going to update the PVFS2 group, haven't<BR>
heard if they have had a chance to do anything to trace this down and<BR>
fix it (I don't think it is a priority, as BLAST doesn't use MPI-IO,<BR>
which PVFS2 is quite good at).<BR>
<BR>
> Are there other recommendations for fast scratch space (it doesn't have to<BR>
> be a parallel file system, something with less hardware would be nice)<BR>
<BR>
Pure software: GlusterFS currently, ceph in the near future. GFS won't<BR>
give you very good performance (meta-data shuttling limits what you can<BR>
do). You could go Lustre, but then you need to build MDS/ODS setups so<BR>
this is hybrid.<BR>
<BR>
Pure hardware: Panasas (awesome kit, but not for the light-of-wallet),<BR>
DDN, Bluearc (same comments for these as well).<BR>
<BR>
Reasonable cost HW with good performance: us and a few others. Put any<BR>
parallel FS atop this, or pure NFS. We have measured NFSoverRDMA speeds<BR>
(on SDR IB at that) at 460 MB/s, on an RDMA adapter reporting 750 MB/s<BR>
(in a 4x PCIe slot, so ~860 MB/s max is what we should expect for this).<BR>
Faster IB hardware should result in better performance, though you<BR>
still have to walk through the various software stacks, and they ...<BR>
remove efficiency ... (nice PC way to say that they slow things down a<BR>
bit :( )<BR>
<BR>
--<BR>
Joseph Landman, Ph.D<BR>
Founder and CEO<BR>
Scalable Informatics LLC,<BR>
email: <a href="landman@scalableinformatics.com">landman@scalableinformatics.com</a><BR>
web : <a href="http://www.scalableinformatics.com">http://www.scalableinformatics.com</a><BR>
<a href="http://jackrabbit.scalableinformatics.com">http://jackrabbit.scalableinformatics.com</a><BR>
phone: +1 734 786 8423 x121<BR>
fax : +1 866 888 3112<BR>
cell : +1 734 612 4615<BR>
<BR>
</SPAN></FONT></BLOCKQUOTE><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'><BR>
-- <BR>
Glen L. Beane<BR>
Software Engineer<BR>
The Jackson Laboratory<BR>
Phone (207) 288-6153<BR>
<BR>
</SPAN></FONT>
</BODY>
</HTML>