[Beowulf] NFS shared file system
Ted Sariyski
tsariysk at craft-tech.com
Mon Dec 5 08:37:05 PST 2005
Hi Mark,
Thanks for your comments. We combined clusters into a single one in
anticipation of jobs running on more than ~60 cpu and in a hope to get
better utilization of the existing resources. While in 3x30 state the
typical CPU load was ~1-5% and CFD IO was ~700-1000 req/s. It required a
lot of nfs tuning but once tuned it worked fine. Now I am trying to find
out what is the cheapest solution for 90-node cluster. One solution in
20k-30k range we have been offered is a 3 TB fiber channel nSTORE SAN
with Montilio's RapidFile I/O engine. Does anybody have experience with
it? Lustre, GPFS, HP's SFS, Panasys, etc. will be our next longterm step.
Thanks, Ted
Mark Hahn wrote:
>>Each cluster had its own head node and its own cheap, in-house build
>>RAID exported over GB NFS. Recently we combined the existing clusters
>>
>>
>
>when it was in the 3x30 state, did you do any measurements of the raid's
>internal performance, and performance when under "normal" load by the nodes?
>also, have you characterized the IO load of your CFD application?
>
>
>
>>into one and the first problem we have is with the mass storage,
>>occasionally it cannot handle the IO load. My question is if I buy a
>>commercial NAS what are the chances that after that I'll need to replace
>>GB with Mirinet (e.g.)?
>>
>>
>
>well, the better question is why you got rid of two of the IO nodes -
>or did you?
>
>
>
>>clusters but from what I read in this newsgroups my understanding is
>>that 90 nodes is a small cluster and I didn't expect scalability
>>problems at this level.
>>
>>
>
>the traffic here is somewhat specialized, of course - people doing 16-node
>clusters are not having any problems, and so don't speak up ;)
>
>90 nodes is clearly enough to show real scaling problems if the load is
>reasonably intensive and from multiple nodes simultaneously. is it safe
>to assume you've done the basic first steps in tuning (lots of nfsd's,
>perhaps also higher AC parameters on the client side, probably not using
>the default 32K packets?)
>
>
>
>>If a commercial storage optimized for IO is a
>>solution what is the price I'm facing? Any recomendations?
>>
>>
>
>depends on what your IO goals are. do you insist on a single filesystem
>implemented across multiple server nodes? if so, you have to look into
>cluster-fs things like Lustre, GPFS, HP's SFS, Panasys, etc. the overhead
>(dollars and brains) is nontrivial.
>
>I would probably split the workload across three independent NFS's,
>and also try some basic tuning. these are cheap, easy to do and will
>definitely improve performance.
>
>more speculative things:
>
> - use LACP or related techniques to provide more bandwidth out
> of the NFS server(s). this will probably not improve the bandwidth
> seen by a single node, but should come close to doubling the
> aggregate.
>
> - try out fscache - this is an add-on layer being promulgated by
> RH which creates a local disk cache to unload your NFS.
>
>
>
More information about the Beowulf
mailing list