cluster setup - handling user homeareas - from main public network storage device
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Jeff Layton jeffrey.b.layton at lmco.comThu Mar 28 03:25:11 PST 2002
- Previous message: cluster setup - handling user homeareas - from main public network storage device
- Next message: console redirect issue
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Shin, Since nobody has jumped in yet, I guess I will :) shin at guss.org.uk wrote: > Hi, > > I'm just starting out with my first Beowulf, and have worked out > mostly how I'm going to be setting things up but I'm unsure as to how > to deal with user home areas. > > The cluster will probably be configured as a number of nodes on a > class B private address (it's not going to grow too much), with the > front end node (FEN) sitting on the private network and also the > main public network. > > The FEN will allow ssh only and will NFS export a number of s/w > packages to the nodes (to save installing s/w on each node). It will > also have a small scratch area - as does each node. > > The main problem I'm looking for advice on is how to deal with the > handling of user home areas. All the users have a large storage > allocation on our main RAID (connected to sun/solaris kit, quota'd & > backed up regularly) which sits on the main network, currently users > produce very large (Gb's worth) data files and the smallish area on > the cluster won't suffice - as I expect similarly sized output files > on the cluster. > > Should I : > > 1. automount the users home area from the raid on to the FEN when each > user logs in - but then how do I cope with the fact that the nodes are > on the private network - do I get the FEN to re-export the homearea to > the nodes so that the jobs can write the data back? Or is NAT the > answer somehow here? AFAIK you can't re-export. I remember someone saying that you could maybe re-export using the user-space nfsd but not with the kernel space nfsd. I wouldn't recommend it anyway. Another option, if you can do it, is to add a NIC to the RAID box to connect it directly to the cluster switch. Of course, you would have to take down the box to install the hardware, but it should work pretty easily. I think a simpler solution would be to add a few good size disks in FEN (if you can). Good size IDE drives are fairly cheap but some of the smaller SCSI drives are pretty good on price as well. Then just have the users stream their data off of FEN onto the RAID box as part of their job (that's the way we function - we have about 120 Gigs on the FEN and then stream that off to some NAS boxes). Just have a good network connection between the FEN and the RAID box. > > > 2. Setup lots of scratch space (or even use PVFS or similar across all > the nodes local disks) on the FEN which each node can write to and the > users use scp to transfer files to/fro the RAID. Expect users to > balk at the idea of using scp. PVFS is really intended to be a high-speed filesystem, not a place to put home directories. The idea is to use it as scratch space for jobs that need some reasonably high-speed IO and then move the data from PVFS onto a more consistent filesystem. In addition, I don't think you can run binaries out of PVFS quite yet. There has been some work along those lines, but it will probably be a while before this happens (you can't have symlinks either). > > > Also should I allow users to run jobs (interactive?) on the FEN or > should it be used exclusively for logging in, NFS etc? We let users run on the FEN if they need to. If it gets to be too much (rarely happens) I just yell at them and then help them find a place to run :) > > > Additionally the main network uses NIS for authentication - and I > wanted to try something similar on the cluster (which will have a > far smaller number of users than the main network) so I was planning > on running a seperate small NIS domain on the cluster (with the FEN > as master), rather than trying to sync passwd etc across nodes. There was a good thread on this list about NIS and larger clusters. I think the final conclusion was for larger clusters (100+ nodes?) that NIS starts to eat network bandwidth quickly. I understand why people want to use NIS since you can push lots of configuration maps very easily and account maintenance is also easy. But, in light of the comments that NIS can starting eating bandwidth, we have just stuck to copying the password/group files to the compute nodes. Since we don't reconfigure our cluster, other important files in /etc. have no real reason to be copied to the nodes. We keep a copy of all of the relevant nodal information on the FEN (with backups of course), so rebuilding a node is fairly trivial. In general I REALLY believe in KISS for account maintenance and nodal creation. Good Luck! Jeff Layton Lockheed-Martin > > > Any ideas, practical advice/setups on how others are doing/dealing > with user home areas would be appreciated, > > Many TIA > Shin > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
- Previous message: cluster setup - handling user homeareas - from main public network storage device
- Next message: console redirect issue
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
