[Beowulf] Some beginner's questions on cluster setup

Thu Jul 9 08:16:02 PDT 2009

Hi!

The diskless provisioning system is definitely the way to go. We use the 
cluster toolkit called, Jesswulf, which is available at

<advertisement>

http://hpc.arc.georgetown.edu/mirror/jesswulf/

By default it runs on RedHat/Centos/Fedora systems, though it has been 
ported to Ubuntu and SuSE without too much trouble. Perseus/Warewulf 
also work well. We also teach cluster courses, which may be helpful.

http://training.arc.georgetown.edu/

</advertisement>

To answer some of your questions, I prefer the read-only NFSROOT 
approach with a small (less than 20 MB ramdisk). We use this on all of 
our clusters (about 7 clusters) and it works fine. We even use it on 
heterogeneous systems. One cluster has a mix of P4 Xeons, dual-core 
Opterons, and quad-core Xeons all using the same NFSROOT so you simply 
update one directory on the master node and *all* of the compute nodes 
have the new software. We love it! We simply either compile the kernel 
or make the initrd with hardware support for all of the nodes. We often 
use different hardware for the master and compute nodes, without issue. 
The only thing that we don't mix is 32 and 64-bit. We have a couple of 
32-bit clusters and the rest are 64-bit.

The main issue that you need to deal with is having a fast enough 
storage system for parallel jobs that generate a lot of data. We use the 
local hard drives in the computes nodes for "scratch" space and we have 
some type of shared file system. On the small clusters, we use NFS, but 
on the bigger clusters we use Glusterfs with Infiniband, which has 
proven to be very nice. If you are running MPI jobs with lots of data, 
you might want to consider adding Infiniband. Even the cheap ($125) 
Infiniband cards give much better performance than standard Gigabit. And 
you can always run IP over IB for applications or services that need 
standard IP.

You mention that you don't think that you will have too much MPI 
traffic, but that you will be copying the results back to the master. 
This is when we see the highest load on our NFS file systems when all of 
the compute nodes are writing at the same time, even on small clusters 
(less than 20 nodes). We've found that a clustered file system like 
Glusterfs provides very low I/O wait load when copying lots of files 
compared to NFS. You may consider picking up some of the cheap IB cards 
($125) and switches ($750 for 8-ports/$2400 for 24-ports) in order to do 
some relatively inexpensive testing. Here is one place where you can 
find them:

http://www.colfaxdirect.com/store/pc/viewCategories.asp?pageStyle=m&idCategory=6

I'd be happy to talk to you. My phone number is below and you have my 
e-mail.

Jess

-- 
Jess Cannata
Advanced Research Computing &
High Performance Computing Training
Georgetown University
202-687-3661

P.R. wrote:
> Hi,
> Im new to the list & also to cluster technology in general.
> Im planning on building a small 20+node cluster, and I have some basic
> questions.
> We're planning on running 5-6 motherboards with quad-core amd 3.0GHz
> phenoms, and 4GB of RAM per node.
> Off the bat, does this sound like a reasonable setup
>
> My first question is about node file&operating systems:
> I'd like to go with a diskless setup, preferably using an NFS root for each
> node.
> However, based on some of the testing Ive done, running the nodes off of the
> NFS share(s) has turned out to be rather slow & quirky.
> Our master node will be running on a completely different hardware setup
> than the slaves, so I *believe* it will make it more complicated & tedious
> to setup&update the nfsroots for all of the nodes (since its not simply a
> matter of 'cloning' the master's setup&config). 
> Is there any truth to this, am I way off?
>
> Can anyone provide any general advice or feedback on how to best setup a
> diskless node?
>
>
> The alternative that I was considering was using (4GB?) USB flash drives to
> drive a full-blown,local OS install on each node...
> Q: does anyone have experience running a node off of a usb flash drive?
> If so, what are some of the pros/cons/issues associated with this type of
> setup?
>
>
> My next question(s) is regarding network setup.
> Each motherboard has an integrated gigabit nic.
>
> Q: should we be running 2 gigabit NICs per motherboard instead of one?
> Is there a 'rule-of-thumb' when it comes to sizing the network requirements?
> (i.e.,'one NIC per 1-2 processor cores'...)
>
>
> Also, we were planning on plugging EVERYTHING into one big (unmanaged)
> gigabit switch.
> However, I read somewhere on the net where another cluster was physically
> separating NFS & MPI traffic on two separate gigabit switches.
> Any thoughts as to whether we should implement two switches, or should we be
> ok with only 1 switch?
>
>
> Notes:
> The application we'll be running is NOAA's wavewatch3, in case anyone has
> any experience with it.
> It will utilize a fair amount of NFS traffic (each node must read a common
> set of data at periodic intervals), 
> and I *believe* that the MPI traffic is not extremely heavy or constant 
> (i.e., nodes do large amounts of independent processing before sending
> results back to master).
>
>
> Id appreciate any help or feedback anyone would be willing&able to offer...
>
> Thanks,
> P.Romero
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
>