Questions and Sanity Check

Tue Feb 27 06:48:59 PST 2001

I would use larger hard drives.  The incremental cost from 10GB to 30GB
should be pretty small and you may one day appreciate that space if you
use something like PVFS.  I would also consider a gigabit uplink to the
head node if you are going to use Scyld.  It drastically improved our
cluster booting time to have a faster link to the head.

				Keith

On 26 Feb 2001, Ray Jones wrote:

> I'm involved in a group putting together a proposal for building a
> Beowulf for our research lab (www.merl.com).  We feel like we've
> achieved a reasonable level of confidence in our design, but I wanted
> to run it past the list for a final sanity check, as well as tack on a
> few questions that we still need to answer.
>
>
> Hardware configuration questions:
>
> Our proposed system:
> 128 nodes - 64 RS-1200 servers from Racksaver
> Each node:
>   1 GHz AMD Thunderbird
>   10 GB IDE hard drive
>   512 MB CS2 memory
>   floppy
>   Intel Etherexpress 8460 NIC
>
> Switch:
> D-Link DES-6000 w/ 8 6003 16-port blades
>
> OS: Scyld (most likely)
>
> I realize that it's an ill-formed question, but does anyone see
> anything horribly wrong with the above?
>
>
>
>
>
> Fuzzier, cost of ownership questions:
>
> We have about 10 researchers that would be interested in using the
> system.  They  almost exclusively into two categories:
> - Matlab users
> - Users with embarassingly parallel problems (tree search, graphics
>   rendering, ...)
>
> For the Matlab users, we plan to use Matlab*p (aka MITMATLAB, aka
> Parallel Problems Server) to provide them access to the system.  The
> others will probably receive a bit of an introduction to MPI and a bit
> of handholding while they get used to running parallel batch jobs.
> How much they'll need is one of the questions below.
>
> Open questions for anyone with experience with supporting multiple
> user access to Beowulf systems.  I realize most of these are even more
> vague than my question above, but any input (no matter how anecdotal)
> would be helpful.
>
> 1- How much scheduling will we have to do?  Will we see a graceful
> degradation of the system if multiple users ignore each other and run
> their jobs simultaneously?  How will this affect things like Matlab*p
> and ScaLAPACK?
>
> 2- How many people are we going to need to dedicate to the software
> side of maintaining the cluster and helping researchers solve their
> problems, given that most of them are either doing batch parallelism
> or using tools (Matlab*p) that just make things magically happen?  Is
> it going to be a full time to support 10 researchers that don't want
> to learn parallel programming?
>
>
>
>
> Specific questions:
>
> >From playing with our test system running Scyld, it looks like the
> root node is a compute node as well, and so should be made homogeneous
> with the cluster.  However, this is not stated explicitly that I
> noticed.  Is this the case?
>
> Does anyone have any comments on the Racksaver RS-1200 compute node,
> in the 2-Athlon in 1U configuration (or even the 2-pentium in 1U
> config)?  We like the node we have for testing, but wonder what life
> with 64 of them will be like.
>
> Thanks,
> Ray Jones
> MERL
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

---------------------------------------------------------------------------
Keith Underwood                   Parallel Architecture Research Lab (PARL)
keithu at parl.clemson.edu                                  Clemson University