Questions and Sanity Check
Ray Jones
rjones at merl.com
Mon Feb 26 14:48:20 PST 2001
I'm involved in a group putting together a proposal for building a
Beowulf for our research lab (www.merl.com). We feel like we've
achieved a reasonable level of confidence in our design, but I wanted
to run it past the list for a final sanity check, as well as tack on a
few questions that we still need to answer.
Hardware configuration questions:
Our proposed system:
128 nodes - 64 RS-1200 servers from Racksaver
Each node:
1 GHz AMD Thunderbird
10 GB IDE hard drive
512 MB CS2 memory
floppy
Intel Etherexpress 8460 NIC
Switch:
D-Link DES-6000 w/ 8 6003 16-port blades
OS: Scyld (most likely)
I realize that it's an ill-formed question, but does anyone see
anything horribly wrong with the above?
Fuzzier, cost of ownership questions:
We have about 10 researchers that would be interested in using the
system. They almost exclusively into two categories:
- Matlab users
- Users with embarassingly parallel problems (tree search, graphics
rendering, ...)
For the Matlab users, we plan to use Matlab*p (aka MITMATLAB, aka
Parallel Problems Server) to provide them access to the system. The
others will probably receive a bit of an introduction to MPI and a bit
of handholding while they get used to running parallel batch jobs.
How much they'll need is one of the questions below.
Open questions for anyone with experience with supporting multiple
user access to Beowulf systems. I realize most of these are even more
vague than my question above, but any input (no matter how anecdotal)
would be helpful.
1- How much scheduling will we have to do? Will we see a graceful
degradation of the system if multiple users ignore each other and run
their jobs simultaneously? How will this affect things like Matlab*p
and ScaLAPACK?
2- How many people are we going to need to dedicate to the software
side of maintaining the cluster and helping researchers solve their
problems, given that most of them are either doing batch parallelism
or using tools (Matlab*p) that just make things magically happen? Is
it going to be a full time to support 10 researchers that don't want
to learn parallel programming?
Specific questions:
More information about the Beowulf
mailing list