Advice on "mini" Beowulf cluster
Miska Le Louarn
lelouarn at eso.org
Fri Feb 1 09:41:29 PST 2002
Hello all,
Sorry if my questions have been asked 1000 times, but I am new in the
cluster PC-business. Plus hardware+software evolve so fast that what was
true a year ago is now history...
We have a simulation code which we have started to paralellize, and want
to build a PC based architecture to run it.
We will use MPI for message passing and write the main code in C/C++.
The software is mainly made of three parts:
- Creating a matrix
We send basically one integer to each node, let it crunch and finally
get back a vector of say 20 000 floats.
We iterate this process a large number of times until our matrix is
filled. Paralellizing this is a piece of cake, each operation is
independant.
- Doing an SVD (or a few matrix multiples) on that matrix (which might
or might not be sparse, depending on some parameters of the system we
are modeling).
Eventually this will be the limiting part in the simulation. We want to
process the biggest matrix we can afford to. We will use Scalapack (or
something similar).
I suspect this might be network intensive, since chuncks of the matrix
need to be passed to other nodes during the calculation.
Typically the matrix size will be 10 000 elements (doubles) squared and
larger.
I worry about this part because it might be a real network hog. We want
to be memory saturated, not network saturated ! But I am far from an
expert in parallel SVD, so I don't really know...
- Doing a fair number of iterations, which are computationnally mostly
moderate sized FFTs plus a large matrix multiply.
The SVD'd matrix will be multipled by a vector. Then some (~10)
independant FFTs (size 512^2 or 1024^2) need to be done. I suspect one
node can easilly cope with one FFT, so no need to use fancy distributed
FFT algorithms. Each node computes and FFT and returns the resulting
array. The total number of such FFTs is ~50 000.
The network will probably get some load from transfering the data to be
FFTd and the result back. And of course there's the matrix-vector mult.
After this lengthy description (which I felt necessary because the
hardware must be adapted to the problem), here is the hardware we would
like to start with:
- 5-6 Athlon MP nodes, which each 2 x 1600MP processors.
Motherboard based on the new 760MPX chipset. Maybe ASUS, but I am not
done looking. Any recommendations ?
Each node will start with 2 Gb of RAM. Can / will be extended to 4 Gb to
allow larger SVDs...
We'll run Linux. The cost of a node is less than 2000 $, including the
memory, which I consider really cheap.
- Network: the big unknown. I would tend to buy a Gigabit switch: 16
gigabit ports cost ~2000 $ (Netgear GS516TGE). I don't think we'll need
more than 16 nodes, but if 32 ports weren't so much more expensive...
I chose Gigabit to be on the safe side. But is that overkill ? By going
to 10/100 I could save and get one more node (and more ports, just in
case our project really expands).
I am really concerned that I might saturate the network if I take only
10/100 Mbits. Comments from SVD / Network experts and the real-life
benefits of Gigabit ethernet ?
- Network cards: Intel Pro1000T desktop, about 100 $ / piece. Seem to
have Linux support.
- Maybe a fast hard-disk (U160, SCSI) for the master node, if data
distribution is needed to the nodes (via NFS for example).
Thanks for reading until the end of this long message. Sorry again if
this is a FAQ...
Any advice / comments will be appreciated !
Thanks in advance,
Miska
--
* Miska Le Louarn, PhD Phone: (49) 89 320 06 908 *
* European Southern Observatory FAX : (49) 89 320 23 62 *
* Karl Schwarzschild Str. 2 e-mail: lelouarn at eso.org *
* D-85748 Garching http://www.eso.org/~lelouarn *
* Germany
*
More information about the Beowulf
mailing list