[Beowulf] Re: building a new cluster

Wed Sep 1 14:27:59 PDT 2004

Thanks to your suggestions. I am summarizing the questions and answers here.

1. cluster usage

The cluster will be used solely for running numerical simulations (number crunching) codes written in fortran90 and MPI, which involves finite difference calculations and fast fourier transforms for solving 3D Navier-Stokes equations. It calls mpi_alltoall a lot (for FFTs) as well as other mpi_send/recv, so communication is intensive. THe problem is unsteady and 3D, so computation is also heavy. A typically run can take 1-2 weeks using 8-16 nodes (depending on the problem size).

We have been OK with a "hybrid" 25-node (COMPAQ Alpha & Dell Xeon 2.4GHz) cluster running right now using a 3Com 100 Mbps (ethernet) switch and a LAM/MPI library. 
I will post some benchmarks later.

2. Many people recommended Opteron (or at least encourage a test run on Opteron) because it seems to be more cost effective. I picked Xeon because of the following reasons:

(1) free Intel FORTRAN 90 compiler, which is also used for other individual workstations in our lab and some supercomputers that we have access to (we are kind of trying to stay away from the hassle of switching between compilers when writing new codes)

(2) We have a few users to share the cluster, so we have to get "enough" nodes

(3) Xeon seems to be more common, so it's easier to get consultanting or support

BTW, what are the common fortran 90 compilers that people use on Opteron? Any comparison to other compilers?

3. My MPI code periodically writes out data files to local disk, so I do need hard disk on every node. Diskless sounds good (cost, maintenance,etc), but the data size seems too big to be transferred to the head node (well technically it can be done, but I would rather just use local scratch disk).

4. Managed or unmanaged?

People already recommended some switches that I will not repeat here. However, I am still not clear about "managed" and "unmanaged" switches. Some vendors told me that I need an managed one, while other said the opposite. Will need to study more...

5. I only have wall-clocking timing of my code on various platforms. I don't know how sensitive it is to cache size. I guess the bigger cache, the better, because the code is operating large arrays all the time.

I will post more summary here if I find out more information about these issues. Thanks.

SCH

SC Huang <schuang21 at yahoo.com> wrote:
Hi,

I am about to order a new cluster using a $100K grant for running our in-house MPI codes. I am trying to have at least 36-40 (or more, if possible) nodes. The individual node configuration is:

dual Xeon 2.8 GHz
512K L2 cache, 1MB L3 cache, 533 FSB
2GB DDR RAM
gigabit NIC
80 GB IDE hard disk

The network will be based on a gigabit switch. Most vendors I talked to use HP Procurve 2148 or 4148.

Can anyone comment on the configuration (and the switch) above? Any other comments (e.g. recommeded vendor, etc) are also welcome. 

Thanks!!!

---------------------------------
Do you Yahoo!?
Yahoo! Mail Address AutoComplete - You start. We finish.

---------------------------------
Do you Yahoo!?
Yahoo! Mail Address AutoComplete - You start. We finish.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20040901/498268fa/attachment.html>