[Beowulf] Re: building a new cluster

Wed Sep 1 17:35:16 PDT 2004

SC Huang wrote:

> Thanks to your suggestions. I am summarizing the questions and answers 
> here.
>  
> 1. cluster usage
>  
> The cluster will be used solely for running numerical simulations 
> (number crunching) codes written in fortran90 and MPI, which involves 
> finite difference calculations and fast fourier transforms for solving 
> 3D Navier-Stokes equations. It calls mpi_alltoall a lot (for FFTs) as 
> well as other mpi_send/recv, so communication is intensive. THe 
> problem is unsteady and 3D, so computation is also heavy. A typically 
> run can take 1-2 weeks using 8-16 nodes (depending on the problem size).
>  
> We have been OK with a "hybrid" 25-node (COMPAQ Alpha & Dell Xeon 
> 2.4GHz) cluster running right now using a 3Com 100 Mbps 
> (ethernet) switch and a LAM/MPI library.
> I will post some benchmarks later.

Hmmm... Interconnect latency may be an issue as you scale up, depending 
critically on how you do your ffts and how your other codes use mpi.

>  
> 2. Many people recommended Opteron (or at least encourage a test run 
> on Opteron) because it seems to be more cost effective. I picked Xeon 
> because of the following reasons:
>  
> (1) free Intel FORTRAN 90 compiler, which is also used for other 
> individual workstations in our lab and some supercomputers that we 
> have access to (we are kind of trying to stay away from the hassle of 
> switching between compilers when writing new codes)

The Intel compiler can emit code for AMD64 (though they call it EM64T, 
it is effectively the same target).  The intel compiler won't 
necessarily optimize as well for the Opteron as it would for the Nocona. 

>  
> (2) We have a few users to share the cluster, so we have to get 
> "enough" nodes

You wanted to spend about $2-2.5k$US per node as I remember (including 
all networking and other stuff).

>  
> (3) Xeon seems to be more common, so it's easier to get consultanting 
> or support

More common, yes, but this likely has to do with a longer time in market 
without a reasonable competitor.  Consulting?  Support?  You can get 
them from all quarters from independent folks, or on the net. 

>  
> BTW, what are the common fortran 90 compilers that people use on 
> Opteron? Any comparison to other compilers?
>  

I have used PathScale, Portland Group, and have done a little work with 
Absoft.  There are a few others. 

>  
> 3. My MPI code periodically writes out data files to local disk, so I 
> do need hard disk on every node. Diskless sounds good (cost, 
> maintenance,etc), but the data size seems too big to be transferred to 
> the head node (well technically it can be done, but I would rather 
> just use local scratch disk).

Ok.  Not hard to do, though I think given your budget that you would 
rather spend your money on interconnect than disk, so aim for IDE.

>  
>  
> 4. Managed or unmanaged?
>  
> People already recommended some switches that I will not repeat here. 
> However, I am still not clear about "managed" and "unmanaged" 
> switches. Some vendors told me that I need an managed one, while other 
> said the opposite. Will need to study more...

You only need managed if you are going to do fancy things with the 
switch, which most folks who build and work with clusters will not do.  
Yes, being able to remotely tell if a port is live is nice ... but then 
again, I have ping, and it works pretty well.  In fact, it helped me 
diagnose a dead port on a 24 port SMC managed switch that the management 
software insisted was an operational port. 

Your choice, it will double or triple the price of a lower end switch.   
If you are going to use more than 48 ports, you don't have a choice.  
Otherwise, you might take that extra $1500 or so, an plow it into an 
extra node ... if you can find another $1000 somewhere. 

>  
> 5. I only have wall-clocking timing of my code on various platforms. I 
> don't know how sensitive it is to cache size. I guess the bigger 
> cache, the better, because the code is operating large arrays all the 
> time.
>  

Would be interesting.  It is always best to use your own code, test it 
the way you want to run it today and as reasonable a guess as to the 
future usage as you can.

Joe

> I will post more summary here if I find out more information about 
> these issues. Thanks.
>  
> SCH
>

-- 
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 612 4615