[Beowulf] Re: Problems scaling performance to more than one node, GbE

Tue Feb 17 11:22:26 PST 2009

On Mon, 16 Feb 2009, Tiago Marques wrote:

> I must ask, doesn't anybody on this list run like 16 cores on two 
> nodes well, for a code and job that completes like in a week?

For GROMACS and other MD programs, the way a job runs depends on a lot 
of factors that define the simulation: the size of the molecular 
system, the force field in use, the cutoff distances, etc. 
Furthermore, what you call a job actually contains a very important 
variable - the number of MD steps, which can make the total runtime go 
from seconds to months (or more). Asking for someone who runs in the 
same conditions as you probably means that he/she has already done the 
simulations you are about to begin, meaning from a scientific point of 
view that you would better invest your time in something else as 
he/she would publish first ;-)

I have found several MD codes to scale rather poorly when used on 
clusters composed of 8-core nodes, especially when those 8 cores are 
coming from 2 quad-core Intel CPUs; the poor scaling was also with 
InfiniBand (Mellanox ConnectX), so IB will not magically solve your 
problems. The setup that seemed to me like a good compromise was with 
4-core nodes, when these 4 cores come from 2 dual-core CPUs, 
associated with Myrinet or IB.

You have to understand that, the way most MD programs are done this 
days, the MD simulations of small molecular systems are simply not 
going to scale, the communication dominates the total runtime. 
Communication through shared memory is still the best way to scale 
such a job, so having a node with as many cores a possible and running 
a job to use all of them is probably going to give you the best 
performance.

-- 
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8240, Fax: +49 6221 54 8850
E-mail: bogdan.costescu at iwr.uni-heidelberg.de