[Beowulf] How much RAM per core is right?

Gus Correa gus at ldeo.columbia.edu
Thu Jul 17 08:47:16 PDT 2008

Dear Beowulf list subscribers

A quick question:

How much memory per core/processor is right for a Beowulf cluster node?

I am new to this list, so please forgive me if my question hurts the 
list etiquette.
I will take the question back if so.

To qualify my question, here are some details of my problem:

I plan to buy 8GB per node on a dual-processor quad-core machine (1GB 
per core),
most likely AMD Opteron.
I based this on some classic scaling calculations based on the programs 
we use and the
computations we do, and shrunk it down somewhat due to budget constraints.
However, I also saw this number on several postings on other mailing lists,
where people described their cluster configurations,
which gave me some confidence 1GB per core is acceptable and used by a 
number of people out there.
Still, I would like to hear your thoughts and experiences about this.
I may be missing an important point, and your advice is important and 

Actually not long ago RAM-per-core ratio used to be 512MB per core 
(which were physical CPUs back then),
and it seems to me some non-PC HPC machines (IBM BlueGene, SiCortex 
machines, etc) still use
the 512MB RAM-per-core ratio.

PC server motherboards can fit 32, 64, even 128GB of RAM these days.
Hence, one can grow really big, particularly for desktop/interactive 
applications like Matlab, etc.
However, what IBM and others do (on machines of admittedly very 
different architecture)
makes me think whether for PC cluster nodes the "big is better" 
philosophy is wise,
or if there is a saturation point for the efficiency of RAM-to-core ratio.

What do you think?
For PC-based cluster compute nodes, is 1GB per core right?
Is it too much?
Is it too little?
"Big is better" is really the best, and minimalism is just an excuse for 
the cheap and the poor?

We do climate model number crunching with MPI here.
We use domain decomposition, finite-differences, some FFTs, etc.
The "models" are "memory intensive",
with big 3D arrays being read from and written to memory all the time,
and not so big 2D arrays (sub domain boundary values) being passed 
across processes through MPI
at every time step of the simulation. I/O happens at a slower pace, 
typically every ~100 time steps or more,
and can be either funneled through a master process, or distributed 
across all processes.

One goal of our new "cluster-to-be" is to run the programs at higher 
spatial resolution.
Most algorithms that march the solution in time are conditionally stable.
Therefore, due to the Courant-Friedrichs-Levy stability condition, the 
time step must be proportional
to the smallest spatial grid interval. 
Hence, for 3D climate problems, the computational effort scales as N**4,
where N is a typical number of grid points in a spatial dimension.

Our old dual-processor single-core production cluster has 1GB per node 
(512MB per "core").
Most of our models fit this configuration.
The larger problems use up to 70-80% RAM, but safely avoid memory 
paging, process switching, etc.

However, on multicore machines there are other issues to consider, 
memory bandwidth, cache size vs. RAM size, NUMA, cache eviction, etc, etc.
So, the classic scaling I mentioned above may need to be combined with
memory bandwidth and other factors of this kind.

In any case, at this point it seems to me that
"get as much RAM as your money can buy and your motherboard can fit" may 
not be a wise choice.
Is there anybody out there using 64 or 128GB per node?

I wonder if there is an optimal choice of RAM-per-core.
What is your rule of thumb?
Or does it depend?
And on what does it depend?

Many thanks,
Gus Correa

Gustavo J. Ponce Correa, PhD - Email: gus at ldeo.columbia.edu
Lamont-Doherty Earth Observatory - Columbia University
P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA

More information about the Beowulf mailing list