p4 v itinium

Fri May 17 11:55:36 PDT 2002

Josh Fryman wrote:
> 
> a better question is why a single node needs that kind of memory for an
> application?  seems like you'd be better off breaking it up under a real
> parallel process situation.

Breaking it up is sometimes inefficient.  Even if you use 3D CFD codes
based on domain decomposition, the optimal amount of memory per node
grows very quickly as a function of the CPU/communication speed ratio.

If you double the CPU speed but keep the same network, you can maintain
performance if your domain volume/surface ratio doubles as well.  This
can happen if the volume (computations) grows by a factor of eight and
surface (communications) by a factor of four.  In other words, per node
memory requirements can grow with the cube of the CPU/communication
speed ratio.

Of course, at some point you'll want a faster network, but CPU speeds
might grow by a factor of 4-5 before that happens, and that means that
RAM/node may need to increase 64-125 fold.  This is why more and more
cluster users are hitting the 32-bit address space limit.

Sincerely,
Josip

P.S.  This also makes a good argument for investing in a faster network
as soon as communications become the dominant bottleneck.  With the old
network, for a fixed size problem, doubling the CPU speed implies that
only 1/8 as many processors can be brought into the computation, so the
wall clock time to solve the problem will actually quadruple.  A faster
network (if that's the active constraint) could allow speedups because
the problem would parallelize better across more processors.  Of course,
if the bottleneck is CPU speed, one should invest in more/faster
processors instead.

-- 
Dr. Josip Loncaric, Research Fellow               mailto:josip at icase.edu
ICASE, Mail Stop 132C           PGP key at http://www.icase.edu./~josip/
NASA Langley Research Center             mailto:j.loncaric at larc.nasa.gov
Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134