p4 v itinium
Josip Loncaric
josip at icase.edu
Fri May 17 11:55:36 PDT 2002
Josh Fryman wrote:
>
> a better question is why a single node needs that kind of memory for an
> application? seems like you'd be better off breaking it up under a real
> parallel process situation.
Breaking it up is sometimes inefficient. Even if you use 3D CFD codes
based on domain decomposition, the optimal amount of memory per node
grows very quickly as a function of the CPU/communication speed ratio.
If you double the CPU speed but keep the same network, you can maintain
performance if your domain volume/surface ratio doubles as well. This
can happen if the volume (computations) grows by a factor of eight and
surface (communications) by a factor of four. In other words, per node
memory requirements can grow with the cube of the CPU/communication
speed ratio.
Of course, at some point you'll want a faster network, but CPU speeds
might grow by a factor of 4-5 before that happens, and that means that
RAM/node may need to increase 64-125 fold. This is why more and more
cluster users are hitting the 32-bit address space limit.
Sincerely,
Josip
P.S. This also makes a good argument for investing in a faster network
as soon as communications become the dominant bottleneck. With the old
network, for a fixed size problem, doubling the CPU speed implies that
only 1/8 as many processors can be brought into the computation, so the
wall clock time to solve the problem will actually quadruple. A faster
network (if that's the active constraint) could allow speedups because
the problem would parallelize better across more processors. Of course,
if the bottleneck is CPU speed, one should invest in more/faster
processors instead.
--
Dr. Josip Loncaric, Research Fellow mailto:josip at icase.edu
ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/
NASA Langley Research Center mailto:j.loncaric at larc.nasa.gov
Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134
More information about the Beowulf
mailing list